This short paper examines inter-rater reliability of native vs. nonnative raters in their assessment of L2 Arabic speech by American learners. It is predicted that ratings provided by native speakers of Arabic would be more consistent and show less variance as opposed to ratings provided by nonnative speakers of Arabic. In a rating experiment, native and nonnative raters evaluated the “nativeness” of American learners’ production of Arabic guttural consonants. A Pearson’s correlation coefficient shows a significant strong inter-rater reliability in the judgments of native raters, and a poor one, although insignificant, in the judgments provided by the nonnative raters. Findings also indicate that overall native and nonnative rater groups produced comparable ratings, although no strong correlation could be established.
References
[1]
Barnwell, D. (1989). Naïve Native Speakers and Judgments of Oral Proficiency in Spanish. Language Testing, 6, 152-163. https://doi.org/10.1177/026553228900600203
[2]
Brown, A. (1995). The Effect of Rater Variables in the Development of an Occupation-Specific Language Performance Test. Language Testing, 12, 1-15. https://doi.org/10.1177/026553229501200101
[3]
Caban, H. (2003). Rater Group Bias in the Speaking Assessment of L1 Japanese ESL Students. Second Language Studies, 21, 1-44.
[4]
Charney, D. (1984). The Validity of Using Holistic Scoring to Evaluate Writing: A Critical Review. Research in the Teaching of English, 18, 65-81.
[5]
Fayer, J. M., & Krasinski, E. (1987). Native and Nonnative Judgments of Intelligibility and Irritation. Language Learning, 37, 313-326. https://doi.org/10.1111/j.1467-1770.1987.tb00573.x
[6]
Hadden, L. (1991). Teacher and Nonteacher Perceptions of Second Language Communication. Language Learning, 41, 1-24. https://doi.org/10.1111/j.1467-1770.1991.tb00674.x
[7]
Ioup, G., Boustagui, E., El Tigi, M., & Moselle, M. (1994). Re-Examining the Critical Period Hypothesis: A Case Study of Successful Adult SLA in a Naturalistic Environment. Studies in Second Language Acquisition, 16, 73-98. https://doi.org/10.1017/S0272263100012596
[8]
Jacobs, H. L., Zinkgraf, S.A., Wormuth, D.R., Hartfiel, V F., & Hughey, J. B. (1981). Testing ESL Composition: A Practical Approach. London: Newbury Publishers House.
[9]
Kobayashi, T. (1992). Native and Nonnative Reactions to ESL Compositions. TESOL Quarterly, 28, 81-121. https://doi.org/10.2307/3587370
[10]
Long, M. (1990). Maturational Constraints on Language Development. Studies in Second Language Acquisition, 12, 251-285. https://doi.org/10.1017/S0272263100009165
[11]
Lumley, T., & McNamara, T. F. (1995). Rater Characteristics and Rater Bias: Implications for Training. Language Testing, 12, 54-71. https://doi.org/10.1177/026553229501200104
[12]
Moyer, A. (1999). Ultimate Attainment in L2 Phonology, the Critical Factors of Age, Motivation, and Instruction. Studies in Second Language Acquisition, 21, 81-108. https://doi.org/10.1017/S0272263199001035
[13]
Patkowski, M. (1994). The Critical Age Hypothesis and Inter-Language Phonology. In M. Yavas (Ed.), First and Second Language Phonology (pp. 209-221). San Diego: Singular Publishing Group.
[14]
Shi, L. (2001). Native and Nonnative-Speaking EFL Teachers’ Evaluation of Chinese Students’ English Writing. Language Testing, 18, 303-325. https://doi.org/10.1177/026553220101800303
[15]
Shohamy, E., Gordon, C. M., & Kraemer, R. (1992). The Effect of Raters’ Background and Training on the Reliability of Direct Writing Tests. Modern Language Journal, 76, 27-33. https://doi.org/10.1111/j.1540-4781.1992.tb02574.x
[16]
Weigle, S. (1994). Effects of Training on Raters of ESL Compositions. Language Testing, 11, 172-197. https://doi.org/10.1177/026553229401100206