%0 Journal Article %T 托福iBT口语考试的评分等级评估——基于中国考生的数据
Evaluating the Rating Scale of the TOEFL iBT Speaking Test—Based on Data from Chinese Test Takers %A 汪雯 %A 许丹 %A 钟舒婷 %J Overseas English Testing: Pedagogy and Research %P 36-46 %@ 2643-5470 %D 2025 %I Hans Publishing %R 10.12677/oetpr.2025.71005 %X 随着在北美接受教育的中国学生数量的增加,评估其英语水平的必要性也逐渐凸显,尤其是在像托福iBT这样的高利害测试中。虽然托福口语测试对准留学生们来说是一个挑战,但他们持续较低的口语成绩引发了关于其根本原因的讨论,包括用于评估他们表现的评分量表描述。本论文旨在评估这些评分描述,特别是综合口语任务的描述,在多大程度上准确反映了中国考生的语言特征。本研究使用实证数据,重点关注托福综合口语测试中的第二题,该题结合了阅读、听力和口语技能。本研究选取了四个参加托福预备课程的中国学生的口语答题录音进行分析,这些录音均获得3分(满分4分)的评分,分析依据为托福口语测试评分量表的三个子构念:总体陈述、语言使用和话题发展。研究结果显示,考生的语言特征与托福评分量表描述之间存在若干差异。在总体陈述方面,研究发现学生回答的语言特征(如流利度、发音和语调)与相应的描述只有部分一致。尽管大多数回答是可以理解的,但学生没有表现出期望的流利度。例如,回答的流利度未完全达到描述中期望的“流畅性”,这表明流利度和可理解性可能并不总是相关。这种不一致可能表明这两个构建应该分开评估,因为它们可能反映了口语表现的不同方面。在语言使用方面,四个学生回答的语法和词汇使用存在若干差异。虽然每个回答表现出一些词汇准确性,但在语法准确性和句子复杂性方面存在显著问题。这些差异往往导致交流不完整或不准确。该子构建的描述提到“某些不准确或不精确的词汇或语法结构使用”,但未能明确允许的错误程度,给评分者留下了解读空间。这种不明确性可能导致评分者根据主观解释对回答进行评分的不一致性。话题发展的分析也反映了托福口语测试评分描述的一些问题。尽管回答总体上连贯,但往往缺乏具体性和完整性,未能完全满足任务要求。回答中出现一些断断续续或不连贯的想法,使研究人员难以将学生回答的语言特征与评分量表中的描述相匹配。虽然话题发展描述的模糊性(如“不完整或不准确的信息”)的确能为评分提供一定便利性,但可能无法反映学生真实的话题扩展能力。研究结果表明,尽管托福iBT口语测试描述为评估考生回答提供了有价值的框架,但描述与考生产生的语言特征之间存在不匹配现象。这些差异可能影响评分过程的准确性和公平性。该论文呼吁需要在这一领域进行进一步的研究,以确保对学生口语能力的评估更加公平和有效。
The increasing number of Chinese students studying in North America has highlighted the need to evaluate their English proficiency, particularly in high-stakes language tests like the TOEFL iBT. While the TOEFL speaking test is often a challenge for these students, their consistently low speaking scores have prompted discussions about the underlying causes, including the rating scale descriptors used to assess their performances. This paper aims to assess the degree to which these descriptors, particularly for the integrated speaking tasks, accurately reflect the linguistic features produced by Chinese test takers. Using empirical data, this study focuses on Task 2 of the TOEFL speaking test, which combines reading, listening, and speaking skills. This study selected four Chinese students who participated in TOEFL preparatory courses to analyze their spoken answer recordings, all of which received a score of 3 out of 4. The analysis is grounded in a content analysis framework that examines the three sub-constructs of the TOEFL speaking test’s rating scale: delivery, language use, and topic development. The findings of this study highlight several discrepancies between the linguistic features exhibited by the test-takers and the descriptors in the TOEFL rating scale. In the area of delivery, the study found that the linguistic features of the students’ %K 托福口语, %K 评分标准, %K 测试效度, %K 测试信度
TOEFL Speaking %K Rating Scale %K Test Validity %K Test Reliability %U http://www.hanspub.org/journal/PaperInformation.aspx?PaperID=108592