检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:林椿[1] 肖云南[1] LIN Chun;XIAO Yunnan
机构地区:[1]湖南大学
出 处:《中国外语》2018年第5期72-84,共13页Foreign Languages in China
基 金:湖南省教育厅科学研究项目“基于动态评估的英语教育绩效研究”(15C1438);湖南省哲学社会科学基金项目“基于认知诊断理论的ESL分级测试体系研究(16YBA392)的资助
摘 要:为探究汉语母语与英语母语评分员在写作评分信度和评分行为上的差异,本文分别运用概化理论和多面Rasch模型,分析了他们对448篇英语作文样本的整体法评分结果。结果表明:(1)评分员的母语背景显著影响他们对学生作文的评分;汉语评分员为2人即可保证两个评分信度系数都达到0.9或以上,而英语评分员需3人才能保证两者都达到0.7以上。(2)评分员的内部一致性较好,但评分员之间的严厉度存在显著差异;英语母语评分员对各水平段的考生作文评分偏严,对最高水平考生评分偏宽;汉语母语评分员对高水平段的考生评分倾向偏宽,对最低水平考生评分偏严。概化理论和多面Rasch模型分别从宏观和微观层面证明了,在趋中度方面,汉语母语与英语母语评分员的评分质量无差别,而在信度系数、评分员一致性、对评分量表的把握、与考生交互方面,汉语母语评分员的评分质量则更高一些。To explore the differences in scoring reliability and rater behavior between native Chinese speaking(NCS) raters and native English speaking(NES) raters, Generalizability Theory(GT) and many-facet Rasch model(MFRM) were applied to analyze their holistic scorings of the 448 English writing samples written by Chinese university students. The results were as follows:(1) Raters’ mother tongue background had significant impact on their rating of English writing;generalizability coefficient and dependability coefficient of writing scoring were excellent even if only two NCS raters were used to rate;the two coefficients were acceptable when three NES raters were used to rate.(2) Raters had sound intrarater consistency, but they differed significantly from each other in severity;NES raters tended to be severely biased towards all the low-mediumhigh level groups and be leniently biased towards the examinee with the highest ability;NCS raters tended to be leniently biased towards high level group while be severely biased towards the examinee with the lowest ability. GT and MFRM proved from macro-and micro-levels respectively that the scoring quality of NCS raters showed no difference from that of NES raters in terms of central tendency, while the scoring quality of NCS raters is higher than that of NES raters in terms of reliability coefficient, rater consistency, grasp of rating scale, and rater-examinee interaction.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249