检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京教育科学研究院 [2]北京师范大学
出 处:《中国考试》2015年第2期39-48,共10页journal of China Examinations
基 金:北京市教育科学"十二五"规划2012年度青年专项课题"学业水平测验认知诊断功能的应用研究"(CHA12109)成果之一
摘 要:目前大规模考试作文评分大都采用双评评分模式,本研究采用多侧面Rasch模型(MFRM)分析双评模式下大型英语作文评分中的评分者误差来源及主要影响因素。对57名评分者所评价的2 427篇作文分析发现:1评分者的宽严度存在显著的差异;2在作文评分中,约有22.8%的评分者之间的一致性较差,也存在约3.5%的评分者之间一致性过高;3约90%的评分者自身的一致性都较高,但仍有8.8%的评分者自身一致性很差,约2%的评分者出现评分自身一致性过高的情况;4从整体上讲,评分者在不同的评分标准(或维度)上、不同评分等级宽严程度的把握存在差异;评分者和被试,以及评分者、被试和评分标准三者的交互作用不显著;5评分者对男生和女生具有相同的宽严度。This research would investigate the extent to which that second language writing performance scores were influenced by rater effect in large scale assessment in China. Writing samples were obtained from 2427(1491 females, 936 males)first grade students in Junior high school. The 54 raters in this study were all experienced specialists in the field of Teaching English as the second language. Each examinee was randomly scored by two raters. Each writing sample was scored according to five criterion:①Information, a 4-point scale was use to measure content;②Gracture, which is a 4-point scale used to evaluate the sentence; ③Mechanics, a 3-point scale is for the overall structure;④Length, a 2-point scale used to measure the number of words;and⑤Coherence, a 3-point scale u the expression. The MFRM analysis was completed using Facets software. Three facets were analyzed including persons, raters, and rating criteria based on Partial credit Model. The findings in this study indicated that①Raters differed in severity or leniency.②Some raters could not follow the rating scale consistently, while others could not stay close to their own scoring standard.③Raters could be able to maintain an constant level of severity across all the examinees, but not to all five criteria. ④There was no differential rater functioning related to the gender of examinees, which also means that the raters maintained a consistent severity or leniency across male and female examinees. MFRM study had a number of implications for rating issues in L2 writing assessment. Individual feedbacks can improve the efficiency of rater training to ensure objectivity and fairness of the writing performance assessment.
关 键 词:主观题评分 多侧面Rasch模型 评分者误差分析
分 类 号:G405[文化科学—教育学原理]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15