检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:袁毓林 YUAN Yulin
机构地区:[1]澳门大学人文学院中国语言文学系,中国澳门999078 [2]北京大学中文系、中国语言学研究中心,100871
出 处:《汉语学报》2024年第4期2-16,共15页Chinese Linguistics
基 金:澳门大学讲座教授研究与发展基金(编号:CPG2024-00005-FAH);启动研究基金(编号:SRG2022-00011-FAH)的资助。
摘 要:本文讨论怎样通过跟人类基线的比较,来合理地评估ChatGPT等现代大型语言模型的语言运用能力。首先,用代词指称歧义和否定辖域问题测试ChatGPT,展示语言大模型在语义理解和常识推理方面的优秀表现;接着简介维诺格拉德模式挑战及其升级版本WinoGrande数据集,还介绍了我们对于这种类型的测试题和机器表现的评估方式的改进方案(把仅触发词不同的“句对”扩展为锚定词也不同的“句偶”,把机器表现跟人类被试的表现进行比较);然后介绍我们怎样用“句对”和“句偶”测试ChatGPT和人类被试,并且把人类和机器的表现进行对比,从而得出语言大模型的语言运用能力接近人类的结论。This paper discusses how the language performance of ChatGPT and other modern large language models can be properly evaluated through a comparison between their language performance and human baseline.First,the ambiguity of pronominal reference and the scope of negative expressions are used to test ChatGPT,in which way the excellent performance of large language models is demonstrated in terms of their semantic understanding and commonsense reasoning.Secondly,Winograd Schema Challenge(WSC)is introduced with a focus on its upgraded version,that is,WinoGrande Dataset.In addition,we also propose two improved solutions to these types of test items and evaluative methods of machine performance.One is an extension from“sentence pairs”which trigger different words to“sentence couples”which anchor different words,and the other is a comparison between machine performance and the performance of human subjects.Thirdly,after sentence pairs and sentence couples are used to test ChatGPT and human subjects,the performance of the human and that of the machine are contrasted.On this basis,a conclusion is drawn that large language models are approaching humans in terms of language performance.
关 键 词:语义理解/常识推理 ChatGPT/大型语言模型 维诺格拉德模式/句对与句偶 机器表现/人类基线
分 类 号:TP18[自动化与计算机技术—控制理论与控制工程] H08[自动化与计算机技术—控制科学与工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.191.205.60