检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]鲁东大学文学院,山东烟台264025 [2]北京大学计算语言学教育部重点实验室,北京100871 [3]语言能力协同创新中心,江苏徐州221009
出 处:《中文信息学报》2017年第5期1-7,20,共8页Journal of Chinese Information Processing
基 金:国家自然科学基金(61572245);国家重点基础研究发展计划(2014CB340504);国家社会科学基金(15BYY094)
摘 要:词类标注问题历来受到中文信息处理、汉语语法和词汇学界的共同关注,学者们已提出多种词类标记体系,彼此间存在较大差异,但迄今尚无人对大规模词类标注工程进行系统比较。该文以《现代汉语词典》第5版和《现代汉语语法信息词典》两个大型词典词类标注工程为比较对象,基于所提出的词类对应算法,自动找出两部词典词类标注上的差异,进而对形成差异的原因进行分析。分析结果表明,两部词典词类标注一致性较高(83.5%完全相同),而存在差异的地方可归结为三类主要原因:词类迁移;词类判断标准不一致;收录义项不同。Part-of-speech annotation has attracted extensive attention from the areas including Chinese information processing,Chinese grammar study and Chinese lexicographer.Multiple part-of-speech systems have been proposed and there are significant differences between these systems.So far,little research has been done to systematically compare different large-scale part-of-speech annotations.Based on the part-of-speech annotation results in Dictionary of Contemporary Chinese and Grammatical Knowledge-Base Dictionary,this paper proposes a mapping algorithm,which can detect part-of-speech differences in two dictionaries automatically.Further,we analyze the differences and conclude in two perspectives.1)about 83.5% of the part-of-speech annotation results is identical.and 2)all the differences can be attributed to three effects:part-of-speech shifting,different part-of-speech annotation standards and different senses.
关 键 词:现代汉语词典 现代汉语语法信息词典 词类标注 词类对应
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30