《现汉》与《语法信息词典》词类对应分析  被引量:3

Analysis of Parts-of-speech Correspondence Between DCC and GKB

在线阅读下载全文

作  者:邱立坤[1] 赵慧 俞士汶[2,3] 朱学锋 

机构地区:[1]鲁东大学文学院,山东烟台264025 [2]北京大学计算语言学教育部重点实验室,北京100871 [3]语言能力协同创新中心,江苏徐州221009

出  处:《中文信息学报》2017年第5期1-7,20,共8页Journal of Chinese Information Processing

基  金:国家自然科学基金(61572245);国家重点基础研究发展计划(2014CB340504);国家社会科学基金(15BYY094)

摘  要:词类标注问题历来受到中文信息处理、汉语语法和词汇学界的共同关注,学者们已提出多种词类标记体系,彼此间存在较大差异,但迄今尚无人对大规模词类标注工程进行系统比较。该文以《现代汉语词典》第5版和《现代汉语语法信息词典》两个大型词典词类标注工程为比较对象,基于所提出的词类对应算法,自动找出两部词典词类标注上的差异,进而对形成差异的原因进行分析。分析结果表明,两部词典词类标注一致性较高(83.5%完全相同),而存在差异的地方可归结为三类主要原因:词类迁移;词类判断标准不一致;收录义项不同。Part-of-speech annotation has attracted extensive attention from the areas including Chinese information processing,Chinese grammar study and Chinese lexicographer.Multiple part-of-speech systems have been proposed and there are significant differences between these systems.So far,little research has been done to systematically compare different large-scale part-of-speech annotations.Based on the part-of-speech annotation results in Dictionary of Contemporary Chinese and Grammatical Knowledge-Base Dictionary,this paper proposes a mapping algorithm,which can detect part-of-speech differences in two dictionaries automatically.Further,we analyze the differences and conclude in two perspectives.1)about 83.5% of the part-of-speech annotation results is identical.and 2)all the differences can be attributed to three effects:part-of-speech shifting,different part-of-speech annotation standards and different senses.

关 键 词:现代汉语词典 现代汉语语法信息词典 词类标注 词类对应 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象