汉语内层最长名词短语的识别研究  

Recognition of Chinese Inner Maximal Noun Phrase

在线阅读下载全文

作  者:钱小飞[1] QIAN Xiaofei(College of Liberal Arts,Shanghai University,Shanghai 200444,China)

机构地区:[1]上海大学文学院,上海200444

出  处:《浙江外国语学院学报》2019年第6期59-67,共9页Journal of Zhejiang International Studies University

摘  要:汉语名词短语的内部结构复杂,找出名词短语内部嵌套的最长名词性成分,有助于消解底层句法歧义,挖掘论元结构和语义关系。文章分析了汉语内层最长名词短语的多层级分布特征,指出数据稀疏、结构歧义和边界歧义是识别的难点,并提出了一种基于条件随机场模型和基本名词块提升规则的识别方法,取得了85.23%的结构正确率和78.71%的结构召回率。实验结果表明,上层结构误识、联合结构、“v n n”格式、De后主谓结构和特殊歧义序列等造成的歧义是制约识别效果的主要原因。解决这些问题需要更多句法语义知识的参与,如在词汇层面收录含v简单组块,在句法层面引入句法规则验证机制等。Chinese noun phrase has complex structures.Recognizing the nested inner maximal noun constituents is helpful in distinguishing ambiguity in bottom syntactic analysis,and analyzing argument structures and relations.This paper analyzes the multi-level distribution feature of inner Maximal Noun Phrase,and found that the data sparse problem,structural ambiguities,and boundary ambiguities are the difficulties for analysis.It advances a method of combining Conditional Random Field and promoting rules based on Nominal Base Chunk,and the experiment achieved 85.23%in precision and 78.71%in recall.The analysis shows that the ambiguity caused by the misrecognition of high-level structures,the structure of coordination,the“v n n”format,the subject-predicate structure after“De”,and the special ambiguous sequence are the main reasons for the restricted recognition effects.It needs more linguistic knowledge to solve the problem,such as including simple chunks with verbs in dictionary,and introducing syntactic authentication mechanism.

关 键 词:内层最长名词短语 识别 条件随机场 基本名词块提升 

分 类 号:H087[语言文字—语言学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象