鲁棒性的汉语人称代词消解  被引量:36

Robust Pronominal Resolution within Chinese Text

在线阅读下载全文

作  者:王厚峰[1] 梅铮[1] 

机构地区:[1]北京大学计算机科学技术系,北京100871

出  处:《软件学报》2005年第5期700-707,共8页Journal of Software

基  金:国家自然科学基金;国家高技术研究发展计划(863)~~

摘  要:指代消解在自然语言处理中起着越来越重要的作用.许多自然语言处理应用系统都需要高效、鲁棒的指代消解策略.然而,传统的指代消解方法需要用到句法知识、语义知识、上下文知识,甚至领域知识等多级知识,在目前的自然语言处理水平下,要有效获取这些知识是相当困难的.结合汉语的特点,提出了一种弱化语言知识的人称代词消解方法,仅仅用到了单复数特征、性别特征和语法角色特征.该方法主要分为两步,首先,利用这3种特征的简单约束关系,过滤与人称代词特征不一致的词,并形成可能的先行语候选集;然后,使用一个权值算法,计算候选的权值,并将最高权值的候选作为代词最终的先行语.权值算法并不是枚举式地计算每个候选的权值,而会通过动态评测机制,在合适的条件下自动终止计算,因而有效地控制了计算复杂度.此外,该方法不需要对文本进行深层的分析处理,实现起来也很容易.测试结果表明,该方法达到了满意效果.Anaphora Resolution is playing more and more important role in Natural Language Processing. There is an increasing need for the development of effective and robust strategies of anaphora resolution to meet the demands of practical applications. However, traditional approaches to anaphora resolution rely heavily on multilevel linguistic knowledge, such as syntactic, semantic, contextual and domain knowledge. It is undoubtedly difficult to acquire such knowledge at present. This paper presents a two-step approach with limited knowledge to resolve pronominal anaphora within Chinese text, which only uses number features, gender features and the features of grammatical roles. In this approach, a filter is firstly used to eliminate those expressions whose features are inconsistent with the pronoun, and thus form a set of potential antecedent candidates; then, a scoring algorithm is employed to calculate score of the candidates, and the candidate with the highest score is selected as the resultant antecedent. The algorithm does not examine each candidate in the set, but automatically determine whether to end the calculation or not by dynamically testing a termination condition, therefore the computational complexity is low. In addition, the approach does not need a deep analysis of the text, and can easily be implemented. Experiment shows the result is satisfactory.

关 键 词:人称代词消解 先行语 特征 过滤 权值算法 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象