检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]大连理工大学外国语学院,辽宁大连116024 [2]大连理工大学计算机科学与技术学院,辽宁大连116024
出 处:《中文信息学报》2016年第6期59-66,共8页Journal of Chinese Information Processing
基 金:教育部人文社会科学研究规划基金(13YJAZH062)
摘 要:名词短语识别在句法分析中有着重要的作用,而英汉机器翻译的瓶颈之一就是名词短语的歧义消解问题。研究英语功能名词短语的自动识别,则将名词短语的结构消歧问题转化成名词短语的识别问题。基于名词短语在小句中的语法功能来确定名词短语的边界,选择商务领域语料,采用了细化词性标注集和条件随机域模型结合语义信息的方法,识别了名词短语的边界和句法功能。在预处理基于宾州树库细化了词性标注集,条件随机域模型中加入语义特征主要用来识别状语类的名词短语。实验结果表明,结合金标准词性实验的F值达到了89.04%,改进词性标注集有助于提高名词短语的识别,比使用宾州树库标注集提高了2.21%。将功能名词短语识别信息应用到NiuTrans统计机器翻译系统,英汉翻译质量略有提高。The study on the automatic identification of English functional noun phrases(NP)may transform the task of resolving structural ambiguity caused by noun phrases into the task of NP chunking.Functional noun phrases refer to those noun phrases which are defined based on their syntactic functions in clauses.On a corpus of business domain,this study aims to identify both the scope of NP chunks and their syntactic function types by refining the Partof-speech(POS)tagset,and adopting conditional random fields(CRFs)model combined with the semantic information.Modification to the Penn Treebank tagset is completed in the pre-processing,and semantic features are added to the CRFs model to improve the recognition of the adjunct types of noun phrases.Test results show that the system has achieved an F-score of 89.04%in the open test using our gold standard tags;and refining the POS tagset is a better approach for NP chunking,which has increased the F-score by 2.21%,compared with the model using the Penn Tree bank POS tags.This knowledge of English functional noun phrases is then combined with the NiuTrans SMT system,which slightly improves the English Chinese translation performance.
关 键 词:功能名词短语 名词短语识别 条件随机域模型 语义信息
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.13