基于深层特征抽取的日文词义消歧系统  被引量:1

Japanese word sense disambiguation system based on deep feature extraction

在线阅读下载全文

作  者:雷雪梅[1] 王大亮[2] 田中贵秋 曾广平[1] 

机构地区:[1]北京科技大学信息工程学院,北京100083 [2]中国电信集团系统集成公司,北京100035 [3]NTT通信科学研究所自然语言研究组,京都6190237

出  处:《北京科技大学学报》2010年第2期263-269,共7页Journal of University of Science and Technology Beijing

基  金:国家高技术研究发展计划资助项目(No.2007AA01Z170)

摘  要:词义消歧的特征来源于上下文.日文兼有中英文的语言特性,特征抽取更为复杂.针对日文特点,在词义消歧逻辑模型基础上,利用最大熵模型优良的信息融合性能,采用深层特征抽取方法,引入语义、句法类特征用于消解歧义.同时,为避免偏斜指派,采用BeamSearch算法进行词义序列标注.实验结果表明,与仅使用表层词法类特征方法相比,本文构造的日文词义消歧系统的消歧精度提高2%~3%,动词消歧精度获得5%的改善.The features of word sense disambiguation (WSD) come from the context. Japanese has linguistic features of both Chinese and English at the same time, thus the feature extraction of Japanese is more complicated. Considering Japanese features, based on the proposed WSD logic model and applying the characteristics of information integration of the maximum entropy model, WSD was solved by the deep feature extraction method, introducing semantics and syntactics features. Meanwhile, for preventing the skewed assignment of lonely word sense, the word sense tagging of word sequences was completed with the BeamSearch algorithm. Experiment results show that compared with WSD methods which only focus on the surface lexical features, the disambiguation accuracy of the Japanese WSD system proposed in this paper increases 2% to 3% , and the WSD accuracy of verbs improves 5%.

关 键 词:自然语言处理 词义消歧 最大熵模型 特征抽取 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象