关于对数线性模型在词性标注中的应用  被引量:1

Improved Part-of-speech Tagging by Log-linear Models

在线阅读下载全文

作  者:王保芳[1] 张瑞强 

机构地区:[1]河南大学医学院,河南开封475004 [2]日本先端基础技术研究所,日本京都

出  处:《计算机科学》2008年第5期163-166,共4页Computer Science

摘  要:词性标注是自然语言理解中很长期的问题,但对于大词性标注集的词性标注,它的标注精度还很低。为此我们应用隐含马尔可夫方法(HMM)和最大熵方法对大词性标注集的词性标注问题进行了研究,并在此基础上提出了关于词性标注的最新方法——对数线性模型,以此来提高词性标注精度。此次实验分别在运用HMM模型时,提出了新的光滑算法;在运用最大熵模型上,集成了详细的局部和远距离的上下文特征信息;在对数线性模型中,集成了HMM模型和最大熵模型,并进行了对比。结果表明综合了多源信息的对数线性模型标注精度达81.52%,取得了比传统的HMM模型更好的结果。This paper presented our latest approaches for improving English part-of-speech tagging with a large tagset by using a log-linear model. We found that integration of multiple probability models log-linearly led to significant improvements on part-of-speech tagging. We compared our proposed approach with the two traditional approaches, hidden markov model (HMM) and maximum entropy principle. The HMM approach was implemented by using a new smoothing method while for the maximum entropy approach, we integrated detailed local and long range word and tag information as the predictive features. Those predictive features were proved very effective for part-of-speech tagging. The experimental results showed that the maximum entropy model integrating multiple source information achieved higher POS tagging accuracy than the HMM models, however, the maximal improvements were achieved by the proposed log-linear models.

关 键 词:对数线性模型 最大熵模型 词性标注 自然语言理解 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] O212[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象