检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]河南大学医学院,河南开封475004 [2]日本先端基础技术研究所,日本京都
出 处:《计算机科学》2008年第5期163-166,共4页Computer Science
摘 要:词性标注是自然语言理解中很长期的问题,但对于大词性标注集的词性标注,它的标注精度还很低。为此我们应用隐含马尔可夫方法(HMM)和最大熵方法对大词性标注集的词性标注问题进行了研究,并在此基础上提出了关于词性标注的最新方法——对数线性模型,以此来提高词性标注精度。此次实验分别在运用HMM模型时,提出了新的光滑算法;在运用最大熵模型上,集成了详细的局部和远距离的上下文特征信息;在对数线性模型中,集成了HMM模型和最大熵模型,并进行了对比。结果表明综合了多源信息的对数线性模型标注精度达81.52%,取得了比传统的HMM模型更好的结果。This paper presented our latest approaches for improving English part-of-speech tagging with a large tagset by using a log-linear model. We found that integration of multiple probability models log-linearly led to significant improvements on part-of-speech tagging. We compared our proposed approach with the two traditional approaches, hidden markov model (HMM) and maximum entropy principle. The HMM approach was implemented by using a new smoothing method while for the maximum entropy approach, we integrated detailed local and long range word and tag information as the predictive features. Those predictive features were proved very effective for part-of-speech tagging. The experimental results showed that the maximum entropy model integrating multiple source information achieved higher POS tagging accuracy than the HMM models, however, the maximal improvements were achieved by the proposed log-linear models.
关 键 词:对数线性模型 最大熵模型 词性标注 自然语言理解
分 类 号:TP391[自动化与计算机技术—计算机应用技术] O212[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229