基于N元语法的汉语自动分词系统研究  被引量:2

The Research of Chinese Automatic Word Segmentation System Based on N-Gram Statistical Model

在线阅读下载全文

作  者:石佳[1] 蔡皖东[1] 

机构地区:[1]西北工业大学计算机学院,陕西西安710072

出  处:《微电子学与计算机》2009年第7期98-101,共4页Microelectronics & Computer

摘  要:提出一种基于N元语法的汉语自动分词系统,将分词与标注结合起来,用词性标注来参与评价分词结果.首先基于词典和一元语法统计模型生成N个最优结果作为候选集;然后对候选集进行基于二元语法统计模型的词性标注,最后利用对文本的上下文"理解"信息来确定最佳切分结果.实验结果表明:此方法通过词性标注的反馈有效提高了分词正确率,词性标注对分词有反馈作用.This paper present an approach for Chinese word segmentation based on N-Gram statistical model. The method integrated the segmentation with Part Of Speech tagging, and evaluated the segmentation results by the latter. Firstly, the system generated the top N segmentation results as a candidate sets by the approach based on dictionary combined with uni-gram statistical model. Then, it used the method based on bi-gram statistical model to label the candidate sets. Lastly, the best segmentation result was gained depend on the text's contextual information. Experiments show that our method could efficiently improve the segmentation accuracy through the feedback of POS tagging.

关 键 词:一元语法 二元语法 中文分词 词性标注 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象