检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]西北工业大学计算机学院,陕西西安710072
出 处:《微电子学与计算机》2009年第7期98-101,共4页Microelectronics & Computer
摘 要:提出一种基于N元语法的汉语自动分词系统,将分词与标注结合起来,用词性标注来参与评价分词结果.首先基于词典和一元语法统计模型生成N个最优结果作为候选集;然后对候选集进行基于二元语法统计模型的词性标注,最后利用对文本的上下文"理解"信息来确定最佳切分结果.实验结果表明:此方法通过词性标注的反馈有效提高了分词正确率,词性标注对分词有反馈作用.This paper present an approach for Chinese word segmentation based on N-Gram statistical model. The method integrated the segmentation with Part Of Speech tagging, and evaluated the segmentation results by the latter. Firstly, the system generated the top N segmentation results as a candidate sets by the approach based on dictionary combined with uni-gram statistical model. Then, it used the method based on bi-gram statistical model to label the candidate sets. Lastly, the best segmentation result was gained depend on the text's contextual information. Experiments show that our method could efficiently improve the segmentation accuracy through the feedback of POS tagging.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7