词典与统计方法结合的中文分词模型研究及应用被引量：18

Analysis and application of Chinese word segmentation model which consist of dictionary and statistics method

出　　处：《计算机工程与设计》2012年第1期387-391,共5页Computer Engineering and Design

基　　金：国家自然科学基金项目(71001085)

摘　　要：为了解决传统的基于词典的分词法和基于统计的分词方法的效率和识别能力的不足,根据电子商务中商品名称信息这一特定领域的文本数据的特点进行分析,研究了mmseg分词法和基于互信息的处理方法,结合两类分词方法的优点,将mmseg分词算法和互信息的算法应用于分词处理过程中,设计并实现了一个快速、准确度高的分词模型,通过测试结果表明,该模型能够较好地解决分词的速度与效率问题。To solve the problem that there is a lack of efficiency and recognition ability in the dictionary-based word segmentation method and in the statistical-based word segmentation method, the specific areas of product name text data in E-commerce is analyzed, and the ＂mmseg＂ word segmentation method and mutual information processing method are researched. A rapid and highly accurate word segmentation model is designed and proposed, two types of word segmentation method are untilled, and ＂mmseg＂ segmentation algorithm and mutual information segmentation algorithm are applied in word segment processing. The test proves that this model can provide a better solution for segmentation speed and efficiency.

关键词：分词 mmseg算法互信息词典统计

分类号：TP311[自动化与计算机技术—计算机软件与理论]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

词典与统计方法结合的中文分词模型研究及应用被引量：18

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

词典与统计方法结合的中文分词模型研究及应用 被引量：18

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

词典与统计方法结合的中文分词模型研究及应用被引量：18