串频统计和词形匹配相结合的汉语自动分词系统被引量：65

An Chinese Word Automatic Segmentation System Based on String Frequency Statistics Combined with Word Matching

出　　处：《中文信息学报》1998年第1期17-25,共9页Journal of Chinese Information Processing

摘　　要：本文介绍了一种汉语自动分词软件系统，该系统对原文进行三遍扫描：第一遍，利用切分标记将文本切分成汉字短串的序列；第二遍，根据各短串的每个子串在上下文中的频度计算其权值，权值大的子串视为候选词；第三遍，利用候选词集和一部常用词词典对汉字短串进行切分。实验表明，该分词系统的分词精度在１．５％左右，能够识别大部分生词。This paper presents a software system on Chinese automatic word segmentation.The original text is scanned three times:first,the text is cut into short Chinese character string sequence by cut marks;second,every short sting is weighted by its frequency in context,and the short strings weighted heavy are regarded as candidate words;third,short strings are segmented by candidate word set and everyday words.Experiments results shows that the segmentation precision of this word segmentation system is aboue 1.5%,and a large part of new words can be recognized correctly.This system is very suitable to document retrieval and other areas.

关键词：中文信息处理自动分词汉语串频统计词形匹配

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

串频统计和词形匹配相结合的汉语自动分词系统被引量：65

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

串频统计和词形匹配相结合的汉语自动分词系统 被引量：65

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

串频统计和词形匹配相结合的汉语自动分词系统被引量：65