一种无词典快速抽词算法的设计和实现

Design and Implement of a Fast Extracting Words Algorithm without Using Dictionary

出　　处：《微计算机信息》2008年第27期181-183,共3页Control & Automation

摘　　要：中文抽词在中文自然语言处理中是最基础的工作。本文提出了一种无词典的t-score和二分相结合的抽词算法。它首先对原始文本进行预处理,利用噪音词的辅助信息来做初始切分,经过处理后一部分词被抽取出来,存入结果集。接着利用本文的抽词算法来进行二次抽词,本算法应用了N-Gram的思想,经过实验证明,该算法不但抽词速度快,而且抽取出的词相对长度大,维护了中文语言的完整性,为进一步进行语义分析和索引构建打下了良好的基础。Chinese word extraction is the most basic work in Chinese NLP. This paper presents a new algorithm which use t-score and dimidiate method with no-lexicon. Firstly it pre-process the initial text and use the assistant information of noisy words to cut the text. After the pre-processing some words have been extracted and put into the result sets. Then using the new algorithm from this paper extract Chinese words for the second times. Owing to the N-Gram ideology ,preliminary experiments show that this algo- rithm is not only effective in extracting Chinese words but also favorable for which maintains the Chinese language＇s integrity due to the relative longer length of extracted words. So it establish well base for the next semantics analysis and index construction.

关键词：无词典 T-SCORE 二分法快速抽词

分类号：TP391.1[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种无词典快速抽词算法的设计和实现

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种无词典快速抽词算法的设计和实现

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索