基于数据词典的中文分词算法优化实现被引量：4

Realization of Chinese Word Segmentation Algorithm Optimization Based on Data Dictionary

作　　者：鲍曙光 BAO Shuguang(Vocational Education Center,China Coast Guard Academy,Ningbo 315801,China)

出　　处：《现代信息科技》2022年第7期80-84,共5页Modern Information Technology

摘　　要：中文分词算法是中文自然语言理解的基础,文章运用C#语言实现了正向、逆向、最长词、最短词的分词算法,通过大量样本实例分析,对不同算法进行了比较,介绍了分词算法在新词发现、歧义发现中的应用,重点阐述了关系型数据库、文本文件等不同数据结构的数据词典对中文分词算法速度的影响,创新性地引入一种非常规的数据词典索引表,大大提升了分词算法的速度。Chinese word segmentation algorithm is the basis of Chinese natural language understanding.This paper uses C# language to realize the forward,reverse,longest and shortest word segmentation algorithms.Through the analysis of a large number of sample examples,this paper compares different algorithms,introduces the application of word segmentation algorithm in new word discovery and ambiguity discovery,and focuses on the impact of data dictionaries with different data structures such as relational databases and text files on the speed of Chinese word segmentation algorithm,an unconventional data dictionary index table is innovatively introduced,which greatly improves the speed of word segmentation algorithm.

关键词：中文分词算法优化新词发现歧义消除自然语言识别

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据词典的中文分词算法优化实现被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于数据词典的中文分词算法优化实现 被引量：4

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于数据词典的中文分词算法优化实现被引量：4