一种基于近邻匹配的中文分词算法Jlppeccz  

Jlppeccz:A New Word Segmentation Algorithm Based on Neiboring Match

在线阅读下载全文

作  者:耿新青[1] 陶凤梅[1] 黄宏光[1] 

机构地区:[1]鞍山师范学院数学系,辽宁鞍山114007

出  处:《鞍山师范学院学报》2010年第4期46-48,共3页Journal of Anshan Normal University

基  金:国家自然科学基金资助项目(60275020)

摘  要:提出一种基于近邻匹配新的分词算法Jlppeccz,该算法首先把一篇文章以标点符号为界线分成若干个句子,然后用近邻匹配方法把一句话切分成1~4字的词,通过对词库的搜索,对已分的词进行重组,把小词合并成大词,再将处理过的词存储到一个临时的词库里,以备后续的句子查找,并可实现对词库添加词的功能.与经典MM算法和词频统计方法相比,本文算法有较大的改进.This paper presents a new Chinese word segmentation algorithm Jlppeccz based on neighboring match.The traditional MM algorithm which may easily produce ambiguity depends on dictionary strongly.JIppeccz algorithm divided a article into some sentences with the benchmark of punctuation mark,then one sentence is cut into one word or multiword by neighboring match.The database of the words is searched;the words which have been divided are recombined;the small phrases are combined into the big ones,the words are put into a temporary table to prepare for the following phrases;the words are added into the database of the words.Compared to the classical MM algorithm and the word frequency statistics algorithm,Jlppeccz algorithm has greater improvement.Experiment shows the present algorithm possesses higher precision and efficiency than MM algorithm.The example demonstrates the effectiveness of the present algorithm.

关 键 词:中文分词 近邻匹配 分词系统 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象