基于计算机自动分词的研究  被引量:3

Automatic Segmentation Study Based on Computer

在线阅读下载全文

作  者:李瑞芳[1] 孙健[1] 李娜[1] 

机构地区:[1]沈阳化工学院计算机科学与技术学院,辽宁沈阳110142

出  处:《沈阳化工学院学报》2008年第3期255-259,共5页Journal of Shenyang Institute of Chemical Technolgy

摘  要:时代发展对中文分词的要求越来越高,在原有机械分词方法中双向匹配理论的基础上,对其进行改进,以提高分词的速度和准确率.在原方法上分别为正向匹配和逆向匹配增加了一个词头表,利用Java语言中Map和Set具有Hash结构的特性,进行程序设计.设计以《红楼梦》为例,对改进的方法进行测试,结果证明改进的方法可行,与原有方法比较,在速度上和准确率上都有较大的提高.In order to meet the more need of societal and economic development on the Chinese word segmentation, the original mechanical segmentation of high speed and word accuracy rate based on the theory of two-way matching is improved. In the original method of the positive and reverse match a match prefixes table is increased, and the Java language is utilized in a Hash Map. The structure of the set has the Hash character in the programming design. The imorved methods were tested by Design of "A Dream of Red Mansions" as an instance, t, and the results showed that the "Dream of Red Mansions" book for word frequency statistics only consume 15 seconds. The high speed and accuracy rate obtained proved that the improved method is feasible.

关 键 词:中文分词 双向匹配 正向匹配 逆向匹配 JAVA 

分 类 号:TP32[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象