检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]沈阳化工学院计算机科学与技术学院,辽宁沈阳110142
出 处:《沈阳化工学院学报》2008年第3期255-259,共5页Journal of Shenyang Institute of Chemical Technolgy
摘 要:时代发展对中文分词的要求越来越高,在原有机械分词方法中双向匹配理论的基础上,对其进行改进,以提高分词的速度和准确率.在原方法上分别为正向匹配和逆向匹配增加了一个词头表,利用Java语言中Map和Set具有Hash结构的特性,进行程序设计.设计以《红楼梦》为例,对改进的方法进行测试,结果证明改进的方法可行,与原有方法比较,在速度上和准确率上都有较大的提高.In order to meet the more need of societal and economic development on the Chinese word segmentation, the original mechanical segmentation of high speed and word accuracy rate based on the theory of two-way matching is improved. In the original method of the positive and reverse match a match prefixes table is increased, and the Java language is utilized in a Hash Map. The structure of the set has the Hash character in the programming design. The imorved methods were tested by Design of "A Dream of Red Mansions" as an instance, t, and the results showed that the "Dream of Red Mansions" book for word frequency statistics only consume 15 seconds. The high speed and accuracy rate obtained proved that the improved method is feasible.
关 键 词:中文分词 双向匹配 正向匹配 逆向匹配 JAVA
分 类 号:TP32[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.42