检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]国防科学技术大学信息系统与管理学院,长沙410073 [2]湖南师范大学文学院,长沙410081
出 处:《科学技术与工程》2010年第1期85-89,共5页Science Technology and Engineering
基 金:"十一五"武器装备预先研究项目(513300102)资助
摘 要:针对汉语自动分词后词条的特征信息缺失的问题,提出以词串为分词单位的中文文本分词方法,将整个分词过程分解为三个子过程:首先,采用逆向最大匹配法对文本进行切分;第二,对切分结果进行停用词消除;第三,计算第一次分词得到的词条互信息和相邻共现频次,根据计算结果判定相应的词条组合成词串。实验结果表明,词条组合后的词串的语义信息更丰富,有助于文本特征选择效果的改善和文本分类性能的提高。Since the automatic of Chinese word will bring the lack of information,method of word segmentation according to lexical chunk as segmentation unit are proposed, such segmenting process divided are into three sub-process: firstly,text segmentec by means of Backward Maximum Matching. Second,the stop-words is deleted from the segmentation result. At last,count words mutual information and adjacency by the first time segment words,and then, according to this counting result the lexical chunk can judge and sign by relevant words. The experimentation shows that after the word combination,the lexical chunk bear much more feature information which shares a better effect of the process. It also proves the effect of feature selection in Chinese text categorization and enhanced the capability of text classification.
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145