检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]国防科学技术大学信息系统与管理学院,长沙410073 [2]湖南师范大学文学院,长沙410081
出 处:《计算机科学》2010年第2期171-174,共4页Computer Science
基 金:"十一五"武器装备预先研究项目(513300102)资助
摘 要:针对传统的分词方法切分军事类文本存在未登录词多和部分词条特征信息不完整的问题,提出把整个分词过程分解为若干子过程,以词串为分词单位对军事类文本进行分词。首先基于词典对文本进行双向扫描,标识歧义切分字段,对切分结果一致的字段进行停用词消除,计算第一次分词得到的词条间的互信息和相邻共现频次,根据计算结果判定相应的词条组合成词串并标识,最后提取所标识的歧义字段和词串由人工对其进行审核处理。实验结果表明,词条组合后的词串的特征信息更丰富,分词效果更好。Since the unknown word in military texts is excessive,and the feature of some words is incomplete,the word segmentation method which is based on lexical chunk as the unit was provided, word segmentation was divided into some sections: bidirectional scanning in the text in the base of dictionary,marking the various and segment the words; deleting the stoic〉words which share the same segmentation results, then count words mutual information and adjacency frequency by the first time's word segmentation, according to this counting result, the lexical chunk with relevant words can be judged and signed. At last, picked up the signed various segment and lexical chunks to audit and deal with them artificially. The experimentation shows that after the word combination, the lexical chunk bears much more feature in- formation which shares a better effect of the process.
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.145