基于词条组合的军事类文本分词方法  被引量:2

Word Segmentation Approach in Military Text on the Basis of Word Combination

在线阅读下载全文

作  者:黄魏[1] 高兵[1] 刘异[2] 杨克巍[1] 

机构地区:[1]国防科学技术大学信息系统与管理学院,长沙410073 [2]湖南师范大学文学院,长沙410081

出  处:《计算机科学》2010年第2期171-174,共4页Computer Science

基  金:"十一五"武器装备预先研究项目(513300102)资助

摘  要:针对传统的分词方法切分军事类文本存在未登录词多和部分词条特征信息不完整的问题,提出把整个分词过程分解为若干子过程,以词串为分词单位对军事类文本进行分词。首先基于词典对文本进行双向扫描,标识歧义切分字段,对切分结果一致的字段进行停用词消除,计算第一次分词得到的词条间的互信息和相邻共现频次,根据计算结果判定相应的词条组合成词串并标识,最后提取所标识的歧义字段和词串由人工对其进行审核处理。实验结果表明,词条组合后的词串的特征信息更丰富,分词效果更好。Since the unknown word in military texts is excessive,and the feature of some words is incomplete,the word segmentation method which is based on lexical chunk as the unit was provided, word segmentation was divided into some sections: bidirectional scanning in the text in the base of dictionary,marking the various and segment the words; deleting the stoic〉words which share the same segmentation results, then count words mutual information and adjacency frequency by the first time's word segmentation, according to this counting result, the lexical chunk with relevant words can be judged and signed. At last, picked up the signed various segment and lexical chunks to audit and deal with them artificially. The experimentation shows that after the word combination, the lexical chunk bears much more feature in- formation which shares a better effect of the process.

关 键 词:军事 文本 分词 词条 

分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象