一种适用于复合术语的本体概念学习方法  被引量:10

Ontology Concept Learning Method for Compound Terms

在线阅读下载全文

作  者:李江华[1,2] 时鹏[1] 胡长军[3] 

机构地区:[1]北京科技大学国家材料服役安全科学中心,北京100083 [2]江西理工大学信息工程学院,赣州341000 [3]北京科技大学计算机与通信工程学院,北京100083

出  处:《计算机科学》2013年第5期168-172,共5页Computer Science

基  金:国家"十二五"科技支撑计划项目(2011BAK08B04);中央高校基本科研业务费专项资金资助项目(FRF-TP-12-162A);江西省教育厅科技项目(GJJ12345)资助

摘  要:术语的提取显然在本体概念学习中起着重要作用,由于汉语文本中词与词之间没有明显的界限,使得领域术语特别是复合术语的提取尤为困难。针对传统提取方法缺乏语义支持、计算量大、准确率低等不足,提出了一种适用于复合术语提取的本体概念学习方法。首先利用自然语言处理技术过滤掉与术语无关的成分,对语句进行自然切割,为领域术语提取提供完整的候选数据集,以保证候选领域复合术语不被误分。在此基础上,根据术语的领域统计和分布特征,利用术语频率和信息熵进行多策略的领域术语筛选,经同义术语识别与合并,获得领域概念集。经实验验证,提出的方法能够以较高的准确率从领域文本中提取出领域单词术语和复合术语。Term extraction plays an important role in ontology concept learning based on text. Because of no clear boundary among words in Chinese text, domain terms, especially compound terms, are difficult to be extracted. Tradi- tional term extraction methods usually need large amount of calculation and lack of semantic supporting. A novel ontology concept learning method for compound terms was presented in this paper. At first, natural language processing technolo- gy is utilized to remove the irrelevant parts to get candidate terms. Sentences in the text are cut by punctuation marks and removed parts, so that the candidate compound terms can be reserved from wrong cutting. The candidate domain- specific terms are filtered by term frequency and information entropy with multi-strategy, according to the characteris- tics of distribution and statistics of terms. Then domain-specific concept set is obtained after the synonymous terms recog-nition. Experimental results show that the method can extract domain-specific word terms and compound terms with higher precision.

关 键 词:术语提取 术语筛选 复合术语 本体概念学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象