检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]北京科技大学国家材料服役安全科学中心,北京100083 [2]江西理工大学信息工程学院,赣州341000 [3]北京科技大学计算机与通信工程学院,北京100083
出 处:《计算机科学》2013年第5期168-172,共5页Computer Science
基 金:国家"十二五"科技支撑计划项目(2011BAK08B04);中央高校基本科研业务费专项资金资助项目(FRF-TP-12-162A);江西省教育厅科技项目(GJJ12345)资助
摘 要:术语的提取显然在本体概念学习中起着重要作用,由于汉语文本中词与词之间没有明显的界限,使得领域术语特别是复合术语的提取尤为困难。针对传统提取方法缺乏语义支持、计算量大、准确率低等不足,提出了一种适用于复合术语提取的本体概念学习方法。首先利用自然语言处理技术过滤掉与术语无关的成分,对语句进行自然切割,为领域术语提取提供完整的候选数据集,以保证候选领域复合术语不被误分。在此基础上,根据术语的领域统计和分布特征,利用术语频率和信息熵进行多策略的领域术语筛选,经同义术语识别与合并,获得领域概念集。经实验验证,提出的方法能够以较高的准确率从领域文本中提取出领域单词术语和复合术语。Term extraction plays an important role in ontology concept learning based on text. Because of no clear boundary among words in Chinese text, domain terms, especially compound terms, are difficult to be extracted. Tradi- tional term extraction methods usually need large amount of calculation and lack of semantic supporting. A novel ontology concept learning method for compound terms was presented in this paper. At first, natural language processing technolo- gy is utilized to remove the irrelevant parts to get candidate terms. Sentences in the text are cut by punctuation marks and removed parts, so that the candidate compound terms can be reserved from wrong cutting. The candidate domain- specific terms are filtered by term frequency and information entropy with multi-strategy, according to the characteris- tics of distribution and statistics of terms. Then domain-specific concept set is obtained after the synonymous terms recog-nition. Experimental results show that the method can extract domain-specific word terms and compound terms with higher precision.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.249