基于监督学习的同义关系自动抽取方法  被引量:8

Automatic extraction of synonymy relation using supervised learning

在线阅读下载全文

作  者:孙霞[1] 董乐红[1] 

机构地区:[1]西北大学信息科学与技术学院,陕西西安710069

出  处:《西北大学学报(自然科学版)》2008年第1期35-39,共5页Journal of Northwest University(Natural Science Edition)

基  金:国家自然科学基金(60473136);博士点基金(20040698028)

摘  要:目的解决从大规模文本中自动获取同义关系。方法将同义关系抽任务取看成一个二值分类问题,将其分为训练阶段和抽取阶段,共4个处理模块:预处理、特征生成、模型训练和分类。结果提出并建立了一种新的同义关系抽取模型,并给出了该模型的关键实现算法。结论提出的方法比基于模板方法的F_1值高出了24.4%,大幅度提高了同义关系抽取结果的精度。同时提出的方法有效地改善了基于模板方法领域自适应性差的缺点,所定义的特征和特征的权重计算更适合于判定学习算法。Aim To propose a supervised learning approach to acquire Synonymy relation between terms from domain corpus. Methods Synonymy relations in sample sentences obtained by pre-process are annotated, take them as training data, train machine learning models and perform synonymy relation extraction using the trained models. Various features are considered including words, symbols, punctuations, local position and global position. They characterize the context domain synonymy occurring roundly. Furthermore, a method of weighting was proposed, which more accurately estimates the role of the selected feature in classification. Results A novel model of acquiring synonymy relations was proposed. And key algorithms were implemented. Conclusion show that SVM, discriminative learning, performs the best and its F1 score is 24. 4% higher The experiment results than the pattern based method. Other important new findings in this work include that models in one domain can be oblained and applied them to another domain.

关 键 词:领域同义关系 机器学习 二值分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象