基于加权投票K—近邻法的生物医学缩略语消歧  被引量:3

Disambiguating Biomedical Abbreviations Based on K-Nearest Neighbor with Weighted Voting Method

在线阅读下载全文

作  者:于中华[1] 陈蓉[1] 胡俊锋[1] 陈源[1] 

机构地区:[1]四川大学计算机学院,四川成都610065

出  处:《中文信息学报》2008年第2期18-23,共6页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目(90409007)

摘  要:生物医学文献信息抽取对充分挖掘利用生物医学领域取得的重要成果,促进生物医学的进一步发展具有重要意义。本文针对生物医学缩略语的分析理解问题,提出了基于加权投票K—近邻法的生物医学缩略语消歧算法。该算法基于"One Sense Per Discourse"假设自动生成带类标实例数据,消歧特征选用能表达文本主题的全局特征词,分类算法采用加权投票K—近邻法。在包含177762篇Medline摘要的真实语料上进行的实验表明,本文所提出的算法明显优于相关工作中的算法。此外,实验还表明,对于缩略语消歧,加权投票K—近邻法与经典K—近邻法相比,不但具有高的预测准确率,而且性能更加稳定。Information extraction from biomedical literature is very useful for utilizing the achievements in biomedical field and promoting further improvement of Biology and Medicine, This paper, aiming at biomedical abbreviation analysis and understanding, proposes an approach for disambiguating biomedical abbreviations based on K nearest neighbor (K-NN) with weighted voting, In the approach, the samples with labels are generated automatically based on the hypothesis of "One Sense Per Discourse". And the wordsdescribing the topic of a discourse are chosen as the features for abbreviation disambiguation, The classification model used in the approach is based on K-NN with weighted voting. The experimental results on a testing set containing 177 762 Medline abstracts show that the ap proach proposed in the paper can obtain higher precision than others in related work. The experiments also prove that K-NN with weighted voting can get not only higher precision, but also better stability in comparison with the traditional K-NN in abbreviation disambiguation task.

关 键 词:计算机应用 中文信息处理 生物医学信息抽取 缩略语消歧 加权投票K-近邻法 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象