基于互信息的二阶共现概念相关度研究  被引量:2

Research of Correlation Strength of Second Order Co-Occurrence Concepts Based on Mutual Information

在线阅读下载全文

作  者:刘菊红[1,2] 缪有刚[1] 于建荣[1] 

机构地区:[1]中国科学院上海生命科学信息中心,上海200031 [2]中国科学院国家科学图书馆,北京100190

出  处:《图书情报工作》2009年第18期123-127,共5页Library and Information Service

摘  要:中间集和目标集的膨胀,导致基于非相关文献知识发现的准确率低;基于排序的方法存在缺陷,且过度关注B集的排序是对发现有趣的A、C的偏离。直接计算二阶共现概念相关度是基于非相关文献知识发现的薄弱环节,以互信息方法和回归分析方法为基础,构造算法计算二阶共现概念之间的相关度。以PubMed收录的2型糖尿病领域文献为样本,对算法的可行性进行实证研究。模型取得较好的效果,为二阶共现概念之间的关系提取和评价提供新的方法。Explosion of intermediate concepts (B terms) and aim concepts( C terms) results in low correctness of disjoint-literature based discovery. The method of ranking has drawbacks and focus on ranking of B terms is a departure of discovering interesting relationship between A terms and C terms. The paper designs a model to calculate correlation strength of second order co-occurrence concepts directly based on mutual information measure and regression model. Taking concepts from diabetes mellitus, type 2 from PubMed as an example to test feasibility of the model and gain good effects. The model provides a new method to the relation extraction of Second Order Co-Occurrence Concepts.

关 键 词:互信息 二阶共现 相关度 2型糖尿病 基于非相关文献的知识发现 

分 类 号:G201[文化科学—传播学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象