医学文献主题新颖性探测方法对比分析  被引量:5

Comparative analysis of subject novelty detection methods in medical literature

在线阅读下载全文

作  者:陈斯斯[1] 董立平[1] 许丹[1] 郭继军[1] CHEN Si-si;DONG Li-ping;XU Dan;GUO Ji-jun(Library of China Medical University,Shenyang 110122,Liaoning Province,China)

机构地区:[1]中国医科大学图书馆,辽宁沈阳110122

出  处:《中华医学图书情报杂志》2018年第2期20-25,共6页Chinese Journal of Medical Library and Information Science

摘  要:目的:探讨应用新颖性探测模型评估医学文献主题新颖性的可行性,对比分析2种新颖性探测方法(词重叠法和基于共词的逆文档频率量化法)的优劣。方法:选取生物医学领域8个研究主题,从Pub Med数据库收集文献,构建2种新颖性探测模型,结合文献主题新颖性的专家分析结果,利用ROC曲线及AUC值对2种新颖性探测模型的可行性进行评估。结果:词重叠法的新颖度计算结果波动幅度较大,能够更好地将文献内容间差异表现在数据上。基于ROC曲线及AUC值分析,词重叠法对于判断新颖文献具有一定准确性,基于共词特性的逆文档频率量化法对于判断新颖文献准确性较低。结论:两种新颖性探测方法得出的新颖度计算结果呈中度相关,二者的均值差异有统计学意义,前者的表现优于后者。Objective To study the feasibility of novelty detection model in assessing the subject novelty of medical literature and comparatively analyze the advantages and disadvantages of words-overlap algorithm and co-words-based inverse file frequency quantitative algorithm. Methods Two novelty detection models were established for the8 research subjects in Pub Med-covered literature. The feasibility of two novelty detection models in assessing the subject novelty of medical literature was assessed according to the subject novelty of literature analyzed by experts,ROC curves and AUC values. Results Words-overlap algorithm showed that the fluctuating amplitude of subject novelty was rather high,which can thus reflect the difference between the contents in literature on the data. ROC curves and AUC values-based analysis revealed a high accuracy of words-overlap algorithm for judging the novelty of literature while co-words-based inverse file frequency quantitative algorithm displayed a low accuracy for judging the novelty of literature. Conclusion The novelty of literature detected with the two novelty detection methods is moderately related. The mean novelty value detected with the two novelty detection methods is of statistical significance. However,the novelty of literature detected with words-overlap algorithm is higher than that detected with co-words-based inverse file frequency quantitative algorithm.

关 键 词:文献主题 新颖性探测 ROC曲线 可行性分析 

分 类 号:G254.2[文化科学—图书馆学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象