基于改进BP神经网络和函数拟合的细胞生物学领域“睡美人”识别与典型应用探测  被引量:4

Identifying“Sleeping Beauties”in Cell Biology and Exploring Their Classical Applications through an Improved BP Network and Function Fitting

在线阅读下载全文

作  者:胡泽文 金昕悦 崔静静 Hu Zewen;Jin Xinyue;Cui Jingjing(School of Management Science and Engineering,Nanjing University of Information Science&Technology,Nanjing 210044)

机构地区:[1]南京信息工程大学管理工程学院,南京210044

出  处:《情报学报》2023年第6期711-728,共18页Journal of the China Society for Scientific and Technical Information

基  金:国家社会科学基金项目“面向海量科技文献的潜在‘精品’识别方法与应用研究”(20CTQ031)。

摘  要:海量科技文献中“睡美人”的充分挖掘与广泛利用,能够最大程度实现此类文献的科学价值,发挥其对科技发展的重大促进作用。本文设计和实现融合K值算法的BP(back propagation)神经网络模型,以及融合最小二乘法和迭代算法的一元二次函数拟合识别模型,对细胞生物学领域401130篇论文中的“睡美人”进行识别检验,结果发现:①BP神经网络模型能明显改进“睡美人”识别的自动化程度和效果,不受引文期长短的影响,然而需要预先识别出“睡美人”进行识别模型训练;最小二乘法、迭代算法和切片算法的融入能够提升一元二次函数和基尼系数的识别效率。②一元二次函数拟合受引文期的影响较小,然而基尼系数受引文期的影响极大,较短引文期文献中识别出的“睡美人”数量是较长引文期文献中“睡美人”数量的15倍。③即使同一个领域,识别结果的方法差异也较为明显。K值算法、BP神经网络和一元二次函数的识别效果较优,然而识别数量较少,占总量比例不到0.09%。基尼系数方法受引文期影响,导致识别效果最差且识别数量最多,占比达到0.41%。④细胞生物学领域“睡美人”数量的年度分布较为稳定,保持在0.02%~0.17%。⑤“睡美人”识别结果能够广泛应用于不同价值文献的计量特征比较,领域研究热点主题的识别与推荐,以及潜在“精品”或高价值文献的识别与传播推荐。Identifying“sleeping beauties”from a large number of studies and recommending them to the scientific community can enable the full use of their scientific and technological value,thus driving the development of science and technology.In this study,we designed and implemented an improved back propagation(BP)neural network model by merging the K-value algorithm,quadratic function fitting method,least squares method,and an iterative algorithm.We then used these methods to identify“sleeping beauties”from 401,130 papers in the field of cell biology,from 1990 to 2010,and explored the classical applications of the identified papers.The results show that:(1)the BP neural network can improve the degree of automation in identifying“sleeping beauties.”However,it is necessary to identify some“sleeping beauties”in advance in a training set to train the recognition model.The improved bivariate quadratic function fitting method and Gini coefficient,based on the least squares method,an iterative algorithm,and a slicing algorithm,demonstrate optimal speed in identifying“sleeping beauties”.(2)The recognition effect of the bivariate quadratic function fitting method is not affected by the length of the citation period.However,the recognition effect of the Gini coefficient is influenced by the length of the citation period.This is illustrated by the fact that the number of identified“sleeping beauties”from papers within a shorter citation period(i.e.,published between 2001 and 2010)is 15 times as much as that from papers within a longer citation period(i.e.,published between 1990 and 2000).(3)In the same field,there is a difference in the number of“sleeping beauties”identified using different methods.As an illustration,among 257,562 papers,the K-value algorithm,BP neural network model,and quadratic function fitting method with optimal recognition effect can identify between 30 and 223“sleeping beauties”,an identification percentage that is less than 0.09%.The Gini coefficient with a poorer recognition

关 键 词:细胞生物学 睡美人 BP神经网络 一元二次函数 基尼系数 典型应用 研究热点 文献计量 

分 类 号:Q2-05[生物学—细胞生物学] TP183[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象