检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘有能[1] 吕晶晶 丁楠[2] PAN Youneng;LV Jingjing;DING Nan(Department of Information Resources Management,School of Public Affairs,Hangzhou 310058,China;Zhejiang University Libraries,Zhejiang University,Hangzhou 310028,China)
机构地区:[1]浙江大学公共管理学院信息资源管理系,浙江杭州310058 [2]浙江大学图书馆,浙江杭州310027
出 处:《情报科学》2023年第9期138-145,154,共9页Information Science
基 金:浙江省哲学社会科学规划项目“基于引用网络的科学数据评价研究”(20NDJC039YB)。
摘 要:【目的/意义】在万物互联的开放科学时代,建立科学数据与科技文献之间的关联成为推动科学数据开放获取、共享和重用的重要举措。【方法/过程】本研究基于Labeled-LDA模型,辅以基于规则的识别方法,构建科学数据与科技文献关联识别模型,并以生物医学领域为例分别针对规范化引用、非规范化引用以及无引用三种关联情况进行模型训练与测试。【结果/结论】研究发现本模型在识别规范化引用测试集时识别率和F值分别为0.9和0.5左右,有比较稳定的识别效果,在识别非规范化引用和无引用的测试集时识别率分别为0.465和0.5,也展现出较强的可移植性与应用潜力。通过对非规范化引用和无引用识别结果进行人工判断,发现科学研究中确实存在数据引用不规范的现象,需要学界共同推动数据引用规范化。【创新/局限】与其他研究相比,本文构建的模型为基于语义的关联识别提供了方法层面的参考和基础,可以应用于大规模语料研究,从而促进更深层次语义关联的知识发现。【Purpose/significance】In the era of Open Science in which everything is interconnected,linking scientific data and scientific literature has become an important measure to promote the open access,acquisition,sharing and reuse of scientific data.【Method/process】In order to open up a solution path of identifying and extracting the hidden linkage between scientific data and scientific literature,this paper constructs the linkage recognition model between scientific data and scientific literature based on labeled-LDA model and rule-based recognition method.Taking biomedical papers and scientific data as the research object,this paper carries out model training and testing for the three association cases of standardized citation,non-standardized citation and no citation through text mining.【Result/conclusion】The results show that the F value of the model is about 0.5 when identifying the standardized reference test set,which has a relatively stable recognition effect.When identifying the non-standardized reference test set and the nonreferenced test set,the recognition rates are 0.465 and 0.5 respectively,showing strong portability and great application potential.Through the manual judgment of the recognition results of non-standardized references and non-references,it is found that there is indeed the phenomenon of non-standard data references in scientific research,which needs the academic community to jointly promote the standardization of data references.【Innovation/limitation】Compared with other studies,the model constructed in this paper provides a methodological reference and basis for semantic based association recognition,and can be applied to large-scale corpus research,so as to promote the knowledge discovery of deeper semantic association.
关 键 词:科学数据 科技文献 Labeled-LDA 关联识别 数据引用
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.147.140.129