机构地区:[1]中国科学院文献情报中心,北京100190 [2]中国科学院大学经济与管理学院信息资源管理系,北京100190 [3]中国社会科学评价研究院,北京100732
出 处:《情报理论与实践》2023年第8期182-192,共11页Information Studies:Theory & Application
基 金:国家社会科学基金项目“数据驱动学科分类体系的构建及其在社会科学评价中的应用研究”的成果,项目编号:22BTQ018。
摘 要:[目的/意义]梳理算法构建论文层次学科分类体系相关研究,归纳总结现有研究尤其是应用研究中存在的问题和不足,为相关理论研究和实践应用提供参考与借鉴。[方法/过程]首先,界定算法构建论文层次学科分类体系的概念和内涵,从3个阶段梳理其发展历程。其次,根据构建流程从构建数据关系、聚类方法和描述学科领域几个步骤整理相关研究。最后,梳理学科分类体系特点,以及在构建领域数据集、描述学科结构和学科标准化等方面的应用,指出目前各项研究中存在的问题和不足及未来可能的发展方向。[结果/结论]算法构建论文层次学科分类体系及其应用目前仍然是科学计量学的前沿问题。随着分类数据可获取性提高,论文层次学科分类体系的接受度和使用范围正在不断扩大。但依然存在以下主要问题,值得进一步研究:①由于缺少广泛认可的大范围的金标准数据,目前算法构建的论文层次分类体系的准确性还存有争论,相似度算法的优劣还未有统一结论;②以计算得到的关键词给聚类结果进行命名是目前的主要方式,但可读性较差,不利于使用者快速了解各类别内容;③在应用中与其他分类体系的比较研究还较少,没有充分揭示论文层次分类体系的特性。未来可在构建领域数据集、描述学科结构和学科标准化等主要的应用领域进一步深入研究,同时结合深度学习的文档表征技术和文档总结技术解决传统算法的不足,并探索论文层次的人文社会科学分类体系构建。[Purpose/significance]The aim of this paper is reviewing research related to the Algorithmically Constructed Paper-Level Classification System,summarizing existing studies,particularly the issues and shortcomings in applied research,and providing references and insights for related theoretical research and practical applications.[Method/process]Firstly,defining the concept and connotation of the Algorithmically Constructed Paper-Level Classification System and outlining its development history in three stages.Secondly,organizing related research according to the construction process,including building data relationships,clustering methods,and describing disciplinary domains.Lastly,reviewing the characteristics of the disciplinary classification system and its applications in constructing domain datasets,describing disciplinary structures,and disciplinary standardization,pointing out the current issues and shortcomings in various studies and possible future directions.[Result/conclusion]The Algorithmically Constructed Paper-Level Classification System and its applications remain at the forefront of scientometrics.With the increasing availability of classification data,the acceptance and usage scope of paper-level disciplinary classification systems are continuously expanding.However,there are still major issues that warrant further research:①Due to the lack of widely accepted large-scale gold standard data,the accuracy of the algorithmically constructed paper-level classification system remains debated,and there is no unified conclusion on the merits of similarity algorithms.②Currently,the primary method of assigning names to clustering results is by using computationally derived keywords,but this has poor readability and hinders users from quickly understanding the content of different categories.③Comparative studies with other classification systems in application are still limited,and the characteristics of the paper-level classification system have not been fully revealed.Future research can delve deeper
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...