检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张强 王潇冉 高颖 王常珏 周洪 Zhang Qiang;Wang Xiaoran;Gao Ying;Wang Chang jue;Zhou Hong(School of Humanities,Anhui Polytechnic University,Wuhu 241000;School of Information Management,Central China Normal University,Wuhan 430079;School of Computer and Information,Anhui Polytechnic University,Wuhu 241000;Department of Information Resources Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100191;Wuhan Library and Intelligence Center of Chinese Academy of Science,Wuhan 430071)
机构地区:[1]安徽工程大学人文学院,芜湖241000 [2]华中师范大学信息管理学院,武汉430079 [3]安徽工程大学计算机与信息学院,芜湖241000 [4]中国科学院大学经济与管理学院,北京101190 [5]中国科学院武汉文献情报中心,武汉430071
出 处:《图书情报工作》2024年第8期35-47,共13页Library and Information Service
基 金:安徽省省级教学研究重大项目“新工科背景下人文素质教育体系的改革与实践”(项目编号:2020jyxm0152);中国科学院武汉文献情报中心青年领军人才项目(项目编号:E0KZ451)研究成果之一。
摘 要:[目的/意义]探究ChatGPT生成与学者撰写的中文论文摘要之间的异同,为AI生成学术论文检测及相关研究提供借鉴。[方法/过程]首先,以信息资源管理领域为例,分别抽取图书馆学、情报学、档案学近3年各500篇高被引论文,基于获取的论文题目采用Prompt方式应用ChatGPT工具生成对应的摘要文本,构建数据集合;其次,采用9种机器学习及深度学习算法对ChatGPT生成与学者撰写的摘要文本进行分类检测;最后,从文本特征、主题模型、ROUGE评测对二者的异同进行多角度分析,从而揭示二者之间的异同点。[结果/结论]基于数据集所训练的主流机器学习及深度学习算法可以有效地分辨摘要是AI生成还是学者撰写,其中BERT和ERNIE的效果最好,而机器学习算法中RF和Xgboost效果最好。ChatGPT生成的摘要字符数量、句子数量较学者撰写的要多,关键词多为模版化的转折性词语;两者的文本主题大部分相同,在“学科体系”“数字人文”等主题上存在差异;ROUGE及余弦相似度定量分析表明ChatGPT生成的摘要与学者撰写的摘要文本存在明显的“形似”而非“神似”的现象。[Purpose/Significance]To explore the similarities and differences of Chinese abstracts written by ChatGPT and scholars can provide references for AI-generated academic paper detection and related research.[Method/Process]Firstly,taking the field of information resource management as an example,this paper extracted 500 highly cited papers from library science,information science,and archival science in the recent years.Based on the obtained paper titles,it used the prompt method to apply the ChatGPT tool to generate corresponding abstract texts and construct a dataset.Secondly,it adopted 9 machine learning and deep learning algorithms to classify and detect abstract texts generated by ChatGPT and written by scholars.Finally,it analyzed and revealed the similarities and differences between the two from multiple perspectives,including text features,topic models,and ROUGE evaluation.[Result/Conclusion]Mainstream machine learning and deep learning algorithms trained on datasets can effectively distinguish whether abstracts are generated by AI or written by scholars,with BERT and ERNIE performing best,while RF and Xgboost best in machine learning algorithms.The number of abstract characters and sentences generated by ChatGPT is more than that written by scholars,and the keywords are mostly template-based words.The themes of the two are mostly the same,but different in"disciplinary system"and"digital humanities".The quantitative analysis of ROUGE and cosine similarity indicates that the abstracts generated by ChatGPT have a significant"shape-like"rather than"spirit-like"to that by scholars.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28