检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郭鑫[1] 王一博 王继民[1] GUO Xin;WANG Yibo;WANG Jimin
机构地区:[1]北京大学信息管理系 [2]北京大学图书馆
出 处:《图书馆论坛》2024年第3期134-143,共10页Library Tribune
基 金:国家社会科学基金重点项目“开放科学数据集统一发现的关键问题与平台构建研究”(项目编号:20ATQ007)研究成果。
摘 要:学术写作是ChatGPT的主要应用方向之一。文章以情报学领域的核心期刊论文为研究对象,首先从词、句、篇3个维度出发,使用词性标注、n-gram等文本处理方法对ChatGPT和人类产出的论文引言内容进行对比分析。然后将判断学术内容是否由ChatGPT生成视作一个二元分类任务,采用朴素贝叶斯、支持向量机、随机森林算法进行文本分类实验,并使用SHAP方法对文本结构特征的重要性进行分析。研究发现:ChatGPT在描述有具体时间节点的事实性信息和引用政策文件或研究报告等方面表现较弱,生成引言的篇幅较集中,撰写论文相较于人类更加“循规蹈矩”;查重工具通常无法准确检测出ChatGPT生成内容的原创性,但分类模型可以比较容易地区分出引言是否由ChatGPT生成,平均句子长度、词汇多样性和文本长度是影响分类结果最重要的文本结构特征。Academic writing is one of the main applications of ChatGPT.This paper focuses on the core journal articles in the field of intelligence.Starting from three dimensions:word,sentence,and paragraph,text processing methods such as part-of-speech tagging and n-gram are used to compare the introductions of articles produced by ChatGPT and humans.Furthermore,determining whether the academic content was generated by ChatGPT is treated as a binary classification task.Naive Bayes,Support Vector Machine,and Random Forest algorithms are employed for text classification experiments,and the SHAP method is used to analyze the significance of textual structural features.The study shows that ChatGPT has weaknesses in describing factual information with specific dates and in referencing policy documents or research reports.The introductions generated by ChatGPT are relatively consistent in length,and its academic writing tends to be more"rule-following"compared to human authors.Plagiarism detection tools typically struggle to accurately identify the originality of content produced by ChatGPT.However,classification models are better at distinguishing whether the introductions were generated by ChatGPT.Average sentence length,lexical diversity,and total text length are the most significant textual structural features that influence classification results.
关 键 词:ChatGPT 论文写作 情报学 文本分类 查重检测
分 类 号:G353.1[文化科学—情报学] TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229