ChatGPT生成中文学术内容分析——以情报学领域为例  被引量:11

Feature Analysis of Chinese Academic Content Generated by ChatGPT:Taking the Field of Intelligence as An Example

在线阅读下载全文

作  者:郭鑫[1] 王一博 王继民[1] GUO Xin;WANG Yibo;WANG Jimin

机构地区:[1]北京大学信息管理系 [2]北京大学图书馆

出  处:《图书馆论坛》2024年第3期134-143,共10页Library Tribune

基  金:国家社会科学基金重点项目“开放科学数据集统一发现的关键问题与平台构建研究”(项目编号:20ATQ007)研究成果。

摘  要:学术写作是ChatGPT的主要应用方向之一。文章以情报学领域的核心期刊论文为研究对象,首先从词、句、篇3个维度出发,使用词性标注、n-gram等文本处理方法对ChatGPT和人类产出的论文引言内容进行对比分析。然后将判断学术内容是否由ChatGPT生成视作一个二元分类任务,采用朴素贝叶斯、支持向量机、随机森林算法进行文本分类实验,并使用SHAP方法对文本结构特征的重要性进行分析。研究发现:ChatGPT在描述有具体时间节点的事实性信息和引用政策文件或研究报告等方面表现较弱,生成引言的篇幅较集中,撰写论文相较于人类更加“循规蹈矩”;查重工具通常无法准确检测出ChatGPT生成内容的原创性,但分类模型可以比较容易地区分出引言是否由ChatGPT生成,平均句子长度、词汇多样性和文本长度是影响分类结果最重要的文本结构特征。Academic writing is one of the main applications of ChatGPT.This paper focuses on the core journal articles in the field of intelligence.Starting from three dimensions:word,sentence,and paragraph,text processing methods such as part-of-speech tagging and n-gram are used to compare the introductions of articles produced by ChatGPT and humans.Furthermore,determining whether the academic content was generated by ChatGPT is treated as a binary classification task.Naive Bayes,Support Vector Machine,and Random Forest algorithms are employed for text classification experiments,and the SHAP method is used to analyze the significance of textual structural features.The study shows that ChatGPT has weaknesses in describing factual information with specific dates and in referencing policy documents or research reports.The introductions generated by ChatGPT are relatively consistent in length,and its academic writing tends to be more"rule-following"compared to human authors.Plagiarism detection tools typically struggle to accurately identify the originality of content produced by ChatGPT.However,classification models are better at distinguishing whether the introductions were generated by ChatGPT.Average sentence length,lexical diversity,and total text length are the most significant textual structural features that influence classification results.

关 键 词:ChatGPT 论文写作 情报学 文本分类 查重检测 

分 类 号:G353.1[文化科学—情报学] TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象