检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王末 崔运鹏[1,2] 陈丽 李欢 Wang Mo;Cui Yunpeng;Chen Li;Li Huan(Agricultural Information Institute of Chinese Academy of Agricultural Sciences,Beijing 100081,China;Key Laboratory of Big Agri-data,Ministry of Agriculture and Rural Areas,Beijing 100081,China)
机构地区:[1]中国农业科学院农业信息研究所,北京100081 [2]农业农村部农业大数据重点实验室,北京100081
出 处:《数据分析与知识发现》2020年第6期60-68,共9页Data Analysis and Knowledge Discovery
基 金:中国农业科学院科技创新工程项目“多源异构农业大数据关联发现与计算挖掘”(项目编号:CAAS-ASTIP-2016-AII)的研究成果之一。
摘 要:【目的】以深度学习语言表征模型学习论文句子表达,以此为基础构建论文语步分类模型,提高分类效果。【方法】采用基于深度学习预训练语言表征模型BERT,结合句子文中位置改进模型输入,以标注数据集进行迁移学习,获得句子级的嵌入表达,并以此输入神经网络分类器训练分类模型,实现论文语步分类。【结果】基于公开数据集的实验结果表明,11类别分类任务中,总体准确率提高了29.7%,达到81.3%;在7类别核心语步分类任务中,准确率达到85.5%。【局限】受限于实验环境,所提改进输入模型的预训练参数来源于原始的模型结构,迁移学习的参数对于新模型输入的适用程度可进一步探索。【结论】该方法较传统的"特征构建+机器学习"分类器方法效果有大幅提高,较原始BERT模型亦有一定提高,且无须人工构建特征,模型不局限于特定语言,可应用于中文学术论文的语步分类任务,具有较大的实际应用潜力。[Objective]This study aims at developing a new argumentative zoning method based on deep learning language representation model to achieve better performance.[Methods]We adopted a pre-trained deep learning language representation model BERT,and improved model input with sentence position feature to conduct transfer learning on training data from biochemistry journals.The learned sentence representations were then fed into neural network classifier to achieve argumentative zoning classification.[Results]The experiment indicated that for the eleven-class task,the method achieved significant improvement for most classes.The accuracy reached 81.3%,improved by 29.7%compared to the best performance from previous studies.For the seven core classes,the model achieved an accuracy of 85.5%.[Limitations]Due to limitation on experiment environment,our refined model was trained based on pre-trained parameters,which could limit the potential for classification performance.[Conclusions]The proposed method showed significant improvement compared to shallow machine learning schema or original BERT model,and was able to avoid tedious work of feature engineering.The method is independent of language,hence also suitable for research articles in Chinese language.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.129.128.179