检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马俊 吕璐成 赵亚娟[2] 李聪颖 MA Jun;LV Lu-cheng;ZHAO Ya-juan;LI Cong-ying(Information Research Center of Military Sciences,Academy of Military Sciences,Beijing 100142,China;National Science Library,Chinese Academy of Sciences,Beijing 100190,China)
机构地区:[1]军事科学院军事科学信息研究中心,北京100142 [2]中国科学院文献情报中心,北京100190
出 处:《中华医学图书情报杂志》2022年第11期20-28,共9页Chinese Journal of Medical Library and Information Science
摘 要:目的:支撑大规模中文专利精准自动分类工作,利用改进中文专利文本表示的预训练语言模型实现专利的自动分类。方法:基于中文预训练语言模型RoBERTa,在大规模中文发明专利语料上分别使用单字遮盖策略和全词遮盖策略遮盖语言模型任务进行迁移学习,得到改进中文专利文本表示的RoBERTa模型(ZL-RoBERTa)和RoBERTa-wwm模型(ZL-RoBERTa-wwm);将模型应用到专利文本分类任务中进行实验研究,并与典型深度学习模型(Word2Vec+BiGRU+ATT+TextCNN)和当前先进的预训练语言模型BERT、RoBERTa进行对比分析。结果:基于ZL-RoBERTa和ZL-RoBERTa-wwm的中文专利自动分类模型在专利文本分类任务上的分类精准率/召回率/F1值更为突出。结论:改进文本表示的中文专利预训练语言模型用于专利文本分类具有更优效果,这为后续专利情报工作中应用预训练模型提供了模型基础。Objective To support the accurate automatic classification of large-scale Chinese patents,this paper explored the use of pre-trained language models that improved the text representation of Chinese patents to achieve automatic classification.Methods Based on the Chinese RoBERTa model,the RoBERTa model(ZL-RoBERTa)and RoBERTa-wwm model(ZL-RoBERTa-wwm)for improving the Chinese Patent text representation are obtained by using the Masked Language Model tasks of Single-word Masking strategy and Whole Word Masking strategy respectively for transfer learning on a large-scale Chinese invention patent corpus.The model was applied to the patent text classification tasks for experimental study and compared with typical deep learning models(Word2Vec+BiGRU+ATT+TextCNN)and current state-of-the-art pre-trained language models BERT and RoBERTa for analysis.Results The classification Precision/Recall/F1 values of ZL-RoBERTa-based and ZL-RoBERTa-wwm-based Chinese patent automatic classification models were more outstanding on patent text classification tasks.Conclusion The Chinese patent pre-trained language model with improved text representation is more effective for patent text classification,which provides a model basis for the subsequent application of pre-trained language models in patent intelligence work.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7