检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:王永胜 刘亚丽[1] 宗国浩 王迪 王锐 王金棒 李丰霖 贾楠[1] 冯伟华[1] WANG Yongsheng;LIU Yali;ZONG Guohao;WANG Di;WANG Rui;WANG Jinbang;LI Fenglin;JIA Nan;FENG Weihua(Zhengzhou Tobacco Research Institute of CNTC,Zhengzhou 450001,China;Suzhou Branch of Jiangsu Provincial Tobacco Company,Suzhou 215008,Jiangsu,China)
机构地区:[1]中国烟草总公司郑州烟草研究院,郑州高450001 [2]江苏省烟草公司苏州市公司,江苏省苏州市215008
出 处:《烟草科技》2024年第6期99-106,共8页Tobacco Science & Technology
基 金:河南省科技攻关项目“基于知识图谱的烟草病虫害专家系统构建技术研究”(232102210073);中国烟草总公司重大专项项目“烟草关联学科文献信息资源融合汇通平台研究与构建”[110202101031(SJ-02)];中国烟草总公司重点研发项目“烟草产业关键核心技术需求及技术预见研究”(110202102048);郑州烟草研究院青年人才托举工程项目“基于烟草科技文献的文本分析技术的研究”(602020CR0360)。
摘 要:为快速获取烟草科技文献中的知识信息,通过交互式迭代学习的烟草知识实体标注与识别方法,构建了面向烟草领域的文本标注语料库,设计了适用于烟草领域的文本标注规范,并利用BERT+CRF(Bidirectional Encoder Representations from Transformers+Conditional Random Field)深度学习网络模型实现了烟草命名实体的识别和预标注,结合人工校对扩充了原始语料的规模,优化了模型性能。结果表明:语料标注一致性F1标注达92.4%;BERT+CRF模型识别能力优于常用的CRF、BiLSTM+CRF命名实体识别模型。该技术可为提升烟草领域文本分析和知识挖掘能力提供支持。In order to quickly obtain knowledge information from tobacco-related scientific and technological literatures,an interactive iterative learning method for tobacco knowledge entity annotation and recognition was used to construct a corpus for annotating tobacco-related texts.A text annotation specification suitable for the field of tobacco was designed,and the BERT+CRF(Bidirectional Encoder Representations from Transformers+Conditional Random Field)deep learning network model was used to recognize and pre-annotate tobacco named entities.Combined with manual proofreading,the size of the original corpus was increased and the performance of the model was optimized.The results showed that the consistency F1an of the corpus annotation reached 92.4%.The BERT+CRF model has better recognition ability than those of commonly used CRF,BiLSTM+CRF named entity recognition models.This technology supports the improvement of text analysis and knowledge mining capabilities in the field of tobacco.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49