检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐翀 王其清 XU Chong;WANG Qiqing(State Grid Energy Research Institute Co.,Ltd.,Changping District,Beijing 102209,China)
机构地区:[1]国网能源研究院有限公司,北京市昌平区102209
出 处:《电力信息与通信技术》2023年第4期31-36,共6页Electric Power Information and Communication Technology
基 金:国家电网有限公司总部科技项目资助“基于知识图谱的科技咨询专家智能优选技术研究与开发”(1400-202057269A-0-0-00)。
摘 要:为克服电力科技文本专业化、跨学科特点给知识获取带来的挑战,提出构建电力科技领域语言模型,实现更准确的文本表示。文章收集大量电力科技论文、专利、项目等文本,基于Transformer模型预训练得到领域语言模型,设计电力科技术语分类和电力科技远程监督实体关系抽取2类知识抽取任务进行模型验证,实验结果表明,所提领域语言模型在术语分类任务上的F1分数较word2vec基准模型提升超过10%,在实体关系抽取任务上的AUC分数比BERT语言模型基准模型提升约2%,所提模型有利于为下游知识获取任务提供更高质量特征表示。To overcome the challenges of knowledge acquisition brought by the specialization and interdisciplinary characteristics of electric power science and technology texts,a power technology language model is proposed to achieve a more accurate text representation.The Transformer-based language model is pre-trained on large-scale power technology papers,patents,projects,and other texts.Two tasks including power science and technology term classification and distantly supervised entity relation extraction are proposed for verifying the model.Experiment results show that the F1-score of the proposed domain language model on the term classification task is more than 10%higher than that of the word2vec benchmark model,and the AUC score on the entity relation extraction task is about 2%higher than the BERT benchmark model.The proposed language model is beneficial to provide higher-quality feature representations for downstream knowledge acquisition tasks.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7