检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐欢 王尧 萧展辉 沈宇红 XU Huan;WANG Yao;XIAO Zhanhui;SHEN Yuhong(China Southern Power Grid Co.,Ltd.,Guangzhou 510000,Guangdong,China;Digital Grid Research Institute of China Southern Power Grid Co.,Ltd.,Guangzhou 510000,Guangdong,China)
机构地区:[1]南方电网有限责任公司,广东广州510000 [2]南方电网数字电网研究院有限责任公司,广东广州510000
出 处:《电力大数据》2022年第9期37-44,共8页Power Systems and Big Data
摘 要:数据精细化处理是电网企业在进行数字化转型过程中的重大难题。由于电网企业的封闭性特点,数据精细化处理过程中专业语言与知识难以与外界通用的知识进行融会贯通。为解决上述问题,本文在teacher-student框架基础上结合fine-tuning技术设计了一种有效的信息表示模型——基于知识迁移与蒸馏的轻量级领域信息表示模型。该模型将通用知识框架作为基座,将专业知识与通用知识提炼成一个统一的向量空间。本文设计的模型比通用大模型更快、更轻量、更有效,仅需对百级别的专业小样本进行增量学习,便可将通用知识与专业知识进行了高效地融合。为了验证该模型的有效性,我们在文本相似度计算任务上进行了实验,实验结果表明技术指标NDCG@5提升5.76%。此外,该模型有效地降低了资源消耗,提升了搜索效率。Refinement of data processing is a major problem for power grid companies in the process of digital transformation. Due to the closed nature of power grid enterprises, it is difficult to integrate professional language and knowledge with the general knowledge of the outside world in the process of data refinement processing. In order to solve the above problems, this paper designs an effective information representation model based on the teacher-student framework combined with the fine-tuning technology—a lightweight domain information representation model based on knowledge transfer and distillation. The model takes the general knowledge framework as the base, and distills specialized knowledge and general knowledge into a unified vector space. The model designed in this paper is faster, lighter and more effective than the general large model. It only needs to perform incremental learning on a hundred-level professional small samples, and then the general knowledge and professional knowledge can be efficiently integrated. In order to verify the effectiveness of the model, we conducted experiments on the text similarity calculation task. The experimental results show that the technical index NDCG@5 is improved by 5.76%. In addition, the model effectively reduces resource consumption and improves search efficiency.
关 键 词:知识迁移 模型蒸馏 领域信息检索模型 数字化转型
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:13.59.205.74