检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:潘桢皓 关东海 袁伟伟[1] 郭然 PAN Zhenhao;GUAN Donghai;YUAN Weiwei;GUO Ran(College of Computer Science and Technology,Nanjing University of Aeronautics and Astronautics,Nanjing Jiangsu 211106,China;College of Physics and Materials Science,Guangzhou University,Guangzhou Guandong 510006,China)
机构地区:[1]南京航空航天大学计算机科学与技术学院,南京211106 [2]广州大学物理与材料科学学院,广州510006
出 处:《计算机应用》2024年第S01期18-23,共6页journal of Computer Applications
基 金:江苏省航空基金资助项目(ASFC‑20200055052005)。
摘 要:要提升同义词挖掘的效果通常需要现成的相关领域同义词库的支持。由于相关领域同义词库极其稀缺,给模型优化带来了阻碍。针对缺少相关领域同义词库而导致模型在相关领域的同义词挖掘效果难以持续提升的问题,提出了基于主动学习和持续学习的同义词挖掘模型(SYN-AC)。首先,基于主动学习的方法获取专家标记数据,设计了一个新的损失函数并利用标记后的数据去微调模型;其次,为了减少时间和空间消耗,采用了持续学习的方法,使模型在只使用当前组标记的数据进行训练的情况下,也能不断提高同义词挖掘效果,而不需要每次都使用所有标记数据对模型重新微调。使用了3个数据集模拟专家标记的过程,实验结果表明,在其中2个数据集上比效果最好的BERT(Bidirectional Encoder Representations from Transformers)模型F1值分别提升了9.34个百分点和2.75个百分点。验证了SYN-AC能够有效提高同义词挖掘的效果。To improve the effectiveness of synonym mining,the support of existing synonym databases in relevant fields is usually required,but the extremely scarcity of synonym databases in related fields hinders model optimization.A SYNonym mining model based on Active learning and Continuous learning(SYN-AC)was proposed to address the issue of the lack of a synonym database in related fields,which makes it difficult to continuously improve the models synonym mining performance in related fields.Firstly,the active learning method was used to obtain expert labeled data,a new loss function was designed and the labeled data was used to fine-tune the model;secondly,in order to reduce time and space consumption,the continuous learning method was adopted,enabling the model to continuously improve synonym mining performance even when only using the labeled data by the current group for training,without the need to use all labeled data to fine-tune the model every time.In the article,three datasets were used to simulate the process of expert labeling.The experiment results show that on two of these datasets,SYN-AC improves the F1 value by 9.34 percentage points and 2.75 percentage points respectively,compared to BERT(Bidirectional Encoder Representations from Transformers)model.So SYN-AC was validated to effectively improve the effectiveness of synonym mining.
关 键 词:同义词挖掘 主动学习 持续学习 BERT 余弦相似度
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.188.92.213