检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘凡平 陈慧 沈振雷 吴业俭 Liu Fanping;Chen Hui;Shen Zhenlei;Wu Yejian(Shanghai 2345 Network Technology Co.,Ltd.,Shanghai 201203,China)
机构地区:[1]上海二三四五网络科技有限公司,上海201203
出 处:《计算机应用与软件》2023年第6期173-180,共8页Computer Applications and Software
摘 要:针对当前新词发现准确率低、可移植性不强和需要大规模语料等问题,提出一种基于BERT的开放领域新词识别方法。利用BERT对句意的较强理解能力,将词语和上下文输入模型,训练词语识别器;将测试文本按字节流进行大小为N的滑动窗口操作形成若干候选词。针对候选词进行分类,识别判定其在上下文中是否属于一个词,倘若该词未在标准词库中出现,则为新词。将该方法与基于互信息和左右熵的新词发现方法和基于条件随机场的新词发现方法进行效果对比,结果表明该方法具有更高的精准率和F1值,同时对于命名体的识别也拥有更高的召回率。Aiming at the problems of low accuracy,low portability and large-scale corpus,this paper proposes an open domain new word detection method based on BERT.By using the strong understanding ability of the sentence meaning of BERT,the word and context were input into the model to train the word recognizer.The test text was operated by sliding window with the size of N according to the byte stream to form several candidate words.The candidate words were classified to determine whether they belonged to a word in the context.If the word did not appear in the standard thesaurus,it was a new word.Compared with the new word discovery method based on mutual information and left and right entropy and conditional random field,the results show that this method has higher accuracy and F1 value,and has higher recall rate for the recognition of named objects.
分 类 号:TP3[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38