检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:段建勇[1] 游世薪 张梅[1] 王昊[1] DUAN Jian-yong;YOU Shi-xin;ZHANG Mei;WANG Hao(School of Information,North China University of Technology,Beijing 100144,China)
出 处:《计算机科学》2020年第S02期73-77,共5页Computer Science
基 金:国家自然科学基金(61972003,61672040)。
摘 要:随着互联网的发展,网页数据以及新媒体文本等数据日益增多,全文信息检索的效率已经不足以支撑海量数据的检索,因而关键词抽取技术广泛应用于搜索引擎(如百度搜索)和新媒体服务等领域(如新闻检索)。融合模型是一种使用BiLSTM-CRF结构并融合多重手工特征的模型,可以更有效地完成关键词抽取任务。融合模型在词嵌入特征的基础上,融入了词性、词频、词长和词位置特征,多维度的特征信息可以更加全面地辅助模型提取到关键词的深层特征信息。融合模型将深度学习的广覆盖度、高学习能力等特点与手工特征的精确表达能力相结合,以进一步提高特征挖掘能力并缩短训练所需时间。此外,该模型使用了一种新的“LMRSN”标记方法,可以更有效地完成关键短语的抽取。实验结果表明,融合模型在与传统模型的对比中取得了62.08的F1分值,性能远高于传统模型。With the development of the Internet,webpage data,new media text and other data are increasing,the efficiency of information retrieval based on full text is not enough to support the retrieval of massive data,so the keyword extraction technology is widely used in search engines(such as Baidu search)and new media services(such as news retrieval).The fusion model is a model that uses the BiLSTM-CRF structure and fuses multiple manual features,which can more effectively complete the task of keyword extraction.Based on the features of words embedding,the fusion model incorporates the features of part of speech,word frequency,word length and word position.Themultidimensional feature information can help the model to extract deep keyword feature information more comprehensively.The fusion model combines the features of deep learning,such as wide coverage and high learning ability,with the ability of accurate expression of manual features to further improve the feature mining ability and shorten the training time.In addition,a labeling method called LMRSN is adopted in this model to extract key phrases more effec-tively.Experimental results show that the fusion model achieves F1 score of 62.08 in comparison with the traditional model,and its performance is much better than that of the traditional model.
关 键 词:抽取 深度学习 特征融合 信息检索 长短期记忆网络
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222