检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:胡小荣[1] 姚长青[1] 高影繁[1] HU Xiao-rong;YAO Chang-qing;GAO Ying-fan(Institute of Scientific and Technical Information of China. Beijing 100038. China)
出 处:《情报科学》2019年第6期49-54,共6页Information Science
摘 要:【目的/意义】针对基于统计特征的短语识别方法存在的噪声问题,提出了融合多策略的短语识别方法。【方法/过程】该方法融合多统计量提取候选短语,并基于停用词表进行初步过滤,利用词向量较强的语义表达能力对候选短语进行过滤,以提高短语识别的准确率。在环保领域专利语料上进行实验,利用搜狗新闻语料与中文专利数据训练词向量库进行短语识别优化。【结果/结论】该方法对于语料规模较小以及阈值较低的结果过滤还有待进一步研究。实验结果表明,融合深度学习的方法提高了短语识别的准确率。【Purpose/significance】We propose a multi-strategy-based phrase recognition method in this paper to solve the noise problem on the phrase recognition method based on statistical features.【Method/process】The method firstly fuses multiple statistics to extract candidate phrases,and performs preliminary filtering based on the stop word list.It further uses the strong semantic expression ability of word vectors to filter candidate phrases,thereby improving the accuracy of phrase recognition.Finally,we carry out experiments using patent texts in the field of environmental protection as experimental data,and train two word vector library using Sougou news corpus and Chinese patents.【Result/conclusion】The method for further filtering of results with smaller corpus size and lower threshold remains to be further studied.The experiments prove that the method improves the accuracy of phrase recognition.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:18.222.218.204