检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马建红[1] 杨浩[1] 姚爽 MA Jianhong;YANG Hao;YAO Shuang(School of Computer Science and Engineering,Hebei University of Technology,Tianjin 300401,China)
机构地区:[1]河北工业大学计算机科学与软件学院,天津300401
出 处:《郑州大学学报(理学版)》2018年第2期86-91,共6页Journal of Zhengzhou University:Natural Science Edition
摘 要:句子特征提取与相似度计算是自然语言处理中的重要问题.目前汉语句子相似度计算方法不能全面考虑句子语义,因而导致相似度计算结果不够准确.提出了基于深层稀疏自动编码器的句子语义特征提取及相似度计算算法.首先将句子表示为高维、稀疏向量,进一步利用深度无监督学习句子非线性特征,即将高维、稀疏向量变换到低维、本质特征空间,此过程是一种更为纯粹的端到端的学习,避免了建立停用词表、分词等工作,最终得到可直接用于句子相似度计算的低维特征表示.实验结果表明,提取到的句子特征应用于句子相似度计算,与基于关系向量模型的句子相似度计算方法相比,提高了相似度计算准确率,计算的时间复杂度仅为O(n).The extraction of sentence features and the calculation of similarity were two important issues in the natural language processing field.Currently,the similarity calculation method of Chinese sentences could not take the sentence meanings into consideration comprehensively,and this resulted in the calculation result of similarity was not accurate enough.The thesis aimed to discuss the sentence′s semantic feature based on the deep auto-encoder and the calculation method of the similarity.Firstly,the sentence was expressed in the form of high-dimensional and sparse vectors.Then the high-dimensional and sparse vectors were transformed into low-dimensional vectors by using the auto-encoder to the non-linear feature of sentence′s unsupervisedly.After this repeated dimensionality reduction,the final features of sentences were used to calculate their sentence′s similarity.This was a pure process of end-to-end study to avoid the establishments of stop word list and word segmentation effectively.The experiment result indicated that,the proposed method not only increased the accuracy of similarity calculation,but also made the time complexity be O(n).
关 键 词:自动编码器 无监督特征学习 语义特征提取 相似度计算
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.88