检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:钱玲龙 武娇 王人锋 陆慧娟[1] QIAN Ling-long;WU Jiao;WANG Ren-feng;LU Hui-juan(China Jiliang University,Hangzhou 310018,China)
机构地区:[1]中国计量大学,杭州310018
出 处:《计算机科学》2020年第S02期97-105,共9页Computer Science
基 金:国家自然科学基金(61272315,61602431);浙江省自然科学基金(LQ20F030015);国家级大学生创新创业训练计划-基于自然语言处理的智能阅读模型(201810356020)。
摘 要:文档自动摘要是自然语言处理领域中的重要任务,受限于难以准确理解文档语义,大多通过词频、关键词等人工特征对文档句子进行重要程度排序,以此提取摘要。受稀疏表示理论启发,提出了一种基于稀疏表示的动态语义空间划分算法。算法对初始划分的语义子空间进行字典学习,利用所得字典对所有句向量进行稀疏重构,从而将各句向量动态调整至重构误差最小的划分,迭代地实现语义空间的重划分。对于划分后语义子空间内摘要句的提取,提出了一种基于稀疏相似度排序的自动摘要提取算法。将各语义子空间的所有句向量作为字典原子,通过稀疏重构,得到能体现句子对其他句子语义表征程度的稀疏相似度,以各句累积稀疏相似度作为衡量句子表征空间语义信息能力的指标,依据其排序来提取摘要句。在猫途鹰网站热门景点旅游评论数据集上进行了实验,结果表明语义空间重构误差快速迭代5次即可稳定收敛且平均有效降低重构误差约17%,且算法对数据维度不敏感,所提摘要避免了重复提取冗余度大、重复性高的文本,是一种有效的自动摘要方法。Automatic document summary is an important task in the field of natural language processing.Limited by the difficulty of accurately understanding the semantics of documents,most of the documents are sorted by artificial features,such as word frequency and keywords,to extract the abstract.Inspired by the theory of sparse representation,a dynamic semantic space partition algorithm based on sparse representation is proposed.The algorithm performs dictionary learning on the initially divided semantic subspace,uses the obtained dictionary to sparsely reconstruct the sentence vector.Dynamically adjusts it to the division which has the smallest reconstruction error.Iteratively realizes the re-division of the semantic space.For abstracting sentences in the divided semantic subspace,an automatic extraction algorithm based on sparse similarity ranking is proposed.All sentence vectors in each semantic subspace are viewed as dictionary atoms.Through sparse reconstruction,the sparse similarity can be obtained which reflects the degree of semantic representation of one sentences to others.The cumulative sparse similarity of each sentence to other sentences is used as a metric to measure the ability of the sentence to represent the spatial semantic information.Ranking the cumulative sparse similarity,and then extract the required top N sentences.The experimental results on the travel review data set of popular attractions on the TripAdvisor website show that the semantic space reconstruction error can be rapidly reduced after 5 iterations,remain stable which shows the convergence.Except for effectively reduce the reconstruction error by nearly 17%,the algorithm is also not sensitive to data dimensions.The proposed summary avoids repeated abstraction of redundant and highly repetitive text,which is an effective multi-document automatic summarization method.
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46