检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:孙璧凡 辜丽川[2,3,4] SUN Bifan;GU Lichuan
机构地区:[1]安徽农业大学信息与人工智能学院,安徽合肥230036 [2]安徽农业大学信息与计算机学院,安徽合肥230036 [3]智慧农业技术与装备安徽省重点实验室信息与人工智能学院,安徽合肥230036 [4]安徽省农情信息感知与智能计算工程研究中心信息与人工智能学院,安徽合肥230036
出 处:《淮南师范学院学报》2025年第2期131-135,共5页Journal of Huainan Normal University
基 金:安徽省教育厅科研项目“基于主题建模的组合文本语义表示和推理优化方法研究”(2022AH050889);国家自然科学基金“基于加权隐私保护计算的非小细胞肺癌辅助诊断方法研究”(62301006)。
摘 要:文本摘要通常用于提炼大量文本的核心内容,但针对科技文献而非通用文本的专用摘要模型较少。文章提出一种面向科技文献中篇章结构的生成式文本摘要模型RTsum(Rhetorical Topic summarization model),其结合了语步结构分类模块,以科技文献的篇章结构信息引导深度学习中的神经主题模型,来获取更具有事实一致性的全局语义,从而形成高质量的文本摘要。具体来说,RTsum首先根据文章篇章信息对原始文档句子进行分类,再融合层次化的Transformer编码器(Hierarchical transformer encoder)和神经主题模(Neural topic model),不仅可以将文本的全局语义与语步结构信息相结合,还可以减少次优主题句的冗余,并通过语步分类优化的主题分布融入生成式摘要,增强科学文献摘要的质量。实验结果表明,在CORD-19和XSUM数据集上,RTsum模型生成的摘要准确率和事实一致性的相关指标分别取得最高7.68%和9.09%的提升,提升了科技文献生成式文本摘要的事实性和准确性。Text summarization is typically employed to distill the core content from large volumes of text.However,there is a scarcity of specialized summarization models designed specifically for scientific and technological literature rather than general-purpose texts.This paper proposes a rhetorical topic summarization model(RTsum)for scientific and tech nological literature,which incorporates a rhetorical structure classification module to guide a neural topic model in deep learning.By leveraging the discourse structure information of scientific and technological literature,RTsum aims to obtain globally coherent semantics with higher factual consistency,thereby generating high-quality text summaries.Specifically,RTsum first categorizes the sentences of the original document based on its discourse structure information.It then in tegrates a hierarchical transformer encoder with a neural topic model.This integration not only combines the global se mantics of the text with its rhetorical structure but also reduces redundancy from suboptimal topic sentences.By optimi zing the topic distribution through rhetorical classification,RTsum enhances the quality of scientific literature summaries.Experimental results demonstrate that,on the CORD-19 and XSUM datasets,the accuracy and factual consistency metrics of the summaries generated by the RTsum model achieved maximum improvements of 7.68%and 9.09%,re spectively.These findings indicate that RTsum significantly enhances the factual accuracy and overall quality of abstractive text summarization for scientific and technological literature.
关 键 词:生成式文本摘要 领域文本分析 深度学习 语步分类 自然语言处理
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7