科技文献篇章分析在文本摘要中的计算机应用  

The Application of Scientific Literature Discourse Analysisin Text Summarization Computing

在线阅读下载全文

作  者:孙璧凡 辜丽川[2,3,4] SUN Bifan;GU Lichuan

机构地区:[1]安徽农业大学信息与人工智能学院,安徽合肥230036 [2]安徽农业大学信息与计算机学院,安徽合肥230036 [3]智慧农业技术与装备安徽省重点实验室信息与人工智能学院,安徽合肥230036 [4]安徽省农情信息感知与智能计算工程研究中心信息与人工智能学院,安徽合肥230036

出  处:《淮南师范学院学报》2025年第2期131-135,共5页Journal of Huainan Normal University

基  金:安徽省教育厅科研项目“基于主题建模的组合文本语义表示和推理优化方法研究”(2022AH050889);国家自然科学基金“基于加权隐私保护计算的非小细胞肺癌辅助诊断方法研究”(62301006)。

摘  要:文本摘要通常用于提炼大量文本的核心内容,但针对科技文献而非通用文本的专用摘要模型较少。文章提出一种面向科技文献中篇章结构的生成式文本摘要模型RTsum(Rhetorical Topic summarization model),其结合了语步结构分类模块,以科技文献的篇章结构信息引导深度学习中的神经主题模型,来获取更具有事实一致性的全局语义,从而形成高质量的文本摘要。具体来说,RTsum首先根据文章篇章信息对原始文档句子进行分类,再融合层次化的Transformer编码器(Hierarchical transformer encoder)和神经主题模(Neural topic model),不仅可以将文本的全局语义与语步结构信息相结合,还可以减少次优主题句的冗余,并通过语步分类优化的主题分布融入生成式摘要,增强科学文献摘要的质量。实验结果表明,在CORD-19和XSUM数据集上,RTsum模型生成的摘要准确率和事实一致性的相关指标分别取得最高7.68%和9.09%的提升,提升了科技文献生成式文本摘要的事实性和准确性。Text summarization is typically employed to distill the core content from large volumes of text.However,there is a scarcity of specialized summarization models designed specifically for scientific and technological literature rather than general-purpose texts.This paper proposes a rhetorical topic summarization model(RTsum)for scientific and tech nological literature,which incorporates a rhetorical structure classification module to guide a neural topic model in deep learning.By leveraging the discourse structure information of scientific and technological literature,RTsum aims to obtain globally coherent semantics with higher factual consistency,thereby generating high-quality text summaries.Specifically,RTsum first categorizes the sentences of the original document based on its discourse structure information.It then in tegrates a hierarchical transformer encoder with a neural topic model.This integration not only combines the global se mantics of the text with its rhetorical structure but also reduces redundancy from suboptimal topic sentences.By optimi zing the topic distribution through rhetorical classification,RTsum enhances the quality of scientific literature summaries.Experimental results demonstrate that,on the CORD-19 and XSUM datasets,the accuracy and factual consistency metrics of the summaries generated by the RTsum model achieved maximum improvements of 7.68%and 9.09%,re spectively.These findings indicate that RTsum significantly enhances the factual accuracy and overall quality of abstractive text summarization for scientific and technological literature.

关 键 词:生成式文本摘要 领域文本分析 深度学习 语步分类 自然语言处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象