基于遗传算法的非监督摘要提取  被引量:1

Unsupervised Abstract Extraction based on Genetic Algorithm

在线阅读下载全文

作  者:王涛 范晓波 胥小波 WANG Tao;FAN Xiaobo;XU Xiaobo(Institute of Science and Technology Information of Sichuan,Chengdu Sichuan 610000,China;China Electronic Technology Cyber Security Co.,Ltd.,Chengdu Sichuan 610000,China)

机构地区:[1]四川省科学技术信息研究所,四川成都610000 [2]中国电子科技网络信息安全有限公司,四川成都610000

出  处:《通信技术》2021年第5期1120-1125,共6页Communications Technology

摘  要:摘要提取的一大难题是如何在不丢失关键信息的情况下简约地描述整个文档。监督模型因通常需要大量的训练语料而在实际使用中受限。子集选择算法是无监督自动文档摘要的有效方法。在该类模型中,摘要提取被建模为求解某个目标表达式的最优值。然而,优化子集选择表达式是一个NP问题,当前普遍采用贪婪式算法来求解。基于此,提出了一种新的基于遗传算法的非监督摘要提取框架,并充分考虑了中文中段首句和段尾句的重要性。实验结果表明,该方法具有较好的提取性能。The difficulty of abstract extraction is how to describe the whole document concisely without losing key information.Supervised model usually needs a large number of training corpuses,which leads to its limitation in practice.Subset selection algorithm is an effective method for unsupervised automatic document summarization.In this kind of model,abstract extraction is modeled as solving the optimal value of a target expression.However,the optimized subset selection expression is an NP problem,and current algorithms generally use greedy algorithms to solve them.Therefore,this paper proposes a new unsupervised extraction method based on genetic algorithm,and the importance of the first sentence and the last sentence of the paragraph in Chinese is fully considered.Experimental results indicate that the proposed method has good extraction performance.

关 键 词:摘要提取 遗传算法 子集选择 NP问题 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象