检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:资康莉 王石[1] 曹存根[1] ZI Kangli;WANG Shi;CAO Cungen(Key Laboratory of Intelligent Information Processing,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049)
机构地区:[1]中国科学院计算技术研究所智能信息处理重点实验室,北京100190 [2]中国科学院大学,北京100049
出 处:《高技术通讯》2023年第8期836-848,共13页Chinese High Technology Letters
基 金:国家重点研发计划(2022YFC3302300);国家242信息安全计划(2022A056)资助项目。
摘 要:标题生成作为文本摘要任务的一个分支,能够帮助人们高效获取信息。本文针对中文标题生成任务面临的大规模、高质量中文标注数据缺乏的问题,利用标题往往可由原文中的词语来构成的特点,从将无监督学习模型与有监督的序列标注模型结合的角度出发,提出了融合聚类模型和主题模型的抽取式深度神经网络中文标题生成方法和模型。在缺乏人工分类标注信息的中文新闻数据集上,该模型可利用聚类和主题模型自动挖掘数据内部潜在的特征信息,获得不同的数据簇及各簇内的主题词来辅助中文新闻标题生成,使模型在具有潜在主题类别特征的、标题质量参差的中文新闻数据集上都具有较好的适用性。本文提出的中文标题生成模型在互联网上公开的中文新闻标题数据集上的实验结果也表明其在微观F1、BLEU、ROUGE、压缩率等评价指标上都取得了较基准模型更好的效果。As a branch of text summarization task,headline generation can help people obtain information efficiently.In this paper,aiming at the lack of large-scale and high-quality Chinese annotation data in the Chinese headline generation task,taking advantage of the feature that headlines can often be formed from words in the contents,a Chinese headline generation method and model based on extractive deep neural network is proposed.The whole model is enhanced with the clustering model and the topic model,from the perspective of combining unsupervised learning model with supervised sequence labeling model.On the Chinese news data lacking manual annotated classifications,the whole model can automatically mine potential feature information within the data,and obtain different data clusters and the topic words to assist Chinese news headline generation by applying the clustering model and topic model,which makes the whole model more adaptable on the Chinese news data of different topics and uneven annotation quality.The experimental results on a dataset of Chinese news headline generation publicly available on the Internet also show that this whole model achieves better performance on the evaluation metrics,including the micro F1,BLEU,ROUGE and compression ratio than the baseline models.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.142.244.250