基于深度学习的海洋热点新闻挖掘方法  

Deep Learning-based Method for Mining Ocean Hot Spot News

在线阅读下载全文

作  者:覃娴萍 丁昭旭 仲国强 王栋[2] QIN Xianping;DING Zhaoxu;ZHONG Guoqiang;WANG Dong(College of Computer Science and Technology,Ocean University of China,Qingdao,Shandong 266404,China;Library of Ocean University of China,Qingdao,Shandong 266404,China)

机构地区:[1]中国海洋大学计算机科学与技术学院,山东青岛266404 [2]中国海洋大学图书馆,山东青岛266404

出  处:《计算机科学》2024年第S02期98-107,共10页Computer Science

基  金:科技创新2030-“新一代人工智能”重大项目(2018AAA0100400);山东省自然科学基金(ZR2020MF131,ZR2021ZD19);青岛市科技计划项目(21-1-4-ny-19-nsh);中国海洋大学图书情报研究基金(202253006)。

摘  要:移动互联网的快速发展和现代移动客户端的普及推动了网络新闻行业、社交媒体和自媒体等的蓬勃发展,为用户提供了多元、丰富的海量信息。随着我国海洋强国战略的稳步推进,国民海洋意识的显著增强,有关海洋领域的多方面信息充斥着网络,相关媒体报道、公众舆论在网上大量涌现,热点事件频频发生。针对多来源、多属性的网络海洋信息,基于多源文本聚类和自动摘要技术,提出一种基于深度学习的海洋热点新闻自动挖掘系统,包括多源涉海数据自动采集、数据预处理、特征提取、文本聚类、自动摘要五大功能模块。具体而言,网络爬虫程序从多个数据源采集多样且分散的海洋数据,自动将数据结构化后存入数据库;根据文本特征的近似程度和文本间的关联关系实现聚类分析,聚类结果为后继摘要生成、主题发现提供数据支撑;基于预训练语言模型强大的上下文理解能力和丰富的语言表达能力,提出基于预训练语言模型的海洋新闻自动摘要生成方法。通过多组实验证明了所提方法在各个评估指标上的有效性,突显出其在多源异构网络海洋新闻挖掘方面的优势。该方法为处理分散的海洋资讯信息、生成可读性更强的内容摘要提供可行的解决方案,对提高海洋信息获取效率、监测公众舆论走向、推动海洋信息的应用与传播具有重要意义。The rapid development of the mobile Internet and the popularity of modern mobile clients promote the vigorous development of the online news industry,social media and self-media,etc.,providing users with diverse and rich information.With the steady advancement of China’s maritime power strategy and the significant enhancement of national maritime eawareness,the Internet is flooded with multifaceted information on the ocean field,with relevant media reports and public opinions proliferating online and hotspot events occurring frequently.Aiming at multi-source and multi-attribute network marine information,based on multi-source text clustering and automatic summarization technology,an automatic deep learning-based ocean hot news mining system is proposed,including five functional modules:automatic collection of multi-source ocean-related data,data preprocessing,feature extraction,text clustering,and automatic summarization.Specifically,the web crawler program collects diverse and scattered ocean data from multiple data sources,automatically structures the data and stores it in the database;clustering analysis is performed based on the similarity of text features and relationships between texts,which provides data support for subsequent summarization generation and topic discovery.Additionally,an automatic summary generation method for ocean news is proposed,leveraging the powerful contextual understanding and rich language expression abilities of the pre-trained language models.Multiple experiments demonstrate the effectiveness of the proposed method in each evaluation index,highlighting its superiority in mining news on multi-source heterogeneous networks.This method provides a feasible solution for processing scattered marine information and generating more readable content summaries,significantly contributing to the enhancement of marine information retrieval efficiency,monitoring public opinion trends,and promoting the application and dissemination of marine information.

关 键 词:海洋新闻 文本聚类 自动摘要 深度学习 自然语言处理 预训练模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象