聚类标注和多粒度特征融合的基金新闻分类  

Fund News Classification Based on Clustering Annotation and Multi Granularity Feature Fusion

在线阅读下载全文

作  者:胡菊香 吕学强 游新冬[2] 周建设 HU Juxiang;L Xueqiang;YOU Xindong;ZHOU Jianshe(Research Center for Language Intelligence of China,Capital Normal University,Beijing 100048,China;Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China)

机构地区:[1]首都师范大学中国语言智能研究中心,北京100048 [2]北京信息科技大学网络文化与数字传播北京市重点实验室,北京100101

出  处:《小型微型计算机系统》2024年第2期257-264,共8页Journal of Chinese Computer Systems

基  金:国家自然科学基金项目(62171043)资助;北京市自然科学基金项目(4212020)资助;国家语委项目(ZDI145-10,YB145-3)资助;北京市教育委员会科学研究计划项目(KM202111232001)资助。

摘  要:针对人工标注类别耗时耗力、效率低,以及现有文本分类方法忽略词语、句子之间关系,未对文本分类关键特征赋予更高权重等问题,提出了一种基于聚类加权标注和多粒度特征融合的基金新闻分类方法.基于聚类加权的类别标注算法将K-Means和DBSCAN的聚类结果进行加权计算并自动标注基金文本数据,辅以少量人工校对,为后续基金新闻分类提供数据支撑.多粒度特征融合的分类算法首先从词粒度出发构建停用词表、扩展词典;其次从句粒度出发抽取新闻摘要,捕捉更具有语义关联的文本信息;最后将多头注意力机制嵌入BERT模型,对关键特征赋予更高权重,以提高分类的准确性.本文从多个角度进行了充分地实验,该方法具有高效的处理能力和有效性,其分类精确率可达到95.21%,优于现有方法.This paper proposes a fund news classification method based on cluster-weighted labeling and multi-granularity feature fusion,aiming to address issues such as time-consuming and laborious manual category labeling and the neglect of word and sentence relationships in existing text classification methods,as well as the lower weight of key features in text classification.The category labeling algorithm based on cluster-weighted labeling calculates and automatically labels fund text data based on the weighted results of K-Means and DBSCAN clustering,which is supported by a small amount of manual correction to facilitate fund news classification.The multi-granularity feature fusion classification algorithm first constructs a stop word table and an extended dictionary from the word granularity and then extracts news summaries from the sentence granularity to capture more semantically related text information.Then,multi-head attention mechanism is integrated into BERT model to assign higher weights to key features,thus improving the accuracy of classification.The proposed method is rigorously tested from multiple aspects and demonstrates high efficiency and validity,achieving a classification precision of 95.21%,outperforming existing methods.

关 键 词:多粒度 特征融合 文本分类 深度学习 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象