基于GBDT和双层漂移检测的用户评论分类算法  

User Review Classification Algorithm Based on GBDT and Double-layer Drift Detection

在线阅读下载全文

作  者:章涂义 刘三民 ZHANG Tuyi;LIU Sanmin(School of Computer and Information,Anhui Polytechnic University,Wuhu 241000,China)

机构地区:[1]安徽工程大学计算机与信息学院,安徽芜湖241000

出  处:《湖北民族大学学报(自然科学版)》2025年第1期60-66,共7页Journal of Hubei Minzu University:Natural Science Edition

基  金:安徽省自然科学基金项目(2308085MF220);安徽省高校自然科学研究重点项目(2022AH050972,KJ2021A0516)。

摘  要:为应对用户评论数据流中的概念漂移问题并提高算法的准确率,提出基于梯度提升决策树(gradient boosted decision tree, GBDT)和双层漂移检测(GBDT with double-layer drift detection, GBDT-D3)的用户评论分类算法。首先,通过计算GBDT算法中的损失改进比率快速检测潜在漂移。接着,在漂移警告基础上监测数据块中样本质心的移动情况,以精确验证漂移。然后,通过双层漂移检测机制降低用户评论数据流中的漂移误报与漏报,同时增强对动态文本数据流的分类。最后,根据双层漂移检测报告更新GBDT算法,提升分类算法的稳定性。在7个真实用户兴趣漂移文本数据集上开展实验,结果表明GBDT-D3算法在分类准确性和性能稳定性方面明显优于传统在线集成学习算法。GBDT-D3算法能够高效识别用户评论数据流中的概念漂移并增强分类精度,为动态文本数据流的分类任务提供了有效解决方案。To address the concept drift in user comment data streams and enhance accuracy of algorithm,a user review classification algorithm based on gradient boosted decision tree(GBDT)with double-layer drift detection(GBDT-D3)was proposed.Firstly,potential drifts were rapidly detected by calculating the loss improvement ratio in GBDT algorithm,followed by precise drift verification through monitoring centroid shifts of data chunks upon drift warning.Subsequently,the dual-layer drift detection mechanism effectively reduced false alarms and missed detection in user comment streams while improving classification performance for dynamic text data.Finally,the GBDT algorithm was updated based on drift detection reports to enhance classification stability of algorithm.Experiments were carried out on seven real-world text datasets with user interest drift.The results indicated that GBDT-D3 algorithm significantly outperformed traditional online ensemble learning algorithms in both classification accuracy and operational stability.The GBDT-D3 algorithm efficiently identified the concept drift in user comment streams and substantially improved classification precision,providing an effective solution for dynamic text data stream classification tasks.

关 键 词:文本数据流分类 概念漂移检测 用户评论 梯度提升决策树 数据分布 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象