检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:杨进[1] 张晨 YANG Jin;ZHANG Chen(School of Science,University of Shanghai for Science and Technology,Shanghai 200093,China)
出 处:《计算机与现代化》2022年第7期47-53,共7页Computer and Modernization
基 金:国家自然科学基金资助项目(12071293);教育部人文社科规划基金资助项目(16YJA630037);上海市一流学科建设项目(S1201YLXK)。
摘 要:随着互联网发展,网上购物已经成为人们越来越多的选择。为了更好实现帮助顾客推荐商品的目的,对原有数据进行特征提取,再用互信息的方法对数据进行特征选择;用改进的EasyEnsemble算法处理类别不平衡的问题,利用集成策略弥补欠采样的缺陷,使样本数据得到充分的利用并且降低了正负样本差造成的影响;最后选择使用软投票的方法将XGBoost和随机森林结合为一个终分类器做预测,并与单一的算法相比,从而得到更好的结果。基于阿里巴巴天池大赛所提供的数据,以查准率P、召回率R和F1值为评价指标,分别与当前热门的机器学习算法进行对比,验证了本文方法的有效性。With the development of Internet,shopping online has become an increasing choice for people.In order to better achieve the purpose of helping customers to recommend products,the feature of original data is extracted and the feature of the data is selected by mutual information method.The improved EasyEnsemble algorithm is used to deal with the problem of category imbalance,and the defect of under-sampling is compensated by integration strategy.The sample data is fully utilized and the influence caused by positive and negative sample difference is reduced.Finally,the softvoting method is used to combine XGBoost and random forest into a final classifier for prediction,which is compared with the single algorithm,so as to get better results.Based on the data provided by Alibaba Tianchi Competition,the precision rate P,recall R and F1 values are taken as evaluation indexes to compare with the current popular machine learning algorithms respectively to verify the effectiveness of this method.
关 键 词:互信息 类别不平衡 EasyEnsemble XGBoost
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15