检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:梁怀新 郝连旺[1] 宋佳霖[1] 郑存芳 洪文学[1] Liang Huaixin;Hao Lianwang;Song Jialin;Zheng Cunfang;Hong Wenxue(Institute of Biomedical Engineering, Yanshan University, Qinhuangdao 06600)
机构地区:[1]燕山大学生物医学工程研究所,秦皇岛066004
出 处:《高技术通讯》2018年第1期39-51,共13页Chinese High Technology Letters
基 金:国家自然科学基金(61273019;81373767;61501397;61201111);河北省自然科学基金(F2016203443)资助项目
摘 要:提出了一种基于增量学习和最小绝对值收缩和选择算子(Lasso)特征选择融合的数据可视化模式识别方法。该方法首先对归一化数据进行一级Lasso筛选特征降维,之后对连续数据进行基于Gini指数的粒化,再送入增量模式学习系统进行增量学习,针对维数大量升高的情况进行Lasso二级特征筛选生成一致模式决策表,生成属性偏序结构图可视化规则发现。数据采用来自UCI的5个数据库,并与分类器KNN,SVM,Adaboost,Random Forest进行分类准确度比较,实验表明,基于该算法的分类精度普遍高于其他分类器水平,且属性偏序结构图可视化层次清晰鲜明。通过增量学习实验设计,得到了准确率、图结构更新和不同比例增量数据的动态关系,其中Pima Indians Diabetes数据学习达到40%时准确率(77.66%)超过Adaboost(75.32%)、SVM(77.27%)、1NN(59.74%)、3NN(75.97%)算法。结果表明该算法进行数据的可视化和模式识别是行之有效的。A data visualization and pattern recognition method based on the fusion of incremental learning and least absolute shrinkage and selection operator( Lasso) feature selection is proposed. The method selects the features of the normalized data by the first-order Lasso to deduce the dimensions. When the granular computing of the continuous data is completed by using the Gini index,the data is then sent to the incremental learning system. The second-order Lasso feature selection is used to deal with the increasing dimensions,and the attribute partial order structure diagram is generated to visualize the rules concerned. Five databases from UCI and five classifiers( 1 NN,3 NN,SVM,Adaboost,and Random Forest) are selected to make comparison with the precision result of the proposed method. The result shows that the precision of the method is higher than that of other algorithms generally,and the attribute partial order structure diagram has clear layers and structures. The incremental learning experiment is designed to testify the relationships of the precision and update of the structures of the diagram with different incremental learning proportions. When the proportion reaches 40%,the precision of the Pima Indians Diabetes database( 77. 66%) can exceed over the Adaboost( 75. 32%), SVM( 77. 27%),1 NN( 59. 74%) and 3 NN( 75. 97%) algorithm with learning process of all of data. The result shows that the method proposed is an effective tool for the visualization and pattern recognition.
关 键 词:增量学习 最小绝对值收缩和选择算子(Lasso) 属性偏序结构图 可视化 模式识别 粒化
分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222