检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]江苏大学计算机科学与通信工程学院,江苏镇江212013
出 处:《计算机工程》2010年第20期52-54,共3页Computer Engineering
基 金:江苏省高技术研究基金资助项目(BG2007028);江苏省高校自然科学基金资助项目(09KJB52003)
摘 要:针对多关系多分类的非平衡数据,提出一种分类模型。在预处理阶段,建立目标类纠错输出编码(ECOC)、目标关系与背景关系间的虚拟连接并完成属性聚集处理,进而划分训练集和验证集。在训练阶段,依据一对多划分思想,结合CrossMine算法构造多个子分类器,采用AUC法评估验证各子分类器。在验证阶段,比较目标类ECOC与各子分类器分类结果连接字的海明距离,选择最小海明距离的目标类为最终分类。经合成和真实数据的实验,验证了模型有效性及分类效果。This paper proposes a multi-relational model which is applied to the multi-class imbalanced data.In the preprocessing stage,each class is assigned an Error Correcting Output Coding(ECOC).After setting up the virtual joins between the target and background relations,appropriate aggregation functions are used for different features.On this condition,the data can be divided into training set and validation set.Sub-classifiers are built on the training set in combination with One-vs-All classification method and CrossMine algorithm,and all the sub-classifiers are validated by their AUC values.The ECOC of the target class is compared with the Hamming distance of the linked word produced by the sub-classifiers on the validation set,and the class is chosen which has the shortest Hamming distance for the final result.The validity and effectiveness of the classifier by experiments are shown on both synthetic and real datasets.
关 键 词:多关系分类 非平衡数据 多类分类 纠错输出编码 一对多划分
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46