不完整数据分类与缺失信息重要性识别特权LSSVM  被引量:2

Privileged LSSVM for classification and simultaneous importance identification of missing information on incomplete data

在线阅读下载全文

作  者:吴晗 王士同 WU Han;WANG Shitong(School of Artificial Intelligence and Computer Science,Jiangnan University,Wuxi 214122,China)

机构地区:[1]江南大学人工智能与计算机学院,江苏无锡214122

出  处:《智能系统学报》2023年第4期743-753,共11页CAAI Transactions on Intelligent Systems

基  金:国家自然科学基金项目(61972181)。

摘  要:针对直接移除缺失数据的样本可能会导致因样本数量规模的减少从而降低了分类性能的问题,本文基于同时处理缺失数据与构建模式分类模型的策略,提出使用特权信息学习(learning using privileged information,LUPI)的特权最小二乘支持向量机(privileged least squares support vector machine,P-LSSVM),从而达到既能改进其分类性能,又能在保证无偏的情况下确定缺失特征的重要性。本文的基本思想是将完整数据的训练作为特权信息,以此来引导面向整个不完全数据的最小二乘支持向量机(least squares support vector machine,LSSVM)的学习,通过可加性核表达每个特征(含缺失特征)的重要性,推导完整数据的训练的特权信息,并以此构建PLSSVM,运用所提出的留一交叉验证方法完成无偏的缺失特征重要性识别。实验结果表明,本文提出的方法不但在平均测试精度上优于对比算法,还能同时确定缺失特征的重要性。While handling missing data classification tasks,the commonly-used removal strategy of missing data may perhaps degrade the classifier’s performance,due to very insufficient perfect data.Based on the strategy of processing missing data and constructing classification model simultaneously,we develop a novel privileged LSSVM(P-LSSVM),which learns using privilaged information.It can not only improve its classification performance,but also determines the importance of missing features without bias.The basic idea is to take the trained classifier of the available perfect data as the privileged information to guide the learning of LSSVM for the whole incomplete data,express the importance of each feature including missing features through the additivity kernel,then deduce the privilaged information of complete data after training,based on which P-LSSVM is constructed.Finally,the unbiased missing feature importance recognition is completed by the proposed leaving-one cross-validation method.Experimental results show that the proposed method can achieve better testing accuracies,with the importance identification of missing features.

关 键 词:最小二乘支持向量机 特权信息学习 可加性核 数据缺失 k最近邻 样本空间 特权空间 数据质量 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象