信息增益混合邻域粗糙集的肺部肿瘤高维特征选择算法  被引量:3

High-Dimensional Feature Selection Algorithm for Lung Tumors Based on Information Gain and Neighborhood Rough Set

在线阅读下载全文

作  者:陆惠玲[1] 周涛[1,2,4] 张飞飞 霍兵强 LU Huiling;ZHOU Tao;ZHANG Feifei;HUO Bingqiang(School of Science,Ningxia Medical University,Yinchuan,750004,China;School of Computer Science and Engineering,North Minzu University,Yinchuan,750021,China;China Telecom Corporation Limited Ningxia Branch,Yinchuan,750002,China;Ningxia Key Laboratory of Intelligent Information and Big Data Processing,Yinchuan,750021,China)

机构地区:[1]宁夏医科大学理学院,银川750004 [2]北方民族大学计算机科学与工程学院,银川750021 [3]中国电信股份有限公司宁夏分公司,银川750002 [4]宁夏智能信息与大数据处理重点实验室,银川750021

出  处:《数据采集与处理》2020年第3期536-548,共13页Journal of Data Acquisition and Processing

基  金:国家自然科学基金(61561040)资助项目;宁夏312人才计划资助项目;北方民族大学引进人才科研启动(2020KYQD08)资助项目。

摘  要:针对冗余属性和不相关属性过多对肺部肿瘤诊断的影响以及Pawlak粗糙集只适合处理离散变量而导致原始信息大量丢失的问题,提出混合信息增益和邻域粗糙集的肺部肿瘤高维特征选择算法(Information gain-neighborhood rough set-support vector machine,IG-NRS-SVM)。该算法首先提取3000例肺部肿瘤CT图像的104维特征构造决策信息表,借助信息增益结果选出高相关的特征子集,再通过邻域粗糙集剔除高冗余的属性,通过两次属性约简得到最优的特征子集,最后采用网格寻优算法优化的支持向量机构建分类识别模型进行肺部肿瘤良恶性的鉴别。从约简和分类识别两个角度验证方法的可行性与有效性,并与不约简算法、Pawlak粗糙集、信息增益和邻域粗糙集约简算法进行对比。结果表明混合算法精确度优于其他对比算法,精确度达到96.17%,并且有效降低了时间复杂度,对肺部肿瘤计算机辅助诊断具有一定的参考价值。Aiming at the influence of excessive redundant and unrelated attributes on the diagnosis of lung tumors and the fact that Pawlak rough set is only suitable for dealing with discrete variables and causing a large loss of original information,a high-dimensionality of lung tumors with mixed information gain and neighborhood rough set is proposed.The algorithm first extracts the 104-dimensional feature structure decision information table of 3000 CT images of lung tumors.With the information gain result,the high correlation feature subset is selected,and the high redundancy attribute is eliminated by the neighborhood rough set.The optimal feature subset is obtained through two attribute reductions.Finally,the support vector machine optimized by the grid optimization algorithm is used to construct the classification recognition model to identify the benign and malignant lung tumors.The feasibility and effectiveness of the method are verified from the two aspects of reduction and classification,and compared with the nonreduction algorithm,Pawlak rough set,information gain and neighborhood rough set reduction algorithm.The results show that the accuracy of the hybrid algorithm is better than other comparison algorithms,the accuracy is 96.17%,and the time complexity is effectively reduced.It has certain reference value for computer-aided diagnosis of lung tumors.

关 键 词:信息增益 邻域粗糙集 支持向量机 肺部肿瘤 特征选择 

分 类 号:TP391.4[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象