基于Lasso特征选择的代谢综合征数据分类  被引量:2

Classification ofmetabolic syndrome based on Lasso feature selection

在线阅读下载全文

作  者:闫慈[1] 田翔华[2] 阿拉依.阿汗 张伟文[1] 曹明芹[1] 

机构地区:[1]新疆医科大学公共卫生学院,乌鲁木齐830011 [2]新疆医科大学医学工程技术学院

出  处:《公共卫生与预防医学》2017年第6期31-33,共3页Journal of Public Health and Preventive Medicine

基  金:国家自然科学基金(71663053);新疆科技支疆项目(2016E02082)

摘  要:目的针对体检数据的高维度、高冗余特点,对体检数据进行Lasso特征选择,为高维体检数据减少数据冗余提供方法学参考。方法以代谢综合征为切入点,收集乌鲁木齐某体检中心2016年体检者信息共34 981例,每位体检者信息包含75个变量。Lasso算法用于筛选体检中与代谢综合征强相关的变量。以F值、几何均数、ROC曲线下面积作为评价指标,比较Lasso特征选择前后,决策树分类体检中的代谢综合征患者的性能。结果 Lasso特征选择后,体检变量降至34个与代谢综合征强先关的炎性因子。Lasso特征选择后,C4.5决策树的分类性能提高。结论建议在对体检高维数据分类前,运用Lasso进行特征选择,减少数据冗余,同时提高分类算法性能。Objective In view of high dimensional and high redundancy characteristics of physical examination,the appliance of Lasso feature selection as methodological reference for physical examination data was studied. Methods Taking metabolic syndrome as the breakthrough point,34 981 cases undergoing physical examination in 2016 were collected from a physical examination center in Ummqi,every case included 75 variables. Lasso was used to screen variables that were strongly related to metabolic syndromes in physical examination. With F - measure,G - mean and area under R0C curve as evaluation criteria. Functions in decision tree classified metabolic syndromes were compared before and after using Lasso feature selection. Results After using the Lasso feature selection, the physical examination variables reduced to 34 inflammatory factors which were strongly related to the metabolic syndromes. The classification performance of C4. 5 decision tree improved after using the Lasso feature selection. Conclusions In order to reduce data redundancy and improve the performance of classification algorithm,it is suggested to use Lasso feature selection.

关 键 词:Lasso 特征选择 体检 代谢综合征 分类 

分 类 号:R331[医药卫生—人体生理学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象