检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:闫慈[1] 田翔华[2] 阿拉依.阿汗 张伟文[1] 曹明芹[1]
机构地区:[1]新疆医科大学公共卫生学院,乌鲁木齐830011 [2]新疆医科大学医学工程技术学院
出 处:《公共卫生与预防医学》2017年第6期31-33,共3页Journal of Public Health and Preventive Medicine
基 金:国家自然科学基金(71663053);新疆科技支疆项目(2016E02082)
摘 要:目的针对体检数据的高维度、高冗余特点,对体检数据进行Lasso特征选择,为高维体检数据减少数据冗余提供方法学参考。方法以代谢综合征为切入点,收集乌鲁木齐某体检中心2016年体检者信息共34 981例,每位体检者信息包含75个变量。Lasso算法用于筛选体检中与代谢综合征强相关的变量。以F值、几何均数、ROC曲线下面积作为评价指标,比较Lasso特征选择前后,决策树分类体检中的代谢综合征患者的性能。结果 Lasso特征选择后,体检变量降至34个与代谢综合征强先关的炎性因子。Lasso特征选择后,C4.5决策树的分类性能提高。结论建议在对体检高维数据分类前,运用Lasso进行特征选择,减少数据冗余,同时提高分类算法性能。Objective In view of high dimensional and high redundancy characteristics of physical examination,the appliance of Lasso feature selection as methodological reference for physical examination data was studied. Methods Taking metabolic syndrome as the breakthrough point,34 981 cases undergoing physical examination in 2016 were collected from a physical examination center in Ummqi,every case included 75 variables. Lasso was used to screen variables that were strongly related to metabolic syndromes in physical examination. With F - measure,G - mean and area under R0C curve as evaluation criteria. Functions in decision tree classified metabolic syndromes were compared before and after using Lasso feature selection. Results After using the Lasso feature selection, the physical examination variables reduced to 34 inflammatory factors which were strongly related to the metabolic syndromes. The classification performance of C4. 5 decision tree improved after using the Lasso feature selection. Conclusions In order to reduce data redundancy and improve the performance of classification algorithm,it is suggested to use Lasso feature selection.
分 类 号:R331[医药卫生—人体生理学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.15