基于Lasso特征选择乳腺癌二分类算法研究  被引量:1

A Study on the Lasso Feature-based Selection Algorithm for Breast Cancer Binary Classification

在线阅读下载全文

作  者:冯欣 张航 辛瑞昊 FENG Xin;ZHANG Hang;XIN Ruihao(School of Mathematics and Science,Jilin Institute of Chemical Technology,Jilin City 132022,China;School of Information and Control Engineering,Jilin Institute of Chemical Technology,Jilin City 132022,China)

机构地区:[1]吉林化工学院理学院,吉林吉林132022 [2]吉林化工学院信息与控制工程学院,吉林吉林132022

出  处:《吉林化工学院学报》2023年第1期23-28,共6页Journal of Jilin Institute of Chemical Technology

基  金:吉林省教育厅科学研究重点项目(1KH20220245K1);吉林省教育厅人文社科研究项目(JIKH 20220226SK);吉林省高教科研课题(JGJX2021D226);吉林省自然科学基金(YDZJ202301ZYTS288);吉林省自然科学基金(YDZJ202301ZYTS401)。

摘  要:近年来,随着大数据挖掘技术在医疗行业的迅速发展,临床精准治疗成为医疗大数据领域的研究热点。基于UCI数据库中乳腺癌数据集,通过构建乳腺癌二分类算法来预测乳腺肿瘤类型。其中针对不平衡数据集的处理、特征选择算法的优化以及分类准确率的评估,使用了机器学习技术包括随机过采样算法、Least absolute shrinkage and selection operator(Lasso)回归进行特征选择、序列前向选择(SFS)的特征选择算法。结果表明包含其中的6个特征的随机森林算法分类准确率最高(97.07%),相对于未进行特征选择算法的准确率有所提高,有可能在乳腺癌检测方面提供新的思路。In recent years,with the rapid development of big data mining technology in the medical industry,clinical precision therapy has become a research hotspot in the field of medical big data.In this study,based on the breast cancer dataset in the UCI database,a breast cancer dichotomous classification algorithm was constructed to predict breast tumour types.Among them,machine learning techniques including random oversampling algorithm,Least absolute shrinkage and selection operator(Lasso)regression for feature selection,and sequential forward selection(SFS)for feature selection algorithm were used for the processing of imbalanced dataset,optimisation of feature selection algorithm and evaluation of classification accuracy.The results showed that the random forest algorithm containing six of these features had the highest classification accuracy(97.07%),which improved the accuracy relative to the algorithm without feature selection and could potentially provide new ideas in breast cancer detection.

关 键 词:乳腺癌 Lasso SFS 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象