基于距离度量的癌症预测分类算法研究  

Research on Cancer Prediction Classification Algorithm Based on Distance Measurement

在线阅读下载全文

作  者:殷丽凤[1] 刘浩琦 YIN Lifeng;LIU Haoqi(School of Rail Intlligence Engineering,Dalian Jiaotong University,Dalian 116028,China)

机构地区:[1]大连交通大学轨道智能工程学院,辽宁大连116028

出  处:《大连交通大学学报》2025年第2期106-112,共7页Journal of Dalian Jiaotong University

基  金:国家自然科学基金项目(61771087)。

摘  要:为了提高分类算法的效率及准确性,提出一种基于距离度量的二分类算法模型并应用于癌症识别领域。首先,利用k-means聚类找到数据集的聚类中心,计算每个样本点到聚类中心的曼哈顿距离、余弦相似度和马氏距离。其次,采用距离度量替换原有属性放人GBM和XGBoost分类器进行学习的方式来压缩数据属性,以减少分类器的训练压力、提高训练效率,并用训练好的模型对测试集进行预测。最后,设计3组不同训练方式进行对比试验,用分类评估标准评估模型性能,并控制参数从多个角度验证TCDM的合理性。试验结果表明,TCDM相较于其他分类模型在癌症识别领域中有更高的性能和准确率。In order to improve the efficiency and accuracy of classification algorithm,a binary classification algorithm model based on distance measurement is proposed and applied to the field of cancer identification.Firstly,k-means clustering is used to find the cluster center of data set,and Manhattan distance,cosine similarity and Mahalanobis distance from each sample point to the cluster center are calculated.Secondly,the distance metric is used to replace the original attributes and put them into GBM and XGBoost classifiers to compress the data attributes,so as to reduce the training pressure of classifiers and improve the training efficiency,and the trained model is used to predict the test set.Finally,three groups of different training methods are designed for comparative experiments,the performance of the model is evaluated by used of classification evaluation criteria,and the rationality of TCDM is verified from multiple angles by controlling parameters.The experiment results show that the TCDM has higher performance and accuracy than other classification models in the field of cancer identification.

关 键 词:分类算法 余弦相似度 马氏距离 曼哈顿距离 K-MEANS聚类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象