检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:殷丽凤[1] 刘浩琦 YIN Lifeng;LIU Haoqi(School of Rail Intlligence Engineering,Dalian Jiaotong University,Dalian 116028,China)
机构地区:[1]大连交通大学轨道智能工程学院,辽宁大连116028
出 处:《大连交通大学学报》2025年第2期106-112,共7页Journal of Dalian Jiaotong University
基 金:国家自然科学基金项目(61771087)。
摘 要:为了提高分类算法的效率及准确性,提出一种基于距离度量的二分类算法模型并应用于癌症识别领域。首先,利用k-means聚类找到数据集的聚类中心,计算每个样本点到聚类中心的曼哈顿距离、余弦相似度和马氏距离。其次,采用距离度量替换原有属性放人GBM和XGBoost分类器进行学习的方式来压缩数据属性,以减少分类器的训练压力、提高训练效率,并用训练好的模型对测试集进行预测。最后,设计3组不同训练方式进行对比试验,用分类评估标准评估模型性能,并控制参数从多个角度验证TCDM的合理性。试验结果表明,TCDM相较于其他分类模型在癌症识别领域中有更高的性能和准确率。In order to improve the efficiency and accuracy of classification algorithm,a binary classification algorithm model based on distance measurement is proposed and applied to the field of cancer identification.Firstly,k-means clustering is used to find the cluster center of data set,and Manhattan distance,cosine similarity and Mahalanobis distance from each sample point to the cluster center are calculated.Secondly,the distance metric is used to replace the original attributes and put them into GBM and XGBoost classifiers to compress the data attributes,so as to reduce the training pressure of classifiers and improve the training efficiency,and the trained model is used to predict the test set.Finally,three groups of different training methods are designed for comparative experiments,the performance of the model is evaluated by used of classification evaluation criteria,and the rationality of TCDM is verified from multiple angles by controlling parameters.The experiment results show that the TCDM has higher performance and accuracy than other classification models in the field of cancer identification.
关 键 词:分类算法 余弦相似度 马氏距离 曼哈顿距离 K-MEANS聚类
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49