人体血清拉曼光谱结合六种机器学习算法对肺癌的诊断研究  

Diagnosis of Lung Cancer by Human Serum Raman Spectroscopy Combined With Six Machine Learning Algorithms

作  者:倪钦如 欧全宏[1] 时有明[2] 刘超[3] 左烨豪 智兆星 任先培 刘刚[1] NI Qin-ru;OU Quan-hong;SHI You-ming;LIU Chao;ZUO Ye-hao;ZHI Zhao-xing;REN Xian-pei;LIU Gang(College of Physics and Electronic Information,Yunnan Normal University,Kunming 650500,China;College of Physics and Electronic Engineering,Qujing Normal University,Qujing 655011,China;Department of Nuclear Medicine,Yunnan Cancer Hospital,Kunming 650118,China;College of Physics and Electronic Engineering,Sichuan University of Science&Engineering,Zigong 643000,China)

机构地区:[1]云南师范大学物理与电子信息学院,云南昆明650502 [2]曲靖师范学院物理与电子工程学院,云南曲靖655011 [3]云南省肿瘤医院核医学科,云南昆明650118 [4]四川轻化工大学物理与电子工程学院,四川自贡643000

出  处:《光谱学与光谱分析》2025年第3期685-691,共7页Spectroscopy and Spectral Analysis

基  金:云南省基础研究计划项目(202301AT070068);云南省专家工作站(202205AF150008);国家自然科学基金项目(31760341)资助。

摘  要:肺癌严重威胁人类健康,近年我国发病率逐渐增加。影像学和病理组织学检查是肺癌的主要筛查方式。影像学检查作为初筛方法,应用广泛,但存在不确定性。病理组织学检查结果准确,是肺癌诊断的“金标准”,而组织样本的获取对人体创伤较大。有必要开发一种可靠且创伤小的肺癌诊断方法。血清样本的获取比病理组织样本的获取便捷,且创伤小。拉曼光谱具有测试简单、快速及灵敏度高等优点,可获取血清的生化信息。该研究使用拉曼光谱技术测试155例健康受试者和92例肺癌患者的血清样本。在1800~800 cm^(-1)的波段范围内对健康受试者和肺癌患者血清的拉曼光谱进行曲线分峰拟合,发现肺癌患者的拉曼光谱在1005和1091 cm-1附近的子峰面积百分比较健康受试者分别增加3.36%和5.32%。在964、1522和1586 cm^(-1)附近的子峰面积百分比,肺癌患者比健康受试者降低了2.3%、2.82%和5.6%。曲线分峰拟合结果初步表明,肺癌患者的血清中类胡萝卜素、氨基酸、核糖、核酸等生化物质含量发生了变化。为了深入探索肺癌患者血清的拉曼光谱特征,利用机器学习挖掘血清的拉曼光谱数据中隐含的信息。首先使用主成分分析(PCA)提取光谱的特征变量,并将获得的特征变量分别应用于支持向量机(SVM)、随机森林(RF)、k邻近(kNN)、逻辑回归分类(LRC)、决策树(DT)和贝叶斯(Bayes)算法建立分类模型,再使用留一交叉验证法评估模型的预测性能。结果显示:SVM模型对肺癌患者血清的拉曼光谱分类效果最好,准确率、灵敏度、特异性和F1分数分别达到98%、94.44%、100%和97.14%。SVM模型的9折交叉验证ROC曲线下面积的平均值为0.94,说明SVM算法的预测性能较好。研究表明,血清拉曼光谱结合机器学习算法可对肺癌进行有效诊断,该技术创伤小且准确率高,是一种潜在的肺癌诊断技术。Lung cancer is a serious threat to human health.In recent years,the incidence of lung cancer has been increasing in China.Imaging examination and histopathological examination are the main screening methods for lung cancer.Imaging examinations are widely used as a preliminary screening method,but they have some uncertainties.The result of the histopathological examination is accurate,so the histopathological examination is the“gold standard”of a lung cancer diagnosis.However,the acquisition of tissue samples can cause traumatic lung injury.Therefore,developing a reliable and minimally invasive method for lung cancer diagnosis is necessary.Acquiring serum samples is more convenient and less invasive than pathological tissue samples.Raman spectroscopy has the advantages of a simple operation,rapid sensitivity,and the ability to provide biochemical information on serum samples.This study obtained Raman spectra of the serum in 155 healthy subjects and 92 lung cancer patients.Curve fitting was applied to the Raman spectra data,and characteristic differences between healthy subjects and lung cancer patients were found in the range of 1800~800 cm^(-1).The curve fitting results showed that compared with healthy subjects,the area percentages of sub-peaks around 1005 and 1091 cm^(-1) of lung cancer patients increased by 3.36%and 5.32%.On the contrary,the area percentage of sub-peaks around 964,1522 and 1586 cm^(-1) of lung cancer patients decreased by 2.3%,2.82%,and 5.6%.The preliminary results of curve fitting showed that the biochemical substances of carotenoids,amino acids,ribose,and nucleic acids in the serum of lung cancer patients were altered.To investigate the Raman spectral characteristics of serum in healthy subjects and lung cancer patients,machine learning methods were used to obtain the hidden information of the Raman spectral data.First,principal component analysis(PCA)was used to extract the characteristic variables of the spectra.The characteristic variables were applied to support vector machine(SVM),r

关 键 词:拉曼光谱 肺癌 血清 机器学习 支持向量机 

分 类 号:O657.3[理学—分析化学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象