基于医疗文本数据聚类的帕金森病早期诊断预测  被引量:6

Early diagnosis and prediction of Parkinson’s disease based on clustering medical text data

在线阅读下载全文

作  者:张晓博[1,2,3] 杨燕[1,2,3] 李天瑞[1,2,3] 陆凡 彭莉兰 ZHANG Xiaobo;YANG Yan;LI Tianrui;LU Fan;PENG Lilan(School of Information Science and Technology,Southwest Jiaotong University,Chengdu Sichuan 611756,China;Institute of Artificial Intelligence,Southwest Jiaotong University,Chengdu Sichuan 611756,China;National Engineering Laboratory of Integrated Transportation Big Data Application Technology(Southwest Jiaotong University),Chengdu Sichuan 611756,China)

机构地区:[1]西南交通大学信息科学与技术学院,成都611756 [2]西南交通大学人工智能研究院,成都611756 [3]综合交通大数据应用技术国家工程实验室(西南交通大学),成都611756

出  处:《计算机应用》2020年第10期3088-3094,共7页journal of Computer Applications

基  金:国家自然科学基金资助项目(61976247);四川省重点研发计划项目(20ZDYF2837)。

摘  要:针对多发于老龄人群的帕金森病(PD)的早期智能化诊断的问题,提出基于医疗检测文本信息数据的聚类技术来对PD进行分析预测。首先,对原始数据集进行预处理以获取有效特征信息,并通过主成分分析(PCA)方法将原始特征分别降维到8个不同维度的维度空间;然后,应用5个传统的经典聚类模型和3种不同的聚类集成方法分别对8个维度空间的数据进行聚类;最后,采用4个聚类性能指标来预测数据集中的多巴胺异常PD患者、健康体和无多巴胺缺失(SWEDD)PD患者。仿真结果显示,PCA特征维度值取30时,高斯混合模型(GMM)的聚类准确度达到89.12%;PCA特征维度值取70时,谱聚类(SC)的聚类准确度达到61.41%;PCA特征维度值取80时,元聚类算法(MCLA)的聚类准确度达到59.62%。对比实验结果表明,5种经典聚类方法中,PCA的特征维度值小于40时,高斯混合模型聚类效果最佳;3种聚类集成方法中,对于不同的特征维度,MCLA的聚类性能均表现优异,进而为PD的早期智能化辅助诊断提供了技术和理论支撑。In view of the problem of the early intelligent diagnosis for Parkinson’s Disease(PD)which occurs more common in the elderly,the clustering technologies based on medical detection text information data were proposed for the analysis and prediction of PD.Firstly,the original dataset was pre-processed to obtain effective feature information,and these features were respectively reduced to eight dimensional spaces with different dimensions by Principal Component Analysis(PCA)method.Then,five traditional classical clustering models and three different clustering ensemble methods were respectively used to cluster the data of eight dimensional spaces.Finally,four clustering performance indexes were selected to predict PD subject with dopamine deficiency as well as healthy control and Scans Without Evidence of Dopamine Deficiency(SWEDD)PD subject.The simulation results show that the clustering accuracy of Gaussian Mixture Model(GMM)reaches 89.12%when the value of PCA feature dimension is 30,the clustering accuracy of Spectral Clustering(SC)is 61.41%when the PCA feature dimension value is 70,and the clustering accuracy of Meta-CLustering Algorithm(MCLA)achieves 59.62%when the PCA feature dimension value is 80.The comparative experiments results show that GMM has the best clustering effect in the five classical clustering methods when the PCA feature dimension value is less than 40 and MCLA has the excellent clustering performance among the three clustering ensemble methods for different feature dimensions,which thereby provides the technical and theoretical supports for the early intelligent auxiliary diagnosis of PD.

关 键 词:帕金森病 医疗文本数据 主成分分析 聚类 聚类集成 

分 类 号:TP391.7[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象