基于图结构聚类的自监督学习疾病诊断方法  被引量:1

Self-Supervised Learning Based on Graph Structural Clustering for Disease Diagnosis Method

在线阅读下载全文

作  者:张正康 杨丹[1] 聂铁铮 寇月 ZHANG Zhengkang;YANG Dan;NIE Tiezheng;KOU Yue(School of Computer Science and Software Engineering,University of Science and Technology Liaoning,Anshan 114051,Liaoning,China;School of Computer Science and Engineering,Northeastern University,Shenyang 110169,Liaoning,China)

机构地区:[1]辽宁科技大学计算机与软件工程学院,辽宁鞍山114051 [2]东北大学计算机科学与工程学院,辽宁沈阳110169

出  处:《计算机工程》2024年第7期360-371,共12页Computer Engineering

基  金:国家自然科学基金(62072084,62072086);辽宁省教育厅科学研究项目(LJKMZ20220646)。

摘  要:图自监督学习方法近年来被应用于疾病诊断任务中以缓解医疗标签信息缺乏和人工标注问题。然而,图自监督学习的性能主要依赖于高质量的正样本和负样本,这限制了疾病诊断的灵活性和泛用性。此外,在构建医疗异构属性图时没有充分利用病人的多模态数据,影响了疾病诊断的性能。提出一个基于医疗异构属性图结构聚类的自监督学习疾病诊断框架SC4DD。该框架利用病人的结构化数据和非结构化临床文本摘要构建医疗异构属性图,通过图上的结构聚类算法生成节点的伪标签。考虑到不同元路径对学习病人嵌入表示的重要性以及不同模态医疗数据对疾病诊断结果的影响程度,引入注意力机制的异构图神经网络作为编码器,伪标签作为自监督信号辅助编码器学习注意力系数和病人嵌入表示。在MIMIC-Ⅲ数据集上的实验结果表明,SC4DD优于传统基线方法,能够有效提高疾病诊断的性能。其中,相较于性能最优的基线方法HeCo,SC4DD在2%、3%、4%标记节点下的宏平均F1值分别提高了1.46%、0.97%、0.94%,微平均F1值分别提高了0.91%、0.84%、0.52%。Recently,graph self-supervised learning has been applied to disease diagnosis to alleviate the lack of medical labeling information and manual labeling problems.However,the performance of existing graph selfsupervised learning heavily relies on high-quality positive and negative samples,which limits the flexibility and generalizability of disease diagnosis.Moreover,the full potential of patients'multi-modal data is not adequately utilized in constructing medical heterogeneous attributed graphs,which affects the performance of disease diagnosis.Therefore,this study proposes a framework called self-supervised learning based on the Structural Clustering of a medical heterogeneous attributed graph for Disease Diagnosis(SC4DD).This framework uses medical structured data and unstructured medical text to construct a medical heterogeneous attributed graph,and generates pseudo-labels for nodes using a structural clustering algorithm on the graph.Considering the different levels of importance of the different meta-paths for learning patient representations and the different impacts of different model medical data on the diagnosis results,a heterogeneous Graph Neural Network(GNN)with an attention mechanism is introduced as an encoder.Pseudo-labels are used as self-supervised signals to assist the encoder in learning the attention coefficients and patient representations.Experimental results on the MIMIC-III dataset show that SC4DD outperforms other baselines and effectively improves the disease-diagnosis performance.In particular,compared to the optimal performance baseline method(HeCo),SC4DD achieves improvements of 1.46%,0.97%,and 0.94% in the Macro-F1 scores,along with improvements of 0.91%,0.84%,and 0.52% in the Micro-F1 scores,for 2%,3%,and 4% of labeled nodes.

关 键 词:疾病诊断 电子病历 图自监督学习 图神经网络 医疗异构属性图 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象