基于多头自编码网络的单细胞多组学数据无监督降噪  

Unsupervised Denoising of Single-Cell Multi-Omics Data Based on Multi-Head Autoencoder Network

在线阅读下载全文

作  者:李双翼 刘发荣 任胜 于彬 LI Shuangyi;LIU Farong;REN Sheng;YU Bin(College of Mathematics and Physics,Qingdao University of Science and Technology,Qingdao 266061,China;College of Data Science,Qingdao University of Science and Technology,Qingdao 266061,China)

机构地区:[1]青岛科技大学数理学院,山东青岛266061 [2]青岛科技大学数据科学学院,山东青岛266061

出  处:《青岛科技大学学报(自然科学版)》2024年第4期146-158,共13页Journal of Qingdao University of Science and Technology:Natural Science Edition

基  金:国家自然科学基金项目(62172248);山东省自然科学基金项目(ZR2021MF098).

摘  要:单细胞多组学测序正在广泛应用于生物医学研究中,并产生大量的多样性组学数据。然而原始的单细胞多组学数据包含多种类型的测序噪声和冗余信息,对后续生物医疗层面的分析造成困难。现有的降噪方法主要依赖于单一的数据分布假设,并针对性的处理单个组学数据,这对模型联合处理不同组学数据造成极大地限制。本研究提出一种使用单细胞多组学数据降噪的分析方法,称为scMAED(single-cell multi-omics data via a multi-head autoencoder network to denoising)。模型在多头自动编码器网络中添加了分类解码器,以无监督的方式来最大程度的去除数据噪声。首先,使用两个编码器独立学习多组学数据的内部特征,并联合输出的低维特征进行共同解码。其次,分类解码器不做任何数据分布假设,通过使用预测的细胞簇标签来反馈数据信息,以最大限度的去除复杂噪声。最后,使用主成分分析和t-SNE进行可视化。本文基于模拟数据集和真实的小鼠数据集对模型进行性能评估,结果显示sc-MAED在降噪效果上优于实验中的对比方法,并能够极大的改善单细胞多组学数据的质量。Single-cell multi-omics sequencing is being widely used in biomedical research and generates large amounts of diverse omics data.However,raw single-cell multi-omics data contains multiple types of sequencing noise and redundant information,which makes subsequent biomedical analysis difficult.Existing denoising methods mainly rely on a single data distribution assumption and process a single omics data in a targeted manner,which greatly limits the joint processing of different omics data by the model.Therefore,we design and propose an analytical method for denoising using single-cell multi-omics data,called sc-MAED(single-cell multi-omics data via a multi-head autoencoder network to denoising).The model adds a classification decoder to the multi-head autoencoder network to remove the maximum noise from the data in an unsupervised manner.First,two encoders are used to independently learn the internal features of the multi-omics data,and jointly decode the output low-dimensional features.Second,the classification decoder does not make any data distribution assumptions,and uses the predicted cell cluster labels to feed back data information to minimize complex noise.Finally,we use principal component analysis and t-SNE for visualization.In this paper,we evaluate the performance of the model based on simulated datasets and real mouse datasets.The results show that scMAED is superior to the experimental comparison method in denoising effect,and can greatly improve the quality of single-cell multi-omics data.

关 键 词:单细胞多组学数据 深度学习 多头自编码网络 降噪 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象