基于负二项分布的单细胞数据缺失值分治插补研究  

Divide and Conquer Imputation for Dropouts in Single-Cell Data based on Negative Binomial Distribution

在线阅读下载全文

作  者:熊珍珍 张本龚 XIONG Zhen-zhen;ZHANG Ben-gong(School of Computer Science and Artificial Intelligence,Wuhan Textile University,Wuhan Hubei 430200,China;Research Center of Nonlinear Science,Wuhan Textile University,Wuhan Hubei 430200,China;School of Mathematical and Physical Sciences,Wuhan Textile University,Wuhan Hubei 430200,China)

机构地区:[1]武汉纺织大学计算机与人工智能学院,湖北武汉430200 [2]武汉纺织大学应用数学与交叉科学研究中心,湖北武汉430200 [3]武汉纺织大学数理科学学院,湖北武汉430200

出  处:《武汉纺织大学学报》2023年第1期14-20,共7页Journal of Wuhan Textile University

摘  要:单细胞转录组测序(scRNA-seq,single cell RNA sequencing)技术为单个细胞高通量、高分辨率的深入研究提供了机会,为在单细胞层面研究细胞功能及其背后的基因调控机制提供了重要技术手段。然而这项技术也带来新的挑战,单细胞数据具有规模大、噪声高、异构性强等特点,特别是高比例的数据缺失(dropout)严重影响了下游分析的可靠性,甚至掩盖了基因与基因间的重要关系。这里提出一种基于负二项分布的分治插补策略ND-Impute(Negative binomial distribution based Divide and conquer strategy for imputation)对scRNA-seq数据进行处理,该方法假设scRNA-seq数据符合负二项分布,利用包含特定损失函数的自动编码器获取数据的特异性参数,并使用分治策略估计潜在的基因表达值。通过聚类效果、相关性和误差分析等比较,表明该方法可以有效地恢复缺失数据,提高了后续研究分析的准确性。Single-cell RNA sequencing(scRNA-seq)technology provides opportunities for high-throughput,high-resolution in-depth research of single cells,and provides insights into cell functions and the underlying gene regulation mechanisms at the single-cell level important technical means.However,this technology also brings new challenges.ScRNA-seq data has the characteristics of large scale,high noise,and strong heterogeneity,especially the high proportion of data missing,which is called dropout.The problem of dropout seriously affects the reliability of the downstream analysis,and even covers up the important relationship between genes and genes.This paper proposed a divided and conquering imputation strategy based on negative binomial distribution ND-Impute to process scRNA-seq data.This method assumed that scRNA-seq data conform to the negative binomial distribution,utilized an autoencoder that incorporates a specific loss function to obtain data-specific parameters,and used a divide-and-conquer strategy to estimate potential gene expression values.The comparison of clustering effect,correlation,and error analysis showed that this method can effectively restore missing data and improve the accuracy of subsequent research and analysis.

关 键 词:单细胞转录组测序 数据缺失 插补策略 聚类分析 

分 类 号:O211.9[理学—概率论与数理统计]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象