Network diffusion for scalable embedding of massive single-cell ATAC-seq data  

基于网络扩散的超大规模单细胞染色质开放性数据的可扩展嵌入

在线阅读下载全文

作  者:Kangning Dong Shihua Zhang 董康宁;张世华(NCMIS,CEMS,RCSDS,Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China;School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing 100049,China;Center for Excellence in Animal Evolution and Genetics,Chinese Academy of Sciences,Kunming 650223,China;Key Laboratory of Systems Biology,Hangzhou Institute for Advanced Study,University of Chinese Academy of Sciences,Hangzhou 310024,China)

机构地区:[1]NCMIS,CEMS,RCSDS,Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing 100190,China [2]School of Mathematical Sciences,University of Chinese Academy of Sciences,Beijing 100049,China [3]Center for Excellence in Animal Evolution and Genetics,Chinese Academy of Sciences,Kunming 650223,China [4]Key Laboratory of Systems Biology,Hangzhou Institute for Advanced Study,University of Chinese Academy of Sciences,Hangzhou 310024,China

出  处:《Science Bulletin》2021年第22期2271-2276,M0003,共7页科学通报(英文版)

基  金:supported by the National Key R&D Program of China(2019YFA0709501);the National Natural Science Foundation of China(61621003);the National Ten Thousand Talent Program for Young Top-notch Talents,the CAS Frontier Science Research Key Project for Top Young Scientist(QYZDB-SSW-SYS008);the Shanghai Municipal Science and Technology Major Project(2017SHZDZX01).

摘  要:Cell type-specific genomic regulation is driven by the binding of transcription factors(TFs)in accessible genomic regions.Thus,chromatin accessibility can be used to identify cis-regulatory elements and directly depict cellular identity.Single-cell Assay for Transposase-Accessible Chromatin using sequencing(Single-cell ATAC-seq or scATAC-seq)has enabled genome-wide profiling of chromatin accessibility at single-cell resolution and can thus reveal epigenetic heterogeneity at cellular level[1].单细胞染色质开放性测序技术(single-cell ATAC-seq)使得分析细胞的表观异质性成为可能.该数据具有高维、稀疏和近似二值化特点,这对于有效的数据解析和表示造成了很大的挑战.本文提出一种网络化的生物信息学建模方法scAND,利用二部图来表征二值化数据,并采用一种简单且快速的网络扩散算法来减少数据稀疏性的影响,提升数据的表示能力.通过大规模的比较分析, scAND在聚类精度、鲁棒性和可拓展性等方面显示出明显的优势.更重要的是, scAND可同时获得peak的低维表示,进而可以利用该表示实现不同批次数据的有效整合. scAND为日益增加的超大规模单细胞染色质开放性数据提供了高效的快速解析手段.

关 键 词:数据稀疏性 二值化 扩散算法 超大规模 数据解析 CAN 生物信息学 二部图 

分 类 号:Q811.4[生物学—生物工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象