机构地区:[1]Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders (Ministry of Education), Shanghai Jiao Tong University [2]Institute of Social Cognitive and Behavioral Sciences, Shanghai Jiao Tong University [3]School of Bio-medical Engineering, Shanghai Jiao Tong University [4]Shanghai Changning Mental Health Center
出 处:《Journal of Genetics and Genomics》2015年第8期445-453,共9页遗传学报(英文版)
基 金:supported by the National Key Basic Research Program of China (973 Program) (No. 2015CB559100);the National High Technology Research and Development Program of China (863 Program) (Nos. 2012AA02A515 and2012AA021802);the Natural Science Foundation of China (Nos. 31325014, 81130022, 81272302 and 81421061);the National Program for Support of Top-Notch Young Professionals, the Program of Shanghai Subject Chief Scientist (No. 15XD1502200);"Shu Guang" project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 12SG17)
摘 要:Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis (PCA) has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit (GPU)-based PCA software named SHEsisPCA (http://analysis.bio-x.cn/SHEsisMain.htm) that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.Population stratification is a problem in genetic association studies because it is likely to highlight loci that underlie the population structure rather than disease-related loci. At present, principal component analysis (PCA) has been proven to be an effective way to correct for population stratification. However, the conventional PCA algorithm is time-consuming when dealing with large datasets. We developed a Graphic processing unit (GPU)-based PCA software named SHEsisPCA (http://analysis.bio-x.cn/SHEsisMain.htm) that is highly parallel with a highest speedup greater than 100 compared with its CPU version. A cluster algorithm based on X-means was also implemented as a way to detect population subgroups and to obtain matched cases and controls in order to reduce the genomic inflation and increase the power. A study of both simulated and real datasets showed that SHEsisPCA ran at an extremely high speed while the accuracy was hardly reduced. Therefore, SHEsisPCA can help correct for population stratification much more efficiently than the conventional CPU-based algorithms.
关 键 词:Population stratification Principal component analysis Graphic processing unit CLUSTER Matched cases and controls Genetic studies
分 类 号:R394[医药卫生—医学遗传学]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...