GSTDTAP  > 气候变化
DOI10.1126/science.abf7117
Haplotype-resolved diverse human genomes and integrated analysis of structural variation
Peter Ebert; Peter A. Audano; Qihui Zhu; Bernardo Rodriguez-Martin; David Porubsky; Marc Jan Bonder; Arvis Sulovari; Jana Ebler; Weichen Zhou; Rebecca Serra Mari; Feyza Yilmaz; Xuefang Zhao; PingHsun Hsieh; Joyce Lee; Sushant Kumar; Jiadong Lin; Tobias Rausch; Yu Chen; Jingwen Ren; Martin Santamarina; Wolfram Höps; Hufsah Ashraf; Nelson T. Chuang; Xiaofei Yang; Katherine M. Munson; Alexandra P. Lewis; Susan Fairley; Luke J. Tallon; Wayne E. Clarke; Anna O. Basile; Marta Byrska-Bishop; André Corvelo; Uday S. Evani; Tsung-Yu Lu; Mark J. P. Chaisson; Junjie Chen; Chong Li; Harrison Brand; Aaron M. Wenger; Maryam Ghareghani; William T. Harvey; Benjamin Raeder; Patrick Hasenfeld; Allison A. Regier; Haley J. Abel; Ira M. Hall; Paul Flicek; Oliver Stegle; Mark B. Gerstein; Jose M. C. Tubio; Zepeng Mu; Yang I. Li; Xinghua Shi; Alex R. Hastie; Kai Ye; Zechen Chong; Ashley D. Sanders; Michael C. Zody; Michael E. Talkowski; Ryan E. Mills; Scott E. Devine; Charles Lee; Jan O. Korbel; Tobias Marschall; Evan E. Eichler
2021-04-02
发表期刊Science
出版年2021
英文摘要Many human genomes have been reported using short-read technology, but it is difficult to resolve structural variants (SVs) using these data. These genomes thus lack comprehensive comparisons among individuals and populations. Ebert et al. used long-read structural variation calling across 64 human genomes representing diverse populations and developed new methods for variant discovery. This approach allowed the authors to increase the number of confirmed SVs and to describe the patterns of variation across populations. From this dataset, they identified quantitative trait loci affected by these SVs and determined how they may affect gene expression and potentially explain genome-wide association study hits. This information provides insights into patterns of normal human genetic variation and generates reference genomes that better represent the diversity of our species. Science , this issue p. [eabf7117][1] ### INTRODUCTION The characterization of the full spectrum of genetic variation is critical to understanding human health and disease. Recent technological advances have made it possible to survey genetic variants on the level of fully reconstructed haplotypes, leading to substantially improved sensitivity in detecting and characterizing large structural variants (SVs), including complex classes. ### RATIONALE We focused on comprehensive genetic variant discovery from a human diversity panel representing 25 human populations. We leveraged a recently developed computational pipeline that combines long-read technology and single-cell template strand sequencing (Strand-seq) to generate fully phased diploid genome assemblies without guidance of a reference genome or use of parent-child trio information. Variant discovery from high-quality haplotype assemblies increases sensitivity and yields variants that are not only sequence resolved but also embedded in their genomic context, substantially improving genotyping in short-read sequenced cohorts and providing an assessment of their potential functional relevance. ### RESULTS We generated fully phased genome assemblies for 35 individuals (32 unrelated and three children from parent-child trios). Genomes are highly contiguous [average minimum contig length needed to cover 50% of the genome: 26 million base pairs (Mbp)], accurate at the base-pair level (quality value > 40), correctly phased (average switch error rate 0.18%), and nearly complete compared with GRCh38 (median aligned contig coverage >95%). From the set of 64 unrelated haplotype assemblies, we identified 15.8 million single-nucleotide variants (SNVs), 2.3 million insertions/deletions (indels; 1 to 49 bp in length), 107,590 SVs (≥50 bp), 316 inversions, and 9453 nonreference mobile elements. The large fraction of African individuals in our study (11 of 35) enhances the discovery of previously unidentified variation (approximately twofold increase in discovery rate compared with non-Africans). Overall, ~42% of SVs are previously unidentified compared with recent long-read-based studies. Using orthogonal technologies, we validated most events and discovered ~35 structurally divergent regions per human genome (>50 kbp) not yet fully resolved with long-read genome assembly. We found that homology-mediated mechanisms of SV formation are twice as common as expected from previous reports that used short-read sequencing. We constructed a phylogeny of active L1 source elements and observed a correlation between evolutionary age and features such as the activity level, suggesting that younger elements contribute disproportionately to disease-causing variation. Transduction tracing allowed the identification of 54 active SVA retrotransposon source elements, which mobilize nonrepetitive sequences at their 5′ and 3′ ends. We genotyped up to 50,340 SVs into Illumina short-read data from the 1000 Genomes Project and identified variants associated with changes in gene expression, such as a 1069-bp SV near the gene LIPI , a locus that is associated with cardiac failure. We further identified 117 loci that show evidence for population stratification. These are candidates for local adaptation, such as a 4.0-kbp deletion of regulatory DNA LCT (lactase gene) among Europeans. ### CONCLUSION Fully reconstructed haplotype assemblies triple SV discovery when compared with short-read data and improve genotyping, leading to insights into SV mechanism of origin, evolutionary history, and disease association. ![Figure][2] Discovery and analysis of global human genetic diversity. Starting from a global panel of human diversity (top), we discovered structural variation from fully phased diploid genome assemblies (middle), resulting in a comprehensive catalog of sequence- and context-resolved variants. This facilitates integrative analysis and identification of new associations between variants and molecular phenotypes (bottom). SAS, South Asian; AMR, Admixed American; AFR, African; EUR, European; EAS, East Asian; INV, inversion; INS, insertion; DEL, deletion; MEI, mobile element insertion. Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population. [1]: /lookup/doi/10.1126/science.abf7117 [2]: pending:yes
领域气候变化 ; 资源环境
URL查看原文
引用统计
文献类型期刊论文
条目标识符http://119.78.100.173/C666/handle/2XK7JSWQ/321131
专题气候变化
资源环境科学
推荐引用方式
GB/T 7714
Peter Ebert,Peter A. Audano,Qihui Zhu,et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation[J]. Science,2021.
APA Peter Ebert.,Peter A. Audano.,Qihui Zhu.,Bernardo Rodriguez-Martin.,David Porubsky.,...&Evan E. Eichler.(2021).Haplotype-resolved diverse human genomes and integrated analysis of structural variation.Science.
MLA Peter Ebert,et al."Haplotype-resolved diverse human genomes and integrated analysis of structural variation".Science (2021).
条目包含的文件
条目无相关文件。
个性服务
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
谷歌学术
谷歌学术中相似的文章
[Peter Ebert]的文章
[Peter A. Audano]的文章
[Qihui Zhu]的文章
百度学术
百度学术中相似的文章
[Peter Ebert]的文章
[Peter A. Audano]的文章
[Qihui Zhu]的文章
必应学术
必应学术中相似的文章
[Peter Ebert]的文章
[Peter A. Audano]的文章
[Qihui Zhu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。