期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Assessment of Genotype Imputation Performance Using 1000 Genomes in African American Studies

Dana B. Hancock Joshua L. Levy Nathan C. Gaddis Laura J. Bierut Nancy L. Saccone Grier P. Page Eric O. Johnson 《PloS one》2012,7(11)

Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina’s HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%–93%), but IMPUTE2 had the highest IQS (81%–83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs. 相似文献

2.

HLA Typing from 1000 Genomes Whole Genome and Whole Exome Illumina Data

Endre Major Krisztina Rigó Tim Hague Attila Bérces Szilveszter Juhos 《PloS one》2013,8(11)

Specific HLA genotypes are known to be linked to either resistance or susceptibility to certain diseases or sensitivity to certain drugs. In addition, high accuracy HLA typing is crucial for organ and bone marrow transplantation. The most widespread high resolution HLA typing method used to date is Sanger sequencing based typing (SBT), and next generation sequencing (NGS) based HLA typing is just starting to be adopted as a higher throughput, lower cost alternative. By HLA typing the HapMap subset of the public 1000 Genomes paired Illumina data, we demonstrate that HLA-A, B and C typing is possible from exome sequencing samples with higher than 90% accuracy. The older 1000 Genomes whole genome sequencing read sets are less reliable and generally unsuitable for the purpose of HLA typing. We also propose using coverage % (the extent of exons covered) as a quality check (QC) measure to increase reliability. 相似文献

3.

Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes

Hou-Feng Zheng Jing-Jing Rong Ming Liu Fang Han Xing-Wei Zhang J. Brent Richards Li Wang 《PloS one》2015,10(1)

Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF≤0.3%), only 0–1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute. 相似文献

4.

Technology-specific error signatures in the 1000 Genomes Project data

Nothnagel M Herrmann A Wolf A Schreiber S Platzer M Siebert R Krawczak M Hampe J 《Human genetics》2011,130(4):505-516

Next-generation sequencing (NGS) will likely facilitate a better understanding of the causes and consequences of human genetic variability. In this context, the validity of NGS-inferred single-nucleotide variants (SNVs) is of paramount importance. We therefore developed a statistical framework to assess the fidelity of three common NGS platforms. Using aligned DNA sequence data from two completely sequenced HapMap samples as included in the 1000 Genomes Project, we unraveled remarkably different error profiles for the three platforms. Compared to confirmed HapMap variants, newly identified SNVs included a substantial proportion of false positives (3–17%). Consensus calling by more than one platform yielded significantly lower error rates (1–4%). This implies that the use of multiple NGS platforms may be more cost-efficient than relying upon a single technology alone, particularly in physically localized sequencing experiments that rely upon small error rates. Our study thus highlights that different NGS platforms suit different practical applications differently well, and that NGS-based studies require stringent data quality control for their results to be valid. 相似文献

5.

The 1000 Genomes Project: data management and community access 总被引：1，自引：0，他引：1

Clarke L Zheng-Bradley X Smith R Kulesha E Xiao C Toneva I Vaughan B Preuss D Leinonen R Shumway M Sherry S Flicek P; Genomes Project Consortium 《Nature methods》2012,9(5):459-462

The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology. In addition to the primary scientific goals of creating both a deep catalog of human genetic variation and extensive methods to accurately discover and characterize variation using new sequencing technologies, the project makes all of its data publicly available. Members of the project data coordination center have developed and deployed several tools to enable widespread data access. 相似文献

6.

Performance of genotype imputations using data from the 1000 Genomes Project

Sung YJ Wang L Rankinen T Bouchard C Rao DC 《Human heredity》2012,73(1):18-25

Genotype imputations based on 1000 Genomes (1KG) Project data have the advantage of imputing many more SNPs than imputations based on HapMap data. It also provides an opportunity to discover associations with relatively rare variants. Recent investigations are increasingly using 1KG data for genotype imputations, but only limited evaluations of the performance of this approach are available. In this paper, we empirically evaluated imputation performance using 1KG data by comparing imputation results to those using the HapMap Phase II data that have been widely used. We used three reference panels: the CEU panel consisting of 120 haplotypes from HapMap II and 1KG data (June 2010 release) and the EUR panel consisting of 566 haplotypes also from 1KG data (August 2010 release). We used Illumina 324,607 autosomal SNPs genotyped in 501 individuals of European ancestry. Our most important finding was that both 1KG reference panels provided much higher imputation yield than the HapMap II panel. There were more than twice as many successfully imputed SNPs as there were using the HapMap II panel (6.7 million vs. 2.5 million). Our second most important finding was that accuracy using both 1KG panels was high and almost identical to accuracy using the HapMap II panel. Furthermore, after removing SNPs with MACH Rsq <0.3, accuracy for both rare and low frequency SNPs was very high and almost identical to accuracy for common SNPs. We found that imputation using the 1KG-EUR panel had advantages in successfully imputing rare, low frequency and common variants. Our findings suggest that 1KG-based imputation can increase the opportunity to discover significant associations for SNPs across the allele frequency spectrum. Because the 1KG Project is still underway, we expect that later versions will provide even better imputation performance. 相似文献

7.

Imputation of Exome Sequence Variants into Population- Based Samples and Blood-Cell-Trait-Associated Loci in African Americans: NHLBI GO Exome Sequencing Project

Paul?L. Auer Jill?M. Johnsen Andrew?D. Johnson Benjamin?A. Logsdon Leslie?A. Lange Michael?A. Nalls Guosheng Zhang Nora Franceschini Keolu Fox Ethan?M. Lange Stephen?S. Rich Christopher?J. O’Donnell Rebecca?D. Jackson Robert?B. Wallace Zhao Chen Timothy?A. Graubert James?G. Wilson Hua Tang Guillaume Lettre Alex?P. Reiner Santhi?K. Ganesh Yun Li 《American journal of human genetics》2012,91(5):794-808

Researchers have successfully applied exome sequencing to discover causal variants in selected individuals with familial, highly penetrant disorders. We demonstrate the utility of exome sequencing followed by imputation for discovering low-frequency variants associated with complex quantitative traits. We performed exome sequencing in a reference panel of 761 African Americans and then imputed newly discovered variants into a larger sample of more than 13,000 African Americans for association testing with the blood cell traits hemoglobin, hematocrit, white blood count, and platelet count. First, we illustrate the feasibility of our approach by demonstrating genome-wide-significant associations for variants that are not covered by conventional genotyping arrays; for example, one such association is that between higher platelet count and an MPL c.117G>T (p.Lys39Asn) variant encoding a p.Lys39Asn amino acid substitution of the thrombpoietin receptor gene (p = 1.5 × 10⁻¹¹). Second, we identified an association between missense variants of LCT and higher white blood count (p = 4 × 10⁻¹³). Third, we identified low-frequency coding variants that might account for allelic heterogeneity at several known blood cell-associated loci: MPL c.754T>C (p.Tyr252His) was associated with higher platelet count; CD36 c.975T>G (p.Tyr325^∗) was associated with lower platelet count; and several missense variants at the α-globin gene locus were associated with lower hemoglobin. By identifying low-frequency missense variants associated with blood cell traits not previously reported by genome-wide association studies, we establish that exome sequencing followed by imputation is a powerful approach to dissecting complex, genetically heterogeneous traits in large population-based studies. 相似文献

8.

Efficient Genome-Wide Detection and Cataloging of EMS-Induced Mutations Using Exome Capture and Next-Generation Sequencing

Isabelle M. Henry Ugrappa Nagalakshmi Meric C. Lieberman Kathie J. Ngo Ksenia V. Krasileva Hans Vasquez-Gross Alina Akhunova Eduard Akhunov Jorge Dubcovsky Thomas H. Tai Luca Comai 《The Plant cell》2014,26(4):1382-1397

Chemical mutagenesis efficiently generates phenotypic variation in otherwise homogeneous genetic backgrounds, enabling functional analysis of genes. Advances in mutation detection have brought the utility of induced mutant populations on par with those produced by insertional mutagenesis, but systematic cataloguing of mutations would further increase their utility. We examined the suitability of multiplexed global exome capture and sequencing coupled with custom-developed bioinformatics tools to identify mutations in well-characterized mutant populations of rice (Oryza sativa) and wheat (Triticum aestivum). In rice, we identified ∼18,000 induced mutations from 72 independent M2 individuals. Functional evaluation indicated the recovery of potentially deleterious mutations for >2600 genes. We further observed that specific sequence and cytosine methylation patterns surrounding the targeted guanine residues strongly affect their probability to be alkylated by ethyl methanesulfonate. Application of these methods to six independent M2 lines of tetraploid wheat demonstrated that our bioinformatics pipeline is applicable to polyploids. In conclusion, we provide a method for developing large-scale induced mutation resources with relatively small investments that is applicable to resource-poor organisms. Furthermore, our results demonstrate that large libraries of sequenced mutations can be readily generated, providing enhanced opportunities to study gene function and assess the effect of sequence and chromatin context on mutations. 相似文献

9.

Read Length and Repeat Resolution: Exploring Prokaryote Genomes Using Next-Generation Sequencing Technologies

Matt J. Cahill Claudio U. K?ser Nicholas E. Ross John A. C. Archer 《PloS one》2010,5(7)

Background

There are a growing number of next-generation sequencing technologies. At present, the most cost-effective options also produce the shortest reads. However, even for prokaryotes, there is uncertainty concerning the utility of these technologies for the de novo assembly of complete genomes. This reflects an expectation that short reads will be unable to resolve small, but presumably abundant, repeats.

Methodology/Principal Findings

Using a simple model of repeat assembly, we develop and test a technique that, for any read length, can estimate the occurrence of unresolvable repeats in a genome, and thus predict the number of gaps that would need to be closed to produce a complete sequence. We apply this technique to 818 prokaryote genome sequences. This provides a quantitative assessment of the relative performance of various lengths. Notably, unpaired reads of only 150nt can reconstruct approximately 50% of the analysed genomes with fewer than 96 repeat-induced gaps. Nonetheless, there is considerable variation amongst prokaryotes. Some genomes can be assembled to near contiguity using very short reads while others require much longer reads.

Conclusions

Given the diversity of prokaryote genomes, a sequencing strategy should be tailored to the organism under study. Our results will provide researchers with a practical resource to guide the selection of the appropriate read length. 相似文献

10.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios

《Cell》2022,185(18):3426-3440.e19

Download : Download high-res image (257KB)
Download : Download full-size image

相似文献

11.

Comparative Analysis of rs12979860 SNP of the IFNL3 Gene in Children with Hepatitis C and Ethnic Matched Controls Using 1000 Genomes Project Data

Giuseppe Indolfi Giusi Mangone Elisa Bartolini Gabriella Nebbia Pier Luigi Calvo Maria Moriondo Pier-Angelo Tovo Maurizio de Martino Chiara Azzari Massimo Resti 《PloS one》2014,9(1)

The rs12979860 single nucleotide polymorphism located on chromosome 19q13.13 near the interferon L3 gene (formerly and commonly known as interleukin 28B gene) has been associated in adults with both spontaneous and treatment induced clearance of hepatitis C virus. Although the exact mechanism of these associations remains unclear, it suggests that variation in genes involved in the immune response against the virus favours viral clearance. Limited and preliminary data are available on this issue in children. The aim of the present study was to evaluate, in a representative cohort of children with perinatal infection, the potential association between rs12979860 single nucleotide polymorphism and the outcome of hepatitis C virus infection. Alleles and genotypes frequencies were evaluated in 30 children who spontaneously cleared the virus and in 147 children with persistent infection and were compared with a population sample of ethnically matched controls with unknown hepatitis C status obtained using the 1000 Genomes Project data. The C allele and the C/C genotype showed greater frequencies in the clearance group (76.7% and 56.7%, respectively) when compared with both children with viral persistence (C allele 56.5%, p = 0.004; C/C genotype 32.7%, p = 0.02) and with the ethnically matched individuals (C allele 59.7%, p = 0.02; C/C genotype 34.7%, p = 0.03). Children with the C/C genotype were 2 times more likely to clear hepatitis C virus relative to children with the C/T and T/T genotypes combined (odds ratio: 2.7; 90% confidence intervals: 1.3–5.8). The present study provides the evidence that the rs12979860 single nucleotide polymorphism influences the natural history of hepatitis C virus in children. 相似文献

12.

Comprehensive Survey of SNPs in the Affymetrix Exon Array Using the 1000 Genomes Dataset

Eric R. Gamazon Wei Zhang M. Eileen Dolan Nancy J. Cox 《PloS one》2010,5(2)

相似文献

13.

Analysis of genomic variation in non-coding elements using population-scale sequencing data from the 1000 Genomes Project

Mu XJ Lu ZJ Kong Y Lam HY Gerstein MB 《Nucleic acids research》2011,39(16):7058-7076

相似文献

14.

Identification of Potential Antisense Transcripts in Rice Using Conventional Microarray

Qiang Gan Dejun Li Guozhen Liu Lihuang Zhu 《Molecular biotechnology》2012,51(1):37-43

相似文献

15.

Amino Acid Changes in Disease-Associated Variants Differ Radically from Variants Observed in the 1000 Genomes Project Dataset

Tjaart A. P. de Beer Roman A. Laskowski Sarah L. Parks Botond Sipos Nick Goldman Janet M. Thornton 《PLoS computational biology》2013,9(12)

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans. 相似文献

16.

Identification of Medically Actionable Secondary Findings in the 1000 Genomes

Emily Olfson Catherine E. Cottrell Nicholas O. Davidson Christina A. Gurnett Jonathan W. Heusel Nathan O. Stitziel Li-Shiun Chen Sarah Hartz Rakesh Nagarajan Nancy L. Saccone Laura J. Bierut 《PloS one》2015,10(9)

The American College of Medical Genetics and Genomics (ACMG) recommends that clinical sequencing laboratories return secondary findings in 56 genes associated with medically actionable conditions. Our goal was to apply a systematic, stringent approach consistent with clinical standards to estimate the prevalence of pathogenic variants associated with such conditions using a diverse sequencing reference sample. Candidate variants in the 56 ACMG genes were selected from Phase 1 of the 1000 Genomes dataset, which contains sequencing information on 1,092 unrelated individuals from across the world. These variants were filtered using the Human Gene Mutation Database (HGMD) Professional version and defined parameters, appraised through literature review, and examined by a clinical laboratory specialist and expert physician. Over 70,000 genetic variants were extracted from the 56 genes, and filtering identified 237 variants annotated as disease causing by HGMD Professional. Literature review and expert evaluation determined that 7 of these variants were pathogenic or likely pathogenic. Furthermore, 5 additional truncating variants not listed as disease causing in HGMD Professional were identified as likely pathogenic. These 12 secondary findings are associated with diseases that could inform medical follow-up, including cancer predisposition syndromes, cardiac conditions, and familial hypercholesterolemia. The majority of the identified medically actionable findings were in individuals from the European (5/379) and Americas (4/181) ancestry groups, with fewer findings in Asian (2/286) and African (1/246) ancestry groups. Our results suggest that medically relevant secondary findings can be identified in approximately 1% (12/1092) of individuals in a diverse reference sample. As clinical sequencing laboratories continue to implement the ACMG recommendations, our results highlight that at least a small number of potentially important secondary findings can be selected for return. Our results also confirm that understudied populations will not reap proportionate benefits of genomic medicine, highlighting the need for continued research efforts on genetic diseases in these populations. 相似文献

17.

Relationship between Deleterious Variation,Genomic Autozygosity,and Disease Risk: Insights from The 1000 Genomes Project

Trevor J. Pemberton Zachary A. Szpiech 《American journal of human genetics》2018,102(4):658-675

相似文献

18.

Evaluation of microsatellite variation in the 1000 Genomes Project pilot studies is indicative of the quality and utility of the raw data and alignments

McIver LJ Fondon JW Skinner MA Garner HR 《Genomics》2011,97(4):193-199

We performed an analysis of global microsatellite variation on the two kindreds sequenced at high depth (~20×-60×) in the 1000 Genomes Project pilot studies because alterations in these highly mutable repetitive sequences have been linked with many phenotypes and disease risks. The standard alignment technique performs poorly in microsatellite regions as a consequence of low effective coverage (~1×-5×) resulting in 79% of the informative loci exhibiting non-Mendelian inheritance patterns. We used a more stringent approach in computing robust allelotypes resulting in 94.4% of the 1095 informative repeats conforming to traditional inheritance. The high-confidence allelotypes were analyzed to obtain an estimate of the minimum polymorphism rate as a function of motif length, motif sequence, and distribution within the genome. 相似文献

19.

Editor's choice: Next-Generation Annotation of Prokaryotic Genomes with EuGene-P: Application to Sinorhizobium meliloti 2011

Erika Sallet Brice Roux Laurent Sauviac Marie-Franc?oise Jardinaud Sébastien Carrère Thomas Faraut Fernanda de Carvalho-Niebel Jér?me Gouzy Pascal Gamas Delphine Capela Claude Bruand Thomas Schiex 《DNA research》2013,20(4):339-354

相似文献

20.

Local Exome Sequences Facilitate Imputation of Less Common Variants and Increase Power of Genome Wide Association Studies

Peter K. Joshi James Prendergast Ross M. Fraser Jennifer E. Huffman Veronique Vitart Caroline Hayward Ruth McQuillan Dominik Glodzik Ozren Pola?ek Nicholas D. Hastie Igor Rudan Harry Campbell Alan F. Wright Chris S. Haley James F. Wilson Pau Navarro 《PloS one》2013,8(7)

The analysis of less common variants in genome-wide association studies promises to elucidate complex trait genetics but is hampered by low power to reliably detect association. We show that addition of population-specific exome sequence data to global reference data allows more accurate imputation, particularly of less common SNPs (minor allele frequency 1–10%) in two very different European populations. The imputation improvement corresponds to an increase in effective sample size of 28–38%, for SNPs with a minor allele frequency in the range 1–3%. 相似文献