期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accuracy of genotype imputation in Nelore cattle

Roberto Carvalheiro Solomon A Boison Haroldo H R Neves Mehdi Sargolzaei Flavio S Schenkel Yuri T Utsunomiya Ana Maria Pérez O’Brien Johann S?lkner John C McEwan Curtis P Van Tassell Tad S Sonstegard José Fernando Garcia 《遗传、选种与进化》2014,46(1)

Background

Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds.

Methods

Accuracy of genotype imputation in Nelore cattle was evaluated using different LD chips, imputation software and sets of animals. Twelve commercial and customized LD chips with densities ranging from 7 K to 75 K were tested. Customized LD chips were virtually designed taking into account minor allele frequency, linkage disequilibrium and distance between markers. Software programs FImpute and BEAGLE were applied to impute genotypes. From 995 bulls and 1247 cows that were genotyped with the Illumina® BovineHD chip (HD), 793 sires composed the reference set, and the remaining 202 younger sires and all the cows composed two separate validation sets for which genotypes were masked except for the SNPs of the LD chip that were to be tested.

Results

Imputation accuracy increased with the SNP density of the LD chip. However, the gain in accuracy with LD chips with more than 15 K SNPs was relatively small because accuracy was already high at this density. Commercial and customized LD chips with equivalent densities presented similar results. FImpute outperformed BEAGLE for all LD chips and validation sets. Regardless of the imputation software used, accuracy tended to increase as the relatedness between imputed and reference animals increased, especially for the 7 K chip.

Conclusions

If the Illumina® BovineHD is considered as the target chip for genomic applications in the Nelore breed, cost-effectiveness can be improved by genotyping part of the animals with a chip containing around 15 K useful SNPs and imputing their high-density missing genotypes with FImpute.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0069-1) contains supplementary material, which is available to authorized users. 相似文献

2.

Evaluating coverage of exons by HapMap SNPs

Xiao Dong Tingyan Zhong Tao Xu Yunting Xia Biqing Li Chao Li Liyun Yuan Guohui Ding Yixue Li 《Genomics》2013,101(1):20-23

Genome-wide association (GWA) studies are currently one of the most powerful tools in identifying disease-associated genes or variants. In typical GWA studies, single-nucleotide polymorphisms (SNPs) are often used as genetic makers. Therefore, it is critical to estimate the percentage of genetic variations which can be covered by SNPs through linkage disequilibrium (LD). In this study, we use the concept of haplotype blocks to evaluate the coverage of five SNP sets including the HapMap and four commercial arrays, for every exon in the human genome. We show that although some Chips can reach similar coverage as the HapMap, only about 50% of exons are completely covered by haplotype blocks of HapMap SNPs. We suggest further high-resolution genotyping methods are required, to provide adequate genome-wide power for identifying variants. 相似文献

3.

Genotype-Imputation Accuracy across Worldwide Human Populations 总被引：2，自引：0，他引：2

Lucy Huang Yun Li Andrew B. Singleton John A. Hardy Gonalo Abecasis Noah A. Rosenberg Paul Scheet 《American journal of human genetics》2009,84(2):235-250

A current approach to mapping complex-disease-susceptibility loci in genome-wide association (GWA) studies involves leveraging the information in a reference database of dense genotype data. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and tested for disease association. This imputation strategy has been successful for GWA studies in populations well represented by existing reference panels. We used genotypes at 513,008 autosomal single-nucleotide polymorphism (SNP) loci in 443 unrelated individuals from 29 worldwide populations to evaluate the “portability” of the HapMap reference panels for imputation in studies of diverse populations. When a single HapMap panel was leveraged for imputation of randomly masked genotypes, European populations had the highest imputation accuracy, followed by populations from East Asia, Central and South Asia, the Americas, Oceania, the Middle East, and Africa. For each population, we identified “optimal” mixtures of reference panels that maximized imputation accuracy, and we found that in most populations, mixtures including individuals from at least two HapMap panels produced the highest imputation accuracy. From a separate survey of additional SNPs typed in the same samples, we evaluated imputation accuracy in the scenario in which all genotypes at a given SNP position were unobserved and were imputed on the basis of data from a commercial “SNP chip,” again finding that most populations benefited from the use of combinations of two or more HapMap reference panels. Our results can serve as a guide for selecting appropriate reference panels for imputation-based GWA analysis in diverse populations. 相似文献

4.

Genome-wide selection of tag SNPs using multiple-marker correlation

Hao K 《Bioinformatics (Oxford, England)》2007,23(23):3178-3184

MOTIVATIONS: The tag SNP approach is a valuable tool in whole genome association studies, and a variety of algorithms have been proposed to identify the optimal tag SNP set. Currently, most tag SNP selection is based on two-marker (pairwise) linkage disequilibrium (LD). Recent literature has shown that multiple-marker LD also contains useful information that can further increase the genetic coverage of the tag SNP set. Thus, tag SNP selection methods that incorporate multiple-marker LD are expected to have advantages in terms of genetic coverage and statistical power. RESULTS: We propose a novel algorithm to select tag SNPs in an iterative procedure. In each iteration loop, the SNP that captures the most neighboring SNPs (through pair-wise and multiple-marker LD) is selected as a tag SNP. We optimize the algorithm and computer program to make our approach feasible on today's typical workstations. Benchmarked using HapMap release 21, our algorithm outperforms standard pair-wise LD approach in several aspects. (i) It improves genetic coverage (e.g. by 7.2% for 200 K tag SNPs in HapMap CEU) compared to its conventional pair-wise counterpart, when conditioning on a fixed tag SNP number. (ii) It saves genotyping costs substantially when conditioning on fixed genetic coverage (e.g. 34.1% saving in HapMap CEU at 90% coverage). (iii) Tag SNPs identified using multiple-marker LD have good portability across closely related ethnic groups and (iv) show higher statistical power in association tests than those selected using conventional methods. AVAILABILITY: A computer software suite, multiTag, has been developed based on this novel algorithm. The program is freely available by written request to the author at ke_hao@merck.com 相似文献

5.

Design of low density SNP chips for genotype imputation in layer chicken

Florian Herry Frédéric Hérault David Picard Druet Amandine Varenne Thierry Burlot Pascale Le Roy Sophie Allais 《BMC genetics》2018,19(1):108

Background

The main goal of selection is to achieve genetic gain for a population by choosing the best breeders among a set of selection candidates. Since 2013, the use of a high density genotyping chip (600K Affymetrix® Axiom® HD genotyping array) for chicken has enabled the implementation of genomic selection in layer and broiler breeding, but the genotyping costs remain high for a routine use on a large number of selection candidates. It has thus been deemed interesting to develop a low density genotyping chip that would induce lower costs. In this perspective, various simulation studies have been conducted to find the best way to select a set of SNPs for low density genotyping of two laying hen lines.

Results

To design low density SNP chips, two methodologies, based on equidistance (EQ) or on linkage disequilibrium (LD) were compared. Imputation accuracy was assessed as the mean correlation between true and imputed genotypes. The results showed correlations more sensitive to false imputation of SNPs having low Minor Allele Frequency (MAF) when the EQ methodology was used. An increase in imputation accuracy was obtained when SNP density was increased, either through an increase in the number of selected windows on a chromosome or through the rise of the LD threshold. Moreover, the results varied depending on the type of chromosome (macro or micro-chromosome). The LD methodology enabled to optimize the number of SNPs, by reducing the SNP density on macro-chromosomes and by increasing it on micro-chromosomes. Imputation accuracy also increased when the size of the reference population was increased. Conversely, imputation accuracy decreased when the degree of kinship between reference and candidate populations was reduced. Finally, adding selection candidates’ dams in the reference population, in addition to their sire, enabled to get better imputation results.

Conclusions

Whichever the SNP chip, the methodology, and the scenario studied, highly accurate imputations were obtained, with mean correlations higher than 0.83. The key point to achieve good imputation results is to take into account chicken lines’ LD when designing a low density SNP chip, and to include the candidates’ direct parents in the reference population.

相似文献

6.

Assessing Accuracy of Genotype Imputation in American Indians

Alka Malhotra Sayuko Kobes Clifton Bogardus William C. Knowler Leslie J. Baier Robert L. Hanson 《PloS one》2014,9(7)

Background

Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.

Results

Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome). The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.

Conclusions

Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population. 相似文献

7.

Comparing strategies for selection of low-density SNPs for imputation-mediated genomic prediction in U. S. Holsteins

Jun He Jiaqi Xu Xiao-Lin Wu Stewart Bauck Jungjae Lee Gota Morota Stephen D. Kachman Matthew L. Spangler 《Genetica》2018,146(2):137-149

SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821–0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825–0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction. 相似文献

8.

Arrayed primer extension in the "array of arrays" format: a rational approach for microarray-based SNP genotyping

Klitø NG Tan Q Nyegaard M Brusgaard K Thomassen M Skouboe C Dahlgaard J Kruse TA 《Genetic testing》2007,11(2):160-166

This study provides a new version of the arrayed primer extension (APEX) protocol adapted to the 'array of arrays' platform using an instrumental setup for microarray processing not previously described. The primary aim of the study is to implement a system for rational cost-efficient genotyping where multiple singlenucleotide polymorphisms (SNPs) and individuals are genotyped on each microarray slide. Genotyping results are collected across 185 healthy Danish subjects and 76 SNPs on chromosome 3q13.31, because linkage to atopic disease phenotypes have been suggested in the Danish population. Linkage disequilibrium (LD) results from the experimental data are used in a novel comparison to baseline data defined by the international HapMap SNP database. Comparison on the LD results reveals a strong linear correlation irrespective of LD measure considered: R2 (D') = 0.73 and R2(r2) = 0.54. In conclusion, our results show that this setup is strong enough to support high-throughput genotyping, and these observations support that the HapMap genotype resource is important for defining SNP panels aimed at gene mapping in local subpopulations from Europe. 相似文献

9.

Gene-centric characteristics of genome-wide association studies

Dong C Qian Z Jia P Wang Y Huang W Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions. 相似文献

10.

SNP identification, linkage disequilibrium, and haplotype analysis for a 200-kb genomic region in a Korean population 总被引：2，自引：0，他引：2

Kim KJ Lee HJ Park MH Cha SH Kim KS Kim HT Kimm K Oh B Lee JY 《Genomics》2006,88(5):535-540

Understanding patterns of linkage disequilibrium (LD) across genomes may facilitate association mapping studies to localize genetic variants influencing complex diseases, a recognition that led to the International Haplotype Mapping Project (HapMap). Divergent patterns of haplotype frequency and LD across global populations require that the HapMap database be supplemented with haplotype and LD data from additional populations. We conducted a pilot study of the LD and haplotype structure of a genomic region in a Korean population. A total of 165 SNPs were identified in a 200-kb region of 22q13.2 by direct sequencing. Unphased genotype data were generated for 76 SNPs in 90 unrelated Korean individuals. LD, haplotype diversity, and recombination rates were assessed in this region and compared with the HapMap database. The pattern of LD and haplotype frequencies of Korean samples showed a high degree of similarity with Japanese data. There was a strong correlation between high LD and low recombination frequency in this region. We found considerable similarities in local LD patterns between three Asian populations (Han Chinese, Japanese, and Korean) and the CEPH population. Haplotype frequencies were, however, significantly different between them. Our results should further the understanding of distinctive Korean genomic features and assist in designing appropriate association studies. 相似文献