首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 296 毫秒
1.

Background

Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.

Methods

Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.

Results

Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.

Conclusion

In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.  相似文献   

2.

Background

Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50 000 (moderate-density) to 777 000 (high-density) SNPs (single nucleotide polymorphisms).

Methods

The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE.

Results

Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%).

Conclusions

For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly.  相似文献   

3.

Background

Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem.

Results

Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money.

Conclusions

Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.  相似文献   

4.

Background

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.

Methods

Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.

Results

Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.

Conclusions

Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.  相似文献   

5.

Background

Recombination events tend to occur in hotspots and vary in number among individuals. The presence of recombination influences the accuracy of haplotype phasing and the imputation of missing genotypes. Genes that influence genome-wide recombination rate have been discovered in mammals, yeast, and plants. Our aim was to investigate the influence of recombination on haplotype phasing, locate recombination hotspots, scan the genome for Quantitative Trait Loci (QTL) and identify candidate genes that influence recombination, and quantify the impact of recombination on the accuracy of genotype imputation in beef cattle.

Methods

2775 Angus and 1485 Limousin parent-verified sire/offspring pairs were genotyped with the Illumina BovineSNP50 chip. Haplotype phasing was performed with DAGPHASE and BEAGLE using UMD3.1 assembly SNP (single nucleotide polymorphism) coordinates. Recombination events were detected by comparing the two reconstructed chromosomal haplotypes inherited by each offspring with those of their sires. Expected crossover probabilities were estimated assuming no interference and a binomial distribution for the frequency of crossovers. The BayesB approach for genome-wide association analysis implemented in the GenSel software was used to identify genomic regions harboring QTL with large effects on recombination. BEAGLE was used to impute Angus genotypes from a 7K subset to the 50K chip.

Results

DAGPHASE was superior to BEAGLE in haplotype phasing, which indicates that linkage information from relatives can improve its accuracy. The estimated genetic length of the 29 bovine autosomes was 3097 cM, with a genome-wide recombination distance averaging 1.23 cM/Mb. 427 and 348 windows containing recombination hotspots were detected in Angus and Limousin, respectively, of which 166 were in common. Several significant SNPs and candidate genes, which influence genome-wide recombination were localized in QTL regions detected in the two breeds. High-recombination rates hinder the accuracy of haplotype phasing and genotype imputation.

Conclusions

Small population sizes, inadequate half-sib family sizes, recombination, gene conversion, genotyping errors, and map errors reduce the accuracy of haplotype phasing and genotype imputation. Candidate regions associated with recombination were identified in both breeds. Recombination analysis may improve the accuracy of haplotype phasing and genotype imputation from low- to high-density SNP panels.  相似文献   

6.

Background

Genotype imputation is commonly used as an initial step in genomic selection since the accuracy of genomic selection does not decline if accurately imputed genotypes are used instead of actual genotypes but for a lower cost. Performance of imputation has rarely been investigated in crossbred animals and, in particular, in pigs. The extent and pattern of linkage disequilibrium differ in crossbred versus purebred animals, which may impact the performance of imputation. In this study, first we compared different scenarios of imputation from 5 K to 8 K single nucleotide polymorphisms (SNPs) in genotyped Danish Landrace and Yorkshire and crossbred Landrace-Yorkshire datasets and, second, we compared imputation from 8 K to 60 K SNPs in genotyped purebred and simulated crossbred datasets. All imputations were done using software Beagle version 3.3.2. Then, we investigated the reasons that could explain the differences observed.

Results

Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations.

Conclusions

The results from both real data and pedigree-based simulated data demonstrate that genotype imputation from low-density panels to medium-density panels is highly accurate in both purebred and crossbred pigs. In crossbred pigs, combining the parental purebred animals in the reference population is necessary to obtain high imputation accuracy.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0134-4) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions.

Methods

We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute.

Results

The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%.

Conclusions

Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0063-7) contains supplementary material, which is available to authorized users.  相似文献   

8.

Key message

Imputing genotypes from the 90K SNP chip to exome sequence in wheat was moderately accurate. We investigated the factors that affect imputation and propose several strategies to improve accuracy.

Abstract

Imputing genetic marker genotypes from low to high density has been proposed as a cost-effective strategy to increase the power of downstream analyses (e.g. genome-wide association studies and genomic prediction) for a given budget. However, imputation is often imperfect and its accuracy depends on several factors. Here, we investigate the effects of reference population selection algorithms, marker density and imputation algorithms (Beagle4 and FImpute) on the accuracy of imputation from low SNP density (9K array) to the Infinium 90K single-nucleotide polymorphism (SNP) array for a collection of 837 hexaploid wheat Watkins landrace accessions. Based on these results, we then used the best performing reference selection and imputation algorithms to investigate imputation from 90K to exome sequence for a collection of 246 globally diverse wheat accessions. Accession-to-nearest-entry and genomic relationship-based methods were the best performing selection algorithms, and FImpute resulted in higher accuracy and was more efficient than Beagle4. The accuracy of imputing exome capture SNPs was comparable to imputing from 9 to 90K at approximately 0.71. This relatively low imputation accuracy is in part due to inconsistency between 90K and exome sequence formats. We also found the accuracy of imputation could be substantially improved to 0.82 when choosing an equivalent number of exome SNP, instead of 90K SNPs on the existing array, as the lower density set. We present a number of recommendations to increase the accuracy of exome imputation.
  相似文献   

9.

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.  相似文献   

10.

Background

Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.

Methods

Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively.

Results

Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP.

Conclusions

Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.  相似文献   

11.

Background

In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.

Methods

Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.

Results

Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.

Conclusions

Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0149-x) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation.

Methods

An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis.

Results

Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored.

Conclusions

The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations.  相似文献   

13.

Background

Nelore and Gir are the two most important indicine cattle breeds for production of beef and milk in Brazil. Historical records state that these breeds were introduced in Brazil from the Indian subcontinent, crossed to local taurine cattle in order to quickly increase the population size, and then backcrossed to the original breeds to recover indicine adaptive and productive traits. Previous investigations based on sparse DNA markers detected taurine admixture in these breeds. High-density genome-wide analyses can provide high-resolution information on the genetic composition of current Nelore and Gir populations, estimate more precisely the levels and nature of taurine introgression, and shed light on their history and the strategies that were used to expand these breeds.

Results

We used the high-density Illumina BovineHD BeadChip with more than 777 K single nucleotide polymorphisms (SNPs) that were reduced to 697 115 after quality control filtering to investigate the structure of Nelore and Gir populations and seven other worldwide populations for comparison. Multidimensional scaling and model-based ancestry estimation clearly separated the indicine, European taurine and African taurine ancestries. The average level of taurine introgression in the autosomal genome of Nelore and Gir breeds was less than 1% but was 9% for the Brahman breed. Analyses based on the mitochondrial SNPs present in the Illumina BovineHD BeadChip did not clearly differentiate taurine and indicine haplotype groupings.

Conclusions

The low level of taurine ancestry observed for both Nelore and Gir breeds confirms the historical records of crossbreeding and supports a strong directional selection against taurine haplotypes via backcrossing. Random sampling in production herds across the country and subsequent genotyping would be useful for a more complete view of the admixture levels in the commercial Nelore and Gir populations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0109-5) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used.

Methods

Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content.

Results

In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip.

Conclusions

Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.  相似文献   

15.

Background

Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.

Results

Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome). The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.

Conclusions

Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.  相似文献   

16.

Background

Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.

Results

The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.

Conclusions

The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.  相似文献   

17.

Background

The main goal of selection is to achieve genetic gain for a population by choosing the best breeders among a set of selection candidates. Since 2013, the use of a high density genotyping chip (600K Affymetrix® Axiom® HD genotyping array) for chicken has enabled the implementation of genomic selection in layer and broiler breeding, but the genotyping costs remain high for a routine use on a large number of selection candidates. It has thus been deemed interesting to develop a low density genotyping chip that would induce lower costs. In this perspective, various simulation studies have been conducted to find the best way to select a set of SNPs for low density genotyping of two laying hen lines.

Results

To design low density SNP chips, two methodologies, based on equidistance (EQ) or on linkage disequilibrium (LD) were compared. Imputation accuracy was assessed as the mean correlation between true and imputed genotypes. The results showed correlations more sensitive to false imputation of SNPs having low Minor Allele Frequency (MAF) when the EQ methodology was used. An increase in imputation accuracy was obtained when SNP density was increased, either through an increase in the number of selected windows on a chromosome or through the rise of the LD threshold. Moreover, the results varied depending on the type of chromosome (macro or micro-chromosome). The LD methodology enabled to optimize the number of SNPs, by reducing the SNP density on macro-chromosomes and by increasing it on micro-chromosomes. Imputation accuracy also increased when the size of the reference population was increased. Conversely, imputation accuracy decreased when the degree of kinship between reference and candidate populations was reduced. Finally, adding selection candidates’ dams in the reference population, in addition to their sire, enabled to get better imputation results.

Conclusions

Whichever the SNP chip, the methodology, and the scenario studied, highly accurate imputations were obtained, with mean correlations higher than 0.83. The key point to achieve good imputation results is to take into account chicken lines’ LD when designing a low density SNP chip, and to include the candidates’ direct parents in the reference population.
  相似文献   

18.

Background

The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.

Results

A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual).

Conclusion

Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.  相似文献   

19.

Background

Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information.

Methods

A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method.

Results

About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers.

Conclusions

Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.  相似文献   

20.

Background

Genotyping accounts for a substantial part of the cost of genomic selection (GS). Using both dense and sparse SNP chips, together with imputation of missing genotypes, can reduce these costs. The aim of this study was to identify the set of candidates that are most important for dense genotyping, when they are used to impute the genotypes of sparsely genotyped animals. In a real pig pedigree, the 2500 most recently born pigs of the last generation, i.e. the target animals, were used for sparse genotyping. Their missing genotypes were imputed using either Beagle or LDMIP from T densely genotyped candidates chosen from the whole pedigree. A new optimization method was derived to identify the best animals for dense genotyping, which minimized the conditional genetic variance of the target animals, using either the pedigree-based relationship matrix (MCA), or a genotypic relationship matrix based on sparse marker genotypes (MCG). These, and five other methods for selecting the T animals were compared, using T = 100 or 200 animals, SNP genotypes were obtained assuming Ne =100 or 200, and MAF thresholds set to D = 0.01, 0.05 or 0.10. The performances of the methods were compared using the following criteria: call rate of true genotypes, accuracy of genotype prediction, and accuracy of genomic evaluations using the imputed genotypes.

Results

For all criteria, MCA and MCG performed better than other selection methods, significantly so for all methods other than selection of sires with the largest numbers of offspring. Methods that choose animals that have the closest average relationship or contribution to the target population gave the lowest accuracy of imputation, in some cases worse than random selection, and should be avoided in practice.

Conclusion

Minimization of the conditional variance of the genotypes in target animals provided an effective optimization procedure for prioritizing animals for genotyping or sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/1297-9686-46-46) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号