首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The objective of this study was to quantify the accuracy of imputing the genotype of parents using information on the genotype of their progeny and a family-based and population-based imputation algorithm. Two separate data sets were used, one containing both dairy and beef animals (n=3122) with high-density genotypes (735 151 single nucleotide polymorphisms (SNPs)) and the other containing just dairy animals (n=5489) with medium-density genotypes (51 602 SNPs). Imputation accuracy of three different genotype density panels were evaluated representing low (i.e. 6501 SNPs), medium and high density. The full genotypes of sires with genotyped half-sib progeny were masked and subsequently imputed. Genotyped half-sib progeny group sizes were altered from 4 up to 12 and the impact on imputation accuracy was quantified. Up to 157 and 258 sires were used to test the accuracy of imputation in the dairy plus beef data set and the dairy-only data set, respectively. The efficiency and accuracy of imputation was quantified as the proportion of genotypes that could not be imputed, and as both the genotype concordance rate and allele concordance rate. The median proportion of genotypes per animal that could not be imputed in the imputation process decreased as the number of genotyped half-sib progeny increased; values for the medium-density panel ranged from a median of 0.015 with a half-sib progeny group size of 4 to a median of 0.0014 to 0.0015 with a half-sib progeny group size of 8. The accuracy of imputation across different paternal half-sib progeny group sizes was similar in both data sets. Concordance rates increased considerably as the number of genotyped half-sib progeny increased from four (mean animal allele concordance rate of 0.94 in both data sets for the medium-density genotype panel) to five (mean animal allele concordance rate of 0.96 in both data sets for the medium-density genotype panel) after which it was relatively stable up to a half-sib progeny group size of eight. In the data set with dairy-only animals, sufficient sires with paternal half-sib progeny groups up to 12 were available and the within-animal mean genotype concordance rates continued to increase up to this group size. The accuracy of imputation was worst for the low-density genotypes, especially with smaller half-sib progeny group sizes but the difference in imputation accuracy between density panels diminished as progeny group size increased; the difference between high and medium-density genotype panels was relatively small across all half-sib progeny group sizes. Where biological material or genotypes are not available on individual animals, at least five progeny can be genotyped (on either a medium or high-density genotyping platform) and the parental alleles imputed with, on average, ⩾96% accuracy.  相似文献   

2.
In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.  相似文献   

3.

Background

Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem.

Results

Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money.

Conclusions

Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.  相似文献   

4.
Genotyping sheep for genome‐wide SNPs at lower density and imputing to a higher density would enable cost‐effective implementation of genomic selection, provided imputation was accurate enough. Here, we describe the design of a low‐density (12k) SNP chip and evaluate the accuracy of imputation from the 12k SNP genotypes to 50k SNP genotypes in the major Australian sheep breeds. In addition, the impact of imperfect imputation on genomic predictions was evaluated by comparing the accuracy of genomic predictions for 15 novel meat traits including carcass and meat quality and omega fatty acid traits in sheep, from 12k SNP genotypes, imputed 50k SNP genotypes and real 50k SNP genotypes. The 12k chip design included 12 223 SNPs with a high minor allele frequency that were selected with intermarker spacing of 50–475 kb. SNPs for parentage and horned or polled tests also were represented. Chromosome ends were enriched with SNPs to reduce edge effects on imputation. The imputation performance of the 12k SNP chip was evaluated using 50k SNP genotypes of 4642 animals from six breeds in three different scenarios: (1) within breed, (2) single breed from multibreed reference and (3) multibreed from a single‐breed reference. The highest imputation accuracies were found with scenario 2, whereas scenario 3 was the worst, as expected. Using scenario 2, the average imputation accuracy in Border Leicester, Polled Dorset, Merino, White Suffolk and crosses was 0.95, 0.95, 0.92, 0.91 and 0.93 respectively. Imputation scenario 2 was used to impute 50k genotypes for 10 396 animals with novel meat trait phenotypes to compare genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP) with real and imputed 50k genotypes. The weighted mean imputation accuracy achieved was 0.92. The average accuracy of genomic estimated breeding values (GEBVs) based on only 12k data was 0.08 across traits and breeds, but accuracies varied widely. The mean GBLUP accuracies with imputed 50k data more than doubled to 0.21. Accuracies of genomic prediction were very similar for imputed and real 50k genotypes. There was no apparent impact on accuracy of GEBVs as a result of using imputed rather than real 50k genotypes, provided imputation accuracy was >90%.  相似文献   

5.
The uptake of genomic selection (GS) by the swine industry is still limited by the costs of genotyping. A feasible alternative to overcome this challenge is to genotype animals using an affordable low-density (LD) single nucleotide polymorphism (SNP) chip panel followed by accurate imputation to a high-density panel. Therefore, the main objective of this study was to screen incremental densities of LD panels in order to systematically identify one that balances the tradeoffs among imputation accuracy, prediction accuracy of genomic estimated breeding values (GEBVs), and genotype density (directly associated with genotyping costs). Genotypes using the Illumina Porcine60K BeadChip were available for 1378 Duroc (DU), 2361 Landrace (LA) and 3192 Yorkshire (YO) pigs. In addition, pseudo-phenotypes (de-regressed estimated breeding values) for five economically important traits were provided for the analysis. The reference population for genotyping imputation consisted of 931 DU, 1631 LA and 2103 YO animals and the remainder individuals were included in the validation population of each breed. A LD panel of 3000 evenly spaced SNPs (LD3K) yielded high imputation accuracy rates: 93.78% (DU), 97.07% (LA) and 97.00% (YO) and high correlations (>0.97) between the predicted GEBVs using the actual 60 K SNP genotypes and the imputed 60 K SNP genotypes for all traits and breeds. The imputation accuracy was influenced by the reference population size as well as the amount of parental genotype information available in the reference population. However, parental genotype information became less important when the LD panel had at least 3000 SNPs. The correlation of the GEBVs directly increased with an increase in imputation accuracy. When genotype information for both parents was available, a panel of 300 SNPs (imputed to 60 K) yielded GEBV predictions highly correlated (⩾0.90) with genomic predictions obtained based on the true 60 K panel, for all traits and breeds. For a small reference population size with no parents on reference population, it is recommended the use of a panel at least as dense as the LD3K and, when there are two parents in the reference population, a panel as small as the LD300 might be a feasible option. These findings are of great importance for the development of LD panels for swine in order to reduce genotyping costs, increase the uptake of GS and, therefore, optimize the profitability of the swine industry.  相似文献   

6.
The aim of this study was to evaluate the impact of genotype imputation on the performance of the GBLUP and Bayesian methods for genomic prediction. A total of 10,309 Holstein bulls were genotyped on the BovineSNP50 BeadChip (50 k). Five low density single nucleotide polymorphism (SNP) panels, containing 6,177, 2,480, 1,536, 768 and 384 SNPs, were simulated from the 50 k panel. A fraction of 0%, 33% and 66% of the animals were randomly selected from the training sets to have low density genotypes which were then imputed into 50 k genotypes. A GBLUP and a Bayesian method were used to predict direct genomic values (DGV) for validation animals using imputed or their actual 50 k genotypes. Traits studied included milk yield, fat percentage, protein percentage and somatic cell score (SCS). Results showed that performance of both GBLUP and Bayesian methods was influenced by imputation errors. For traits affected by a few large QTL, the Bayesian method resulted in greater reductions of accuracy due to imputation errors than GBLUP. Including SNPs with largest effects in the low density panel substantially improved the accuracy of genomic prediction for the Bayesian method. Including genotypes imputed from the 6 k panel achieved almost the same accuracy of genomic prediction as that of using the 50 k panel even when 66% of the training population was genotyped on the 6 k panel. These results justified the application of the 6 k panel for genomic prediction. Imputations from lower density panels were more prone to errors and resulted in lower accuracy of genomic prediction. But for animals that have close relationship to the reference set, genotype imputation may still achieve a relatively high accuracy.  相似文献   

7.
Although genomic selection offers the prospect of improving the rate of genetic gain in meat, wool and dairy sheep breeding programs, the key constraint is likely to be the cost of genotyping. Potentially, this constraint can be overcome by genotyping selection candidates for a low density (low cost) panel of SNPs with sparse genotype coverage, imputing a much higher density of SNP genotypes using a densely genotyped reference population. These imputed genotypes would then be used with a prediction equation to produce genomic estimated breeding values. In the future, it may also be desirable to impute very dense marker genotypes or even whole genome re‐sequence data from moderate density SNP panels. Such a strategy could lead to an accurate prediction of genomic estimated breeding values across breeds, for example. We used genotypes from 48 640 (50K) SNPs genotyped in four sheep breeds to investigate both the accuracy of imputation of the 50K SNPs from low density SNP panels, as well as prospects for imputing very dense or whole genome re‐sequence data from the 50K SNPs (by leaving out a small number of the 50K SNPs at random). Accuracy of imputation was low if the sparse panel had less than 5000 (5K) markers. Across breeds, it was clear that the accuracy of imputing from sparse marker panels to 50K was higher if the genetic diversity within a breed was lower, such that relationships among animals in that breed were higher. The accuracy of imputation from sparse genotypes to 50K genotypes was higher when the imputation was performed within breed rather than when pooling all the data, despite the fact that the pooled reference set was much larger. For Border Leicesters, Poll Dorsets and White Suffolks, 5K sparse genotypes were sufficient to impute 50K with 80% accuracy. For Merinos, the accuracy of imputing 50K from 5K was lower at 71%, despite a large number of animals with full genotypes (2215) being used as a reference. For all breeds, the relationship of individuals to the reference explained up to 64% of the variation in accuracy of imputation, demonstrating that accuracy of imputation can be increased if sires and other ancestors of the individuals to be imputed are included in the reference population. The accuracy of imputation could also be increased if pedigree information was available and was used in tracking inheritance of large chromosome segments within families. In our study, we only considered methods of imputation based on population‐wide linkage disequilibrium (largely because the pedigree for some of the populations was incomplete). Finally, in the scenarios designed to mimic imputation of high density or whole genome re‐sequence data from the 50K panel, the accuracy of imputation was much higher (86–96%). This is promising, suggesting that in silico genome re‐sequencing is possible in sheep if a suitable pool of key ancestors is sequenced for each breed.  相似文献   

8.

Background

Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers.

Methods

The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits.

Results

Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers.

Conclusions

The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation.  相似文献   

9.

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited.  相似文献   

10.
Application of imputation methods to accurately predict a dense array of SNP genotypes in the dog could provide an important supplement to current analyses of array-based genotyping data. Here, we developed a reference panel of 4,885,283 SNPs in 83 dogs across 15 breeds using whole genome sequencing. We used this panel to predict the genotypes of 268 dogs across three breeds with 84,193 SNP array-derived genotypes as inputs. We then (1) performed breed clustering of the actual and imputed data; (2) evaluated several reference panel breed combinations to determine an optimal reference panel composition; and (3) compared the accuracy of two commonly used software algorithms (Beagle and IMPUTE2). Breed clustering was well preserved in the imputation process across eigenvalues representing 75 % of the variation in the imputed data. Using Beagle with a target panel from a single breed, genotype concordance was highest using a multi-breed reference panel (92.4 %) compared to a breed-specific reference panel (87.0 %) or a reference panel containing no breeds overlapping with the target panel (74.9 %). This finding was confirmed using target panels derived from two other breeds. Additionally, using the multi-breed reference panel, genotype concordance was slightly higher with IMPUTE2 (94.1 %) compared to Beagle; Pearson correlation coefficients were slightly higher for both software packages (0.946 for Beagle, 0.961 for IMPUTE2). Our findings demonstrate that genotype imputation from SNP array-derived data to whole genome-level genotypes is both feasible and accurate in the dog with appropriate breed overlap between the target and reference panels.  相似文献   

11.
Accurate genomic analyses are predicated on access to a large quantity of accurately genotyped and phenotyped animals. Because the cost of genotyping is often less than the cost of phenotyping, interest is increasing in generating genotypes for phenotyped animals. In some instances this may imply the requirement to genotype older animals with greater phenotypic information content. Biological material for these older informative animals may, however, no longer exist. The objective of the present study was to quantify the ability to impute 11 129 single nucleotide polymorphism (SNP) genotypes of non-genotyped animals (in this instance sires) from the genotypes of their progeny with or without including the genotypes of the progenys’ dams (i.e. mates of the sire to be imputed). The impact on the accuracy of genotype imputation by including more progeny (and their dams’) genotypes in the imputation reference population was also quantified. When genotypes of the dams were not available, genotypes of 41 sires with at least 15 genotyped progeny were used for the imputation; when genotypes of the dams were available, genotypes of 21 sires with at least 10 genotyped progeny were used for the imputation. Imputation was undertaken exploiting family and population level information. The mean and variability in the proportion of genotypes per individual that could not be imputed reduced as the number of progeny genotypes used per individual increased. Little improvement in the proportion of genotypes that could not be imputed was achieved once genotypes of seven progeny and their dams were used or genotypes of 11 progeny without their respective dam’s genotypes were used. Mean imputation accuracy per individual (depicted by both concordance rates and correlation between true and imputed) increased with increasing progeny group size. Moreover, the range in mean imputation accuracy per individual reduced as more progeny genotypes were used in the imputation. If the genotype of the mate of the sire was also used, high accuracy of imputation (mean genotype concordance rate per individual of 0.988), with little additional benefit thereafter, was achieved with seven genotyped progeny. In the absence of genotypes on the dam, similar imputation accuracy could not be achieved even using genotypes on up to 15 progeny. Results therefore suggest, at least for the SNP density used in the present study, that it is possible to accurately impute the genotypes of a non-genotyped parent from the genotypes of its progeny and there is a benefit of also including the genotype of the sire’s mate (i.e. dam of the progeny).  相似文献   

12.

Background

Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.

Methods

Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.

Results

Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.

Conclusion

In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.  相似文献   

13.
SNP chips are commonly used for genotyping animals in genomic selection but strategies for selecting low-density (LD) SNPs for imputation-mediated genomic selection have not been addressed adequately. The main purpose of the present study was to compare the performance of eight LD (6K) SNP panels, each selected by a different strategy exploiting a combination of three major factors: evenly-spaced SNPs, increased minor allele frequencies, and SNP-trait associations either for single traits independently or for all the three traits jointly. The imputation accuracies from 6K to 80K SNP genotypes were between 96.2 and 98.2%. Genomic prediction accuracies obtained using imputed 80K genotypes were between 0.817 and 0.821 for daughter pregnancy rate, between 0.838 and 0.844 for fat yield, and between 0.850 and 0.863 for milk yield. The two SNP panels optimized on the three major factors had the highest genomic prediction accuracy (0.821–0.863), and these accuracies were very close to those obtained using observed 80K genotypes (0.825–0.868). Further exploration of the underlying relationships showed that genomic prediction accuracies did not respond linearly to imputation accuracies, but were significantly affected by genotype (imputation) errors of SNPs in association with the traits to be predicted. SNPs optimal for map coverage and MAF were favorable for obtaining accurate imputation of genotypes whereas trait-associated SNPs improved genomic prediction accuracies. Thus, optimal LD SNP panels were the ones that combined both strengths. The present results have practical implications on the design of LD SNP chips for imputation-enabled genomic prediction.  相似文献   

14.

Background

Genotype imputation is commonly used as an initial step in genomic selection since the accuracy of genomic selection does not decline if accurately imputed genotypes are used instead of actual genotypes but for a lower cost. Performance of imputation has rarely been investigated in crossbred animals and, in particular, in pigs. The extent and pattern of linkage disequilibrium differ in crossbred versus purebred animals, which may impact the performance of imputation. In this study, first we compared different scenarios of imputation from 5 K to 8 K single nucleotide polymorphisms (SNPs) in genotyped Danish Landrace and Yorkshire and crossbred Landrace-Yorkshire datasets and, second, we compared imputation from 8 K to 60 K SNPs in genotyped purebred and simulated crossbred datasets. All imputations were done using software Beagle version 3.3.2. Then, we investigated the reasons that could explain the differences observed.

Results

Genotype imputation performs as well in crossbred animals as in purebred animals when both parental breeds are included in the reference population. When the size of the reference population is very large, it is not necessary to use a reference population that combines the two breeds to impute the genotypes of purebred animals because a within-breed reference population can provide a very high level of imputation accuracy (correct rate ≥ 0.99, correlation ≥ 0.95). However, to ensure that similar imputation accuracies are obtained for crossbred animals, a reference population that combines both parental purebred animals is required. Imputation accuracies are higher when a larger proportion of haplotypes are shared between the reference population and the validation (imputed) populations.

Conclusions

The results from both real data and pedigree-based simulated data demonstrate that genotype imputation from low-density panels to medium-density panels is highly accurate in both purebred and crossbred pigs. In crossbred pigs, combining the parental purebred animals in the reference population is necessary to obtain high imputation accuracy.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0134-4) contains supplementary material, which is available to authorized users.  相似文献   

15.
Availability of high-density single nucleotide polymorphism (SNP) genotyping platforms provided unprecedented opportunities to enhance breeding programmes in livestock, poultry and plant species, and to better understand the genetic basis of complex traits. Using this genomic information, genomic breeding values (GEBVs), which are more accurate than conventional breeding values. The superiority of genomic selection is possible only when high-density SNP panels are used to track genes and QTLs affecting the trait. Unfortunately, even with the continuous decrease in genotyping costs, only a small fraction of the population has been genotyped with these high-density panels. It is often the case that a larger portion of the population is genotyped with low-density and low-cost SNP panels and then imputed to a higher density. Accuracy of SNP genotype imputation tends to be high when minimum requirements are met. Nevertheless, a certain rate of genotype imputation errors is unavoidable. Thus, it is reasonable to assume that the accuracy of GEBVs will be affected by imputation errors; especially, their cumulative effects over time. To evaluate the impact of multi-generational selection on the accuracy of SNP genotypes imputation and the reliability of resulting GEBVs, a simulation was carried out under varying updating of the reference population, distance between the reference and testing sets, and the approach used for the estimation of GEBVs. Using fixed reference populations, imputation accuracy decayed by about 0.5% per generation. In fact, after 25 generations, the accuracy was only 7% lower than the first generation. When the reference population was updated by either 1% or 5% of the top animals in the previous generations, decay of imputation accuracy was substantially reduced. These results indicate that low-density panels are useful, especially when the generational interval between reference and testing population is small. As the generational interval increases, the imputation accuracies decay, although not at an alarming rate. In absence of updating of the reference population, accuracy of GEBVs decays substantially in one or two generations at the rate of 20% to 25% per generation. When the reference population is updated by 1% or 5% every generation, the decay in accuracy was 8% to 11% after seven generations using true and imputed genotypes. These results indicate that imputed genotypes provide a viable alternative, even after several generations, as long the reference and training populations are appropriately updated to reflect the genetic change in the population.  相似文献   

16.

Background

Genomic selection has become a standard tool in dairy cattle breeding. However, for other animal species, implementation of this technology is hindered by the high cost of genotyping. One way to reduce the routine costs is to genotype selection candidates with an SNP (single nucleotide polymorphism) panel of reduced density. This strategy is investigated in the present paper. Methods are proposed for the approximation of SNP positions, for selection of SNPs to be included in the low-density panel, for genotype imputation, and for the estimation of the accuracy of genomic breeding values. The imputation method was developed for a situation in which selection candidates are genotyped with an SNP panel of reduced density but have high-density genotyped sires. The dams of selection candidates are not genotyped. The methods were applied to a sire line pig population with 895 German Piétrain boars genotyped with the PorcineSNP60 BeadChip.

Results

Genotype imputation error rates were 0.133 for a 384 marker panel, 0.079 for a 768 marker panel, and 0.022 for a 3000 marker panel. Error rates for markers with approximated positions were slightly larger. Availability of high-density genotypes for close relatives of the selection candidates reduced the imputation error rate. The estimated decrease in the accuracy of genomic breeding values due to imputation errors was 3% for the 384 marker panel and negligible for larger panels, provided that at least one parent of the selection candidates was genotyped at high-density.Genomic breeding values predicted from deregressed breeding values with low reliabilities were more strongly correlated with the estimated BLUP breeding values than with the true breeding values. This was not the case when a shortened pedigree was used to predict BLUP breeding values, in which the parents of the individuals genotyped at high-density were considered unknown.

Conclusions

Genomic selection with imputation from very low- to high-density marker panels is a promising strategy for the implementation of genomic selection at acceptable costs. A panel size of 384 markers can be recommended for selection candidates of a pig breeding program if at least one parent is genotyped at high-density, but this appears to be the lower bound.  相似文献   

17.

Background

Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds.

Methods

Accuracy of genotype imputation in Nelore cattle was evaluated using different LD chips, imputation software and sets of animals. Twelve commercial and customized LD chips with densities ranging from 7 K to 75 K were tested. Customized LD chips were virtually designed taking into account minor allele frequency, linkage disequilibrium and distance between markers. Software programs FImpute and BEAGLE were applied to impute genotypes. From 995 bulls and 1247 cows that were genotyped with the Illumina® BovineHD chip (HD), 793 sires composed the reference set, and the remaining 202 younger sires and all the cows composed two separate validation sets for which genotypes were masked except for the SNPs of the LD chip that were to be tested.

Results

Imputation accuracy increased with the SNP density of the LD chip. However, the gain in accuracy with LD chips with more than 15 K SNPs was relatively small because accuracy was already high at this density. Commercial and customized LD chips with equivalent densities presented similar results. FImpute outperformed BEAGLE for all LD chips and validation sets. Regardless of the imputation software used, accuracy tended to increase as the relatedness between imputed and reference animals increased, especially for the 7 K chip.

Conclusions

If the Illumina® BovineHD is considered as the target chip for genomic applications in the Nelore breed, cost-effectiveness can be improved by genotyping part of the animals with a chip containing around 15 K useful SNPs and imputing their high-density missing genotypes with FImpute.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0069-1) contains supplementary material, which is available to authorized users.  相似文献   

18.
《Genomics》2021,113(2):655-668
Genotyping-by-sequencing (GBS) provides the marker density required for genomic predictions (GP). However, GBS gives a high proportion of missing SNP data which, for species without a chromosome-level genome assembly, must be imputed without knowing the SNP physical positions. Here, we compared GP accuracy with seven map-independent and two map-dependent imputation approaches, and when using all SNPs against the subset of genetically mapped SNPs. We used two rubber tree (Hevea brasiliensis) datasets with three traits. The results showed that the best imputation approaches were LinkImputeR, Beagle and FImpute. Using the genetically mapped SNPs increased GP accuracy by 4.3%. Using LinkImputeR on all the markers allowed avoiding genetic mapping, with a slight decrease in GP accuracy. LinkImputeR gave the highest level of correctly imputed genotypes and its performances were further improved by its ability to define a subset of SNPs imputed optimally. These results will contribute to the efficient implementation of genomic selection with GBS. For Hevea, GBS is promising for rubber yield improvement, with GP accuracies reaching 0.52.  相似文献   

19.
Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.  相似文献   

20.
Genotype imputation, used in genome-wide association studies to expand coverage of single nucleotide polymorphisms (SNPs), has performed poorly in African Americans compared to less admixed populations. Overall, imputation has typically relied on HapMap reference haplotype panels from Africans (YRI), European Americans (CEU), and Asians (CHB/JPT). The 1000 Genomes project offers a wider range of reference populations, such as African Americans (ASW), but their imputation performance has had limited evaluation. Using 595 African Americans genotyped on Illumina’s HumanHap550v3 BeadChip, we compared imputation results from four software programs (IMPUTE2, BEAGLE, MaCH, and MaCH-Admix) and three reference panels consisting of different combinations of 1000 Genomes populations (February 2012 release): (1) 3 specifically selected populations (YRI, CEU, and ASW); (2) 8 populations of diverse African (AFR) or European (AFR) descent; and (3) all 14 available populations (ALL). Based on chromosome 22, we calculated three performance metrics: (1) concordance (percentage of masked genotyped SNPs with imputed and true genotype agreement); (2) imputation quality score (IQS; concordance adjusted for chance agreement, which is particularly informative for low minor allele frequency [MAF] SNPs); and (3) average r2hat (estimated correlation between the imputed and true genotypes, for all imputed SNPs). Across the reference panels, IMPUTE2 and MaCH had the highest concordance (91%–93%), but IMPUTE2 had the highest IQS (81%–83%) and average r2hat (0.68 using YRI+ASW+CEU, 0.62 using AFR+EUR, and 0.55 using ALL). Imputation quality for most programs was reduced by the addition of more distantly related reference populations, due entirely to the introduction of low frequency SNPs (MAF≤2%) that are monomorphic in the more closely related panels. While imputation was optimized by using IMPUTE2 with reference to the ALL panel (average r2hat = 0.86 for SNPs with MAF>2%), use of the ALL panel for African American studies requires careful interpretation of the population specificity and imputation quality of low frequency SNPs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号