首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 22 毫秒
1.

Background

Knowing the phase of marker genotype data can be useful in genome-wide association studies, because it makes it possible to use analysis frameworks that account for identity by descent or parent of origin of alleles and it can lead to a large increase in data quantities via genotype or sequence imputation. Long-range phasing and haplotype library imputation constitute a fast and accurate method to impute phase for SNP data.

Methods

A long-range phasing and haplotype library imputation algorithm was developed. It combines information from surrogate parents and long haplotypes to resolve phase in a manner that is not dependent on the family structure of a dataset or on the presence of pedigree information.

Results

The algorithm performed well in both simulated and real livestock and human datasets in terms of both phasing accuracy and computation efficiency. The percentage of alleles that could be phased in both simulated and real datasets of varying size generally exceeded 98% while the percentage of alleles incorrectly phased in simulated data was generally less than 0.5%. The accuracy of phasing was affected by dataset size, with lower accuracy for dataset sizes less than 1000, but was not affected by effective population size, family data structure, presence or absence of pedigree information, and SNP density. The method was computationally fast. In comparison to a commonly used statistical method (fastPHASE), the current method made about 8% less phasing mistakes and ran about 26 times faster for a small dataset. For larger datasets, the differences in computational time are expected to be even greater. A computer program implementing these methods has been made available.

Conclusions

The algorithm and software developed in this study make feasible the routine phasing of high-density SNP chips in large datasets.  相似文献   

2.

Background

Identifying recombination events and the chromosomal segments that constitute a gamete is useful for a number of applications in genomic analyses. In livestock, genotypic data are commonly available for half-sib families. We propose a straightforward but computationally efficient method to use single nucleotide polymorphism marker genotypes on half-sibs to reconstruct the recombination and segregation events that occurred during meiosis in a sire to form the haplotypes observed in its offspring. These meiosis events determine a block structure in paternal haplotypes of the progeny and this can be used to phase the genotypes of individuals in single half-sib families, to impute haplotypes of the sire if they are not genotyped or to impute the paternal strand of the offspring’s sequence based on sequence data of the sire.

Methods

The hsphase algorithm exploits information from opposing homozygotes among half-sibs to identify recombination events, and the chromosomal regions from the paternal and maternal strands of the sire (blocks) that were inherited by its progeny. This information is then used to impute the sire’s genotype, which, in turn, is used to phase the half-sib family. Accuracy (defined as R2) and performance of this approach were evaluated by using simulated and real datasets. Phasing results for the half-sibs were benchmarked to other commonly used phasing programs – AlphaPhase, BEAGLE and PedPhase 3.

Results

Using a simulated dataset with 20 markers per cM, and for a half-sib family size of 4 and 40, the accuracy of block detection, was 0.58 and 0.96, respectively. The accuracy of inferring sire genotypes was 0.75 and 1.00 and the accuracy of phasing was around 0.97, respectively. hsphase was more robust to genotyping errors than PedPhase 3, AlphaPhase and BEAGLE. Computationally, hsphase was much faster than AlphaPhase and BEAGLE.

Conclusions

In half-sib families of size 8 and above, hsphase can accurately detect block structure of paternal haplotypes, impute genotypes of ungenotyped sires and reconstruct haplotypes in progeny. The method is much faster and more accurate than other widely used population-based phasing programs. A program implementing the method is freely available as an R package (hsphase).  相似文献   

3.

Background

Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation.

Methods

An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis.

Results

Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored.

Conclusions

The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations.  相似文献   

4.

Background

The advent of low cost next generation sequencing has made it possible to sequence a large number of dairy and beef bulls which can be used as a reference for imputation of whole genome sequence data. The aim of this study was to investigate the accuracy and speed of imputation from a high density SNP marker panel to whole genome sequence level. Data contained 132 Holstein, 42 Jersey, 52 Nordic Red and 16 Brown Swiss bulls with whole genome sequence data; 16 Holstein, 27 Jersey and 29 Nordic Reds had previously been typed with the bovine high density SNP panel and were used for validation. We investigated the effect of enlarging the reference population by combining data across breeds on the accuracy of imputation, and the accuracy and speed of both IMPUTE2 and BEAGLE using either genotype probability reference data or pre-phased reference data. All analyses were done on Bovine autosome 29 using 387,436 bi-allelic variants and 13,612 SNP markers from the bovine HD panel.

Results

A combined breed reference population led to higher imputation accuracies than did a single breed reference. The highest accuracy of imputation for all three test breeds was achieved when using BEAGLE with un-phased reference data (mean genotype correlations of 0.90, 0.89 and 0.87 for Holstein, Jersey and Nordic Red respectively) but IMPUTE2 with un-phased reference data gave similar accuracies for Holsteins and Nordic Red. Pre-phasing the reference data only lead to a minor decrease in the imputation accuracy, but gave a large improvement in computation time. Pre-phasing with BEAGLE was substantially faster than pre-phasing with SHAPEIT2 (2.5 hours vs. 52 hours for 242 individuals), and imputation with pre-phased data was faster in IMPUTE2 than in BEAGLE (5 minutes vs. 50 minutes per individual).

Conclusion

Combining reference populations across breeds is a good option to increase the size of the reference data and in turn the accuracy of imputation when only few animals are available. Pre-phasing the reference data only slightly decreases the accuracy but gives substantial improvements in speed. Using BEAGLE for pre-phasing and IMPUTE2 for imputation is a fast and accurate strategy.  相似文献   

5.

Background

Imputation of genotypes for ungenotyped individuals could enable the use of valuable phenotypes created before the genomic era in analyses that require genotypes. The objective of this study was to investigate the accuracy of imputation of non-genotyped individuals using genotype information from relatives.

Methods

Genotypes were simulated for all individuals in the pedigree of a real (historical) dataset of phenotyped dairy cows and with part of the pedigree genotyped. The software AlphaImpute was used for imputation in its standard settings but also without phasing, i.e. using basic inheritance rules and segregation analysis only. Different scenarios were evaluated i.e.: (1) the real data scenario, (2) addition of genotypes of sires and maternal grandsires of the ungenotyped individuals, and (3) addition of one, two, or four genotyped offspring of the ungenotyped individuals to the reference population.

Results

The imputation accuracy using AlphaImpute in its standard settings was lower than without phasing. Including genotypes of sires and maternal grandsires in the reference population improved imputation accuracy, i.e. the correlation of the true genotypes with the imputed genotype dosages, corrected for mean gene content, across all animals increased from 0.47 (real situation) to 0.60. Including one, two and four genotyped offspring increased the accuracy of imputation across all animals from 0.57 (no offspring) to 0.73, 0.82, and 0.92, respectively.

Conclusions

At present, the use of basic inheritance rules and segregation analysis appears to be the best imputation method for ungenotyped individuals. Comparison of our empirical animal-specific imputation accuracies to predictions based on selection index theory suggested that not correcting for mean gene content considerably overestimates the true accuracy. Imputation of ungenotyped individuals can help to include valuable phenotypes for genome-wide association studies or for genomic prediction, especially when the ungenotyped individuals have genotyped offspring.  相似文献   

6.

Background

The major obstacles for the implementation of genomic selection in Australian beef cattle are the variety of breeds and in general, small numbers of genotyped and phenotyped individuals per breed. The Australian Beef Cooperative Research Center (Beef CRC) investigated these issues by deriving genomic prediction equations (PE) from a training set of animals that covers a range of breeds and crosses including Angus, Murray Grey, Shorthorn, Hereford, Brahman, Belmont Red, Santa Gertrudis and Tropical Composite. This paper presents accuracies of genomically estimated breeding values (GEBV) that were calculated from these PE in the commercial pure-breed beef cattle seed stock sector.

Methods

PE derived by the Beef CRC from multi-breed and pure-breed training populations were applied to genotyped Angus, Limousin and Brahman sires and young animals, but with no pure-breed Limousin in the training population. The accuracy of the resulting GEBV was assessed by their genetic correlation to their phenotypic target trait in a bi-variate REML approach that models GEBV as trait observations.

Results

Accuracies of most GEBV for Angus and Brahman were between 0.1 and 0.4, with accuracies for abattoir carcass traits generally greater than for live animal body composition traits and reproduction traits. Estimated accuracies greater than 0.5 were only observed for Brahman abattoir carcass traits and for Angus carcass rib fat. Averaged across traits within breeds, accuracies of GEBV were highest when PE from the pooled across-breed training population were used. However, for the Angus and Brahman breeds the difference in accuracy from using pure-breed PE was small. For the Limousin breed no reasonable results could be achieved for any trait.

Conclusion

Although accuracies were generally low compared to published accuracies estimated within breeds, they are in line with those derived in other multi-breed populations. Thus PE developed by the Beef CRC can contribute to the implementation of genomic selection in Australian beef cattle breeding.  相似文献   

7.

Background

Despite the dramatic reduction in the cost of high-density genotyping that has occurred over the last decade, it remains one of the limiting factors for obtaining the large datasets required for genomic studies of disease in the horse. In this study, we investigated the potential for low-density genotyping and subsequent imputation to address this problem.

Results

Using the haplotype phasing and imputation program, BEAGLE, it is possible to impute genotypes from low- to high-density (50K) in the Thoroughbred horse with reasonable to high accuracy. Analysis of the sources of variation in imputation accuracy revealed dependence both on the minor allele frequency of the single nucleotide polymorphisms (SNPs) being imputed and on the underlying linkage disequilibrium structure. Whereas equidistant spacing of the SNPs on the low-density panel worked well, optimising SNP selection to increase their minor allele frequency was advantageous, even when the panel was subsequently used in a population of different geographical origin. Replacing base pair position with linkage disequilibrium map distance reduced the variation in imputation accuracy across SNPs. Whereas a 1K SNP panel was generally sufficient to ensure that more than 80% of genotypes were correctly imputed, other studies suggest that a 2K to 3K panel is more efficient to minimize the subsequent loss of accuracy in genomic prediction analyses. The relationship between accuracy and genotyping costs for the different low-density panels, suggests that a 2K SNP panel would represent good value for money.

Conclusions

Low-density genotyping with a 2K SNP panel followed by imputation provides a compromise between cost and accuracy that could promote more widespread genotyping, and hence the use of genomic information in horses. In addition to offering a low cost alternative to high-density genotyping, imputation provides a means to combine datasets from different genotyping platforms, which is becoming necessary since researchers are starting to use the recently developed equine 70K SNP chip. However, more work is needed to evaluate the impact of between-breed differences on imputation accuracy.  相似文献   

8.

Background

Imputation of genotypes from low-density to higher density chips is a cost-effective method to obtain high-density genotypes for many animals, based on genotypes of only a relatively small subset of animals (reference population) on the high-density chip. Several factors influence the accuracy of imputation and our objective was to investigate the effects of the size of the reference population used for imputation and of the imputation method used and its parameters. Imputation of genotypes was carried out from 50 000 (moderate-density) to 777 000 (high-density) SNPs (single nucleotide polymorphisms).

Methods

The effect of reference population size was studied in two datasets: one with 548 and one with 1289 Holstein animals, genotyped with the Illumina BovineHD chip (777 k SNPs). A third dataset included the 548 animals genotyped with the 777 k SNP chip and 2200 animals genotyped with the Illumina BovineSNP50 chip. In each dataset, 60 animals were chosen as validation animals, for which all high-density genotypes were masked, except for the Illumina BovineSNP50 markers. Imputation was studied in a subset of six chromosomes, using the imputation software programs Beagle and DAGPHASE.

Results

Imputation with DAGPHASE and Beagle resulted in 1.91% and 0.87% allelic imputation error rates in the dataset with 548 high-density genotypes, when scale and shift parameters were 2.0 and 0.1, and 1.0 and 0.0, respectively. When Beagle was used alone, the imputation error rate was 0.67%. If the information obtained by Beagle was subsequently used in DAGPHASE, imputation error rates were slightly higher (0.71%). When 2200 moderate-density genotypes were added and Beagle was used alone, imputation error rates were slightly lower (0.64%). The least imputation errors were obtained with Beagle in the reference set with 1289 high-density genotypes (0.41%).

Conclusions

For imputation of genotypes from the 50 k to the 777 k SNP chip, Beagle gave the lowest allelic imputation error rates. Imputation error rates decreased with increasing size of the reference population. For applications for which computing time is limiting, DAGPHASE using information from Beagle can be considered as an alternative, since it reduces computation time and increases imputation error rates only slightly.  相似文献   

9.

Background

Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the present study, we assessed accuracy of imputation using HapMap reference populations in a genome-wide association study in Pima Indians.

Results

Data from six randomly selected chromosomes were used. Genotypes in the study population were masked (either 1% or 20% of SNPs available for a given chromosome). The masked genotypes were then imputed using the software Markov Chain Haplotyping Algorithm. Using four HapMap reference populations, average genotype error rates ranged from 7.86% for Mexican Americans to 22.30% for Yoruba. In contrast, use of the original Pima Indian data as a reference resulted in an average error rate of 1.73%.

Conclusions

Our results suggest that the use of HapMap reference populations results in substantial inaccuracy in the imputation of genotypes in American Indians. A possible solution would be to densely genotype or sequence a reference American Indian population.  相似文献   

10.

Background

Genotype imputation from low-density (LD) to high-density single nucleotide polymorphism (SNP) chips is an important step before applying genomic selection, since denser chips tend to provide more reliable genomic predictions. Imputation methods rely partially on linkage disequilibrium between markers to infer unobserved genotypes. Bos indicus cattle (e.g. Nelore breed) are characterized, in general, by lower levels of linkage disequilibrium between genetic markers at short distances, compared to taurine breeds. Thus, it is important to evaluate the accuracy of imputation to better define which imputation method and chip are most appropriate for genomic applications in indicine breeds.

Methods

Accuracy of genotype imputation in Nelore cattle was evaluated using different LD chips, imputation software and sets of animals. Twelve commercial and customized LD chips with densities ranging from 7 K to 75 K were tested. Customized LD chips were virtually designed taking into account minor allele frequency, linkage disequilibrium and distance between markers. Software programs FImpute and BEAGLE were applied to impute genotypes. From 995 bulls and 1247 cows that were genotyped with the Illumina® BovineHD chip (HD), 793 sires composed the reference set, and the remaining 202 younger sires and all the cows composed two separate validation sets for which genotypes were masked except for the SNPs of the LD chip that were to be tested.

Results

Imputation accuracy increased with the SNP density of the LD chip. However, the gain in accuracy with LD chips with more than 15 K SNPs was relatively small because accuracy was already high at this density. Commercial and customized LD chips with equivalent densities presented similar results. FImpute outperformed BEAGLE for all LD chips and validation sets. Regardless of the imputation software used, accuracy tended to increase as the relatedness between imputed and reference animals increased, especially for the 7 K chip.

Conclusions

If the Illumina® BovineHD is considered as the target chip for genomic applications in the Nelore breed, cost-effectiveness can be improved by genotyping part of the animals with a chip containing around 15 K useful SNPs and imputing their high-density missing genotypes with FImpute.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0069-1) contains supplementary material, which is available to authorized users.  相似文献   

11.

Background

A cost-effective strategy to increase the density of available markers within a population is to sequence a small proportion of the population and impute whole-genome sequence data for the remaining population. Increased densities of typed markers are advantageous for genome-wide association studies (GWAS) and genomic predictions.

Methods

We obtained genotypes for 54 602 SNPs (single nucleotide polymorphisms) in 1077 Franches-Montagnes (FM) horses and Illumina paired-end whole-genome sequencing data for 30 FM horses and 14 Warmblood horses. After variant calling, the sequence-derived SNP genotypes (~13 million SNPs) were used for genotype imputation with the software programs Beagle, Impute2 and FImpute.

Results

The mean imputation accuracy of FM horses using Impute2 was 92.0%. Imputation accuracy using Beagle and FImpute was 74.3% and 77.2%, respectively. In addition, for Impute2 we determined the imputation accuracy of all individual horses in the validation population, which ranged from 85.7% to 99.8%. The subsequent inclusion of Warmblood sequence data further increased the correlation between true and imputed genotypes for most horses, especially for horses with a high level of admixture. The final imputation accuracy of the horses ranged from 91.2% to 99.5%.

Conclusions

Using Impute2, the imputation accuracy was higher than 91% for all horses in the validation population, which indicates that direct imputation of 50k SNP-chip data to sequence level genotypes is feasible in the FM population. The individual imputation accuracy depended mainly on the applied software and the level of admixture.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0063-7) contains supplementary material, which is available to authorized users.  相似文献   

12.
13.

Background

The use of whole-genome sequence data can lead to higher accuracy in genome-wide association studies and genomic predictions. However, to benefit from whole-genome sequence data, a large dataset of sequenced individuals is needed. Imputation from SNP panels, such as the Illumina BovineSNP50 BeadChip and Illumina BovineHD BeadChip, to whole-genome sequence data is an attractive and less expensive approach to obtain whole-genome sequence genotypes for a large number of individuals than sequencing all individuals. Our objective was to investigate accuracy of imputation from lower density SNP panels to whole-genome sequence data in a typical dataset for cattle.

Methods

Whole-genome sequence data of chromosome 1 (1737 471 SNPs) for 114 Holstein Friesian bulls were used. Beagle software was used for imputation from the BovineSNP50 (3132 SNPs) and BovineHD (40 492 SNPs) beadchips. Accuracy was calculated as the correlation between observed and imputed genotypes and assessed by five-fold cross-validation. Three scenarios S40, S60 and S80 with respectively 40%, 60%, and 80% of the individuals as reference individuals were investigated.

Results

Mean accuracies of imputation per SNP from the BovineHD panel to sequence data and from the BovineSNP50 panel to sequence data for scenarios S40 and S80 ranged from 0.77 to 0.83 and from 0.37 to 0.46, respectively. Stepwise imputation from the BovineSNP50 to BovineHD panel and then to sequence data for scenario S40 improved accuracy per SNP to 0.65 but it varied considerably between SNPs.

Conclusions

Accuracy of imputation to whole-genome sequence data was generally high for imputation from the BovineHD beadchip, but was low from the BovineSNP50 beadchip. Stepwise imputation from the BovineSNP50 to the BovineHD beadchip and then to sequence data substantially improved accuracy of imputation. SNPs with a low minor allele frequency were more difficult to impute correctly and the reliability of imputation varied more. Linkage disequilibrium between an imputed SNP and the SNP on the lower density panel, minor allele frequency of the imputed SNP and size of the reference group affected imputation reliability.  相似文献   

14.

Background

Genotyping with the medium-density Bovine SNP50 BeadChip® (50K) is now standard in cattle. The high-density BovineHD BeadChip®, which contains 777 609 single nucleotide polymorphisms (SNPs), was developed in 2010. Increasing marker density increases the level of linkage disequilibrium between quantitative trait loci (QTL) and SNPs and the accuracy of QTL localization and genomic selection. However, re-genotyping all animals with the high-density chip is not economically feasible. An alternative strategy is to genotype part of the animals with the high-density chip and to impute high-density genotypes for animals already genotyped with the 50K chip. Thus, it is necessary to investigate the error rate when imputing from the 50K to the high-density chip.

Methods

Five thousand one hundred and fifty three animals from 16 breeds (89 to 788 per breed) were genotyped with the high-density chip. Imputation error rates from the 50K to the high-density chip were computed for each breed with a validation set that included the 20% youngest animals. Marker genotypes were masked for animals in the validation population in order to mimic 50K genotypes. Imputation was carried out using the Beagle 3.3.0 software.

Results

Mean allele imputation error rates ranged from 0.31% to 2.41% depending on the breed. In total, 1980 SNPs had high imputation error rates in several breeds, which is probably due to genome assembly errors, and we recommend to discard these in future studies. Differences in imputation accuracy between breeds were related to the high-density-genotyped sample size and to the genetic relationship between reference and validation populations, whereas differences in effective population size and level of linkage disequilibrium showed limited effects. Accordingly, imputation accuracy was higher in breeds with large populations and in dairy breeds than in beef breeds. More than 99% of the alleles were correctly imputed if more than 300 animals were genotyped at high-density. No improvement was observed when multi-breed imputation was performed.

Conclusion

In all breeds, imputation accuracy was higher than 97%, which indicates that imputation to the high-density chip was accurate. Imputation accuracy depends mainly on the size of the reference population and the relationship between reference and target populations.  相似文献   

15.

Background

The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used.

Methods

Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content.

Results

In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip.

Conclusions

Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.  相似文献   

16.

Background

In sexually reproducing organisms, meiotic crossovers ensure the proper segregation of chromosomes and contribute to genetic diversity by shuffling allelic combinations. Such genetic reassortment is exploited in breeding to combine favorable alleles, and in genetic research to identify genetic factors underlying traits of interest via linkage or association-based approaches. Crossover numbers and distributions along chromosomes vary between species, but little is known about their intraspecies variation.

Results

Here, we report on the variation of recombination rates between 22 European maize inbred lines that belong to the Dent and Flint gene pools. We genotype 23 doubled-haploid populations derived from crosses between these lines with a 50 k-SNP array and construct high-density genetic maps, showing good correspondence with the maize B73 genome sequence assembly. By aligning each genetic map to the B73 sequence, we obtain the recombination rates along chromosomes specific to each population. We identify significant differences in recombination rates at the genome-wide, chromosome, and intrachromosomal levels between populations, as well as significant variation for genome-wide recombination rates among maize lines. Crossover interference analysis using a two-pathway modeling framework reveals a negative association between recombination rate and interference strength.

Conclusions

To our knowledge, the present work provides the most comprehensive study on intraspecific variation of recombination rates and crossover interference strength in eukaryotes. Differences found in recombination rates will allow for selection of high or low recombining lines in crossing programs. Our methodology should pave the way for precise identification of genes controlling recombination rates in maize and other organisms.  相似文献   

17.

Background

Currently, genome-wide evaluation of cattle populations is based on SNP-genotyping using ~ 54 000 SNP. Increasing the number of markers might improve genomic predictions and power of genome-wide association studies. Imputation of genotypes makes it possible to extrapolate genotypes from lower to higher density arrays based on a representative reference sample for which genotypes are obtained at higher density.

Methods

Genotypes using 639 214 SNP were available for 797 bulls of the Fleckvieh cattle breed. The data set was divided into a reference and a validation population. Genotypes for all SNP except those included in the BovineSNP50 Bead chip were masked and subsequently imputed for animals of the validation population. Imputation of genotypes was performed with Beagle, findhap.f90, MaCH and Minimac. The accuracy of the imputed genotypes was assessed for four different scenarios including 50, 100, 200 and 400 animals as reference population. The reference animals were selected to account for 78.03%, 89.21%, 97.47% and > 99% of the gene pool of the genotyped population, respectively.

Results

Imputation accuracy increased as the number of animals and relatives in the reference population increased. Population-based algorithms provided highly reliable imputation of genotypes, even for scenarios with 50 and 100 reference animals only. Using MaCH and Minimac, the correlation between true and imputed genotypes was > 0.975 with 100 reference animals only. Pre-phasing the genotypes of both the reference and validation populations not only provided highly accurate imputed genotypes but was also computationally efficient. Genome-wide analysis of imputation accuracy led to the identification of many misplaced SNP.

Conclusions

Genotyping key animals at high density and subsequent population-based genotype imputation yield high imputation accuracy. Pre-phasing the genotypes of the reference and validation populations is computationally efficient and results in high imputation accuracy, even when the reference population is small.  相似文献   

18.

Background

Genotype imputation can help reduce genotyping costs particularly for implementation of genomic selection. In applications entailing large populations, recovering the genotypes of untyped loci using information from reference individuals that were genotyped with a higher density panel is computationally challenging. Popular imputation methods are based upon the Hidden Markov model and have computational constraints due to an intensive sampling process. A fast, deterministic approach, which makes use of both family and population information, is presented here. All individuals are related and, therefore, share haplotypes which may differ in length and frequency based on their relationships. The method starts with family imputation if pedigree information is available, and then exploits close relationships by searching for long haplotype matches in the reference group using overlapping sliding windows. The search continues as the window size is shrunk in each chromosome sweep in order to capture more distant relationships.

Results

The proposed method gave higher or similar imputation accuracy than Beagle and Impute2 in cattle data sets when all available information was used. When close relatives of target individuals were present in the reference group, the method resulted in higher accuracy compared to the other two methods even when the pedigree was not used. Rare variants were also imputed with higher accuracy. Finally, computing requirements were considerably lower than those of Beagle and Impute2. The presented method took 28 minutes to impute from 6 k to 50 k genotypes for 2,000 individuals with a reference size of 64,429 individuals.

Conclusions

The proposed method efficiently makes use of information from close and distant relatives for accurate genotype imputation. In addition to its high imputation accuracy, the method is fast, owing to its deterministic nature and, therefore, it can easily be used in large data sets where the use of other methods is impractical.  相似文献   

19.

Background

We conducted a genome-wide linkage analysis to identify quantitative trait loci (QTL) that influence meat quality-related traits in a large F2 intercross between Landrace and Korean native pigs. Thirteen meat quality-related traits of the m. longissimus lumborum et thoracis were measured in more than 830 F2 progeny. All these animals were genotyped with 173 microsatellite markers located throughout the pig genome, and the GridQTL program based on the least squares regression model was used to perform the QTL analysis.

Results

We identified 23 genome-wide significant QTL in eight chromosome regions (SSC1, 2, 6, 7, 9, 12, 13, and 16) (SSC for Sus Scrofa) and detected 51 suggestive QTL in the 17 chromosome regions. QTL that affect 10 meat quality traits were detected on SSC12 and were highly significant at the genome-wide level. In particular, the QTL with the largest effect affected crude fat percentage and explained 22.5% of the phenotypic variance (F-ratio = 278.0 under the additive model, nominal P = 5.5 × 10−55). Interestingly, the QTL on SSC12 that influenced meat quality traits showed an obvious trend for co-localization.

Conclusions

Our results confirm several previously reported QTL. In addition, we identified novel QTL for meat quality traits, which together with the associated positional candidate genes improve the knowledge on the genetic structure that underlies genetic variation for meat quality traits in pigs.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0080-6) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

In a previous study in the Fleckvieh dual purpose cattle breed, we mapped a quantitative trait locus (QTL) affecting milk yield (MY1), milk protein yield (PY1) and milk fat yield (FY1) during first lactation to the distal part of bovine chromosome 5 (BTA5), but the confidence interval was too large for positional cloning of the causal gene. Our objective here was to refine the position of this QTL and to define the candidate region for high-throughput sequencing.

Methods

In addition to those previously studied, new Fleckvieh families were genotyped, in order to increase the number of recombination events. Twelve new microsatellites and 240 SNP markers covering the most likely QTL region on BTA5 were analysed. Based on haplotype analysis performed in this complex pedigree, families segregating for the low frequency allele of this QTL (minor allele) were selected. Single- and multiple-QTL analyses using combined linkage and linkage disequilibrium methods were performed.

Results

Single nucleotide polymorphism haplotype analyses on representative family sires and their ancestors revealed that the haplotype carrying the minor QTL allele is rare and most probably originates from a unique ancestor in the mapping population. Analyses of different subsets of families, created according to the results of haplotype analysis and availability of SNP and microsatellite data, refined the previously detected QTL affecting MY1 and PY1 to a region ranging from 117.962 Mb to 119.018 Mb (1.056 Mb) on BTA5. However, the possibility of a second QTL affecting only PY1 at 122.115 Mb was not ruled out.

Conclusion

This study demonstrates that targeting families segregating for a less frequent QTL allele is a useful method. It improves the mapping resolution of the QTL, which is due to the division of the mapping population based on the results of the haplotype analysis and to the increased frequency of the minor allele in the families. Consequently, we succeeded in refining the region containing the previously detected QTL to 1 Mb on BTA5. This candidate region contains 27 genes with unknown or partially known function(s) and is small enough for high-throughput sequencing, which will allow future detailed analyses of candidate genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号