期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering 总被引：17，自引：3，他引：14

下载免费PDF全文

Browning SR Browning BL 《American journal of human genetics》2007,81(5):1084-1097

Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available. 相似文献

2.

Selection of genetic markers for association analyses,using linkage disequilibrium and haplotypes

下载免费PDF全文

Meng Z Zaykin DV Xu CF Wagner M Ehm MG 《American journal of human genetics》2003,73(1):115-130

The genotyping of closely spaced single-nucleotide polymorphism (SNP) markers frequently yields highly correlated data, owing to extensive linkage disequilibrium (LD) between markers. The extent of LD varies widely across the genome and drives the number of frequent haplotypes observed in small regions. Several studies have illustrated the possibility that LD or haplotype data could be used to select a subset of SNPs that optimize the information retained in a genomic region while reducing the genotyping effort and simplifying the analysis. We propose a method based on the spectral decomposition of the matrices of pairwise LD between markers, and we select markers on the basis of their contributions to the total genetic variation. We also modify Clayton's "haplotype tagging SNP" selection method, which utilizes haplotype information. For both methods, we propose sliding window-based algorithms that allow the methods to be applied to large chromosomal regions. Our procedures require genotype information about a small number of individuals for an initial set of SNPs and selection of an optimum subset of SNPs that could be efficiently genotyped on larger numbers of samples while retaining most of the genetic variation in samples. We identify suitable parameter combinations for the procedures, and we show that a sample size of 50-100 individuals achieves consistent results in studies of simulated data sets in linkage equilibrium and LD. When applied to experimental data sets, both procedures were similarly effective at reducing the genotyping requirement while maintaining the genetic information content throughout the regions. We also show that haplotype-association results that Hosking et al. obtained near CYP2D6 were almost identical before and after marker selection. 相似文献

3.

Mapping loci influencing blood pressure in the Framingham pedigrees using model-free LOD score analysis of a quantitative trait

Knight J North BV Sham PC Curtis D 《BMC genetics》2003,4(Z1):S74

This paper presents a method of performing model-free LOD-score based linkage analysis on quantitative traits. It is implemented in the QMFLINK program. The method is used to perform a genome screen on the Framingham Heart Study data. A number of markers that show some support for linkage in our study coincide substantially with those implicated in other linkage studies of hypertension. Although the new method needs further testing on additional real and simulated data sets we can already say that it is straightforward to apply and may offer a useful complementary approach to previously available methods for the linkage analysis of quantitative traits. 相似文献

4.

Modeling genetic inheritance of copy number variations

下载免费PDF全文

Wang K Chen Z Tadesse MG Glessner J Grant SF Hakonarson H Bucan M Li M 《Nucleic acids research》2008,36(21):e138

Copy number variations (CNVs) are being used as genetic markers or functional candidates in gene-mapping studies. However, unlike single nucleotide polymorphism or microsatellite genotyping techniques, most CNV detection methods are limited to detecting total copy numbers, rather than copy number in each of the two homologous chromosomes. To address this issue, we developed a statistical framework for intensity-based CNV detection platforms using family data. Our algorithm identifies CNVs for a family simultaneously, thus avoiding the generation of calls with Mendelian inconsistency while maintaining the ability to detect de novo CNVs. Applications to simulated data and real data indicate that our method significantly improves both call rates and accuracy of boundary inference, compared to existing approaches. We further illustrate the use of Mendelian inheritance to infer SNP allele compositions in each of the two homologous chromosomes in CNV regions using real data. Finally, we applied our method to a set of families genotyped using both the Illumina HumanHap550 and Affymetrix genome-wide 5.0 arrays to demonstrate its performance on both inherited and de novo CNVs. In conclusion, our method produces accurate CNV calls, gives probabilistic estimates of CNV transmission and builds a solid foundation for the development of linkage and association tests utilizing CNVs. 相似文献

5.

Estimation by simulation of the efficiency of the French marker-assisted selection program in dairy cattle (Open Access publication)

Fran?ois Guillaume Sébastien Fritz Didier Boichard Tom Druet 《遗传、选种与进化》2008,40(1):91-102

The efficiency of the French marker-assisted selection (MAS) was estimated by a simulation study. The data files of two different time periods were used: April 2004 and 2006. The simulation method used the structure of the existing French MAS: same pedigree, same marker genotypes and same animals with records. The program simulated breeding values and new records based on this existing structure and knowledge on the QTL used in MAS (variance and frequency). Reliabilities of genetic values of young animals (less than one year old) obtained with and without marker information were compared to assess the efficiency of MAS for evaluation of milk, fat and protein yields and fat and protein contents. Mean gains of reliability ranged from 0.015 to 0.094 and from 0.038 to 0.114 in 2004 and 2006, respectively. The larger number of animals genotyped and the use of a new set of genetic markers can explain the improvement of MAS reliability from 2004 to 2006. This improvement was also observed by analysis of information content for young candidates. The gain of MAS reliability with respect to classical selection was larger for sons of sires with genotyped progeny daughters with records. Finally, it was shown that when superiority of MAS over classical selection was estimated with daughter yield deviations obtained after progeny test instead of true breeding values, the gain was underestimated. 相似文献

6.

Accuracy of direct genomic breeding values for nationally evaluated traits in US Limousin and Simmental beef cattle

Mahdi Saatchi Robert D Schnabel Megan M Rolf Jeremy F Taylor Dorian J Garrick 《遗传、选种与进化》2012,44(1):38

Background

In national evaluations, direct genomic breeding values can be considered as correlated traits to those for which phenotypes are available for traditional estimation of breeding values. For this purpose, estimates of the accuracy of direct genomic breeding values expressed as genetic correlations between traits and their respective direct genomic breeding values are required.

Methods

We derived direct genomic breeding values for 2239 registered Limousin and 2703 registered Simmental beef cattle genotyped with either the Illumina BovineSNP50 BeadChip or the Illumina BovineHD BeadChip. For the 264 Simmental animals that were genotyped with the BovineHD BeadChip, genotypes for markers present on the BovineSNP50 BeadChip were extracted. Deregressed estimated breeding values were used as observations in weighted analyses that estimated marker effects to derive direct genomic breeding values for each breed. For each breed, genotyped individuals were clustered into five groups using K-means clustering, with the aim of increasing within-group and decreasing between-group pedigree relationships. Cross-validation was performed five times for each breed, using four groups for training and the fifth group for validation. For each trait, we then applied a weighted bivariate analysis of the direct genomic breeding values of genotyped animals from all five validation sets and their corresponding deregressed estimated breeding values to estimate variance and covariance components.

Results

After minimizing relationships between training and validation groups, estimated genetic correlations between each trait and its direct genomic breeding values ranged from 0.39 to 0.76 in Limousin and from 0.29 to 0.65 in Simmental. The efficiency of selection based on direct genomic breeding values relative to selection based on parent average information ranged from 0.68 to 1.28 in genotyped Limousin and from 0.51 to 1.44 in genotyped Simmental animals. The efficiencies were higher for 323 non-genotyped young Simmental animals, born after January 2012, and ranged from 0.60 to 2.04.

Conclusions

Direct genomic breeding values show promise for routine use by Limousin and Simmental breeders to improve the accuracy of predicted genetic merit of their animals at a young age and increase response to selection. Benefits from selecting on direct genomic breeding values are greater for breeders who use natural mating sires in their herds than for those who use artificial insemination sires. Producers with unregistered commercial Limousin and Simmental cattle could also benefit from being able to identify genetically superior animals in their herds, an opportunity that has in the past been limited to seed stock animals. 相似文献

7.

Reconstructing sibling relationships in wild populations

Berger-Wolf TY Sheikh SI DasGupta B Ashley MV Caballero IC Chaovalitwongse W Putrevu SL 《Bioinformatics (Oxford, England)》2007,23(13):i49-i56

Reconstruction of sibling relationships from genetic data is an important component of many biological applications. In particular, the growing application of molecular markers (microsatellites) to study wild populations of plant and animals has created the need for new computational methods of establishing pedigree relationships, such as sibgroups, among individuals in these populations. Most current methods for sibship reconstruction from microsatellite data use statistical and heuristic techniques that rely on a priori knowledge about various parameter distributions. Moreover, these methods are designed for data with large number of sampled loci and small family groups, both of which typically do not hold for wild populations. We present a deterministic technique that parsimoniously reconstructs sibling groups using only Mendelian laws of inheritance. We validate our approach using both simulated and real biological data and compare it to other methods. Our method is highly accurate on real data and compares favorably with other methods on simulated data with few loci and large family groups. It is the only method that does not rely on a priori knowledge about the population under study. Thus, our method is particularly appropriate for reconstructing sibling groups in wild populations. 相似文献

8.

Multipoint mapping of viability and segregation distorting loci using molecular markers 总被引：6，自引：0，他引：6

Vogl C Xu S 《Genetics》2000,155(3):1439-1447

In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees. 相似文献

9.

Imputation of missing genotypes from sparse to high density using long-range phasing

Daetwyler HD Wiggans GR Hayes BJ Woolliams JA Goddard ME 《Genetics》2011,189(1):317-327

Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible. 相似文献

10.

The effect of missing marker genotypes on the accuracy of gene-assisted breeding value estimation: a comparison of methods

Mulder HA Meuwissen TH Calus MP Veerkamp RF 《Animal : an international journal of animal bioscience》2010,4(1):9-19

In livestock populations, missing genotypes on a large proportion of the animals is a major problem when implementing gene-assisted breeding value estimation for genes with known effect. The objective of this study was to compare different methods to deal with missing genotypes on accuracy of gene-assisted breeding value estimation for identified bi-allelic genes using Monte Carlo simulation. A nested full-sib half-sib structure was simulated with a mixed inheritance model with one bi-allelic quantitative trait loci (QTL) and a polygenic effect due to infinite number of polygenes. The effect of the QTL was included in gene-assisted BLUP either by random regression on predicted gene content, i.e. the number of positive alleles, or including haplotype effects in the model with an inverse IBD matrix to account for identity-by-descent relationships between haplotypes using linkage analysis information (IBD-LA). The inverse IBD matrix was constructed using segregation indicator probabilities obtained from multiple marker iterative peeling. Gene contents for unknown genotypes were predicted using either multiple marker iterative peeling or mixed model methodology. For both methods, gene-assisted breeding value estimation increased accuracies of total estimated breeding value (EBV) with 0% to 22% for genotyped animals in comparison to conventional breeding value estimation. For animals that were not genotyped, the increase in accuracy was much lower (0% to 5%), but still substantial when the heritability was 0.1 and when the QTL explained at least 15% of the genetic variance. Regression on predicted gene content yielded higher accuracies than IBD-LA. Allele substitution effects were, however, overestimated, especially when only sires and males in the last generation were genotyped. For juveniles without phenotypic records and traits measured only on females, the superiority of regression on gene content over IBD-LA was larger than when all animals had phenotypes. Missing gene contents were predicted with higher accuracy using multiple-marker iterative peeling than with using mixed model methodology, but the difference in accuracy of total EBV was negligible and mixed model methodology was computationally much faster than multiple iterative peeling. For large livestock populations it can be concluded that gene-assisted breeding value estimation can be practically best performed by regression on gene contents, using mixed model methodology to predict missing marker genotypes, combining phenotypic information of genotyped and ungenotyped animals in one evaluation. This technique would be, in principle, also feasible for genomic selection. It is expected that genomic selection for ungenotyped animals using predicted single nucleotide polymorphism gene contents might be beneficial especially for low heritable traits. 相似文献

11.

A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle

Gengler N Mayeres P Szydlowski M 《Animal : an international journal of animal bioscience》2007,1(1):21-28

Gene content is the number of copies of a particular allele in a genotype of an animal. Gene content can be used to study additive gene action of candidate gene. Usually genotype data are available only for a part of population and for the rest gene contents have to be calculated based on typed relatives. Methods to calculate expected gene content for animals on large complex pedigrees are relatively complex. In this paper we proposed a practical method to calculate gene content using a linear regression. The method does not estimate genotype probabilities but these can be approximated from gene content assuming Hardy-Weinberg proportions. The approach was compared with other methods on multiple simulated data sets for real bovine pedigrees of 1 082 and 907 903 animals. Different allelic frequencies (0.4 and 0.2) and proportions of the missing genotypes (90, 70, and 50%) were considered in simulation. The simulation showed that the proposed method has similar capability to predict gene content as the iterative peeling method, however it requires less time and can be more practical for large pedigrees. The method was also applied to real data on the bovine myostatin locus on a large dual-purpose Belgian Blue pedigree of 235 133 animals. It was demonstrated that the proposed method can be easily adapted for particular pedigrees. 相似文献

12.

Prediction of haplotypes for ungenotyped animals and its effect on marker-assisted breeding value estimation

Han A Mulder Mario PL Calus Roel F Veerkamp 《遗传、选种与进化》2010,42(1):10

Background

In livestock populations, missing genotypes on a large proportion of animals are a major problem to implement the estimation of marker-assisted breeding values using haplotypes. The objective of this article is to develop a method to predict haplotypes of animals that are not genotyped using mixed model equations and to investigate the effect of using these predicted haplotypes on the accuracy of marker-assisted breeding value estimation.

Methods

For genotyped animals, haplotypes were determined and for each animal the number of haplotype copies (nhc) was counted, i.e. 0, 1 or 2 copies. In a mixed model framework, nhc for each haplotype were predicted for ungenotyped animals as well as for genotyped animals using the additive genetic relationship matrix. The heritability of nhc was assumed to be 0.99, allowing for minor genotyping and haplotyping errors. The predicted nhc were subsequently used in marker-assisted breeding value estimation by applying random regression on these covariables. To evaluate the method, a population was simulated with one additive QTL and an additive polygenic genetic effect. The QTL was located in the middle of a haplotype based on SNP-markers.

Results

The accuracy of predicted haplotype copies for ungenotyped animals ranged between 0.59 and 0.64 depending on haplotype length. Because powerful BLUP-software was used, the method was computationally very efficient. The accuracy of total EBV increased for genotyped animals when marker-assisted breeding value estimation was compared with conventional breeding value estimation, but for ungenotyped animals the increase was marginal unless the heritability was smaller than 0.1. Haplotypes based on four markers yielded the highest accuracies and when only the nearest left marker was used, it yielded the lowest accuracy. The accuracy increased with increasing marker density. Accuracy of the total EBV approached that of gene-assisted BLUP when 4-marker haplotypes were used with a distance of 0.1 cM between the markers.

Conclusions

The proposed method is computationally very efficient and suitable for marker-assisted breeding value estimation in large livestock populations including effects of a number of known QTL. Marker-assisted breeding value estimation using predicted haplotypes increases accuracy especially for traits with low heritability. 相似文献

13.

Integrated analysis of gene expression and copy number data on gene shaving using independent component analysis

Sheng J Deng HW Calhoun VD Wang YP 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(6):1568-1579

DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer. 相似文献

14.

Genomic evaluations with many more genotypes

Paul M VanRaden Jeffrey R O'Connell George R Wiggans Kent A Weigel 《遗传、选种与进化》2011,43(1):10

Background

Genomic evaluations in Holstein dairy cattle have quickly become more reliable over the last two years in many countries as more animals have been genotyped for 50,000 markers. Evaluations can also include animals genotyped with more or fewer markers using new tools such as the 777,000 or 2,900 marker chips recently introduced for cattle. Gains from more markers can be predicted using simulation, whereas strategies to use fewer markers have been compared using subsets of actual genotypes. The overall cost of selection is reduced by genotyping most animals at less than the highest density and imputing their missing genotypes using haplotypes. Algorithms to combine different densities need to be efficient because numbers of genotyped animals and markers may continue to grow quickly.

Methods

Genotypes for 500,000 markers were simulated for the 33,414 Holsteins that had 50,000 marker genotypes in the North American database. Another 86,465 non-genotyped ancestors were included in the pedigree file, and linkage disequilibrium was generated directly in the base population. Mixed density datasets were created by keeping 50,000 (every tenth) of the markers for most animals. Missing genotypes were imputed using a combination of population haplotyping and pedigree haplotyping. Reliabilities of genomic evaluations using linear and nonlinear methods were compared.

Results

Differing marker sets for a large population were combined with just a few hours of computation. About 95% of paternal alleles were determined correctly, and > 95% of missing genotypes were called correctly. Reliability of breeding values was already high (84.4%) with 50,000 simulated markers. The gain in reliability from increasing the number of markers to 500,000 was only 1.6%, but more than half of that gain resulted from genotyping just 1,406 young bulls at higher density. Linear genomic evaluations had reliabilities 1.5% lower than the nonlinear evaluations with 50,000 markers and 1.6% lower with 500,000 markers.

Conclusions

Methods to impute genotypes and compute genomic evaluations were affordable with many more markers. Reliabilities for individual animals can be modified to reflect success of imputation. Breeders can improve reliability at lower cost by combining marker densities to increase both the numbers of markers and animals included in genomic evaluation. Larger gains are expected from increasing the number of animals than the number of markers. 相似文献

15.

Modeling of identity-by-descent processes along a chromosome between haplotypes and their genotyped ancestors

Druet T Farnir FP 《Genetics》2011,188(2):409-419

Identity-by-descent probabilities are important for many applications in genetics. Here we propose a method for modeling the transmission of the haplotypes from the closest genotyped relatives along an entire chromosome. The method relies on a hidden Markov model where hidden states correspond to the set of all possible origins of a haplotype within a given pedigree. Initial state probabilities are estimated from average genetic contribution of each origin to the modeled haplotype while transition probabilities are computed from recombination probabilities and pedigree relationships between the modeled haplotype and the various possible origins. The method was tested on three simulated scenarios based on real data sets from dairy cattle, Arabidopsis thaliana, and maize. The mean identity-by-descent probabilities estimated for the truly inherited parental chromosome ranged from 0.94 to 0.98 according to the design and the marker density. The lowest values were observed in regions close to crossing over or where the method was not able to discriminate between several origins due to their similarity. It is shown that the estimated probabilities were correctly calibrated. For marker imputation (or QTL allele prediction for fine mapping or genomic selection), the method was efficient, with 3.75% allelic imputation error rates on a dairy cattle data set with a low marker density map (1 SNP/Mb). The method should prove useful for situations we are facing now in experimental designs and in plant and animal breeding, where founders are genotyped with relatively high markers densities and last generation(s) genotyped with a lower-density panel. 相似文献

16.

Invited review: efficient computation strategies in genomic selection

《Animal : an international journal of animal bioscience》2017,11(5):731-736

The purpose of this study is review and evaluation of computing methods used in genomic selection for animal breeding. Commonly used models include SNP BLUP with extensions (BayesA, etc), genomic BLUP (GBLUP) and single-step GBLUP (ssGBLUP). These models are applied for genomewide association studies (GWAS), genomic prediction and parameter estimation. Solving methods include finite Cholesky decomposition possibly with a sparse implementation, and iterative Gauss–Seidel (GS) or preconditioned conjugate gradient (PCG), the last two methods possibly with iteration on data. Details are provided that can drastically decrease some computations. For SNP BLUP especially with sampling and large number of SNP, the only choice is GS with iteration on data and adjustment of residuals. If only solutions are required, PCG by iteration on data is a clear choice. A genomic relationship matrix (GRM) has limited dimensionality due to small effective population size, resulting in infinite number of generalized inverses of GRM for large genotyped populations. A specific inverse called APY requires only a small fraction of GRM, is sparse and can be computed and stored at a low cost for millions of animals. With APY inverse and PCG iteration, GBLUP and ssGBLUP can be applied to any population. Both tools can be applied to GWAS. When the system of equations is sparse but contains dense blocks, a recently developed package for sparse Cholesky decomposition and sparse inversion called YAMS has greatly improved performance over packages where such blocks were treated as sparse. With YAMS, GREML and possibly single-step GREML can be applied to populations with >50 000 genotyped animals. From a computational perspective, genomic selection is becoming a mature methodology. 相似文献

17.

Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture

Eduardo CG Pimentel Monika Wensch-Dorendorf Sven K?nig Hermann H Swalve 《遗传、选种与进化》2013,45(1):12

Background

The most common application of imputation is to infer genotypes of a high-density panel of markers on animals that are genotyped for a low-density panel. However, the increase in accuracy of genomic predictions resulting from an increase in the number of markers tends to reach a plateau beyond a certain density. Another application of imputation is to increase the size of the training set with un-genotyped animals. This strategy can be particularly successful when a set of closely related individuals are genotyped.

Methods

Imputation on completely un-genotyped dams was performed using known genotypes from the sire of each dam, one offspring and the offspring’s sire. Two methods were applied based on either allele or haplotype frequencies to infer genotypes at ambiguous loci. Results of these methods and of two available software packages were compared. Quality of imputation under different population structures was assessed. The impact of using imputed dams to enlarge training sets on the accuracy of genomic predictions was evaluated for different populations, heritabilities and sizes of training sets.

Results

Imputation accuracy ranged from 0.52 to 0.93 depending on the population structure and the method used. The method that used allele frequencies performed better than the method based on haplotype frequencies. Accuracy of imputation was higher for populations with higher levels of linkage disequilibrium and with larger proportions of markers with more extreme allele frequencies. Inclusion of imputed dams in the training set increased the accuracy of genomic predictions. Gains in accuracy ranged from close to zero to 37.14%, depending on the simulated scenario. Generally, the larger the accuracy already obtained with the genotyped training set, the lower the increase in accuracy achieved by adding imputed dams.

Conclusions

Whenever a reference population resembling the family configuration considered here is available, imputation can be used to achieve an extra increase in accuracy of genomic predictions by enlarging the training set with completely un-genotyped dams. This strategy was shown to be particularly useful for populations with lower levels of linkage disequilibrium, for genomic selection on traits with low heritability, and for species or breeds for which the size of the reference population is limited. 相似文献

18.

Quality Control of Genotypes Using Heritability Estimates of Gene Content at the Marker

Natalia S. Forneris Andres Legarra Zulma G. Vitezica Shogo Tsuruta Ignacio Aguilar Ignacy Misztal Rodolfo J. C. Cantet 《Genetics》2015,199(3):675-681

Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided. 相似文献

19.

Genomic prediction ability for beef fatty acid profile in Nelore cattle using different pseudo-phenotypes

Hermenegildo Lucas Justino Chiaia Elisa Peripolli Rafael Medeiros de Oliveira Silva Fabiele Loise Braga Feitosa Marcos Vinícius Antunes de Lemos Mariana Piatto Berton Bianca Ferreira Olivieri Rafael Espigolan Rafael Lara Tonussi Daniel Gustavo Mansan Gordo Lucia Galvão de Albuquerque Henrique Nunes de Oliveira Adrielle Mathias Ferrinho Lenise Freitas Mueller Sabrina Kluska Humberto Tonhati Angélica Simone Cravo Pereira Ignacio Aguilar Fernando Baldi 《Journal of applied genetics》2018,59(4):493-501

The aim of the present study was to compare the predictive ability of SNP-BLUP model using different pseudo-phenotypes such as phenotype adjusted for fixed effects, estimated breeding value, and genomic estimated breeding value, using simulated and real data for beef FA profile of Nelore cattle finished in feedlot. A pedigree with phenotypes and genotypes of 10,000 animals were simulated, considering 50% of multiple sires in the pedigree. Regarding to phenotypes, two traits were simulated, one with high heritability (0.58), another with low heritability (0.13). Ten replicates were performed for each trait and results were averaged among replicates. A historical population was created from generation zero to 2020, with a constant size of 2000 animals (from generation zero to 1000) to produce different levels of linkage disequilibrium (LD). Therefore, there was a gradual reduction in the number of animals (from 2000 to 600), producing a “bottleneck effect” and consequently, genetic drift and LD starting in the generation 1001 to 2020. A total of 335,000 markers (with MAF greater or equal to 0.02) and 1000 QTL were randomly selected from the last generation of the historical population to generate genotypic data for the test population. The phenotypes were computed as the sum of the QTL effects and an error term sampled from a normal distribution with zero mean and variance equal to 0.88. For simulated data, 4000 animals of the generations 7, 8, and 9 (with genotype and phenotype) were used as training population, and 1000 animals of the last generation (10) were used as validation population. A total of 937 Nelore bulls with phenotype for fatty acid profiles (Sum of saturated, monounsaturated, omega 3, omega 6, ratio of polyunsaturated and saturated and polyunsaturated fatty acid profile) were genotyped using the Illumina BovineHD BeadChip (Illumina, San Diego, CA) with 777,962 SNP. To compare the accuracy and bias of direct genomic value (DGV) for different pseudo-phenotypes, the correlation between true breeding value (TBV) or DGV with pseudo-phenotypes and linear regression coefficient of the pseudo-phenotypes on TBV for simulated data or DGV for real data, respectively. For simulated data, the correlations between DGV and TBV for high heritability traits were higher than obtained with low heritability traits. For simulated and real data, the prediction ability was higher for GEBV than for Yc and EBV. For simulated data, the regression coefficient estimates (b_(Yc,DGV)), were on average lower than 1 for high and low heritability traits, being inflated. The results were more biased for Yc and EBV than for GEBV. For real data, the GEBV displayed less biased results compared to Yc and EBV for SFA, MUFA, n-3, n-6, and PUFA/SFA. Despite the less biased results for PUFA using the EBV as pseudo-phenotype, the b_(Yi,DGV estimates obtained for the different pseudo-phenotypes (Yc, EBV and GEBV) were very close. Genomic information can assist in improving beef fatty acid profile in Zebu cattle, since the use of genomic information yielded genomic values for fatty acid profile with accuracies ranging from low to moderate. Considering both simulated and real data, the ssGBLUP model is an appropriate alternative to obtain more reliable and less biased GEBVs as pseudo-phenotype in situations of missing pedigree, due to high proportion of multiple sires, being more adequate than EBV and Yc to predict direct genomic value for beef fatty acid profile. 相似文献

20.

SNP markers trace familial linkages in a cloned population of Pinus taeda—prospects for genomic selection

Jaime Zapata-Valenzuela Fikret Isik Christian Maltecca Jill Wegrzyn David Neale Steve McKeand Ross Whetten 《Tree Genetics & Genomes》2012,8(6):1307-1318

Advances in DNA sequencing technology have made possible the genotyping of thousands of single-nucleotide polymorphism (SNP) markers, and new methods of statistical analysis are emerging to apply these advances in plant breeding programs. We report the utility of markers for prediction of breeding values in a forest tree species using empirical genotype data (3,406 polymorphic SNP loci). A total of 526 Pinus taeda L. clones tested widely in field trials were phenotyped at age 5?years. Only 149 clones from 13 full-sib crosses were genotyped. Markers were fit simultaneously to predict marker additive and dominance effects. Subsets of the 149 genotyped clones were used to train a model using all markers. Cross-validation strategies were followed for the remaining subset of genotyped individuals. The accuracy of genomic estimated breeding values ranged from 0.61 to 0.83 for wood lignin and cellulose content, and from 0.30 to 0.68 for height and volume traits. The accuracies of predictions based on markers were comparable with the accuracies based on pedigree. Because of the small number of SNP markers used and the relatively small population size, we suggest that observed accuracies in this study trace familial linkage rather than historical linkage disequilibrium with trait loci. Prediction accuracies of models that use only a subset of markers were generally comparable with the accuracies of the models using all markers, regardless of whether markers are associated with the phenotype. The results suggest that using SNP loci for selection instead of phenotype is efficient under different relative lengths of the breeding cycle, which would allow cost-effective applications in tree breeding programs. Prospects for applications of genomic selection to P. taeda breeding are discussed. 相似文献