首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data.

Methods

Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training.

Results

Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed.

Conclusions

Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0149-x) contains supplementary material, which is available to authorized users.  相似文献   

2.

Background

Last generations of Single Nucleotide Polymorphism (SNP) arrays allow to study copy-number variations in addition to genotyping measures.

Results

MPAgenomics, standing for multi-patient analysis (MPA) of genomic markers, is an R-package devoted to: (i) efficient segmentation and (ii) selection of genomic markers from multi-patient copy number and SNP data profiles. It provides wrappers from commonly used packages to streamline their repeated (sometimes difficult) manipulation, offering an easy-to-use pipeline for beginners in R.The segmentation of successive multiple profiles (finding losses and gains) is performed with an automatic choice of parameters involved in the wrapped packages. Considering multiple profiles in the same time, MPAgenomics wraps efficient penalized regression methods to select relevant markers associated with a given outcome.

Conclusions

MPAgenomics provides an easy tool to analyze data from SNP arrays in R. The R-package MPAgenomics is available on CRAN.  相似文献   

3.

Background

Many areas critical to agricultural production and research, such as the breeding and trait mapping in plants and livestock, require robust and scalable genotyping platforms. Genotyping-by-sequencing (GBS) is a one such method highly suited to non-human organisms. In the GBS protocol, genomic DNA is fractionated via restriction digest, then reduced representation is achieved through size selection. Since many restriction sites are conserved across a species, the sequenced portion of the genome is highly consistent within a population. This makes the GBS protocol highly suited for experiments that require surveying large numbers of markers within a population, such as those involving genetic mapping, breeding, and population genomics. We have modified the GBS technology in a number of ways. Custom, enzyme specific adaptors have been replaced with standard Illumina adaptors compatible with blunt-end restriction enzymes. Multiplexing is achieved through a dual barcoding system, and bead-based library preparation protocols allows for in-solution size selection and eliminates the need for columns and gels.

Results

A panel of eight restriction enzymes was selected for testing on B73 maize and Nipponbare rice genomic DNA. Quality of the data was demonstrated by identifying that the vast majority of reads from each enzyme aligned to restriction sites predicted in silico. The link between enzyme parameters and experimental outcome was demonstrated by showing that the sequenced portion of the genome was adaptable by selecting enzymes based on motif length, complexity, and methylation sensitivity. The utility of the new GBS protocol was demonstrated by correctly mapping several in a maize F2 population resulting from a B73 × Country Gentleman test cross.

Conclusions

This technology is readily adaptable to different genomes, highly amenable to multiplexing and compatible with over forty commercially available restriction enzymes. These advancements represent a major improvement in genotyping technology by providing a highly flexible and scalable GBS that is readily implemented for studies on genome-wide variation.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-979) contains supplementary material, which is available to authorized users.  相似文献   

4.

Background

Recent developments in SNP discovery and high throughput genotyping technology have made the use of high-density SNP markers to predict breeding values feasible. This involves estimation of the SNP effects in a training data set, and use of these estimates to evaluate the breeding values of other ''evaluation'' individuals. Simulation studies have shown that these predictions of breeding values can be accurate, when training and evaluation individuals are (closely) related. However, many general applications of genomic selection require the prediction of breeding values of ''unrelated'' individuals, i.e. individuals from the same population, but not particularly closely related to the training individuals.

Methods

Accuracy of selection was investigated by computer simulation of small populations. Using scaling arguments, the results were extended to different populations, training data sets and genome sizes, and different trait heritabilities.

Results

Prediction of breeding values of unrelated individuals required a substantially higher marker density and number of training records than when prediction individuals were offspring of training individuals. However, when the number of records was 2*Ne*L and the number of markers was 10*Ne*L, the breeding values of unrelated individuals could be predicted with accuracies of 0.88 – 0.93, where Ne is the effective population size and L the genome size in Morgan. Reducing this requirement to 1*Ne*L individuals, reduced prediction accuracies to 0.73–0.83.

Conclusion

For livestock populations, 1NeL requires about ~30,000 training records, but this may be reduced if training and evaluation animals are related. A prediction equation is presented, that predicts accuracy when training and evaluation individuals are related. For humans, 1NeL requires ~350,000 individuals, which means that human disease risk prediction is possible only for diseases that are determined by a limited number of genes. Otherwise, genotyping and phenotypic recording need to become very common in the future.  相似文献   

5.

Background

Genomic selection can increase genetic gain within aquaculture breeding programs, but the high costs related to high-density genotyping of a large number of individuals would make the breeding program expensive. In this study, a low-cost method using low-density genotyping of pre-selected candidates and their sibs was evaluated by stochastic simulation.

Methods

A breeding scheme with selection for two traits, one measured on candidates and one on sibs was simulated. Genomic breeding values were estimated within families and combined with conventional family breeding values for candidates that were pre-selected based on conventional BLUP breeding values. This strategy was compared with a conventional breeding scheme and a full genomic selection program for which genomic breeding values were estimated across the whole population. The effects of marker density, level of pre-selection and number of sibs tested and genotyped for the sib-trait were studied.

Results

Within-family genomic breeding values increased genetic gain by 15% and reduced rate of inbreeding by 15%. Genetic gain was robust to a reduction in marker density, with only moderate reductions, even for very low densities. Pre-selection of candidates down to approximately 10% of the candidates before genotyping also had minor effects on genetic gain, but depended somewhat on marker density. The number of test-individuals, i.e. individuals tested for the sib-trait, affected genetic gain, but the fraction of the test-individuals genotyped only affected the relative contribution of each trait to genetic gain.

Conclusions

A combination of genomic within-family breeding values, based on low-density genotyping, and conventional BLUP family breeding values was shown to be a possible low marker density implementation of genomic selection for species with large full-sib families for which the costs of genotyping must be kept low without compromising the effect of genomic selection on genetic gain.  相似文献   

6.

Background

Genotyping accounts for a substantial part of the cost of genomic selection (GS). Using both dense and sparse SNP chips, together with imputation of missing genotypes, can reduce these costs. The aim of this study was to identify the set of candidates that are most important for dense genotyping, when they are used to impute the genotypes of sparsely genotyped animals. In a real pig pedigree, the 2500 most recently born pigs of the last generation, i.e. the target animals, were used for sparse genotyping. Their missing genotypes were imputed using either Beagle or LDMIP from T densely genotyped candidates chosen from the whole pedigree. A new optimization method was derived to identify the best animals for dense genotyping, which minimized the conditional genetic variance of the target animals, using either the pedigree-based relationship matrix (MCA), or a genotypic relationship matrix based on sparse marker genotypes (MCG). These, and five other methods for selecting the T animals were compared, using T = 100 or 200 animals, SNP genotypes were obtained assuming Ne =100 or 200, and MAF thresholds set to D = 0.01, 0.05 or 0.10. The performances of the methods were compared using the following criteria: call rate of true genotypes, accuracy of genotype prediction, and accuracy of genomic evaluations using the imputed genotypes.

Results

For all criteria, MCA and MCG performed better than other selection methods, significantly so for all methods other than selection of sires with the largest numbers of offspring. Methods that choose animals that have the closest average relationship or contribution to the target population gave the lowest accuracy of imputation, in some cases worse than random selection, and should be avoided in practice.

Conclusion

Minimization of the conditional variance of the genotypes in target animals provided an effective optimization procedure for prioritizing animals for genotyping or sequencing.

Electronic supplementary material

The online version of this article (doi:10.1186/1297-9686-46-46) contains supplementary material, which is available to authorized users.  相似文献   

7.

Background

Commercial breeding programs seek to maximise the rate of genetic gain while minimizing the costs of attaining that gain. Genomic information offers great potential to increase rates of genetic gain but it is expensive to generate. Low-cost genotyping strategies combined with genotype imputation offer dramatically reduced costs. However, both the costs and accuracy of imputation of these strategies are highly sensitive to several factors. The objective of this paper was to explore the cost and imputation accuracy of several alternative genotyping strategies in pedigreed populations.

Methods

Pedigree and genotype data from a commercial pig population were used. Several alternative genotyping strategies were explored. The strategies differed in the density of genotypes used for the ancestors and the individuals to be imputed. Parents, grandparents, and other relatives that were not descendants, were genotyped at high-density, low-density, or extremely low-density, and associated costs and imputation accuracies were evaluated.

Results

Imputation accuracy and cost were influenced by the alternative genotyping strategies. Given the mating ratios and the numbers of offspring produced by males and females, an optimized low-cost genotyping strategy for a commercial pig population could involve genotyping male parents at high-density, female parents at low-density (e.g. 3000 SNP), and selection candidates at very low-density (384 SNP).

Conclusions

Among the selection candidates, 95.5 % and 93.5 % of the genotype variation contained in the high-density SNP panels were recovered using a genotyping strategy that costs respectively, $24.74 and $20.58 per candidate.  相似文献   

8.

Background

At the current price, the use of high-density single nucleotide polymorphisms (SNP) genotyping assays in genomic selection of dairy cattle is limited to applications involving elite sires and dams. The objective of this study was to evaluate the use of low-density assays to predict direct genomic value (DGV) on five milk production traits, an overall conformation trait, a survival index, and two profit index traits (APR, ASI).

Methods

Dense SNP genotypes were available for 42,576 SNP for 2,114 Holstein bulls and 510 cows. A subset of 1,847 bulls born between 1955 and 2004 was used as a training set to fit models with various sets of pre-selected SNP. A group of 297 bulls born between 2001 and 2004 and all cows born between 1992 and 2004 were used to evaluate the accuracy of DGV prediction. Ridge regression (RR) and partial least squares regression (PLSR) were used to derive prediction equations and to rank SNP based on the absolute value of the regression coefficients. Four alternative strategies were applied to select subset of SNP, namely: subsets of the highest ranked SNP for each individual trait, or a single subset of evenly spaced SNP, where SNP were selected based on their rank for ASI, APR or minor allele frequency within intervals of approximately equal length.

Results

RR and PLSR performed very similarly to predict DGV, with PLSR performing better for low-density assays and RR for higher-density SNP sets. When using all SNP, DGV predictions for production traits, which have a higher heritability, were more accurate (0.52-0.64) than for survival (0.19-0.20), which has a low heritability. The gain in accuracy using subsets that included the highest ranked SNP for each trait was marginal (5-6%) over a common set of evenly spaced SNP when at least 3,000 SNP were used. Subsets containing 3,000 SNP provided more than 90% of the accuracy that could be achieved with a high-density assay for cows, and 80% of the high-density assay for young bulls.

Conclusions

Accurate genomic evaluation of the broader bull and cow population can be achieved with a single genotyping assays containing ~ 3,000 to 5,000 evenly spaced SNP.  相似文献   

9.

Background

Simulation studies have shown that accuracy and genetic gain are increased in genomic selection schemes compared to traditional aquaculture sib-based schemes. In genomic selection, accuracy of selection can be maximized by increasing the precision of the estimation of SNP effects and by maximizing the relationships between test sibs and candidate sibs. Another means of increasing the accuracy of the estimation of SNP effects is to create individuals in the test population with extreme genotypes. The latter approach was studied here with creation of double haploids and use of non-random mating designs.

Methods

Six alternative breeding schemes were simulated in which the design of the test population was varied: test sibs inherited maternal (Mat), paternal (Pat) or a mixture of maternal and paternal (MatPat) double haploid genomes or test sibs were obtained by maximum coancestry mating (MaxC), minimum coancestry mating (MinC), or random (RAND) mating. Three thousand test sibs and 3000 candidate sibs were genotyped. The test sibs were recorded for a trait that could not be measured on the candidates and were used to estimate SNP effects. Selection was done by truncation on genome-wide estimated breeding values and 100 individuals were selected as parents each generation, equally divided between both sexes.

Results

Results showed a 7 to 19% increase in selection accuracy and a 6 to 22% increase in genetic gain in the MatPat scheme compared to the RAND scheme. These increases were greater with lower heritabilities. Among all other scenarios, i.e. Mat, Pat, MaxC, and MinC, no substantial differences in selection accuracy and genetic gain were observed.

Conclusions

In conclusion, a test population designed with a mixture of paternal and maternal double haploids, i.e. the MatPat scheme, increases substantially the accuracy of selection and genetic gain. This will be particularly interesting for traits that cannot be recorded on the selection candidates and require the use of sib tests, such as disease resistance and meat quality.  相似文献   

10.

Background

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.

Results

Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.

Conclusions

Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.  相似文献   

11.

Background

The prediction of the outcomes from multistage breeding schemes is especially important for the introduction of genomic selection in dairy cattle. Decorrelated selection indices can be used for the optimisation of such breeding schemes. However, they decrease the accuracy of estimated breeding values and, therefore, the genetic gain to an unforeseeable extent and have not been applied to breeding schemes with different generation intervals and selection intensities in each selection path.

Methods

A grid search was applied in order to identify optimum breeding plans to maximise the genetic gain per year in a multistage, multipath dairy cattle breeding program. In this program, different values of the accuracy of estimated genomic breeding values and of their costs per individual were applied, whereby the total breeding costs were restricted. Both decorrelated indices and optimum selection indices were used together with fast multidimensional integration algorithms to produce results.

Results

In comparison to optimum indices, the genetic gain with decorrelated indices was up to 40% less and the proportion of individuals undergoing genomic selection was different. Additionally, the interaction between selection paths was counter-intuitive and difficult to interpret. Independent of using decorrelated or optimum selection indices, genomic selection replaced traditional progeny testing when maximising the genetic gain per year, as long as the accuracy of estimated genomic breeding values was ≥ 0.45. Overall breeding costs were mainly generated in the path "dam-sire". Selecting males was still the main source of genetic gain per year.

Conclusion

Decorrelated selection indices should not be used because of misleading results and the availability of accurate and fast algorithms for exact multidimensional integration. Genomic selection is the method of choice when maximising the genetic gain per year but genotyping females may not allow for a reduction in overall breeding costs. Furthermore, the economic justification of genotyping females remains questionable.  相似文献   

12.

Background

A large single nucleotide polymorphism (SNP) dataset was used to analyze genome-wide diversity in a diverse collection of watermelon cultivars representing globally cultivated, watermelon genetic diversity. The marker density required for conducting successful association mapping depends on the extent of linkage disequilibrium (LD) within a population. Use of genotyping by sequencing reveals large numbers of SNPs that in turn generate opportunities in genome-wide association mapping and marker-assisted selection, even in crops such as watermelon for which few genomic resources are available. In this paper, we used genome-wide genetic diversity to study LD, selective sweeps, and pairwise FST distributions among worldwide cultivated watermelons to track signals of domestication.

Results

We examined 183 Citrullus lanatus var. lanatus accessions representing domesticated watermelon and generated a set of 11,485 SNP markers using genotyping by sequencing. With a diverse panel of worldwide cultivated watermelons, we identified a set of 5,254 SNPs with a minor allele frequency of ≥ 0.05, distributed across the genome. All ancestries were traced to Africa and an admixture of various ancestries constituted secondary gene pools across various continents. A sliding window analysis using pairwise FST values was used to resolve selective sweeps. We identified strong selection on chromosomes 3 and 9 that might have contributed to the domestication process. Pairwise analysis of adjacent SNPs within a chromosome as well as within a haplotype allowed us to estimate genome-wide LD decay. LD was also detected within individual genes on various chromosomes. Principal component and ancestry analyses were used to account for population structure in a genome-wide association study. We further mapped important genes for soluble solid content using a mixed linear model.

Conclusions

Information concerning the SNP resources, population structure, and LD developed in this study will help in identifying agronomically important candidate genes from the genomic regions underlying selection and for mapping quantitative trait loci using a genome-wide association study in sweet watermelon.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-767) contains supplementary material, which is available to authorized users.  相似文献   

13.

Background

In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of these objectives.

Results

We have developed spiked genotyping-by-sequencing (sGBS), which combines targeted amplicon sequencing with reduced representation genotyping-by-sequencing. To minimize the cost of targeted assays, we utilize a small percent of sequencing capacity available in runs of GBS libraries to “spike” amplified targets of a priori alleles tagged with a different set of unique barcodes. This open platform allows multiple, single-target loci to be assayed while simultaneously generating a whole-genome profile. This dual-genotyping approach allows different sets of samples to be evaluated for single markers or whole genome-profiling. Here, we report the application of sGBS on a winter wheat panel that was screened for converted KASP markers and newly-designed markers targeting known polymorphisms in the leaf rust resistance gene Lr34.

Conclusions

The flexibility and low-cost of sGBS will enable a range of applications across genetics research. Specifically in breeding applications, the sGBS approach will allow breeders to obtain a whole-genome profile of important individuals while simultaneously targeting specific genes for a range of selection strategies across the breeding program.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1404-9) contains supplementary material, which is available to authorized users.  相似文献   

14.

Background

Using haplotype blocks as predictors rather than individual single nucleotide polymorphisms (SNPs) may improve genomic predictions, since haplotypes are in stronger linkage disequilibrium with the quantitative trait loci than are individual SNPs. It has also been hypothesized that an appropriate selection of a subset of haplotype blocks can result in similar or better predictive ability than when using the whole set of haplotype blocks. This study investigated genomic prediction using a set of haplotype blocks that contained the SNPs with large effects estimated from an individual SNP prediction model. We analyzed protein yield, fertility and mastitis of Nordic Holstein cattle, and used high-density markers (about 770k SNPs). To reach an optimum number of haplotype variables for genomic prediction, predictions were performed using subsets of haplotype blocks that contained a range of 1000 to 50 000 main SNPs.

Results

The use of haplotype blocks improved the prediction reliabilities, even when selection focused on only a group of haplotype blocks. In this case, the use of haplotype blocks that contained the 20 000 to 50 000 SNPs with the highest effect was sufficient to outperform the model that used all individual SNPs as predictors (up to 1.3 % improvement in prediction reliability for mastitis, compared to individual SNP approach), and the achieved reliabilities were similar to those using all haplotype blocks available in the genome data (from 0.6 % lower to 0.8 % higher reliability).

Conclusions

Haplotype blocks used as predictors can improve the reliability of genomic prediction compared to the individual SNP model. Furthermore, the use of a subset of haplotype blocks that contains the main SNP effects from genomic data could be a feasible approach to genomic prediction in dairy cattle, given an increase in density of genotype data available. The predictive ability of the models that use a subset of haplotype blocks was similar to that obtained using either all haplotype blocks or all individual SNPs, with the benefit of having a much lower computational demand.  相似文献   

15.

Background

A RIL population between Solanum lycopersicum cv. Moneymaker and S. pimpinellifolium G1.1554 was genotyped with a custom made SNP array. Additionally, a subset of the lines was genotyped by sequencing (GBS).

Results

A total of 1974 polymorphic SNPs were selected to develop a linkage map of 715 unique genetic loci. We generated plots for visualizing the recombination patterns of the population relating physical and genetic positions along the genome.This linkage map was used to identify two QTLs for TYLCV resistance which contained favourable alleles derived from S. pimpinellifolium. Further GBS was used to saturate regions of interest, and the mapping resolution of the two QTLs was improved. The analysis showed highest significance on Chromosome 11 close to the region of 51.3 Mb (qTy-p11) and another on Chromosome 3 near 46.5 Mb (qTy-p3). Furthermore, we explored the population using untargeted metabolic profiling, and the most significant differences between susceptible and resistant plants were mainly associated with sucrose and flavonoid glycosides.

Conclusions

The SNP information obtained from an array allowed a first QTL screening of our RIL population. With additional SNP data of a RILs subset, obtained through GBS, we were able to perform an in silico mapping improvement to further confirm regions associated with our trait of interest. With the combination of different ~ omics platforms we provide valuable insight into the genetics of S. pimpinellifolium-derived TYLCV resistance.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-1152) contains supplementary material, which is available to authorized users.  相似文献   

16.

Background

Genomic prediction faces two main statistical problems: multicollinearity and n ≪ p (many fewer observations than predictor variables). Principal component (PC) analysis is a multivariate statistical method that is often used to address these problems. The objective of this study was to compare the performance of PC regression (PCR) for genomic prediction with that of a commonly used REML model with a genomic relationship matrix (GREML) and to investigate the full potential of PCR for genomic prediction.

Methods

The PCR model used either a common or a semi-supervised approach, where PC were selected based either on their eigenvalues (i.e. proportion of variance explained by SNP (single nucleotide polymorphism) genotypes) or on their association with phenotypic variance in the reference population (i.e. the regression sum of squares contribution). Cross-validation within the reference population was used to select the optimum PCR model that minimizes mean squared error. Pre-corrected average daily milk, fat and protein yields of 1609 first lactation Holstein heifers, from Ireland, UK, the Netherlands and Sweden, which were genotyped with 50 k SNPs, were analysed. Each testing subset included animals from only one country, or from only one selection line for the UK.

Results

In general, accuracies of GREML and PCR were similar but GREML slightly outperformed PCR. Inclusion of genotyping information of validation animals into model training (semi-supervised PCR), did not result in more accurate genomic predictions. The highest achievable PCR accuracies were obtained across a wide range of numbers of PC fitted in the regression (from one to more than 1000), across test populations and traits. Using cross-validation within the reference population to derive the number of PC, yielded substantially lower accuracies than the highest achievable accuracies obtained across all possible numbers of PC.

Conclusions

On average, PCR performed only slightly less well than GREML. When the optimal number of PC was determined based on realized accuracy in the testing population, PCR showed a higher potential in terms of achievable accuracy that was not capitalized when PC selection was based on cross-validation. A standard approach for selecting the optimal set of PC in PCR remains a challenge.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-014-0060-x) contains supplementary material, which is available to authorized users.  相似文献   

17.

Background

Polyploidy is a major component of eukaryote evolution. Estimation of allele copy numbers for molecular markers has long been considered a challenge for polyploid species, while this process is essential for most genetic research. With the increasing availability and whole-genome coverage of single nucleotide polymorphism (SNP) markers, it is essential to implement a versatile SNP genotyping method to assign allelic configuration efficiently in polyploids.

Scope

This work evaluates the usefulness of the KASPar method, based on competitive allele-specific PCR, for the assignment of SNP allelic configuration. Citrus was chosen as a model because of its economic importance, the ongoing worldwide polyploidy manipulation projects for cultivar and rootstock breeding, and the increasing availability of SNP markers.

Conclusions

Fifteen SNP markers were successfully designed that produced clear allele signals that were in agreement with previous genotyping results at the diploid level. The analysis of DNA mixes between two haploid lines (Clementine and pummelo) at 13 different ratios revealed a very high correlation (average = 0·9796; s.d. = 0·0094) between the allele ratio and two parameters [θ angle = tan−1 (y/x) and y′ = y/(x + y)] derived from the two normalized allele signals (x and y) provided by KASPar. Separated cluster analysis and analysis of variance (ANOVA) from mixed DNA simulating triploid and tetraploid hybrids provided 99·71 % correct allelic configuration. Moreover, triploid populations arising from 2n gametes and interploid crosses were easily genotyped and provided useful genetic information. This work demonstrates that the KASPar SNP genotyping technique is an efficient way to assign heterozygous allelic configurations within polyploid populations. This method is accurate, simple and cost-effective. Moreover, it may be useful for quantitative studies, such as relative allele-specific expression analysis and bulk segregant analysis.  相似文献   

18.

Background

A newly recognized type of genetic variation, Copy Number Variation (CNV), is detected in mammalian genomes, e.g. the cattle genome. This form of variation can potentially cause phenotypic variation. Our objective was to determine whether dense SNP (single nucleotide polymorphisms) panels can capture the genetic variation due to a simple bi-allelic CNV, with the prospect of including the effect of such structural variations into genomic predictions.

Methods

A deletion type CNV on bovine chromosome 6 was predicted from its neighboring SNP with a multiple regression model. Our dataset consisted of CNV genotypes of 1,682 cows, along with 100 surrounding SNP genotypes. A prediction model was fitted considering 10 to 100 surrounding SNP and the accuracy obtained directly from the model was confirmed by cross-validation.

Results and conclusions

The accuracy of prediction increased with an increasing number of SNP in the model and the predicted accuracies were similar to those obtained by cross-validation. A substantial increase in accuracy was observed when the number of SNP increased from 10 to 50 but thereafter the increase was smaller, reaching the highest accuracy (0.94) with 100 surrounding SNP. Thus, we conclude that the genotype of a deletion type CNV and its putative QTL effect can be predicted with a maximum accuracy of 0.94 from surrounding SNP. This high prediction accuracy suggests that genetic variation due to simple deletion CNV is well captured by dense SNP panels. Since genomic selection relies on the availability of a dense marker panel with markers in close linkage disequilibrium to the QTL in order to predict their genetic values, we also discuss opportunities for genomic selection to predict the effects of CNV by dense SNP panels, when CNV cause variation in quantitative traits.  相似文献   

19.

Background

The selection of variable sites for inclusion in genomic analyses can influence results, especially when exemplar populations are used to determine polymorphic sites. We tested the impact of ascertainment bias on the inference of population genetic parameters using empirical and simulated data representing the three major continental groups of cattle: European, African, and Indian. We simulated data under three demographic models. Each simulated data set was subjected to three ascertainment schemes: (I) random selection; (II) geographically biased selection; and (III) selection biased toward loci polymorphic in multiple groups. Empirical data comprised samples of 25 individuals representing each continental group. These cattle were genotyped for 47,506 loci from the bovine 50 K SNP panel. We compared the inference of population histories for the empirical and simulated data sets across different ascertainment conditions using FST and principal components analysis (PCA).

Results

Bias toward shared polymorphism across continental groups is apparent in the empirical SNP data. Bias toward uneven levels of within-group polymorphism decreases estimates of FST between groups. Subpopulation-biased selection of SNPs changes the weighting of principal component axes and can affect inferences about proportions of admixture and population histories using PCA. PCA-based inferences of population relationships are largely congruent across types of ascertainment bias, even when ascertainment bias is strong.

Conclusions

Analyses of ascertainment bias in genomic data have largely been conducted on human data. As genomic analyses are being applied to non-model organisms, and across taxa with deeper divergences, care must be taken to consider the potential for bias in ascertainment of variation to affect inferences. Estimates of FST, time of separation, and population divergence as estimated by principal components analysis can be misleading if this bias is not taken into account.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1469-5) contains supplementary material, which is available to authorized users.  相似文献   

20.

Background

Genomic selection (GS) uses molecular breeding values (MBV) derived from dense markers across the entire genome for selection of young animals. The accuracy of MBV prediction is important for a successful application of GS. Recently, several methods have been proposed to estimate MBV. Initial simulation studies have shown that these methods can accurately predict MBV. In this study we compared the accuracies and possible bias of five different regression methods in an empirical application in dairy cattle.

Methods

Genotypes of 7,372 SNP and highly accurate EBV of 1,945 dairy bulls were used to predict MBV for protein percentage (PPT) and a profit index (Australian Selection Index, ASI). Marker effects were estimated by least squares regression (FR-LS), Bayesian regression (Bayes-R), random regression best linear unbiased prediction (RR-BLUP), partial least squares regression (PLSR) and nonparametric support vector regression (SVR) in a training set of 1,239 bulls. Accuracy and bias of MBV prediction were calculated from cross-validation of the training set and tested against a test team of 706 young bulls.

Results

For both traits, FR-LS using a subset of SNP was significantly less accurate than all other methods which used all SNP. Accuracies obtained by Bayes-R, RR-BLUP, PLSR and SVR were very similar for ASI (0.39-0.45) and for PPT (0.55-0.61). Overall, SVR gave the highest accuracy.All methods resulted in biased MBV predictions for ASI, for PPT only RR-BLUP and SVR predictions were unbiased. A significant decrease in accuracy of prediction of ASI was seen in young test cohorts of bulls compared to the accuracy derived from cross-validation of the training set. This reduction was not apparent for PPT. Combining MBV predictions with pedigree based predictions gave 1.05 - 1.34 times higher accuracies compared to predictions based on pedigree alone. Some methods have largely different computational requirements, with PLSR and RR-BLUP requiring the least computing time.

Conclusions

The four methods which use information from all SNP namely RR-BLUP, Bayes-R, PLSR and SVR generate similar accuracies of MBV prediction for genomic selection, and their use in the selection of immediate future generations in dairy cattle will be comparable. The use of FR-LS in genomic selection is not recommended.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号