首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genome-wide breeding value (GWEBV) estimation methods can be classified based on the prior distribution assumptions of marker effects. Genome-wide BLUP methods assume a normal prior distribution for all markers with a constant variance, and are computationally fast. In Bayesian methods, more flexible prior distributions of SNP effects are applied that allow for very large SNP effects although most are small or even zero, but these prior distributions are often also computationally demanding as they rely on Monte Carlo Markov chain sampling. In this study, we adopted the Pareto principle to weight available marker loci, i.e., we consider that x% of the loci explain (100 - x)% of the total genetic variance. Assuming this principle, it is also possible to define the variances of the prior distribution of the ''big'' and ''small'' SNP. The relatively few large SNP explain a large proportion of the genetic variance and the majority of the SNP show small effects and explain a minor proportion of the genetic variance. We name this method MixP, where the prior distribution is a mixture of two normal distributions, i.e. one with a big variance and one with a small variance. Simulation results, using a real Norwegian Red cattle pedigree, show that MixP is at least as accurate as the other methods in all studied cases. This method also reduces the hyper-parameters of the prior distribution from 2 (proportion and variance of SNP with big effects) to 1 (proportion of SNP with big effects), assuming the overall genetic variance is known. The mixture of normal distribution prior made it possible to solve the equations iteratively, which greatly reduced computation loads by two orders of magnitude. In the era of marker density reaching million(s) and whole-genome sequence data, MixP provides a computationally feasible Bayesian method of analysis.  相似文献   

2.
3.
4.
Brown rust, caused by Puccinia melanocephala, has had devastating effects on sugarcane (Saccharum spp.) breeding programs and commercial production. The discovery of Bru1, a major gene conferring resistance to brown rust, represented a substantial breakthrough. Markers for Bru1 are the first available for sugarcane molecular breeding. The contribution of Bru1 towards brown rust resistance in the Canal Point (CP) sugarcane breeding program was determined as a means of directing future breeding strategies. Bru1 was detected in 285 of 1,072 (27 %) clones used for crossing; this germplasm represents the genetic base for cultivar development in Florida. The frequency of Bru1 was greatest in CP clones (42 %) and lowest among Louisiana clones (6 %). Bru1 was not detected in clones with year assignments before 1953. However, Bru1 frequency increased from 15 % (assignments 1975–1985) to 47 % in the current decade. The increase coincided with the introduction of brown rust to Florida. Bru1 was detected in 155 (32 %) of 485 parental clones tested for brown rust susceptibility at two field locations. Of clones classed resistant to brown rust, 154 (59 %) contained Bru1, yet none of 100 susceptible clones contained the gene. Bru1 was detected in 667 (44 %) clones in the second clonal stage of selection, 87 % of which were free of brown rust symptoms. Bru1 is the predominant source of resistance in the Florida sugarcane genetic base. Efforts to identify and integrate new brown rust resistance genes must be pursued to minimize risks associated with a future breakdown in major gene resistance provided by Bru1.  相似文献   

5.
The narrow genetic base of peach (Prunus persica L. Batsch) challenges efforts to accurately dissect the genetic architecture of complex traits. Standardized phenotypic assessment of pedigree-linked breeding germplasm and new molecular strategies and analytical approaches developed and conducted during the RosBREED project for enabling marker-assisted breeding (MAB) in Rosaceae crops has overcome several aspects of this challenge. The genetic underpinnings of fruit size (fruit equatorial diameter (FD)) and weight (fresh weight (FW)), two most important components of yield, were investigated using the pedigree-based analysis (PBA) approach under a Bayesian framework which has emerged as an alternative strategy to study the genetics of quantitative traits within diverse breeding germplasm across breeding programs. In this study, a complex pedigree with the common founder “Orange Cling” was identified and FD and FW data from 2011 and 2012 analyzed. A genetic model including genetic additive and dominance effects was considered, and its robustness was evaluated by using various prior and initial values in the Markov chain Monte Carlo procedure. Five QTLs were identified which accounted for up to 29 and 17 % of the phenotypic variation for FD and FW, respectively. Additionally, genomic breeding values were obtained for both traits, with accuracies >85 %. This approach serves as a model study for performing PBA across diverse pedigrees. By incorporating multiple breeding programs, the method and results presented support and highlight the ability of this strategy to identify genomic resources as targets for DNA marker development and subsequent MAB within each program.  相似文献   

6.
We tested the hypothesis that mating strategies with genomic information realise lower rates of inbreeding (∆F) than with pedigree information without compromising rates of genetic gain (∆G). We used stochastic simulation to compare ∆F and ∆G realised by two mating strategies with pedigree and genomic information in five breeding schemes. The two mating strategies were minimum-coancestry mating (MC) and minimising the covariance between ancestral genetic contributions (MCAC). We also simulated random mating (RAND) as a reference point. Generations were discrete. Animals were truncation-selected for a single trait that was controlled by 2000 quantitative trait loci, and the trait was observed for all selection candidates before selection. The criterion for selection was genomic-breeding values predicted by a ridge-regression model. Our results showed that MC and MCAC with genomic information realised 6% to 22% less ∆F than MC and MCAC with pedigree information without compromising ∆G across breeding schemes. MC and MCAC realised similar ∆F and ∆G. In turn, MC and MCAC with genomic information realised 28% to 44% less ∆F and up to 14% higher ∆G than RAND. These results indicated that MC and MCAC with genomic information are more effective than with pedigree information in controlling rates of inbreeding. This implies that genomic information should be applied to more than just prediction of breeding values in breeding schemes with truncation selection.  相似文献   

7.
Improving missing value estimation in microarray data with gene ontology   总被引:3,自引:0,他引:3  
MOTIVATION: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation. RESULTS: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments. AVAILABILITY: Java and Matlab codes are available on request from the authors. SUPPLEMENTARY MATERIAL: Available online at http://users.utu.fi/jotatu/GOImpute.html.  相似文献   

8.
9.
10.
Records on groups of individuals could be valuable for predicting breeding values when a trait is difficult or costly to measure on single individuals, such as feed intake and egg production. Adding genomic information has shown improvement in the accuracy of genetic evaluation of quantitative traits with individual records. Here, we investigated the value of genomic information for traits with group records. Besides, we investigated the improvement in accuracy of genetic evaluation for group-recorded traits when including information on a correlated trait with individual records. The study was based on a simulated pig population, including three scenarios of group structure and size. The results showed that both the genomic information and a correlated trait increased the accuracy of estimated breeding values (EBVs) for traits with group records. The accuracies of EBV obtained from group records with a size 24 were much lower than those with a size 12. Random assignment of animals to pens led to lower accuracy due to the weaker relationship between individuals within each group. It suggests that group records are valuable for genetic evaluation of a trait that is difficult to record on individuals, and the accuracy of genetic evaluation can be considerably increased using genomic information. Moreover, the genetic evaluation for a trait with group records can be greatly improved using a bivariate model, including correlated traits that are recorded individually. For efficient use of group records in genetic evaluation, relatively small group size and close relationships between individuals within one group are recommended.Subject terms: Genetic markers, Animal breeding  相似文献   

11.
Plant height is an important trait related to yield potential and plant architecture. A suitable plant height plays a crucial role in improvement of rice yield and lodging resistance. In this study, we found that the traditional upland landrace ‘Kaowenghan’ (KWH) showed a special semi-dwarf phenotype. To identify the semi-dwarf gene from KWH, we raised BC2F4 semi-dwarf introgression lines (IL) by hybridization of the japonica rice cultivar ‘Dianjingyou1’ (DJY1) and KWH in a DJY1 background. The plant height of the homozygous semi-dwarf IL (IL-87) was significantly reduced compared with that of DJY1. The phenotype of the F1 progeny of the semi-dwarf IL-87 and DJY1 showed that the semi-dwarf phenotype was semidominant. QTL mapping indicated that the semi-dwarf phenotype was controlled by a major QTL qDH1 and was localized between the markers RM6696 and RM12047 on chromosome 1. We also developed near-isogenic lines (NIL) from the BC3F3 population, and found that the yield of homozygous NIL (NIL-2) was not significantly different compared to DJY1. Breeding value evaluation through investigation of the plant height of the progeny of NIL (NIL-2) and cultivars from different genetic background indicate that the novel semi-dwarf gene shows potential as a genetic resource for rice breeding.  相似文献   

12.

Background

In livestock populations, missing genotypes on a large proportion of animals are a major problem to implement the estimation of marker-assisted breeding values using haplotypes. The objective of this article is to develop a method to predict haplotypes of animals that are not genotyped using mixed model equations and to investigate the effect of using these predicted haplotypes on the accuracy of marker-assisted breeding value estimation.

Methods

For genotyped animals, haplotypes were determined and for each animal the number of haplotype copies (nhc) was counted, i.e. 0, 1 or 2 copies. In a mixed model framework, nhc for each haplotype were predicted for ungenotyped animals as well as for genotyped animals using the additive genetic relationship matrix. The heritability of nhc was assumed to be 0.99, allowing for minor genotyping and haplotyping errors. The predicted nhc were subsequently used in marker-assisted breeding value estimation by applying random regression on these covariables. To evaluate the method, a population was simulated with one additive QTL and an additive polygenic genetic effect. The QTL was located in the middle of a haplotype based on SNP-markers.

Results

The accuracy of predicted haplotype copies for ungenotyped animals ranged between 0.59 and 0.64 depending on haplotype length. Because powerful BLUP-software was used, the method was computationally very efficient. The accuracy of total EBV increased for genotyped animals when marker-assisted breeding value estimation was compared with conventional breeding value estimation, but for ungenotyped animals the increase was marginal unless the heritability was smaller than 0.1. Haplotypes based on four markers yielded the highest accuracies and when only the nearest left marker was used, it yielded the lowest accuracy. The accuracy increased with increasing marker density. Accuracy of the total EBV approached that of gene-assisted BLUP when 4-marker haplotypes were used with a distance of 0.1 cM between the markers.

Conclusions

The proposed method is computationally very efficient and suitable for marker-assisted breeding value estimation in large livestock populations including effects of a number of known QTL. Marker-assisted breeding value estimation using predicted haplotypes increases accuracy especially for traits with low heritability.  相似文献   

13.
Summary Maximum yield under highly unpredictable environments should be associated with selection of genotypes with superior performance across good and poor environments. Several stability parameters have been proposed to identify superior genotypes over a wide range of environments. None of these has been used as selection criteria, however, because of their low heritability. The objective of the study presented here was to compare the relative efficiency of predicted gain from indirect selection among three stability parameters: the coefficient of regression (b), deviation from regression (S d 2 ), and principal components scores (PC) from the AMMI model; two indices including mean yield and a stability parameter; and three indices involving yield at the best, the worst, and an intermediate environment. Two hundred S1 families from each of two sorghum populations (TP24D and KP9B) were evaluated at four dry-land evironments over 2 years. The low heritability estimates and the low genetic correlation between the various stability parameters and mean yield resulted in their low relative efficiency as indirect selection criteria for high yield across environments. However, when the parameters were combined with mean yield over all to create indices, the relative efficiency increased for all the environments. In terms of resource allocation, these indices were not as efficient as mean productivity, rank summation, and selection index that involved fewer environments in their estimation.Contribution no. 9820 of Agricultural Research Division, Univ. of Neb. and no. 92-203-J of Kansas Exp. Stn.  相似文献   

14.
A method for estimating major gene effects using Gibbs sampling to infer genotype of individuals with unknown values, was compared with a standard mixed-model analysis. The purpose of this study was to evaluate the effect of including information of individuals with unknown genotypes on the estimates and their error variances (Ve) of the single-gene effects. When genotypes were known for all the individuals, results using the Gibbs method (GS) were similar to those obtained with the mixed model (MM). In the absence of selection, when information from individuals with unknown genotypes was included, GS yielded unbiased estimates of the major gene effects while reducing the Ve associated with them. This reduction in Ve depended on the gene frequency and mode of action of the major locus. For the additive effect, the reduction in Ve ranged from 29 to 69% of the total reduction which would have been obtained if all individuals had had a known genotype. Similarly the reduction in Ve found for the dominance effect ranged from 12 to 58%. Estimates using GS generally had small detectable biases when the polygenic heritability used in the analysis was inflated or estimated simultaneously. However, the benefit of using information from individuals with unknown genotypes was still maintained when comparing the mean square error of the estimates using either GS or MM when genotypes are only known for a subset of the population. When the population has been under selection, the use of Gibbs sampling to incorporate information of individuals without genotypes reduced substantially the bias and mean square error found for MM analysis on partial data. Nevertheless, there was some bias detected using Gibbs sampling. The gene frequency of the major gene in the base population was also well estimated despite its change over generations due to selection.  相似文献   

15.
Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.  相似文献   

16.
17.
A Bayesian missing value estimation method for gene expression profile data   总被引:13,自引:0,他引:13  
MOTIVATION: Gene expression profile analyses have been used in numerous studies covering a broad range of areas in biology. When unreliable measurements are excluded, missing values are introduced in gene expression profiles. Although existing multivariate analysis methods have difficulty with the treatment of missing values, this problem has received little attention. There are many options for dealing with missing values, each of which reaches drastically different results. Ignoring missing values is the simplest method and is frequently applied. This approach, however, has its flaws. In this article, we propose an estimation method for missing values, which is based on Bayesian principal component analysis (BPCA). Although the methodology that a probabilistic model and latent variables are estimated simultaneously within the framework of Bayes inference is not new in principle, actual BPCA implementation that makes it possible to estimate arbitrary missing variables is new in terms of statistical methodology. RESULTS: When applied to DNA microarray data from various experimental conditions, the BPCA method exhibited markedly better estimation ability than other recently proposed methods, such as singular value decomposition and K-nearest neighbors. While the estimation performance of existing methods depends on model parameters whose determination is difficult, our BPCA method is free from this difficulty. Accordingly, the BPCA method provides accurate and convenient estimation for missing values. AVAILABILITY: The software is available at http://hawaii.aist-nara.ac.jp/~shige-o/tools/.  相似文献   

18.
19.
New molecular techniques focused on genome analysis, open new possibilities for more accurate evaluation of economiclly important traits in farm animals. Milk production traits are typical quantitative characteristics controlled by a number of genes. Mutations in their sequences may alter animal performance as well as their breeding values. In this study, we investigated the effect of Kpn2I restriction fragment length polymorphisms in the leptin gene, on bull breeding values for milk yield, fat, and protein yield, and their percentage. In order to test for an association between the leptin single-nucleotide polymorphism in exon 2 and milk productivity, we genotyped 134 Iranian Holstein bulls. Breeding values for milk-related traits (milk yield, fat, and protein yield and percentage) were estimated using the BLUP based on an animal model. The effect of the genotypes of Kpn2I polymorphism on the breeding values for milk-related traits was examined using least square methods. The T allele frequency was 0.425. Genotypes were distributed according to the Hardy-Weinberg equilibrium. Bulls with TT genotype had higher milk, fat and protein yield compared with TC and CC bulls (P < 0.05). Bulls with CC genotype had higher protein percentage compared with TT and TC bulls (P < 0.05). The association between leptin polymorphism with milk production traits suggests that this marker may be useful for selection based on molecular information.  相似文献   

20.
Bayesian (via Gibbs sampling) and empirical BLUP (EBLUP) estimation of fixed effects and breeding values were compared by simulation. Combinations of two simulation models (with or without effect of contemporary group (CG)), three selection schemes (random, phenotypic and BLUP selection), two levels of heritability (0.20 and 0.50) and two levels of pedigree information (0% and 15% randomly missing) were considered. Populations consisted of 450 animals spread over six discrete generations. An infinitesimal additive genetic animal model was assumed while simulating data. EBLUP and Bayesian estimates of CG effects and breeding values were, in all situations, essentially the same with respect to Spearman''s rank correlation between true and estimated values. Bias and mean square error (MSE) of EBLUP and Bayesian estimates of CG effects and breeding values showed the same pattern over the range of simulated scenarios. Methods were not biased by phenotypic and BLUP selection when pedigree information was complete, albeit MSE of estimated breeding values increased for situations where CG effects were present. Estimation of breeding values by Bayesian and EBLUP was similarly affected by joint effect of phenotypic or BLUP selection and randomly missing pedigree information. For both methods, bias and MSE of estimated breeding values and CG effects substantially increased across generations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号