首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Animal breeding faces one of the most significant changes of the past decades - the implementation of genomic selection. Genomic selection uses dense marker maps to predict the breeding value of animals with reported accuracies that are up to 0.31 higher than those of pedigree indexes, without the need to phenotype the animals themselves, or close relatives thereof. The basic principle is that because of the high marker density, each quantitative trait loci (QTL) is in linkage disequilibrium (LD) with at least one nearby marker. The process involves putting a reference population together of animals with known phenotypes and genotypes to estimate the marker effects. Marker effects have been estimated with several different methods that generally aim at reducing the dimensions of the marker data. Nearly all reported models only included additive effects. Once the marker effects are estimated, breeding values of young selection candidates can be predicted with reported accuracies up to 0.85. Although results from simulation studies suggest that different models may yield more accurate genomic estimated breeding values (GEBVs) for different traits, depending on the underlying QTL distribution of the trait, there is so far only little evidence from studies based on real data to support this. The accuracy of genomic predictions strongly depends on characteristics of the reference populations, such as number of animals, number of markers, and the heritability of the recorded phenotype. Another important factor is the relationship between animals in the reference population and the evaluated animals. The breakup of LD between markers and QTL across generations advocates frequent re-estimation of marker effects to maintain the accuracy of GEBVs at an acceptable level. Therefore, at low frequencies of re-estimating marker effects, it becomes more important that the model that estimates the marker effects capitalizes on LD information that is persistent across generations.  相似文献   

2.
3.
On marker-assisted prediction of genetic value: beyond the ridge   总被引:6,自引:0,他引:6  
Gianola D  Perez-Enciso M  Toro MA 《Genetics》2003,163(1):347-365
Marked-assisted genetic improvement of agricultural species exploits statistical dependencies in the joint distribution of marker genotypes and quantitative traits. An issue is how molecular (e.g., dense marker maps) and phenotypic information (e.g., some measure of yield in plants) is to be used for predicting the genetic value of candidates for selection. Multiple regression, selection index techniques, best linear unbiased prediction, and ridge regression of phenotypes on marker genotypes have been suggested, as well as more elaborate methods. Here, phenotype-marker associations are modeled hierarchically via multilevel models including chromosomal effects, a spatial covariance of marked effects within chromosomes, background genetic variability, and family heterogeneity. Lorenz curves and Gini coefficients are suggested for assessing the inequality of the contribution of different marked effects to genetic variability. Classical and Bayesian methods are presented. The Bayesian approach includes a Markov chain Monte Carlo implementation. The generality and flexibility of the Bayesian method is illustrated when a Lorenz curve is to be inferred.  相似文献   

4.
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.  相似文献   

5.
A Bayesian nonparametric form of regression based on Dirichlet process priors is adapted to the analysis of quantitative traits possibly affected by cryptic forms of gene action, and to the context of SNP-assisted genomic selection, where the main objective is to predict a genomic signal on phenotype. The procedure clusters unknown genotypes into groups with distinct genetic values, but in a setting in which the number of clusters is unknown a priori, so that standard methods for finite mixture analysis do not work. The central assumption is that genetic effects follow an unknown distribution with some “baseline” family, which is a normal process in the cases considered here. A Bayesian analysis based on the Gibbs sampler produces estimates of the number of clusters, posterior means of genetic effects, a measure of credibility in the baseline distribution, as well as estimates of parameters of the latter. The procedure is illustrated with a simulation representing two populations. In the first one, there are 3 unknown QTL, with additive, dominance and epistatic effects; in the second, there are 10 QTL with additive, dominance and additive × additive epistatic effects. In the two populations, baseline parameters are inferred correctly. The Dirichlet process model infers the number of unique genetic values correctly in the first population, but it produces an understatement in the second one; here, the true number of clusters is over 900, and the model gives a posterior mean estimate of about 140, probably because more replication of genotypes is needed for correct inference. The impact on inferences of the prior distribution of a key parameter (M), and of the extent of replication, was examined via an analysis of mean body weight in 192 paternal half-sib families of broiler chickens, where each sire was genotyped for nearly 7,000 SNPs. In this small sample, it was found that inference about the number of clusters was affected by the prior distribution of M. For a set of combinations of parameters of a given prior distribution, the effects of the prior dissipated when the number of replicate samples per genotype was increased. Thus, the Dirichlet process model seems to be useful for gauging the number of QTLs affecting the trait: if the number of clusters inferred is small, probably just a few QTLs code for the trait. If the number of clusters inferred is large, this may imply that standard parametric models based on the baseline distribution may suffice. However, priors may be influential, especially if sample size is not large and if only a few genotypic configurations have replicate phenotypes in the sample.  相似文献   

6.
Micro array data provides information of expression levels of thousands of genes in a cell in a single experiment. Numerous efforts have been made to use gene expression profiles to improve precision of tumor classification. In our present study we have used the benchmark colon cancer data set for analysis. Feature selection is done using t‐statistic. Comparative study of class prediction accuracy of 3 different classifiers viz., support vector machine (SVM), neural nets and logistic regression was performed using the top 10 genes ranked by the t‐statistic. SVM turned out to be the best classifier for this dataset based on area under the receiver operating characteristic curve (AUC) and total accuracy. Logistic Regression ranks as the next best classifier followed by Multi Layer Perceptron (MLP). The top 10 genes selected by us for classification are all well documented for their variable expression in colon cancer. We conclude that SVM together with t-statistic based feature selection is an efficient and viable alternative to popular techniques.  相似文献   

7.

Background

Minimotifs are short contiguous peptide sequences in proteins that have known functions. At its simplest level, the minimotif sequence is present in a source protein and has an activity relationship with a target, most of which are proteins. While many scientists routinely investigate new minimotif functions in proteins, the major web-based discovery tools have a high rate of false-positive prediction. Any new approach that reduces false-positives will be of great help to biologists.

Methods and Findings

We have built three filters that use genetic interactions to reduce false-positive minimotif predictions. The basic filter identifies those minimotifs where the source/target protein pairs have a known genetic interaction. The HomoloGene genetic interaction filter extends these predictions to predicted genetic interactions of orthologous proteins and the node-based filter identifies those minimotifs where proteins that have a genetic interaction with the source or target have a genetic interaction. Each filter was evaluated with a test data set containing thousands of true and false-positives. Based on sensitivity and selectivity performance metrics, the basic filter had the best discrimination for true positives, whereas the node-based filter had the highest sensitivity. We have implemented these genetic interaction filters on the Minimotif Miner 2.3 website. The genetic interaction filter is particularly useful for improving predictions of posttranslational modifications such as phosphorylation and proteolytic cleavage sites.

Conclusions

Genetic interaction data sets can be used to reduce false-positive minimotif predictions. Minimotif prediction in known genetic interactions can help to refine the mechanisms behind the functional connection between genes revealed by genetic experimentation and screens.  相似文献   

8.
Long N  Gianola D  Rosa GJ  Weigel KA 《Genetica》2011,139(7):843-854
It has become increasingly clear from systems biology arguments that interaction and non-linearity play an important role in genetic regulation of phenotypic variation for complex traits. Marker-assisted prediction of genetic values assuming additive gene action has been widely investigated because of its relevance in artificial selection. On the other hand, it has been less well-studied when non-additive effects hold. Here, we explored a nonparametric model, radial basis function (RBF) regression, for predicting quantitative traits under different gene action modes (additivity, dominance and epistasis). Using simulation, it was found that RBF had better ability (higher predictive correlations and lower predictive mean square errors) of predicting merit of individuals in future generations in the presence of non-additive effects than a linear additive model, the Bayesian Lasso. This was true for populations undergoing either directional or random selection over several generations. Under additive gene action, RBF was slightly worse than the Bayesian Lasso. While prediction of genetic values under additive gene action is well handled by a variety of parametric models, nonparametric RBF regression is a useful counterpart for dealing with situations where non-additive gene action is suspected, and it is robust irrespective of mode of gene action.  相似文献   

9.
The frequency of inherited malformations as well as genetic disorders in newborns account for around 3-5%. These frequency is much higher in early stages of pregnancy, because serious malformations and genetic disorders usually lead to spontaneous abortion. Prenatal diagnosis allowed identification of malformations and/or some genetic syndromes in fetuses during the first trimester of pregnancy. Thereafter, taking into account the severity of the disorders the decision should be taken in regard of subsequent course of the pregnancy taking into account a possibilities of treatment, parent's acceptation of a handicapped child but also, in some cases the possibility of termination of the pregnancy. In prenatal testing, both screening and diagnostic procedures are included. Screening procedures such as first and second trimester biochemical and/or ultrasound screening, first trimester combined ultrasound/biochemical screening and integrated screening should be widely offered to pregnant women. However, interpretation of screening results requires awareness of both sensitivity and predictive value of these procedures. In prenatal diagnosis ultrasound/MRI searching as well as genetic procedures are offered to pregnant women. A variety of approaches for genetic prenatal analyses are now available, including preimplantation diagnosis, chorion villi sampling, amniocentesis, fetal blood sampling as well as promising experimental procedures (e.g. fetal cell and DNA isolation from maternal blood). An incredible progress in genetic methods opened new possibilities for valuable genetic diagnosis. Although karyotyping is widely accepted as golden standard, the discussion is ongoing throughout Europe concerning shifting to new genetic techniques which allow obtaining rapid results in prenatal diagnosis of aneuploidy (e.g. RAPID-FISH, MLPA, quantitative PCR).  相似文献   

10.
We have developed an iterative hybrid algorithm (HA) to predict the 3D structure of peptides starting from their amino acid sequence. The HA is made of a modified genetic algorithm (GA) coupled to a local optimizer. Each HA iteration is carried out in two phases. In the first phase several GA runs are performed upon the entire peptide conformational space. In the second phase we used the manifestation of what we have called conformational memories, that arises at the end of the first phase, as a way of reducing the peptide conformational space in subsequent HA iterations. Use of conformational memories speeds up and refines the localization of the structure at the putative Global Energy Minimum (GEM) since conformational barriers are avoided. The algorithm has been used to predict successfully the putative GEM for Met- and Leu-enkephalin, and to obtain useful information regarding the 3D structure for the 8mer of polyglycine and the 16 residue (AAQAA)(3)Y peptide. The number of fitness function evaluations needed to locate the putative GEMs are fewer than those reported for other heuristic methods. This study opens the possibility of using Genetic Algorithms in high level predictions of secondary structure of polypeptides.  相似文献   

11.
Semiparametric transformation models provide a very general framework for studying the effects of (possibly time-dependent) covariates on survival time and recurrent event times. Assessing the adequacy of these models is an important task because model misspecification affects the validity of inference and the accuracy of prediction. In this paper, we introduce appropriate time-dependent residuals for these models and consider the cumulative sums of the residuals. Under the assumed model, the cumulative sum processes converge weakly to zero-mean Gaussian processes whose distributions can be approximated through Monte Carlo simulation. These results enable one to assess, both graphically and numerically, how unusual the observed residual patterns are in reference to their null distributions. The residual patterns can also be used to determine the nature of model misspecification. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. Three medical studies are provided for illustrations.  相似文献   

12.
Inferences for a semiparametric model with panel data   总被引:1,自引:0,他引:1  
Cheng  SC; Wei  LJ 《Biometrika》2000,87(1):89-97
  相似文献   

13.
Two bulking procedures (bulking individuals before and after genotyping) are commonly applied in similarity based studies of genetic distance at the population or higher level, but their effectiveness is largely unknown. In this study, expected population-pairwise similarity for both bulking procedures is derived with dominant and co-dominant diallelic markers. Numerical examples for the derived formulae are given with up to ten individuals randomly selected from each population. The procedure of bulking individuals after genotyping with either marker system is generally more informative than the procedure of bulking individuals before genotyping, because the former incorporates the information from marker alleles of intermediate frequency. Both procedures are effective with 5–10 individuals selected randomly from either population, but the procedure of bulking before genotyping requires a genotyping effort several-fold less than the procedure of bulking after genotyping. For either bulking procedure, a co-dominant marker system is generally more informative than a dominant marker system. Received: 20 October 1999 / Accepted: 11 November 1999  相似文献   

14.
Genetic interactions provide information about genes and processes with overlapping functions in biological systems. For Saccharomyces cerevisiae, computational integration of multiple types of functional genomic data is used to generate genome-wide predictions of genetic interactions. However, this methodology cannot be applied to the vastly more complex genome of metazoans, and only recently has the first metazoan genome-wide prediction of genetic interactions been reported. The prediction for Caenorhabditis elegans was generated by computationally integrating functional genomic data from S. cerevisiae, C. elegans and Drosophila melanogaster. This achievement is an important step toward system-level understanding of biological systems and human diseases.  相似文献   

15.
An important task of human genetics studies is to predict accurately disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data from genetically correlated phenotypes. Yet, the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders and schizophrenia with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the area under the receiver operating characteristic curve. We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohn’s disease and ulcerative colitis. The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy can be leveraged as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.  相似文献   

16.
Xu R  Harrington DP 《Biometrics》2001,57(3):875-885
A semiparametric estimate of an average regression effect with right-censored failure time data has recently been proposed under the Cox-type model where the regression effect beta(t) is allowed to vary with time. In this article, we derive a simple algebraic relationship between this average regression effect and a measurement of group differences in k-sample transformation models when the random error belongs to the G(rho) family of Harrington and Fleming (1982, Biometrika 69, 553-566), the latter being equivalent to the conditional regression effect in a gamma frailty model. The models considered here are suitable for the attenuating hazard ratios that often arise in practice. The results reveal an interesting connection among the above three classes of models as alternatives to the proportional hazards assumption and add to our understanding of the behavior of the partial likelihood estimate under nonproportional hazards. The algebraic relationship provides a simple estimator under the transformation model. We develop a variance estimator based on the empirical influence function that is much easier to compute than the previously suggested resampling methods. When there is truncation in the right tail of the failure times, we propose a method of bias correction to improve the coverage properties of the confidence intervals. The estimate, its estimated variance, and the bias correction term can all be calculated with minor modifications to standard software for proportional hazards regression.  相似文献   

17.
This paper proposes a semiparametric methodology for modeling multivariate and conditional distributions. We first build a multivariate distribution whose dependence structure is induced by a Gaussian copula and whose marginal distributions are estimated nonparametrically via mixtures of B‐spline densities. The conditional distribution of a given variable is obtained in closed form from this multivariate distribution. We take a Bayesian approach, using Markov chain Monte Carlo methods for inference. We study the frequentist properties of the proposed methodology via simulation and apply the method to estimation of conditional densities of summary statistics, used for computing conditional local false discovery rates, from genetic association studies of schizophrenia and cardiovascular disease risk factors.  相似文献   

18.
A fuzzy guided genetic algorithm for operon prediction   总被引:4,自引:0,他引:4  
Motivation: The operon structure of the prokaryotic genome isa critical input for the reconstruction of regulatory networksat the whole genome level. As experimental methods for the detectionof operons are difficult and time-consuming, efforts are beingput into developing computational methods that can use availablebiological information to predict operons. Method: A genetic algorithm is developed to evolve a startingpopulation of putative operon maps of the genome into progressivelybetter predictions. Fuzzy scoring functions based on multiplecriteria are used for assessing the ‘fitness’ ofthe newly evolved operon maps and guiding their evolution. Results: The algorithm organizes the whole genome into operons.The fuzzy guided genetic algorithm-based approach makes it possibleto use diverse biological information like genome sequence data,functional annotations and conservation across multiple genomes,to guide the organization process. This approach does not requireany prior training with experimental operons. The predictionsfrom this algorithm for Escherchia coli K12 and Bacillus subtilisare evaluated against experimentally discovered operons forthese organisms. The accuracy of the method is evaluated usingan ROC (receiver operating characteristic) analysis. The areaunder the ROC curve is around 0.9, which indicates excellentaccuracy. Contact: roschen_csir{at}rediffmail.com  相似文献   

19.
Grigoletto M  Akritas MG 《Biometrics》1999,55(4):1177-1187
We propose a method for fitting semiparametric models such as the proportional hazards (PH), additive risks (AR), and proportional odds (PO) models. Each of these semiparametric models implies that some transformation of the conditional cumulative hazard function (at each t) depends linearly on the covariates. The proposed method is based on nonparametric estimation of the conditional cumulative hazard function, forming a weighted average over a range of t-values, and subsequent use of least squares to estimate the parameters suggested by each model. An approximation to the optimal weight function is given. This allows semiparametric models to be fitted even in incomplete data cases where the partial likelihood fails (e.g., left censoring, right truncation). However, the main advantage of this method rests in the fact that neither the interpretation of the parameters nor the validity of the analysis depend on the appropriateness of the PH or any of the other semiparametric models. In fact, we propose an integrated method for data analysis where the role of the various semiparametric models is to suggest the best fitting transformation. A single continuous covariate and several categorical covariates (factors) are allowed. Simulation studies indicate that the test statistics and confidence intervals have good small-sample performance. A real data set is analyzed.  相似文献   

20.
On prediction of genetic values in marker-assisted selection.   总被引:13,自引:0,他引:13  
C Lange  J C Whittaker 《Genetics》2001,159(3):1375-1381
We suggest a new approximation for the prediction of genetic values in marker-assisted selection. The new approximation is compared to the standard approach. It is shown that the new approach will often provide substantially better prediction of genetic values; furthermore the new approximation avoids some of the known statistical problems of the standard approach. The advantages of the new approach are illustrated by a simulation study in which the new approximation outperforms both the standard approach and phenotypic selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号