首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Objective

Multiple linear regression (MLR) and machine learning techniques in pharmacogenetic algorithm-based warfarin dosing have been reported. However, performances of these algorithms in racially diverse group have never been objectively evaluated and compared. In this literature-based study, we compared the performances of eight machine learning techniques with those of MLR in a large, racially-diverse cohort.

Methods

MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied in warfarin dose algorithms in a cohort from the International Warfarin Pharmacogenetics Consortium database. Covariates obtained by stepwise regression from 80% of randomly selected patients were used to develop algorithms. To compare the performances of these algorithms, the mean percentage of patients whose predicted dose fell within 20% of the actual dose (mean percentage within 20%) and the mean absolute error (MAE) were calculated in the remaining 20% of patients. The performances of these techniques in different races, as well as the dose ranges of therapeutic warfarin were compared. Robust results were obtained after 100 rounds of resampling.

Results

BART, MARS and SVR were statistically indistinguishable and significantly out performed all the other approaches in the whole cohort (MAE: 8.84–8.96 mg/week, mean percentage within 20%: 45.88%–46.35%). In the White population, MARS and BART showed higher mean percentage within 20% and lower mean MAE than those of MLR (all p values < 0.05). In the Asian population, SVR, BART, MARS and LAR performed the same as MLR. MLR and LAR optimally performed among the Black population. When patients were grouped in terms of warfarin dose range, all machine learning techniques except ANN and LAR showed significantly higher mean percentage within 20%, and lower MAE (all p values < 0.05) than MLR in the low- and high- dose ranges.

Conclusion

Overall, machine learning-based techniques, BART, MARS and SVR performed superior than MLR in warfarin pharmacogenetic dosing. Differences of algorithms’ performances exist among the races. Moreover, machine learning-based algorithms tended to perform better in the low- and high- dose ranges than MLR.  相似文献   

2.
This study aimed to assess the predictive ability of different machine learning (ML) methods for genomic prediction of reproductive traits in Nellore cattle. The studied traits were age at first calving (AFC), scrotal circumference (SC), early pregnancy (EP) and stayability (STAY). The numbers of genotyped animals and SNP markers available were 2342 and 321 419 (AFC), 4671 and 309 486 (SC), 2681 and 319 619 (STAY) and 3356 and 319 108 (EP). Predictive ability of support vector regression (SVR), Bayesian regularized artificial neural network (BRANN) and random forest (RF) were compared with results obtained using parametric models (genomic best linear unbiased predictor, GBLUP, and Bayesian least absolute shrinkage and selection operator, BLASSO). A 5‐fold cross‐validation strategy was performed and the average prediction accuracy (ACC) and mean squared errors (MSE) were computed. The ACC was defined as the linear correlation between predicted and observed breeding values for categorical traits (EP and STAY) and as the correlation between predicted and observed adjusted phenotypes divided by the square root of the estimated heritability for continuous traits (AFC and SC). The average ACC varied from low to moderate depending on the trait and model under consideration, ranging between 0.56 and 0.63 (AFC), 0.27 and 0.36 (SC), 0.57 and 0.67 (EP), and 0.52 and 0.62 (STAY). SVR provided slightly better accuracies than the parametric models for all traits, increasing the prediction accuracy for AFC to around 6.3 and 4.8% compared with GBLUP and BLASSO respectively. Likewise, there was an increase of 8.3% for SC, 4.5% for EP and 4.8% for STAY, comparing SVR with both GBLUP and BLASSO. In contrast, the RF and BRANN did not present competitive predictive ability compared with the parametric models. The results indicate that SVR is a suitable method for genome‐enabled prediction of reproductive traits in Nellore cattle. Further, the optimal kernel bandwidth parameter in the SVR model was trait‐dependent, thus, a fine‐tuning for this hyper‐parameter in the training phase is crucial.  相似文献   

3.
One of the most important applications of genomic selection in maize breeding is to predict and identify the best untested lines from biparental populations, when the training and validation sets are derived from the same cross. Nineteen tropical maize biparental populations evaluated in multienvironment trials were used in this study to assess prediction accuracy of different quantitative traits using low-density (~200 markers) and genotyping-by-sequencing (GBS) single-nucleotide polymorphisms (SNPs), respectively. An extension of the Genomic Best Linear Unbiased Predictor that incorporates genotype × environment (GE) interaction was used to predict genotypic values; cross-validation methods were applied to quantify prediction accuracy. Our results showed that: (1) low-density SNPs (~200 markers) were largely sufficient to get good prediction in biparental maize populations for simple traits with moderate-to-high heritability, but GBS outperformed low-density SNPs for complex traits and simple traits evaluated under stress conditions with low-to-moderate heritability; (2) heritability and genetic architecture of target traits affected prediction performance, prediction accuracy of complex traits (grain yield) were consistently lower than those of simple traits (anthesis date and plant height) and prediction accuracy under stress conditions was consistently lower and more variable than under well-watered conditions for all the target traits because of their poor heritability under stress conditions; and (3) the prediction accuracy of GE models was found to be superior to that of non-GE models for complex traits and marginal for simple traits.  相似文献   

4.
Several methods have been used for genome-enabled prediction (or genomic selection) of complex traits, for example, multiple regression models describing a target trait with a linear function of a set of genetic markers. Genomic selection studies have been focused mostly on single-trait analyses. However, most profitability traits are genetically correlated, and an increase in prediction accuracy of genomic breeding values for genetically correlated traits is expected when using multiple-trait models. Thus, this study was carried out to assess the accuracy of genomic prediction for carcass and meat quality traits in Nelore cattle, using single- and multiple-trait approaches. The study considered 15 780, 15 784, 15 742 and 526 records of rib eye area (REA, cm2), back fat thickness (BF, mm), rump fat (RF, mm) and Warner–Bratzler shear force (WBSF, kg), respectively, in Nelore cattle, from the Nelore Brazil Breeding Program. Animals were genotyped with a low-density single nucleotide polymorphism (SNP) panel and subsequently imputed to arrays with 54 and 777 k SNPs. Four Bayesian specifications of genomic regression models, namely, Bayes A, Bayes B, Bayes Cπ and Bayesian Ridge Regression; blending methods, BLUP; and single-step genomic best linear unbiased prediction (ssGBLUP) methods were compared in terms of prediction accuracy using a fivefold cross-validation. Estimates of heritability ranged from 0.20 to 0.35 and from 0.21 to 0.46 for RF and WBSF on single- and multiple-trait analyses, respectively. Prediction accuracies for REA, BF, RF and WBSF were all similar using the different specifications of regression models. In addition, this study has shown the impact of genomic information upon genetic evaluations in beef cattle using the multiple-trait model, which was also advantageous compared to the single-trait model because it accounted for the selection process using multiple traits at the same time. The advantage of multi-trait analyses is attributed to the consideration of correlations and genetic influences between the traits, in addition to the non-random association of alleles.  相似文献   

5.
High-throughput genotyping and sequencing techniques are rapidly and inexpensively providing large amounts of human genetic variation data. Single Nucleotide Polymorphisms (SNPs) are an important source of human genome variability and have been implicated in several human diseases, including cancer. Amino acid mutations resulting from non-synonymous SNPs in coding regions may generate protein functional changes that affect cell proliferation. In this study, we developed a machine learning approach to predict cancer-causing missense variants. We present a Support Vector Machine (SVM) classifier trained on a set of 3163 cancer-causing variants and an equal number of neutral polymorphisms. The method achieve 93% overall accuracy, a correlation coefficient of 0.86, and area under ROC curve of 0.98. When compared with other previously developed algorithms such as SIFT and CHASM our method results in higher prediction accuracy and correlation coefficient in identifying cancer-causing variants.  相似文献   

6.
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.  相似文献   

7.
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.  相似文献   

8.
The accurate representation of species distribution derived from sampled data is essential for management purposes and to underpin population modelling. Additionally, the prediction of species distribution for an expanded area, beyond the sampling area can reduce sampling costs. Here, several well-established and recently developed habitat modelling techniques are investigated in order to identify the most suitable approach to use with presence–absence acoustic data. The fitting efficiency of the modelling techniques are initially tested on the training dataset while their predictive capacity is evaluated using a verification set. For the comparison among models, Receiver Operating Characteristics (ROC), Kappa statistics, correlation and confusion matrices are used. Boosted Regression Trees (BRT) and Associative Neural Networks (ASNN), which are both within the machine learning category, outperformed the other modelling approaches tested.  相似文献   

9.
Gross Primary Productivity (GPP) is the amount of sequestered CO2 during plant photosynthesis. GPP is an important indicator of ecosystem health in various ecologies and to assess climate change. The objective of the present work is to propose a machine learning based GPP estimation model using remote sensing (RS) data in combination with meteorological data (MET) and topographical data (TOPO) for prediction of GPP, which can be upscaled in temporal and spatial resolution. Random Forest Regression (RFR) is proposed for this using the Fluxnet2015 GPP dataset for the Australian region. This model has attained a very high accuracy with an R2 value of 0.82, as estimated by 10-fold cross-validation. The model has been compared with state-of-the-art machine learning models and found to be performing better than others. Different feature sets like MET-features and TOPO-features were evaluated in combination with RS-features. The results exhibited that the RFR model performed better when MET and TOPO features are combined with RS-features. GPP prediction for the year 2014, in 8 days temporal and 500m  spatial resolution for the Australian region for different plant function types is demonstrated using the proposed model and produced very high value of R2 (0.84), when compared to ground truth. Thus, the proposed approach of the RFR model for GPP estimation showed significant improvement in regional carbon cycle studies and can also be employed for simulating GPP for the future under different climate scenarios.  相似文献   

10.
Accelerating biomass improvement is a major goal of Miscanthus breeding. The development and implementation of genomic-enabled breeding tools, like marker-assisted selection (MAS) and genomic selection, has the potential to improve the efficiency of Miscanthus breeding. The present study conducted genome-wide association (GWA) and genomic prediction of biomass yield and 14 yield-components traits in Miscanthus sacchariflorus. We evaluated a diversity panel with 590 accessions of M. sacchariflorus grown across 4 years in one subtropical and three temperate locations and genotyped with 268,109 single-nucleotide polymorphisms (SNPs). The GWA study identified a total of 835 significant SNPs and 674 candidate genes across all traits and locations. Of the significant SNPs identified, 280 were localized in mapped quantitative trait loci intervals and proximal to SNPs identified for similar traits in previously reported Miscanthus studies, providing additional support for the importance of these genomic regions for biomass yield. Our study gave insights into the genetic basis for yield-component traits in M. sacchariflorus that may facilitate marker-assisted breeding for biomass yield. Genomic prediction accuracy for the yield-related traits ranged from 0.15 to 0.52 across all locations and genetic groups. Prediction accuracies within the six genetic groupings of M. sacchariflorus were limited due to low sample sizes. Nevertheless, the Korea/NE China/Russia (N = 237) genetic group had the highest prediction accuracy of all genetic groups (ranging 0.26–0.71), suggesting that with adequate sample sizes, there is strong potential for genomic selection within the genetic groupings of M. sacchariflorus. This study indicated that MAS and genomic prediction will likely be beneficial for conducting population-improvement of M. sacchariflorus.  相似文献   

11.
Abstract

Accurate and rapid toxic gas concentration prediction model plays an important role in emergency aid of sudden gas leak. However, it is difficult for existing dispersion model to achieve accuracy and efficiency requirements at the same time. Although some researchers have considered developing new forecasting models with traditional machine learning, such as back propagation (BP) neural network, support vector machine (SVM), the prediction results obtained from such models need to be improved still in terms of accuracy. Then new prediction models based on deep learning are proposed in this paper. Deep learning has obvious advantages over traditional machine learning in prediction and classification. Deep belief networks (DBNs) as well as convolution neural networks (CNNs) are used to build new dispersion models here. Both models are compared with Gaussian plume model, computation fluid dynamics (CFD) model and models based on traditional machine learning in terms of accuracy, prediction time, and computation time. The experimental results turn out that CNNs model performs better considering all evaluation indexes.  相似文献   

12.
The antigenic variability of influenza viruses has always made influenza vaccine development challenging. The punctuated nature of antigenic drift of influenza virus suggests that a relatively small number of genetic changes or combinations of genetic changes may drive changes in antigenic phenotype. The present study aimed to identify antigenicity-associated sites in the hemagglutinin protein of A/H1N1 seasonal influenza virus using computational approaches. Random Forest Regression (RFR) and Support Vector Regression based on Recursive Feature Elimination (SVR-RFE) were applied to H1N1 seasonal influenza viruses and used to analyze the associations between amino acid changes in the HA1 polypeptide and antigenic variation based on hemagglutination-inhibition (HI) assay data. Twenty-three and twenty antigenicity-associated sites were identified by RFR and SVR-RFE, respectively, by considering the joint effects of amino acid residues on antigenic drift. Our proposed approaches were further validated with the H3N2 dataset. The prediction models developed in this study can quantitatively predict antigenic differences with high prediction accuracy based only on HA1 sequences. Application of the study results can increase understanding of H1N1 seasonal influenza virus antigenic evolution and accelerate the selection of vaccine strains.  相似文献   

13.
Many non-synonymous SNPs (nsSNPs) are associated with diseases, and numerous machine learning methods have been applied to train classifiers for sorting disease-associated nsSNPs from neutral ones. The continuously accumulated nsSNP data allows us to further explore better prediction approaches. In this work, we partitioned the training data into 20 subsets according to either original or substituted amino acid type at the nsSNP site. Using support vector machine (SVM), training classification models on each subset resulted in an overall accuracy of 76.3% or 74.9% depending on the two different partition criteria, while training on the whole dataset obtained an accuracy of only 72.6%. Moreover, the dataset was also randomly divided into 20 subsets, but the corresponding accuracy was only 73.2%. Our results demonstrated that partitioning the whole training dataset into subsets properly, i.e., according to the residue type at the nsSNP site, will improve the performance of the trained classifiers significantly, which should be valuable in developing better tools for predicting the disease-association of nsSNPs.  相似文献   

14.

Background

Advances in genotyping technology, such as genotyping by sequencing (GBS), are making genomic prediction more attractive to reduce breeding cycle times and costs associated with phenotyping. Genomic prediction and selection has been studied in several crop species, but no reports exist in soybean. The objectives of this study were (i) evaluate prospects for genomic selection using GBS in a typical soybean breeding program and (ii) evaluate the effect of GBS marker selection and imputation on genomic prediction accuracy. To achieve these objectives, a set of soybean lines sampled from the University of Nebraska Soybean Breeding Program were genotyped using GBS and evaluated for yield and other agronomic traits at multiple Nebraska locations.

Results

Genotyping by sequencing scored 16,502 single nucleotide polymorphisms (SNPs) with minor-allele frequency (MAF) > 0.05 and percentage of missing values ≤ 5% on 301 elite soybean breeding lines. When SNPs with up to 80% missing values were included, 52,349 SNPs were scored. Prediction accuracy for grain yield, assessed using cross validation, was estimated to be 0.64, indicating good potential for using genomic selection for grain yield in soybean. Filtering SNPs based on missing data percentage had little to no effect on prediction accuracy, especially when random forest imputation was used to impute missing values. The highest accuracies were observed when random forest imputation was used on all SNPs, but differences were not significant. A standard additive G-BLUP model was robust; modeling additive-by-additive epistasis did not provide any improvement in prediction accuracy. The effect of training population size on accuracy began to plateau around 100, but accuracy steadily climbed until the largest possible size was used in this analysis. Including only SNPs with MAF > 0.30 provided higher accuracies when training populations were smaller.

Conclusions

Using GBS for genomic prediction in soybean holds good potential to expedite genetic gain. Our results suggest that standard additive G-BLUP models can be used on unfiltered, imputed GBS data without loss in accuracy.  相似文献   

15.
Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a na?ve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n = 500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.  相似文献   

16.
Economically important reproduction traits in sheep, such as number of lambs weaned and litter size, are expressed only in females and later in life after most selection decisions are made, which makes them ideal candidates for genomic selection. Accurate genomic predictions would lead to greater genetic gain for these traits by enabling accurate selection of young rams with high genetic merit. The aim of this study was to design and evaluate the accuracy of a genomic prediction method for female reproduction in sheep using daughter trait deviations (DTD) for sires and ewe phenotypes (when individual ewes were genotyped) for three reproduction traits: number of lambs born (NLB), litter size (LSIZE) and number of lambs weaned. Genomic best linear unbiased prediction (GBLUP), BayesR and pedigree BLUP analyses of the three reproduction traits measured on 5340 sheep (4503 ewes and 837 sires) with real and imputed genotypes for 510 174 SNPs were performed. The prediction of breeding values using both sire and ewe trait records was validated in Merino sheep. Prediction accuracy was evaluated by across sire family and random cross‐validations. Accuracies of genomic estimated breeding values (GEBVs) were assessed as the mean Pearson correlation adjusted by the accuracy of the input phenotypes. The addition of sire DTD into the prediction analysis resulted in higher accuracies compared with using only ewe records in genomic predictions or pedigree BLUP. Using GBLUP, the average accuracy based on the combined records (ewes and sire DTD) was 0.43 across traits, but the accuracies varied by trait and type of cross‐validations. The accuracies of GEBVs from random cross‐validations (range 0.17–0.61) were higher than were those from sire family cross‐validations (range 0.00–0.51). The GEBV accuracies of 0.41–0.54 for NLB and LSIZE based on the combined records were amongst the highest in the study. Although BayesR was not significantly different from GBLUP in prediction accuracy, it identified several candidate genes which are known to be associated with NLB and LSIZE. The approach provides a way to make use of all data available in genomic prediction for traits that have limited recording.  相似文献   

17.
BackgroundPrevious epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors.MethodsIn this study, we used data from a survey of 54 risk factors for intestinal parasitosis in 954 Ethiopian school children. We investigated whether machine learning approaches can supplement traditional logistic regression in identifying intestinal parasite infection risk factors. We used feature selection methods such as InfoGain (IG), ReliefF (ReF), Joint Mutual Information (JMI), and Minimum Redundancy Maximum Relevance (MRMR). Additionally, we predicted children’s parasitic infection status using classifiers such as Logistic Regression (LR), Support Vector Machines (SVM), Random Forests (RF) and XGBoost (XGB), and compared their accuracy and area under the receiver operating characteristic curve (AUROC) scores. For optimal model training, we performed tenfold cross-validation and tuned the classifier hyperparameters. We balanced our dataset using the Synthetic Minority Oversampling (SMOTE) method. Additionally, we used association rule learning to establish a link between risk factors and parasitic infections.Key findingsOur study demonstrated that machine learning could be used in conjunction with logistic regression. Using machine learning, we developed models that accurately predicted four parasitic infections: any parasitic infection at 79.9% accuracy, helminth infection at 84.9%, any STH infection at 95.9%, and protozoan infection at 94.2%. The Random Forests (RF) and Support Vector Machines (SVM) classifiers achieved the highest accuracy when top 20 risk factors were considered using Joint Mutual Information (JMI) or all features were used. The best predictors of infection were socioeconomic, demographic, and hematological characteristics.ConclusionsWe demonstrated that feature selection and association rule learning are useful strategies for detecting risk factors for parasite infection. Additionally, we showed that advanced classifiers might be utilized to predict children’s parasitic infection status. When combined with standard logistic regression models, machine learning techniques can identify novel risk factors and predict infection risk.  相似文献   

18.
Dominance may be an important source of non-additive genetic variance for many traits of dairy cattle. However, nearly all prediction models for dairy cattle have included only additive effects because of the limited number of cows with both genotypes and phenotypes. The role of dominance in the Holstein and Jersey breeds was investigated for eight traits: milk, fat, and protein yields; productive life; daughter pregnancy rate; somatic cell score; fat percent and protein percent. Additive and dominance variance components were estimated and then used to estimate additive and dominance effects of single nucleotide polymorphisms (SNPs). The predictive abilities of three models with both additive and dominance effects and a model with additive effects only were assessed using ten-fold cross-validation. One procedure estimated dominance values, and another estimated dominance deviations; calculation of the dominance relationship matrix was different for the two methods. The third approach enlarged the dataset by including cows with genotype probabilities derived using genotyped ancestors. For yield traits, dominance variance accounted for 5 and 7% of total variance for Holsteins and Jerseys, respectively; using dominance deviations resulted in smaller dominance and larger additive variance estimates. For non-yield traits, dominance variances were very small for both breeds. For yield traits, including additive and dominance effects fit the data better than including only additive effects; average correlations between estimated genetic effects and phenotypes showed that prediction accuracy increased when both effects rather than just additive effects were included. No corresponding gains in prediction ability were found for non-yield traits. Including cows with derived genotype probabilities from genotyped ancestors did not improve prediction accuracy. The largest additive effects were located on chromosome 14 near DGAT1 for yield traits for both breeds; those SNPs also showed the largest dominance effects for fat yield (both breeds) as well as for Holstein milk yield.  相似文献   

19.

Key message

New methods that incorporate the main and interaction effects of high-dimensional markers and of high-dimensional environmental covariates gave increased prediction accuracy of grain yield in wheat across and within environments.

Abstract

In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (G × E). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, G × E can be accounted for using interactions between markers and environmental covariates (ECs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ECs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ECs, respectively. We assessed the proposed method using data from Arvalis, consisting of 139 wheat lines genotyped with 2,395 SNPs and evaluated for grain yield over 8 years and various locations within northern France. A total of 68 ECs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17–34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important.  相似文献   

20.
Prediction of genetic risk for disease is needed for preventive and personalized medicine. Genome-wide association studies have found unprecedented numbers of variants associated with complex human traits and diseases. However, these variants explain only a small proportion of genetic risk. Mounting evidence suggests that many traits, relevant to public health, are affected by large numbers of small-effect genes and that prediction of genetic risk to those traits and diseases could be improved by incorporating large numbers of markers into whole-genome prediction (WGP) models. We developed a WGP model incorporating thousands of markers for prediction of skin cancer risk in humans. We also considered other ways of incorporating genetic information into prediction models, such as family history or ancestry (using principal components, PCs, of informative markers). Prediction accuracy was evaluated using the area under the receiver operating characteristic curve (AUC) estimated in a cross-validation. Incorporation of genetic information (i.e., familial relationships, PCs, or WGP) yielded a significant increase in prediction accuracy: from an AUC of 0.53 for a baseline model that accounted for nongenetic covariates to AUCs of 0.58 (pedigree), 0.62 (PCs), and 0.64 (WGP). In summary, prediction of skin cancer risk could be improved by considering genetic information and using a large number of single-nucleotide polymorphisms (SNPs) in a WGP model, which allows for the detection of patterns of genetic risk that are above and beyond those that can be captured using family history. We discuss avenues for improving prediction accuracy and speculate on the possible use of WGP to prospectively identify individuals at high risk.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号