首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The availability of high density panels of molecular markers has prompted the adoption of genomic selection (GS) methods in animal and plant breeding. In GS, parametric, semi-parametric and non-parametric regressions models are used for predicting quantitative traits. This article shows how to use neural networks with radial basis functions (RBFs) for prediction with dense molecular markers. We illustrate the use of the linear Bayesian LASSO regression model and of two non-linear regression models, reproducing kernel Hilbert spaces (RKHS) regression and radial basis function neural networks (RBFNN) on simulated data and real maize lines genotyped with 55,000 markers and evaluated for several trait-environment combinations. The empirical results of this study indicated that the three models showed similar overall prediction accuracy, with a slight and consistent superiority of RKHS and RBFNN over the additive Bayesian LASSO model. Results from the simulated data indicate that RKHS and RBFNN models captured epistatic effects; however, adding non-signal (redundant) predictors (interaction between markers) can adversely affect the predictive accuracy of the non-linear regression models.  相似文献   

2.
A metabolomics approach for prediction of bacteremic sepsis in patients in the emergency room (ER) was investigated. In a prospective study, whole blood samples from 65 patients with bacteremic sepsis and 49 ER controls were compared. The blood samples were analyzed using gas chromatography coupled to time-of-flight mass spectrometry. Multivariate and logistic regression modeling using metabolites identified by chromatography or using conventional laboratory parameters and clinical scores of infection were employed. A predictive model of bacteremic sepsis with 107 metabolites was developed and validated. The number of metabolites was reduced stepwise until identifying a set of 6 predictive metabolites. A 6-metabolite predictive logistic regression model showed a sensitivity of 0.91(95% CI 0.69–0.99) and a specificity 0.84 (95% CI 0.58–0.94) with an AUC of 0.93 (95% CI 0.89–1.01). Myristic acid was the single most predictive metabolite, with a sensitivity of 1.00 (95% CI 0.85–1.00) and specificity of 0.95 (95% CI 0.74–0.99), and performed better than various combinations of conventional laboratory and clinical parameters. We found that a metabolomics approach for analysis of acute blood samples was useful for identification of patients with bacteremic sepsis. Metabolomics should be further evaluated as a new tool for infection diagnostics.  相似文献   

3.
In studies of human balance, it is common to fit stimulus-response data by tuning the time-delay and gain parameters of a simple delayed feedback model. Many interpret this fitted model, a simple delayed feedback model, as evidence that predictive processes are not required to explain existing data on standing balance. However, two questions lead us to doubt this approach. First, does fitting a delayed feedback model lead to reliable estimates of the time-delay? Second, can a non-predictive controller provide an explanation compatible with the independently estimated time delay? For methodological and experimental clarity, we study human balancing of a simulated inverted pendulum via joystick and screen. A two-step approach to data analysis is used: firstly a non-parametric model—the closed-loop impulse response—is estimated from the experimental data; second, a parametric model is fitted to the non-parametric impulse-response by adjusting time-delay and controller parameters. To support the second step, a new explicit formula relating controller parameters to closed-loop impulse response is derived. Two classes of controller are investigated within a common state-space context: non-predictive and predictive. It is found that the time-delay estimate arising from the second step is strongly dependent on which controller class is assumed; in particular, the non-predictive control assumption leads to time-delay estimates that are smaller than those arising from the predictive assumption. Moreover, the time-delays estimated using the non-predictive control assumption are not consistent with a lower-bound on the time-delay of the non-parametric model whereas the corresponding predictive result is consistent. Thus while the goodness of fit only marginally favoured predictive over non-predictive control, if we add the additional constraint that the model must reproduce the non-parametric time delay, then the non-predictive control model fails. We conclude (1) the time-delay should be estimated independently of fitting a low order parametric model, (2) that balance of the simulated inverted pendulum could not be explained by the non-predictive control model and (3) that predictive control provided a better explanation than non-predictive control.  相似文献   

4.
Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.  相似文献   

5.
6.
7.
Species Distribution Models (SDMs) are a powerful tool to derive habitat suitability predictions relating species occurrence data with habitat features. Two of the most frequently applied algorithms to model species-habitat relationships are Generalised Linear Models (GLM) and Random Forest (RF). The former is a parametric regression model providing functional models with direct interpretability. The latter is a machine learning non-parametric algorithm, more tolerant than other approaches in its assumptions, which has often been shown to outperform parametric algorithms. Other approaches have been developed to produce robust SDMs, like training data bootstrapping and spatial scale optimisation. Using felid presence-absence data from three study regions in Southeast Asia (mainland, Borneo and Sumatra), we tested the performances of SDMs by implementing four modelling frameworks: GLM and RF with bootstrapped and non-bootstrapped training data. With Mantel and ANOVA tests we explored how the four combinations of algorithms and bootstrapping influenced SDMs and their predictive performances. Additionally, we tested how scale-optimisation responded to species' size, taxonomic associations (species and genus), study area and algorithm. We found that choice of algorithm had strong effect in determining the differences between SDMs' spatial predictions, while bootstrapping had no effect. Additionally, algorithm followed by study area and species, were the main factors driving differences in the spatial scales identified. SDMs trained with GLM showed higher predictive performance, however, ANOVA tests revealed that algorithm had significant effect only in explaining the variance observed in sensitivity and specificity and, when interacting with bootstrapping, in Percent Correctly Classified (PCC). Bootstrapping significantly explained the variance in specificity, PCC and True Skills Statistics (TSS). Our results suggest that there are systematic differences in the scales identified and in the predictions produced by GLM vs. RF, but that neither approach was consistently better than the other. The divergent predictions and inconsistent predictive abilities suggest that analysts should not assume machine learning is inherently superior and should test multiple methods. Our results have strong implications for SDM development, revealing the inconsistencies introduced by the choice of algorithm on scale optimisation, with GLM selecting broader scales than RF.  相似文献   

8.
Managing forest ecosystems for sustainable, multiple use requires forest resource managers to understand and predict how plant species composition and distribution varies across environmental gradients and responds to landscape scale disturbances. This study demonstrates predictive vegetation modeling and mapping for a Northeast Oregon forest using non-parametric Multiplicative Regression (NPMR) with presence/absence data for the species Clintonia uniflora (CLUN) and a set of stand structural and raster-based predictor variables. NPMR is a flexible probability modeling system that can find the best subset of habitat factors influencing species occurrence. NPMR was compared with logistic regression (LR) by building reduced models from variables selected as best by NPMR and full models from variables identified as significant with a forward stepwise process and further manual testing. log β was used to select models with the highest predictive capability. NPMR models were less complex and had higher predictive capability than LR for all modeling approaches. Spatial coordinates were among the most powerful predictors and the modeling approach with physiographic and stand structural variables together was the most improved relative to the average frequency of occurrence. GIS probability maps produced with the application of the physiographic models showed good spatial congruence between high probability values and plots that contained CLUN. NPMR proved to be a reliable probability modeling and mapping tool that could be used as the analytical link between monitoring and quantifying the status and trends of vegetation resources.  相似文献   

9.
Aim  To highlight the benefit of using habitat use to improve the accuracy of predictive road fatality models.
Location  The Snowy Mountains Highway in southern New South Wales, Australia.
Methods  A binary logistic regression model was constructed using wombat fatality presences and randomly generated absences. Species-specific habitat variables were included as predictors in the model selection process as well as two spatially explicit measures of wombat habitat use. Generalized additive models (GAMs) were constructed for each possible combination of predictors in R. The final model was selected by comparing all models subsets for the eight predictors and employing the one standard error rule to select the best model set.
Results  The final predictive model had high discriminatory power and incorporated both measures of species habitat use, greatly exceeding the variation explained by a previously published model for the same species and road.
Main Conclusions  Our findings highlight the importance of incorporating variables that describe habitat use by fauna for predictive modelling of animal-vehicle crashes. Reliance upon models that ignore landscape patterns are limited in their capacity to identify hotspots and inform managers of locations to engage in mitigation.  相似文献   

10.
Complex, high-dimensional data sets pose significant analytical challenges in the post-genomic era. Such data sets are not exclusive to genetic analyses and are also pertinent to epidemiology. There has been considerable effort to develop hypothesis-free data mining and machine learning methodologies. However, current methodologies lack exhaustivity and general applicability. Here we use a novel non-parametric, non-euclidean data mining tool, HyperCube®, to explore exhaustively a complex epidemiological malaria data set by searching for over density of events in m-dimensional space. Hotspots of over density correspond to strings of variables, rules, that determine, in this case, the occurrence of Plasmodium falciparum clinical malaria episodes. The data set contained 46,837 outcome events from 1,653 individuals and 34 explanatory variables. The best predictive rule contained 1,689 events from 148 individuals and was defined as: individuals present during 1992–2003, aged 1–5 years old, having hemoglobin AA, and having had previous Plasmodium malariae malaria parasite infection ≤10 times. These individuals had 3.71 times more P. falciparum clinical malaria episodes than the general population. We validated the rule in two different cohorts. We compared and contrasted the HyperCube® rule with the rules using variables identified by both traditional statistical methods and non-parametric regression tree methods. In addition, we tried all possible sub-stratified quantitative variables. No other model with equal or greater representativity gave a higher Relative Risk. Although three of the four variables in the rule were intuitive, the effect of number of P. malariae episodes was not. HyperCube® efficiently sub-stratified quantitative variables to optimize the rule and was able to identify interactions among the variables, tasks not easy to perform using standard data mining methods. Search of local over density in m-dimensional space, explained by easily interpretable rules, is thus seemingly ideal for generating hypotheses for large datasets to unravel the complexity inherent in biological systems.  相似文献   

11.
This paper presents a synergistic parametric and non-parametric modeling study of short-term plasticity (STP) in the Schaffer collateral to hippocampal CA1 pyramidal neuron (SC) synapse. Parametric models in the form of sets of differential and algebraic equations have been proposed on the basis of the current understanding of biological mechanisms active within the system. Non-parametric Poisson–Volterra models are obtained herein from broadband experimental input–output data. The non-parametric model is shown to provide better prediction of the experimental output than a parametric model with a single set of facilitation/depression (FD) process. The parametric model is then validated in terms of its input–output transformational properties using the non-parametric model since the latter constitutes a canonical and more complete representation of the synaptic nonlinear dynamics. Furthermore, discrepancies between the experimentally-derived non-parametric model and the equivalent non-parametric model of the parametric model suggest the presence of multiple FD processes in the SC synapses. Inclusion of an additional set of FD process in the parametric model makes it replicate better the characteristics of the experimentally-derived non-parametric model. This improved parametric model in turn provides the requisite biological interpretability that the non-parametric model lacks.  相似文献   

12.
Different studies have demonstrated the importance of comorbidities to better understand the origin and evolution of medical complications. This study focuses on improvement of the predictive model interpretability based on simple logical features representing comorbidities. We use group lasso based feature interaction discovery followed by a post-processing step, where simple logic terms are added. In the final step, we reduce the feature set by applying lasso logistic regression to obtain a compact set of non-zero coefficients that represent a more comprehensible predictive model. The effectiveness of the proposed approach was demonstrated on a pediatric hospital discharge dataset that was used to build a readmission risk estimation model. The evaluation of the proposed method demonstrates a reduction of the initial set of features in a regression model by 72%, with a slight improvement in the Area Under the ROC Curve metric from 0.763 (95% CI: 0.755–0.771) to 0.769 (95% CI: 0.761–0.777). Additionally, our results show improvement in comprehensibility of the final predictive model using simple comorbidity based terms for logistic regression.  相似文献   

13.
14.
The prediction of antibody-protein (antigen) interactions is very difficult due to the huge variability that characterizes the structure of the antibodies. The region of the antigen bound to the antibodies is called epitope. Experimental data indicate that many antibodies react with a panel of distinct epitopes (positive reaction). The Challenge 1 of DREAM5 aims at understanding whether there exists rules for predicting the reactivity of a peptide/epitope, i.e., its capability to bind to human antibodies. DREAM 5 provided a training set of peptides with experimentally identified high and low reactivities to human antibodies. On the basis of this training set, the participants to the challenge were asked to develop a predictive model of reactivity. A test set was then provided to evaluate the performance of the model implemented so far.We developed a logistic regression model to predict the peptide reactivity, by facing the challenge as a machine learning problem. The initial features have been generated on the basis of the available knowledge and the information reported in the dataset. Our predictive model had the second best performance of the challenge. We also developed a method, based on a clustering approach, able to "in-silico" generate a list of positive and negative new peptide sequences, as requested by the DREAM5 "bonus round" additional challenge.The paper describes the developed model and its results in terms of reactivity prediction, and highlights some open issues concerning the propensity of a peptide to react with human antibodies.  相似文献   

15.
16.
17.
The precision evaluation of prognosis is crucial for clinical treatment decision of bladder cancer (BCa). Therefore, establishing an effective prognostic model for BCa has significant clinical implications. We performed WGCNA and DEG screening to initially identify the candidate genes. The candidate genes were applied to construct a LASSO Cox regression analysis model. The effectiveness and accuracy of the prognostic model were tested by internal/external validation and pan‐cancer validation and time‐dependent ROC. Additionally, a nomogram based on the parameter selected from univariate and multivariate cox regression analysis was constructed. Eight genes were eventually screened out as progression‐related differentially expressed candidates in BCa. LASSO Cox regression analysis identified 3 genes to build up the outcome model in E‐MTAB‐4321 and the outcome model had good performance in predicting patient progress free survival of BCa patients in discovery and test set. Subsequently, another three datasets also have a good predictive value for BCa patients' OS and DFS. Time‐dependent ROC indicated an ideal predictive accuracy of the outcome model. Meanwhile, the nomogram showed a good performance and clinical utility. In addition, the prognostic model also exhibits good performance in pan‐cancer patients. Our outcome model was the first prognosis model for human bladder cancer progression prediction via integrative bioinformatics analysis, which may aid in clinical decision‐making.  相似文献   

18.
Alternative splicing (AS) is critically associated with tumorigenesis and patient's prognosis. Here, we systematically analyzed survival-associated AS signatures in oral squamous cell carcinoma (OSCC) and evaluated their prognostic predictive values. Survival-related AS events were identified by univariate and multivariate Cox regression analyses using OSCC data from the TCGA head neck squamous cell carcinoma data set. The Percent Spliced In calculated by SpliceSeq from 0 to 1 was used to quantify seven types of AS events. A predictive model based on AS events was constructed by least absolute shrinkage and selection operator Cox regression assay and further validated using a training-testing cohort design. Patient survival was estimated using the Kaplan–Meier method and compared with Log-rank test. The receiver operating characteristics curve area under the curves was used to evaluate the predictive abilities of these predictive models. Furthermore, gene–gene interaction networks and the splicing factors (SFs)-AS regulatory network was generated by Cytoscape. A total of 825 survival-related AS events within 719 genes were identified in OSCC samples. The integrative predictive model was better at predicting outcomes of patients as compared to those models built with the individual AS event. The predictive model based on three AS-related genes also effectively predicted patients’ survival. Moreover, seven survival-related SFs were detected in OSCC including RBM4, HNRNPD, and HNRNPC, which have been linked to tumorigenesis. The SF-AS network revealed a significant correlation between survival-related AS genes and these SFs. Our findings revealed a systemic portrait of survival-associated AS events and the splicing network in OSCC, suggesting that AS events might serve as novel prognostic biomarkers and therapeutic targets for OSCC.  相似文献   

19.
Osteologists commonly assess the sex of skeletal remains found in forensic and archaeological contexts based on ordinal scores of subjectively assessed sexually dimorphic traits. Using known‐sex samples, logistic regression (LR) discriminant functions have been recently developed, which allow sex probabilities to be determined. A limitation of LR is that it emphasizes main effects and not interactions. Chi‐square automatic interaction detection (CHAID) is an alternative classification strategy that emphasizes the information in variable interactions and uses decision trees to maximize the probability of correct sex determinations. We used CHAID to analyze the predictive value of the 31 possible combinations of five sexually dimorphic skull traits that Walker used previously to develop logistic regression sex determination equations. The samples consisted of 304 individuals of known sex of English, African American, and European American origin. Based on practical considerations, selection criteria for the best sex predictive trait combinations (SPTCs) were set at accuracies for both sexes of 75% or greater and sex biases lower than 5%. Although several of the trees meeting these criteria were produced for the English and European American samples, none met them for the African American sample. In the series of out‐of‐sample tests we performed, the trees from the English and combined sample of all groups predicted best. Am J Phys Anthropol, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号