首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP.  相似文献   

2.
3.
A recurrent neural network modeling approach for software reliability prediction with respect to cumulative failure time is proposed. Our proposed network structure has the capability of learning and recognizing the inherent internal temporal property of cumulative failure time sequence. Further, by adding a penalty term of sum of network connection weights, Bayesian regularization is applied to our network training scheme to improve the generalization capability and lower the susceptibility of overfitting. The performance of our proposed approach has been tested using four real-time control and flight dynamic application data sets. Numerical results show that our proposed approach is robust across different software projects, and has a better performance with respect to both goodness-of-fit and next-step-predictability compared to existing neural network models for failure time prediction.  相似文献   

4.
In this article we aim at improving the performance of whole brain functional imaging at very high temporal resolution (100 ms or less). This is achieved by utilizing a nonlinear regularized parallel image reconstruction scheme, where the penalty term of the cost function is set to the L(1)-norm measured in some transform domain. This type of image reconstruction has gained much attention recently due to its application in compressed sensing and has proven to yield superior spatial resolution and image quality over e.g. Tikhonov regularized image reconstruction. We demonstrate that by using nonlinear regularization it is possible to more accurately localize brain activation from highly undersampled k-space data at the expense of an increase in computation time.  相似文献   

5.
Understanding the behavior of skeletal muscle is critical to implementing computational methods to study how the body responds to compressive loading. This work presents a novel approach to studying the fully nonlinear response of skeletal muscle in compression. Porcine muscle was compressed in both the longitudinal and transverse directions under five stress relaxation steps. Each step consisted of 5% engineering strain over 1 s followed by a relaxation period until equilibrium was reached at an observed change of 1 g/min. The resulting data were analyzed to identify the peak and equilibrium stresses as well as relaxation time for all samples. Additionally, a fully nonlinear strain energy density–based Prony series constitutive model was implemented and validated with independent constant rate compressive data. A nonlinear least squares optimization approach utilizing the Levenberg–Marquardt algorithm was implemented to fit model behavior to experimental data. The results suggested the time-dependent material response plays a key role in the anisotropy of skeletal muscle as increasing strain showed differences in peak stress and relaxation time (p < 0.05), but changes in equilibrium stress disappeared (p > 0.05). The optimizing procedure produced a single set of hyper-viscoelastic parameters which characterized compressive muscle behavior under stress relaxation conditions. The utilized constitutive model was the first orthotropic, fully nonlinear hyper-viscoelastic model of skeletal muscle in compression while maintaining agreement with constitutive physical boundaries. The model provided an excellent fit to experimental data and agreed well with the independent validation in the transverse direction.  相似文献   

6.
In this paper, we review recent advances in blind source separation (BSS) and independent component analysis (ICA) for nonlinear mixing models. After a general introduction to BSS and ICA, we discuss in more detail uniqueness and separability issues, presenting some new results. A fundamental difficulty in the nonlinear BSS problem and even more so in the nonlinear ICA problem is that they provide non-unique solutions without extra constraints, which are often implemented by using a suitable regularization. In this paper, we explore two possible approaches. The first one is based on structural constraints. Especially, post-nonlinear mixtures are an important special case, where a nonlinearity is applied to linear mixtures. For such mixtures, the ambiguities are essentially the same as for the linear ICA or BSS problems. The second approach uses Bayesian inference methods for estimating the best statistical parameters, under almost unconstrained models in which priors can be easily added. In the later part of this paper, various separation techniques proposed for post-nonlinear mixtures and general nonlinear mixtures are reviewed.  相似文献   

7.
This study focuses on predicting breathing pattern, which is crucial to deal with system latency in the treatments of moving lung tumors. Predicting respiratory motion in real-time is challenging, due to the inherent chaotic nature of breathing patterns, i.e. sensitive dependence on initial conditions. In this work, nonlinear prediction methods are used to predict the short-term evolution of the respiratory system for 62 patients, whose breathing time series was acquired using respiratory position management (RPM) system. Single step and N-point multi step prediction are performed for sampling rates of 5 Hz and 10 Hz. We compare the employed non-linear prediction methods with respect to prediction accuracy to Adaptive Infinite Impulse Response (IIR) prediction filters. A Local Average Model (LAM) and local linear models (LLMs) combined with a set of linear regularization techniques to solve ill-posed regression problems are implemented. For all sampling frequencies both single step and N-point multi step prediction results obtained using LAM and LLM with regularization methods perform better than IIR prediction filters for the selected sample patients. Moreover, since the simple LAM model performs as well as the more complicated LLM models in our patient sample, its use for non-linear prediction is recommended.  相似文献   

8.
Secondary structures of proteins have been predicted using neural networks from their Fourier transform infrared spectra. To improve the generalization ability of the neural networks, the training data set has been artificially increased by linear interpolation. The leave-one-out approach has been used to demonstrate the applicability of the method. Bayesian regularization has been used to train the neural networks and the predictions have been further improved by the maximum-likelihood estimation method. The networks have been tested and standard error of prediction (SEP) of 4.19% for alpha helix, 3.49% for beta sheet, and 3.15% for turns have been achieved. The results indicate that there is a significant decrease in the SEP for each type of structure parameter compared to previous works.  相似文献   

9.
We introduce in this paper the dendroTools R package for studying the statistical relationships between tree-ring parameters and daily environmental data. The core function of the package is daily_response(), which works by sliding a moving window through daily environmental data and calculating statistical metrics with one or more tree ring proxies. Possible metrics are correlation coefficient, coefficient of determination and adjusted coefficient of determination. In addition to linear regression, it is possible to use a nonlinear artificial neural network with the Bayesian regularization training algorithm (brnn). dendroTools provides the opportunity to use daily climate data and robust nonlinear functions for the analysis of climate-growth relationships. Models should thus be better adapted to the real (continuous) growth of trees and should gain in predictive capabilities. The dendroTools R package is freely available in the CRAN repository. The functionality of the package is demonstrated on two examples, one using a mean vessel area (MVA) chronology and one a traditional tree-ring width (TRW).  相似文献   

10.
Model complexity in ecological niche modelling has been recently considered as an important issue that might affect model performance. New methodological developments have implemented the Akaike information criterion (AIC) to capture model complexity in the Maxent algorithm model. AIC is calculated based on the number of parameters and likelihoods of continuous raw outputs. ENMeval R package allows users to perform a species-specific tuning of Maxent settings running models with different combinations of regularization multiplier and feature classes and finally, all these models are compared using AIC corrected for small sample size. This approach is focused to find the “best” model parametrization and it is thought to maximize the model complexity and therefore, its predictability. We found that most niche modelling studies examined by us (68%) tend to consider AIC as a criterion of predictive accuracy in geographical distribution. In other words, AIC is used as a criterion to choose those models with the highest capacity to discriminate between presences and absences. However, the link between AIC and geographical predictive accuracy has not been tested so far. Here, we evaluated this relationship using a set of simulated (virtual) species. We created a set of nine virtual species with different ecological and geographical traits (e.g., niche position, niche breadth, range size) and generated different sets of true presences and absences data across geography. We built a set of models using Maxent algorithm with different regularization values and features schemes and calculated AIC values for each model. For each model, we obtained binary predictions using different threshold criteria and validated using independent presence and absences data. We correlated AIC values against standard validation metrics (e.g., Kappa, TSS) and the number of pixels correctly predicted as presences and absences. We did not find a correlation between AIC values and predictive accuracy from validation metrics. In general, those models with the lowest AIC values tend to generate geographical predictions with high commission and omission errors. The results were consistent across all species simulated. Finally, we suggest that AIC should not be used if users are interested in prediction more than explanation in ecological niche modelling.  相似文献   

11.

Background

Artificial neural networks (ANN) mimic the function of the human brain and are capable of performing massively parallel computations for data processing and knowledge representation. ANN can capture nonlinear relationships between predictors and responses and can adaptively learn complex functional forms, in particular, for situations where conventional regression models are ineffective. In a previous study, ANN with Bayesian regularization outperformed a benchmark linear model when predicting milk yield in dairy cattle or grain yield of wheat. Although breeding values rely on the assumption of additive inheritance, the predictive capabilities of ANN are of interest from the perspective of their potential to increase the accuracy of prediction of molecular breeding values used for genomic selection. This motivated the present study, in which the aim was to investigate the accuracy of ANN when predicting the expected progeny difference (EPD) of marbling score in Angus cattle. Various ANN architectures were explored, which involved two training algorithms, two types of activation functions, and from 1 to 4 neurons in hidden layers. For comparison, BayesCπ models were used to select a subset of optimal markers (referred to as feature selection), under the assumption of additive inheritance, and then the marker effects were estimated using BayesCπ with π set equal to zero. This procedure is referred to as BayesCpC and was implemented on a high-throughput computing cluster.

Results

The ANN with Bayesian regularization method performed equally well for prediction of EPD as BayesCpC, based on prediction accuracy and sum of squared errors. With the 3K-SNP panel, for example, prediction accuracy was 0.776 using BayesCpC, and ranged from 0.776 to 0.807 using BRANN. With the selected 700-SNP panel, prediction accuracy was 0.863 for BayesCpC and ranged from 0.842 to 0.858 for BRANN. However, prediction accuracy for the ANN with scaled conjugate gradient back-propagation was lower, ranging from 0.653 to 0.689 with the 3K-SNP panel, and from 0.743 to 0.793 with the selected 700-SNP panel.

Conclusions

ANN with Bayesian regularization performed as well as linear Bayesian regression models in predicting additive genetic values, supporting the idea that ANN are useful as universal approximators of functions of interest in breeding contexts.  相似文献   

12.
Contreras M  Ryan LM 《Biometrics》2000,56(4):1268-1271
In this article, we present an estimation approach for solving nonlinear constrained generalized estimating equations that can be implemented using object-oriented software for nonlinear programming, such as nlminb in Splus or fmincon and lsqnonlin in Matlab. We show how standard estimating equation theory includes this method as a special case so that our estimates, when unconstrained, will remain consistent and asymptotically normal. To illustrate this method, we fit a nonlinear dose-response model with nonnegative mixed bound constraints to clustered binary data from a developmental toxicity study. Satisfactory confidence intervals are found using a nonparametric bootstrap method when a common correlation coefficient is assumed for all the dose groups and for some of the dose-specific groups.  相似文献   

13.
Comparative proteomic studies often use statistical tests included in the software for the analysis of digitized images of two-dimensional electrophoresis gels. As these programs include only limited capabilities for statistical analysis, many studies do not further describe their statistical approach. To find potential differences produced by different data processing, we compared the results of (1) Student's t-test using a spreadsheet program, (2) the intrinsic algorithms implemented in the Phoretix 2D gel analysis software, and (3) the SAM algorithm originally developed for microarray analysis. We applied the algorithms to proteome data of undifferentiated neural stem cells versus in vitro differentiated neural stem cells. We found (1) 367 spots differentially expressed using Student's t-test, (2) 203 spots using the algorithms in Phoretix 2D, and (3) 119 spots using the algorithms in SAM, respectively, with an overlap of 42 spots detected by all three algorithms. Applying different statistical approaches on the same dataset resulted in divergent set of protein spots labeled as statistically "significant". Currently, there is no agreement on statistical data processing of 2DE datasets, but the statistical tests applied in 2DE studies should be documented. Tools for the statistical analysis of proteome data should be implemented and documented in the existing 2DE software.  相似文献   

14.
Experimental designs involving repeated measurements on experimental units are widely used in physiological research. Often, relatively many consecutive observations on each experimental unit are involved and the data may be quite nonlinear. Yet evidently, one of the most commonly used statistical methods for dealing with such data sets in physiological research is the repeated-measurements ANOVA model. The problem herewith is that it is not well suited for data sets with many consecutive measurements; it does not deal with nonlinear features of the data, and the interpretability of the model may be low. The use of inappropriate statistical models increases the likelihood of drawing wrong conclusions. The aim of this article is to illustrate, for a reasonably typical repeated-measurements data set, how fundamental assumptions of the repeated-measurements ANOVA model are inappropriate and how researchers may benefit from adopting different modeling approaches using a variety of different kinds of models. We emphasize intuitive ideas rather than mathematical rigor. We illustrate how such models represent alternatives that 1) can have much higher interpretability, 2) are more likely to meet underlying assumptions, 3) provide better fitted models, and 4) are readily implemented in widely distributed software products.  相似文献   

15.
This study assesses the ability of a novel family of machine learning algorithms to identify changes in relative protein expression levels, measured using 2-D DIGE data, which support accurate class prediction. The analysis was done using a training set of 36 total cellular lysates comprised of six normal and three cancer biological replicates (the remaining are technical replicates) and a validation set of four normal and two cancer samples. Protein samples were separated by 2-D DIGE and expression was quantified using DeCyder-2D Differential Analysis Software. The relative expression reversal (RER) classifier correctly classified 9/9 training biological samples (p<0.022) as estimated using a modified version of leave one out cross validation and 6/6 validation samples. The classification rule involved comparison of expression levels for a single pair of protein spots, tropomyosin isoforms and alpha-enolase, both of which have prior association as potential biomarkers in cancer. The data was also analyzed using algorithms similar to those found in the extended data analysis package of DeCyder software. We propose that by accounting for sources of within- and between-gel variation, RER classifiers applied to 2-D DIGE data provide a useful approach for identifying biomarkers that discriminate among protein samples of interest.  相似文献   

16.
17.
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.  相似文献   

18.
The widely used “Maxent” software for modeling species distributions from presence‐only data (Phillips et al., Ecological Modelling, 190, 2006, 231) tends to produce models with high‐predictive performance but low‐ecological interpretability, and implications of Maxent's statistical approach to variable transformation, model fitting, and model selection remain underappreciated. In particular, Maxent's approach to model selection through lasso regularization has been shown to give less parsimonious distribution models—that is, models which are more complex but not necessarily predictively better—than subset selection. In this paper, we introduce the MIAmaxent R package, which provides a statistical approach to modeling species distributions similar to Maxent's, but with subset selection instead of lasso regularization. The simpler models typically produced by subset selection are ecologically more interpretable, and making distribution models more grounded in ecological theory is a fundamental motivation for using MIAmaxent. To that end, the package executes variable transformation based on expected occurrence–environment relationships and contains tools for exploring data and interrogating models in light of knowledge of the modeled system. Additionally, MIAmaxent implements two different kinds of model fitting: maximum entropy fitting for presence‐only data and logistic regression (GLM) for presence–absence data. Unlike Maxent, MIAmaxent decouples variable transformation, model fitting, and model selection, which facilitates methodological comparisons and gives the modeler greater flexibility when choosing a statistical approach to a given distribution modeling problem.  相似文献   

19.
Loh PR  Tucker G  Berger B 《PloS one》2011,6(12):e29095
A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号