首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
对模型选择中交叉验证量CV进行改进,得到新的验证模型是否合适的准则RCV,RCV包含了CV的信息,并包含了拟合程度,模型中的待估参数个数和样本容量等等,比起AIC,BIC和CV具有更好的稳定性和分辨功能.  相似文献   

2.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis.  相似文献   

3.
Ishiguro, Sakamoto, and Kitagawa (1997, Annals of the Institute of Statistical Mathematics 49, 411-434) proposed EIC as an extension of Akaike criterion (AIC); the idea leading to EIC is to correct the bias of the log-likelihood, considered as an estimator of the Kullback-Leibler information, using bootstrap. We develop this criterion for its use in multivariate semiparametric situations, and argue that it can be used for choosing among parametric and semiparametric estimators. A simulation study based on aregression model shows that EIC is better than its competitors although likelihood cross-validation performs nearly as well except for small sample size. Its use is illustrated by estimating the mean evolution of viral RNA levels in a group of infants infected by HIV.  相似文献   

4.
Model complexity in ecological niche modelling has been recently considered as an important issue that might affect model performance. New methodological developments have implemented the Akaike information criterion (AIC) to capture model complexity in the Maxent algorithm model. AIC is calculated based on the number of parameters and likelihoods of continuous raw outputs. ENMeval R package allows users to perform a species-specific tuning of Maxent settings running models with different combinations of regularization multiplier and feature classes and finally, all these models are compared using AIC corrected for small sample size. This approach is focused to find the “best” model parametrization and it is thought to maximize the model complexity and therefore, its predictability. We found that most niche modelling studies examined by us (68%) tend to consider AIC as a criterion of predictive accuracy in geographical distribution. In other words, AIC is used as a criterion to choose those models with the highest capacity to discriminate between presences and absences. However, the link between AIC and geographical predictive accuracy has not been tested so far. Here, we evaluated this relationship using a set of simulated (virtual) species. We created a set of nine virtual species with different ecological and geographical traits (e.g., niche position, niche breadth, range size) and generated different sets of true presences and absences data across geography. We built a set of models using Maxent algorithm with different regularization values and features schemes and calculated AIC values for each model. For each model, we obtained binary predictions using different threshold criteria and validated using independent presence and absences data. We correlated AIC values against standard validation metrics (e.g., Kappa, TSS) and the number of pixels correctly predicted as presences and absences. We did not find a correlation between AIC values and predictive accuracy from validation metrics. In general, those models with the lowest AIC values tend to generate geographical predictions with high commission and omission errors. The results were consistent across all species simulated. Finally, we suggest that AIC should not be used if users are interested in prediction more than explanation in ecological niche modelling.  相似文献   

5.
生长参数是渔业资源评估和管理策略中的关键参数,因而对目标鱼种选择合适的生长模型至关重要.本文以北部湾多齿蛇鲻为例,采用2006年12月至2009年7逐月采集的体长与年龄鉴定数据(n=2046),运用5个候选生长模型,利用最大似然法在加性误差条件下估算生长参数,并通过模型近似解释率(R2adj)、根平均方差(RMSE)、赤井信息准则(AIC)和贝叶斯信息准则(BIC)检验模型拟合度.结果表明: 在当前大样本的情况下,4种统计方法在模型拟合度排序上表现一致;多模型推论检验结果表明,Generalized VBGF获得足够的模型支持,并占到AIC权重的95.9%,可以独立描述多齿蛇鲻的体长与年龄的生长关系,生长方程为:Lt=578.49\[1-e-0.051(t-0.14)\]0.361.  相似文献   

6.
Claeskens G  Consentino F 《Biometrics》2008,64(4):1062-1069
SUMMARY: Application of classical model selection methods such as Akaike's information criterion (AIC) becomes problematic when observations are missing. In this article we propose some variations on the AIC, which are applicable to missing covariate problems. The method is directly based on the expectation maximization (EM) algorithm and is readily available for EM-based estimation methods, without much additional computational efforts. The missing data AIC criteria are formally derived and shown to work in a simulation study and by application to data on diabetic retinopathy.  相似文献   

7.
The objective of this paper is to introduce the logical basis of AIC-based model selection to persons analyzing capture-recapture data and to explore the key theorettical aspect of AIC based model selection, for open-model capture-recapture, needed for AIC to perform well in this context. Almost all previous work on AIC assumes a Gaussian model; that assumption does not hold for capture-recapture models. Assuming the Cormack-Jolly-Seber model as the true model, we used numerical methods to evaluate the expectation of the log-likelihood relative to Akaike's target predictive log-likelihood. The use of this particular target criterion was motivated by the idea of using the Kullback-Leibler discrepancy for model selection, for which Akaike found the bias of the sample log-likelihood was asymptotically K, where K = the number of estimated (by MLE) parameters. In some sense, then, AIC is a bias-adjusted log-likelihood. For a set of 81 plausible cases, we evaluated this bias almost exactly. The ratio of this bias to the first order theory (bias of K) and to second order theory (K + a sample size adjustment) is essentially 1 for these 81 cases. Thus, AIC should be a suitable basis for model selection in open model capture-recapture.  相似文献   

8.
Ding J  Wang JL 《Biometrics》2008,64(2):546-556
Summary .   In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out to be more elusive than models for standard longitudinal studies in which no survival endpoint occurs. In this article, we propose a nonparametric multiplicative random effects model for the longitudinal process, which has many applications and leads to a flexible yet parsimonious nonparametric random effects model. A proportional hazards model is then used to link the biomarkers and event time. We use B-splines to represent the nonparametric longitudinal process, and select the number of knots and degrees based on a version of the Akaike information criterion (AIC). Unknown model parameters are estimated through maximizing the observed joint likelihood, which is iteratively maximized by the Monte Carlo Expectation Maximization (MCEM) algorithm. Due to the simplicity of the model structure, the proposed approach has good numerical stability and compares well with the competing parametric longitudinal approaches. The new approach is illustrated with primary biliary cirrhosis (PBC) data, aiming to capture nonlinear patterns of serum bilirubin time courses and their relationship with survival time of PBC patients.  相似文献   

9.
10.
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.  相似文献   

11.
I show how one can estimate the shape of a thermal performance curve using information theory. This approach ranks plausible models by their Akaike information criterion (AIC), which is a measure of a model's ability to describe the data discounted by the model's complexity. I analyze previously published data to demonstrate how one applies this approach to describe a thermal performance curve. This exemplary analysis produced two interesting results. First, a model with a very high r2 (a modified Gaussian function) appeared to overfit the data. Second, the model favored by information theory (a Gaussian function) has been used widely in optimality studies of thermal performance curves. Finally, I discuss the choice between regression and ANOVA when comparing thermal performance curves and highlight a superior method called template mode of variation. Much progress can be made by abandoning traditional methods for a method that combines information theory with template mode of variation.  相似文献   

12.
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments.  相似文献   

13.
Reversible-jump Markov chain Monte Carlo (RJ-MCMC) is a technique for simultaneously evaluating multiple related (but not necessarily nested) statistical models that has recently been applied to the problem of phylogenetic model selection. Here we use a simulation approach to assess the performance of this method and compare it to Akaike weights, a measure of model uncertainty that is based on the Akaike information criterion. Under conditions where the assumptions of the candidate models matched the generating conditions, both Bayesian and AIC-based methods perform well. The 95% credible interval contained the generating model close to 95% of the time. However, the size of the credible interval differed with the Bayesian credible set containing approximately 25% to 50% fewer models than an AIC-based credible interval. The posterior probability was a better indicator of the correct model than the Akaike weight when all assumptions were met but both measures performed similarly when some model assumptions were violated. Models in the Bayesian posterior distribution were also more similar to the generating model in their number of parameters and were less biased in their complexity. In contrast, Akaike-weighted models were more distant from the generating model and biased towards slightly greater complexity. The AIC-based credible interval appeared to be more robust to the violation of the rate homogeneity assumption. Both AIC and Bayesian approaches suggest that substantial uncertainty can accompany the choice of model for phylogenetic analyses, suggesting that alternative candidate models should be examined in analysis of phylogenetic data. [AIC; Akaike weights; Bayesian phylogenetics; model averaging; model selection; model uncertainty; posterior probability; reversible jump.].  相似文献   

14.
Diseased animals may exhibit behavioral shifts that increase or decrease their probability of being randomly sampled. In harvest-based sampling approaches, animal movements, changes in habitat utilization, changes in breeding behaviors during harvest periods, or differential susceptibility to harvest via behaviors like hiding or decreased sensitivity to stimuli may result in a non-random sample that biases prevalence estimates. We present a method that can be used to determine whether bias exists in prevalence estimates from harvest samples. Using data from harvested mule deer (Odocoileus hemionus) sampled in northcentral Colorado (USA) during fall hunting seasons 1996-98 and Akaike's information criterion (AIC) model selection, we detected within-yr trends indicating potential bias in harvest-based prevalence estimates for chronic wasting disease (CWD). The proportion of CWD-positive deer harvested slightly increased through time within a yr. We speculate that differential susceptibility to harvest or breeding season movements may explain the positive trend in proportion of CWD-positive deer harvested during fall hunting seasons. Detection of bias may provide information about temporal patterns of a disease, suggest biological hypotheses that could further understanding of a disease, or provide wildlife managers with information about when diseased animals are more or less likely to be harvested. Although AIC model selection can be useful for detecting bias in data, it has limited utility in determining underlying causes of bias. In cases where bias is detected in data using such model selection methods, then design-based methods (i.e., experimental manipulation) may be necessary to assign causality.  相似文献   

15.
In phylogenetic analyses of molecular sequence data, partitioning involves estimating independent models of molecular evolution for different sets of sites in a sequence alignment. Choosing an appropriate partitioning scheme is an important step in most analyses because it can affect the accuracy of phylogenetic reconstruction. Despite this, partitioning schemes are often chosen without explicit statistical justification. Here, we describe two new objective methods for the combined selection of best-fit partitioning schemes and nucleotide substitution models. These methods allow millions of partitioning schemes to be compared in realistic time frames and so permit the objective selection of partitioning schemes even for large multilocus DNA data sets. We demonstrate that these methods significantly outperform previous approaches, including both the ad hoc selection of partitioning schemes (e.g., partitioning by gene or codon position) and a recently proposed hierarchical clustering method. We have implemented these methods in an open-source program, PartitionFinder. This program allows users to select partitioning schemes and substitution models using a range of information-theoretic metrics (e.g., the Bayesian information criterion, akaike information criterion [AIC], and corrected AIC). We hope that PartitionFinder will encourage the objective selection of partitioning schemes and thus lead to improvements in phylogenetic analyses. PartitionFinder is written in Python and runs under Mac OSX 10.4 and above. The program, source code, and a detailed manual are freely available from www.robertlanfear.com/partitionfinder.  相似文献   

16.
为了研究日本鳀幼鱼生长的异质性,本研究根据2019年4—6月在浙江沿岸海域进行专项特许捕捞中采集的日本鳀幼鱼样品,采用拟合广义线性模型和9个线性混合效应模型,分析日本鳀幼鱼叉长与体重关系的异质性。结果表明: 本次采样的日本鳀幼鱼叉长范围为14~74 mm,平均叉长为33 mm,优势叉长组为21~50 mm;体重范围为0.01~2.96 g,平均体重为0.28 g,优势体重组为0.01~0.50 g。根据赤池信息准则,具有月份和水域对生长参数a、b随机效应的线性混合效应模型的拟合效果最优;交叉验证结果也证明了其预测效果最优。在最优模型中,生长参数a的固定值为0.24×10-5,其估计值波动不明显,b的固定值为3.246,估计值范围为3.206~3.272,表示日本鳀幼鱼为正异速生长。这说明月份和水域对日本鳀幼鱼叉长与体重关系具有显著影响。  相似文献   

17.
Nonparametric mixed effects models for unequally sampled noisy curves   总被引:7,自引:0,他引:7  
Rice JA  Wu CO 《Biometrics》2001,57(1):253-259
We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.  相似文献   

18.
Model choice techniques are proposed for logistic regression, based on prediction criterion estimation similar to Akaike's information criterion. For artificial insemination data of cattle, we wish to study a factor influence on success proportion; tests standard methods don't always seem suitable for prediction objective. Two prediction criterion estimate methods are applied to these data: simulated bootstrap and asymptotic estimates. Some empirical properties of this estimate are studied.  相似文献   

19.
Adhesion flow assays are commonly employed to characterize the kinetics and force-dependence of receptor-ligand interactions. As transient cellular adhesion events are often mediated by a small number of receptor-ligand complexes (tether bonds) their durations are highly variable, which in turn presents obstacles to standard methods of analysis. In this paper, we employ the stochastic approach to chemical kinetics to construct the pause time distribution. Using this distribution, we develop a robust maximum likelihood (ML) approach to the robust estimation of rate constants associated with receptor-mediated transient adhesion and their confidence intervals. We then formulate robust estimators of the parameters of models for the force-dependence of the off-rate. Lastly, we develop a robust method of elucidation of the force-dependence of the off-rate using Akaike's information criterion (AIC). Our findings conclusively demonstrate that ML estimators of adhesion kinetics are substantial improvements over more conventional approaches, and when combined with Fisher information, they may be used to objectively and reproducibly distinguish the kinetics of different receptor-ligand complexes. Software for the implementation of these methods with experimental data is publicly available as for download at http://www.laurenzi.net.  相似文献   

20.
Aim   Although parameter estimates are not as affected by spatial autocorrelation as Type I errors, the change from classical null hypothesis significance testing to model selection under an information theoretic approach does not completely avoid problems caused by spatial autocorrelation. Here we briefly review the model selection approach based on the Akaike information criterion (AIC) and present a new routine for Spatial Analysis in Macroecology (SAM) software that helps establishing minimum adequate models in the presence of spatial autocorrelation.
Innovation    We illustrate how a model selection approach based on the AIC can be used in geographical data by modelling patterns of mammal species in South America represented in a grid system ( n  = 383) with 2° of resolution, as a function of five environmental explanatory variables, performing an exhaustive search of minimum adequate models considering three regression methods: non-spatial ordinary least squares (OLS), spatial eigenvector mapping and the autoregressive (lagged-response) model. The models selected by spatial methods included a smaller number of explanatory variables than the one selected by OLS, and minimum adequate models contain different explanatory variables, although model averaging revealed a similar rank of explanatory variables.
Main conclusions    We stress that the AIC is sensitive to the presence of spatial autocorrelation, generating unstable and overfitted minimum adequate models to describe macroecological data based on non-spatial OLS regression. Alternative regression techniques provided different minimum adequate models and have different uncertainty levels. Despite this, the averaged model based on Akaike weights generates consistent and robust results across different methods and may be the best approach for understanding of macroecological patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号