首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
对模型选择中交叉验证量CV进行改进,得到新的验证模型是否合适的准则RCV,RCV包含了CV的信息,并包含了拟合程度,模型中的待估参数个数和样本容量等等,比起AIC,BIC和CV具有更好的稳定性和分辨功能.  相似文献   

Abstract: As use of Akaike's Information Criterion (AIC) for model selection has become increasingly common, so has a mistake involving interpretation of models that are within 2 AIC units (ΔAIC ≤ 2) of the top-supported model. Such models are <2 ΔAIC units because the penalty for one additional parameter is +2 AIC units, but model deviance is not reduced by an amount sufficient to overcome the 2-unit penalty and, hence, the additional parameter provides no net reduction in AIC. Simply put, the uninformative parameter does not explain enough variation to justify its inclusion in the model and it should not be interpreted as having any ecological effect. Models with uninformative parameters are frequently presented as being competitive in the Journal of Wildlife Management, including 72% of all AIC-based papers in 2008, and authors and readers need to be more aware of this problem and take appropriate steps to eliminate misinterpretation. I reviewed 5 potential solutions to this problem: 1) report all models but ignore or dismiss those with uninformative parameters, 2) use model averaging to ameliorate the effect of uninformative parameters, 3) use 95% confidence intervals to identify uninformative parameters, 4) perform all-possible subsets regression and use weight-of-evidence approaches to discriminate useful from uninformative parameters, or 5) adopt a methodological approach that allows models containing uninformative parameters to be culled from reported model sets. The first approach is preferable for small sets of a priori models, whereas the last 2 approaches should be used for large model sets or exploratory modeling.  相似文献   

Model-free analysis is a technique commonly used within the field of NMR spectroscopy to extract atomic resolution, interpretable dynamic information on multiple timescales from the R 1, R 2, and steady state NOE. Model-free approaches employ two disparate areas of data analysis, the discipline of mathematical optimisation, specifically the minimisation of a χ2 function, and the statistical field of model selection. By searching through a large number of model-free minimisations, which were setup using synthetic relaxation data whereby the true underlying dynamics is known, certain model-free models have been identified to, at times, fail. This has been characterised as either the internal correlation times, τ e , τ f , or τ s , or the global correlation time parameter, local τ m , heading towards infinity, the result being that the final parameter values are far from the true values. In a number of cases the minimised χ2 value of the failed model is significantly lower than that of all other models and, hence, will be the model which is chosen by model selection techniques. If these models are not removed prior to model selection the final model-free results could be far from the truth. By implementing a series of empirical rules involving inequalities these models can be specifically isolated and removed. Model-free analysis should therefore consist of three distinct steps: model-free minimisation, model-free model elimination, and finally model-free model selection. Failure has also been identified to affect the individual Monte Carlo simulations used within error analysis. Each simulation involves an independent randomised relaxation data set and model-free minimisation, thus simulations suffer from exactly the same types of failure as model-free models. Therefore, to prevent these outliers from causing a significant overestimation of the errors the failed Monte Carlo simulations need to be culled prior to calculating the parameter standard deviations.  相似文献   

The objective of this paper is to introduce the logical basis of AIC-based model selection to persons analyzing capture-recapture data and to explore the key theorettical aspect of AIC based model selection, for open-model capture-recapture, needed for AIC to perform well in this context. Almost all previous work on AIC assumes a Gaussian model; that assumption does not hold for capture-recapture models. Assuming the Cormack-Jolly-Seber model as the true model, we used numerical methods to evaluate the expectation of the log-likelihood relative to Akaike's target predictive log-likelihood. The use of this particular target criterion was motivated by the idea of using the Kullback-Leibler discrepancy for model selection, for which Akaike found the bias of the sample log-likelihood was asymptotically K, where K = the number of estimated (by MLE) parameters. In some sense, then, AIC is a bias-adjusted log-likelihood. For a set of 81 plausible cases, we evaluated this bias almost exactly. The ratio of this bias to the first order theory (bias of K) and to second order theory (K + a sample size adjustment) is essentially 1 for these 81 cases. Thus, AIC should be a suitable basis for model selection in open model capture-recapture.  相似文献   

We use first principles of population genetics to model the evolution of proteins under persistent positive selection (PPS). PPS may occur when organisms are subjected to persistent environmental change, during adaptive radiations, or in host–pathogen interactions. Our mutation–selection model indicates protein evolution under PPS is an irreversible Markov process, and thus proteins under PPS show a strongly asymmetrical distribution of selection coefficients among amino acid substitutions. Our model shows the criteria ω>1 (where ω is the ratio of nonsynonymous over synonymous codon substitution rates) to detect positive selection is conservative and indeed arbitrary, because in real proteins many mutations are highly deleterious and are removed by selection even at positively selected sites. We use a penalized-likelihood implementation of the PPS model to successfully detect PPS in plant RuBisCO and influenza HA proteins. By directly estimating selection coefficients at protein sites, our inference procedure bypasses the need for using ω as a surrogate measure of selection and improves our ability to detect molecular adaptation in proteins.  相似文献   

Serum C-reactive protein (CRP) is used as a marker of inflammation in several diseases including autoimmune disease and cardiovascular disease. CRP, a member of the pentraxin family, is comprised of five identical subunits. CRP has diverse ligand-binding properties which depend upon different structural states of CRP. However, little is known about the molecular dynamics and interaction properties of CRP. In this study, we used SAPS, SCRATCH protein predictor, PDBsum, ConSurf, ProtScale, Drawhca, ASAView, SCide and SRide server and performed comprehensive analyses of molecular dynamics, protein–protein and residue–residue interactions of CRP. We used 1GNH.pdb file for the crystal structure of human CRP which generated two pentamers (ABCDE and FGHIJ). The number of residues involved in residue–residue interactions between A–B, B–C, C–D, D–E, F–G, G–H, H–I, I–J, A–E and F–J subunits were 12, 11, 10, 11, 12, 11, 10, 11, 10 and 10, respectively. Fifteen antiparallel β sheets were involved in β-sheet topology, and five β hairpins were involved in forming the secondary structure. Analysis of hydrophobic segment distribution revealed deviations in surface hydrophobicity at different cavities present in CRP. Approximately 33 % of all residues were involved in the stabilization centers. We show that the bioinformatics tools can provide a rapid method to predict molecular dynamics and interaction properties of CRP. Our prediction of molecular dynamics and interaction properties of CRP combined with the modeling data based on the known 3D structure of CRP is helpful in designing stable forms of CRP mutants for structure–function studies of CRP and may facilitate in silico drug design for therapeutic targeting of CRP.  相似文献   

By reason nonlinear relations founded between selection differential and realised selection response we have been made investigations about variants of the genetic-statistical model, which include this nonlinearity. The variations of the model would not only referred to the postulate pattern of the connection between phenotype, genotype and environment but also enclosed the postulate assumption about the distribution of the variates. In an investigated special case the linear model equation P = G ± e was held, however the distributions of P and G were defined over a limited range in one direction. For P we have defined a modified normal distribution and the distribution of the random vector (G, e) non normal regarded with cov (G, e) ≠ 0, By means of a solution set of an integral equation a density function of the random vector (P, G) has been received, in which the expectation of the selection response of the usual genetic-statistical model approximate is included as a special case. The genetical parameters has been derived, which result from changed model. However their representation was only possible partially as an integral function. A subsequent paper informs of the examination this mode! variants, which depend on a parameter of the nonlinearity c.  相似文献   


The gel to fluid phase transition or ordered to disordered phase transition observed in biological membranes are simulated by using constant energy Molecular Dynamics. The surface part of the membrane is modelled as a two-dimensional matrix formed by the head groups of the phospholipid molecules. Head molecules which are modelled as three spheres fused with three force centers, interact with each other via van der Waals and Coulomb type interactions. The -so called- impurity or foreign molecule embedded in the surface represents the protein type molecule which is present in biological membranes and control its activity. It is modelled as a pentagon having one force centers in each corner. It also interacts with the surface molecules again via van der Waals and Coulomb type interactions. The surface density is kept constant in the simulations of the systems with or without impurity. Structural and orientational changes due to impurity were observed and proved by monitoring two-dimensional order parameter. It has been shown that melting of the surface or breakage of the ordering of the surface molecules becomes easier and ordered to disordered phase transition temperature was lowered by 100 K if the impurity is present.  相似文献   

随着质谱技术的快速发展,蛋白质组学已成为继基因组学、转录组学之后的又一研究热点,寻找可靠的差异表达蛋白对于生物标记物的发现至关重要.因此,如何准确、灵敏地筛选出差异蛋白已成为基于质谱的定量蛋白质组学的主要研究内容之一.目前,针对该问题的研究方法众多,但这些方法策略的适用范围不尽相同.总体来说,基于质谱技术筛选差异蛋白的统计学策略可以分为3类:基于经典统计学派的策略、基于贝叶斯学派的统计检验策略和其他策略,这3类方法有各自的应用范围、特点及不足.此外,筛选过程还将产生部分假阳性结果,可以采用其他方法对差异表达蛋白的质量进行控制,以提高统计检验结果的可靠性.  相似文献   

ABSTRACT Statistical inference is an important element of science, but these inferences are constrained within the framework established by the objectives and design of a study. The choice of approach to data analysis, while important, has far less consequence on scientific inference than claimed by Sleep et al. (2007). Their principal assertion—that when model selection is used as the approach to data analysis, all studies provide a reliable foundation for distinguishing among mechanistic explanatory hypotheses—is incorrect and encourages faulty inferences. Sleep et al. (2007) overlook the critical distinction between inferences that result from studies designed a priori to discriminate among a set of candidate explanations versus inferences that result from exploring data post hoc from studies designed originally to meet pattern-based objectives. No approach to data analysis, including model selection, has the power to overcome fundamental limitations on inferences imposed by study design. The comments by Sleep et al. (2007) reinforce the need for scientists to understand clearly the inferential basis for their scientific claims, including the roles and limitations of data analysis.  相似文献   


The quasicrystal structure is considered to be a new type of ordered phase because its Fourier transform has Laue spots with icosahedral symmetry, which is inconsistent with crystal structure. Computer simulation of the formation process of a quasicrystal was performed by the molecular dynamics method. On the basis of the Strandburg type of quasicrystal model, we developed an algorithm of the formation process of binary quasicrystal reflecting the procedure as realistically as possible. The Fourier transform of some of the obtained structures has shown decagonal symmetry although the spots are rather diffused. It has been shown that the potential parameter and experimental condition should be limited to produce a perfect quasicrystal structure.  相似文献   

Reduced bioavailability of nonpolar contaminants due to sorption to natural organic matter is an important factor controlling biodegradation of pollutants in the environment. We established enrichment cultures in which solid organic phases were used to reduce phenanthrene bioavailability to different degrees (R. J. Grosser, M. Friedrich, D. M. Ward, and W. P. Inskeep, Appl. Environ. Microbiol. 66:2695–2702, 2000). Bacteria enriched and isolated from contaminated soils under these conditions were analyzed by denaturing gradient gel electrophoresis (DGGE) and sequencing of PCR-amplified 16S ribosomal DNA segments. Compared to DGGE patterns obtained with enrichment cultures containing sand or no sorptive solid phase, different DGGE patterns were obtained with enrichment cultures containing phenanthrene sorbed to beads of Amberlite IRC-50 (AMB), a weak cation-exchange resin, and especially Biobead SM7 (SM7), a polyacrylic resin that sorbed phenanthrene more strongly. SM7 enrichments selected for mycobacterial phenanthrene mineralizers, whereas AMB enrichments selected for a Burkholderia sp. that degrades phenanthrene. Identical mycobacterial and Burkholderia 16S rRNA sequence segments were found in SM7 and AMB enrichment cultures inoculated with contaminated soil from two geographically distant sites. Other closely related Burkholderia sp. populations, some of which utilized phenanthrene, were detected in sand and control enrichment cultures. Our results are consistent with the hypothesis that different phenanthrene-utilizing bacteria inhabiting the same soils may be adapted to different phenanthrene bioavailabilities.  相似文献   

Human lysozyme has a structure similar to that of hen lysozyme and differs in amino acid sequence by 51 out of 129 residues with one insertion at the position between 47 and 48 in hen lysozyme. The backbone dynamics of free or (NAG)3-bound human lysozyme has been determined by measurements of 15N nuclear relaxation. The relaxation data were analyzed using the Lipari-Szabo formalism and were compared with those of hen lysozyme, which was already reported (Mine S et al.. 1999, J Mol Biol 286:1547-1565). In this paper, it was found that the backbone dynamics of free human and hen lysozymes showed very similar behavior except for some residues, indicating that the difference in amino acid sequence did not affect the behavior of entire backbone dynamics, but the folded pattern was the major determinant of the internal motion of lysozymes. On the other hand, it was also found that the number of residues in (NAG)3-bound human and hen lysozymes showed an increase or decrease in the order parameters at or near active sites on the binding of (NAG)3, indicating the increase in picosecond to nanosecond. These results suggested that the immobilization of residues upon binding (NAG)3 resulted in an entropy penalty and that this penalty was compensated by mobilizing other residues. However, compared with the internal motions between both ligand-bound human and hen lysozymes, differences in dynamic behavior between them were found at substrate binding sites, reflecting a subtle difference in the substrate-binding mode or efficiency of activity between them.  相似文献   


We investigated protein/DNA interactions, using molecular dynamics simulations computed between a 10 Angstom water layer model of the estrogen receptor (ER) protein DNA binding domain (DBD) amino acids and DNA of a non-consensus estrogen response element (ERE) consisting of 29 nucleotide base pairs. This ERE nucleotide sequence occurs naturally upstream of the Xenopus laevis Vitelligenin AI gene. The ER DBD is encoded by three exons. Namely, exons 2 and 3 which encode the two zinc binding motifs and a sequence of exon 4 which encodes a predicted alpha helix. We generated a computer model of the ER DBD using atomic coordinates derived from the average of 30 nuclear magnetic resonance (NMR) spectroscopy coordinate sets. Amino acids on the carboxyl end of the ER DBD were disordered in both X-ray crystallography and NMR determinations and no coordinates were reported. This disordered region includes 10 amino acids of a predicted alpha helix encoded in exon 4 at the exon 3/4 splice junction. These amino acids are known to be important in DNA binding and are also believed to function as a nuclear translocation signal sequence for the ER protein. We generated a computer model of the predicted alpha helix consisting of the 10 amino acids encoded in exon 4 and attached this helix to the carboxyl end of the ER DBD at the exon 3/4 splice junction site. We docked the ER DBD model within the DNA major groove halfsites of the 29 base pair non-consensus ERE and flanking nucleotides. We constructed a solvated model with the ER DBD/ERE complex surrounded by a ten Angstrom water layer and conducted molecular dynamics simulations. Hydrogen bonding interactions were monitored. In addition, van der Waals and electrostatic interaction energies were calculated. Amino acids of the ER DBD DNA recognition helix formed both direct and water mediated hydrogen bonds at cognate codon-anticodon nucleotide base and backbone sites within the ERE DNA right major groove halfsite. Amino acids of the ER DBD exon 4 encoded predicted alpha helix formed direct and water mediated H-bonds with base and backbone sites of their cognate codon-anticodon nucleotides within the minor grooves flanking the ERE DNA major groove halfsites. These interactions together induced bending of the DNA into the protein.  相似文献   


Two potential parameter sets for alkali silicates were derived on the basis of ab-initio MO calculations. One is a model containing completely ionic alkali (model I), and another is that derived from cluster calculations (model II). These sets were tested against the crystal, glass, and liquid of metasilicates. The model II can reproduce these structures well under constant pressure conditions, and is found to be better than model I as a whole.  相似文献   


Point mutations in the human prion protein gene, leading to amino acid substitutions in the human prion protein contribute to conversion of PrPC to PrPSc and amyloid formation, resulting in prion diseases such as familial Creutzfeldt-Jakob disease (CJD), Gerstmann-Straussler-Scheinker disease (GSS), and fatal familial insomnia. We have investigated impressions of prevalent mutations including Q217R, D202N, F198S, on the human prion protein and compared the mutant models with wild types. Structural analyses of models were performed with molecular modeling and molecular dynamics simulation methods. According to our results, frequently occurred mutations are observed in conserved and fully conserved sequences of human prion protein and the most fluctuation values occur in the Helix 1 around residues 144–152 and C-terminal end of the Helix 2. Our analysis of results obtained from MD simulation clearly shows that this long-range effect plays an important role in the conformational fluctuations in mutant structures of human prion protein. Results obtained from molecular modeling such as creation or elimination of some hydrogen bonds, increase or decrease of the accessible surface area and molecular surface, loss or accumulation of negative or positive charges on specific positions, and altering the polarity and pKa values, show that amino acid point mutations, though not urgently change the stability of PrP, might have some local impacts on the protein interactions which are required for oligomerization into fibrillar species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号