首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A central goal of computational biology is the prediction of phenotype from DNA and protein sequence data. Recent models of sequence change use in silico prediction systems to incorporate the effects of phenotype on evolutionary rates. These models have been designed for analyzing sequence data from different species and have been accompanied by statistical techniques for estimating model parameters when the incorporation of phenotype induces dependent change among sequence positions. A difficulty with these efforts to link phenotype and interspecific evolution is that evolution occurs within populations, and parameters of interspecific models should have population genetic interpretations. We show, with two examples, how population genetic interpretations can be assigned to evolutionary models. The first example considers the impact of RNA secondary structure on sequence change, and the second reflects the tendency for protein tertiary structure to influence nonsynonymous substitution rates. We argue that statistical fit to data should not be the sole criterion for assessing models of sequence change. A good interspecific model should also yield a clear and biologically plausible population genetic interpretation.  相似文献   

2.
Accounting for spatial pattern when modeling organism-environment interactions   总被引:10,自引:0,他引:10  
Statistical models of environment-abundance relationships may be influenced by spatial autocorrelation in abundance, environmental variables, or both. Failure to account for spatial autocorrelation can lead to incorrect conclusions regarding both the absolute and relative importance of environmental variables as determinants of abundance. We consider several classes of statistical models that are appropriate for modeling environment-abundance relationships in the presence of spatial autocorrelation, and apply these to three case studies: 1) abundance of voles in relation to habitat characteristics; 2) a plant competition experiment; and 3) abundance of Orbatid mites along environmental gradients. We find that when spatial pattern is accounted for in the modeling process, conclusions about environmental control over abundance can change dramatically. We conclude with five lessons: 1) spatial models are easy to calculate with several of the most common statistical packages; 2) results from spatially-structured models may point to conclusions radically different from those suggested by a spatially independent model; 3) not all spatial autocorrelation in abundances results from spatial population dynamics; it may also result from abundance associations with environmental variables not included in the model; 4) the different spatial models do have different mechanistic interpretations in terms of ecological processes – thus ecological model selection should take primacy over statistical model selection; 5) the conclusions of the different spatial models are typically fairly similar – making any correction is more important than quibbling about which correction to make.  相似文献   

3.
Guo W  Brown MB 《Biometrics》2000,56(3):686-691
Structural time series models have applications in many different fields such as biology, economics, and meteorology. A structural times series model can be represented as a state-space model where the states of the system represent the unobserved components and the structural parameters have clear interpretations. This paper introduces a class of structural time series models that incorporate feedback from the latent components of the history. An iterative procedure is proposed for estimation. These models allow flexible and robust feedback mechanisms, have clear interpretations, and have a computationally efficient estimation procedure. They are applied to hormone data to characterize hormone secretion and to explore a potential feedback mechanism.  相似文献   

4.
Experimental tests of clearly articulated hypotheses are an increasingly widespread feature of modern marine ecology. Increased use of experiments has not, however, been accompanied by increased understanding of the logical structure of falsificationist tests. Most observations can be explained by several different models or theories. To distinguish among these requires demonstration of the falsity of the consequences or predictions of incorrect models. This is best achieved by deriving from each model one or more hypotheses (predictions) about the type, form or nature of observations that should occur in some not-yet-examined set of circumstances. Because of logical constraints on the possibility of proving the correctness of such hypotheses, they must be inverted to form logical null hypotheses which comprise all alternative possibilities to those predicted in the hypotheses. Correctness or not of null hypotheses can then be ascertained by an appropriately designed experiment (or test), leading to unambiguous rejection or retention of the null hypotheses. The former corroborates the hypotheses and provides support for the correctness of the explanatory model for the original observations. In contrast, retention of a null hypothesis identifies an incorrect model. The growth of knowledge is thus the elimination of false models, theories and explanations. Ecological experiments usually require statistical procedures for determining whether or not null hypotheses should be retained. Construction of statistical null hypotheses (i.e. definitions of parameters of frequency distributions of test statistics) sometimes requires that these be identical to logical hypotheses (and not to the logical nulls). This leads to irrational acceptance of hypotheses and the models or theories from which they were derived. It also poses immense problems for determinations of statistical power of experiments. Ecological experiments are analysed to reveal the nature of, and linkages between, their components in relation to falsificationism, statistical procedures and the logical properties and interpretations of ecological theories.  相似文献   

5.
A mathematical model of a process contains parameters supposedly characterizing the system which manifests the process. If the parameters are statistically distributed in a population of such systems, the process manifested by the entire population will in general be described by a different mathematical model. Thus a choice is always at hand between two or more mathematical models, depending on which parameters (if any) are assumed to be distributed and, if so, how. Examples of such alternative interpretations are given for mathematical models of some behavioral processes.  相似文献   

6.
Intraclass correlation (ICC) is an established tool to assess inter-rater reliability. In a seminal paper published in 1979, Shrout and Fleiss considered three statistical models for inter-rater reliability data with a balanced design. In their first two models, an infinite population of raters was considered, whereas in their third model, the raters in the sample were considered to be the whole population of raters. In the present paper, we show that the two distinct estimates of ICC developed for the first two models can both be applied to the third model and we discuss their different interpretations in this context.  相似文献   

7.
Identification of those at greatest risk of death due to the substantial threat of COVID-19 can benefit from novel approaches to epidemiology that leverage large datasets and complex machine-learning models, provide data-driven intelligence, and guide decisions such as intensive-care unit admission (ICUA). The objective of this study is two-fold, one substantive and one methodological: substantively to evaluate the association of demographic and health records with two related, yet different, outcomes of severe COVID-19 (viz., death and ICUA); methodologically to compare interpretations based on logistic regression and on gradient-boosted decision tree (GBDT) predictions interpreted by means of the Shapley impacts of covariates. Very different association of some factors, e.g., obesity and chronic respiratory diseases, with death and ICUA may guide review of practice. Shapley explanation of GBDTs identified varying effects of some factors among patients, thus emphasising the importance of individual patient assessment. The results of this study are also relevant for the evaluation of complex automated clinical decision systems, which should optimise prediction scores whilst remaining interpretable to clinicians and mitigating potential biases.  相似文献   

8.
MOTIVATION: Although several recently proposed analysis packages for microarray data can cope with heavy-tailed noise, many applications rely on Gaussian assumptions. Gaussian noise models foster computational efficiency. This comes, however, at the expense of increased sensitivity to outlying observations. Assessing potential insufficiencies of Gaussian noise in microarray data analysis is thus important and of general interest. RESULTS: We propose to this end assessing different noise models on a large number of microarray experiments. The goodness of fit of noise models is quantified by a hierarchical Bayesian analysis of variance model, which predicts normalized expression values as a mixture of a Gaussian density and t-distributions with adjustable degrees of freedom. Inference of differentially expressed genes is taken into consideration at a second mixing level. For attaining far reaching validity, our investigations cover a wide range of analysis platforms and experimental settings. As the most striking result, we find irrespective of the chosen preprocessing and normalization method in all experiments that a heavy-tailed noise model is a better fit than a simple Gaussian. Further investigations revealed that an appropriate choice of noise model has a considerable influence on biological interpretations drawn at the level of inferred genes and gene ontology terms. We conclude from our investigation that neglecting the over dispersed noise in microarray data can mislead scientific discovery and suggest that the convenience of Gaussian-based modelling should be replaced by non-parametric approaches or other methods that account for heavy-tailed noise.  相似文献   

9.
The availability of novel biomarkers in several branches of medicine opens room for refining prognosis by adding factors on top of those having an established role. It is accepted that the impact of novel factors should not rely solely on regression coefficients and their significance but also on predictive power measures, such as Brier score and ROC‐based quantities. However, novel factors that are promising at the exploratory stage often result in disappointingly low impact in the predictive power. This motivated the proposal of the net reclassification improvement and the integrated discrimination improvement, as direct measures of predictive power gain due to additional factors based on the concept of reclassification tables. These measures became extremely popular in cardiovascular disease and cancer applications, given the apparently easy interpretation. However, recent contributions in the biostatistical literature enlightened the tendency to indicate as advantageous models obtained by adding unrelated factors. These measures should not be used in practice. A further measure proposed a decade ago, the net benefit, is becoming a standard in assessing the consequences in terms of costs and benefits when using a risk predictor in practice for classification. This work reviews the conceptual formulations and interpretations of the available graphical methods and summary measures for evaluating risk predictor models. The aim is to provide guidance in the evaluation process that from the model development brings the risk predictor to be used in clinical practice for binary decision rules.  相似文献   

10.
Summary We consider the problem of estimating the effect of exposure on multiple continuous outcomes, when the outcomes are measured on different scales and are nested within multiple outcome classes, or “domains.” Our Bayesian model extends the linear mixed models approach to allow the exposure effect to differ across domains and across outcomes within domains. Our model can be parameterized to allow shrinkage of the effects within the different levels of nesting, or to allow fixed domain‐specific effects with no shrinkage. Our model also allows covariate effects to differ across outcomes and domains. Our methodology is applied to data on prenatal methylmercury exposure and multiple outcomes in four domains measured at 9 years of age on children enrolled in the Seychelles Child Development Study. We use three different priors and found that our main conclusions were not sensitive to the choice of prior. Simulation studies examine the model performance under alternative scenarios. Our results demonstrate that a sizeable increase in power is possible.  相似文献   

11.
Two types of behavior have been previously reported in models of immune networks. The typical behavior of simple models, which involve B cells only, is stationary behavior involving several steady states. Finite amplitude perturbations may cause the model to switch between different equilibria. The typical behavior of more realistic models, which involve both B cells and antibody, consists of autonomous oscillations and/or chaos. While stationary behavior leads to easy interpretations in terms of idiotypic memory, oscillatory behavior seems to be in better agreement with experimental data obtained in unimmunized animals. Here we study a series of models of the idiotypic interaction between two B cell clones. The models differ with respect to the incorporation of antibodies, B cell maturation and compartmentalization. The most complicated model in the series has two realistic parameter regimes in which the behavior is respectively stationary and chaotic. The stability of the equilibrium states and the structure and interactions of the stable and unstable manifolds of the saddle-type equilibria turn out to be factors influencing the model's behavior. Whether or not the model is able to attain any form of sustained oscillatory behavior, i.e. limit cycles or chaos, seems to be determined by (global) bifurcations involving the stable and unstable manifolds of the equilibrium states. We attempt to determine whether such behavior should be expected to be attained from reasonable initial conditions by incorporating an immune response to an antigen in the model. A comparison of the behavior of the model with experimental data from the literature provides suggestions for the parameter regime in which the immune system is operating.  相似文献   

12.
The interactions of monomeric and dimeric kinesin and ncd constructs with microtubules have been investigated using cryo-electron microscopy (cryo-EM) and several biochemical methods. There is a good consensus on the structure of dimeric ncd when bound to a tubulin dimer showing one head attached directly to tubulin, and the second head tethered to the first. However, the 3D maps of dimeric kinesin motor domains are still quite controversial and leave room for different interpretations. Here we reinvestigated the microtubule binding patterns of dimeric kinesins by cryo-EM and digital 3D reconstruction under different nucleotide conditions and different motor:tubulin ratios, and determined the molecular mass of motor-tubulin complexes by STEM. Both methods revealed complementary results. We found that the ratio of bound kinesin motor-heads to alphabeta-tubulin dimers was never reaching above 1.5 irrespective of the initial mixing ratios. It appears that each kinesin dimer occupies two microtubule-binding sites, provided that there is a free one nearby. Thus the appearances of different image reconstructions can be explained by non-specific excess binding of motor heads. Consequently, the use of different apparent density distributions for docking the X-ray structures onto the microtubule surface leads to different and mutually exclusive models. We propose that in conditions of stoichiometric binding the two heads of a kinesin dimer separate and bind to different tubulin subunits. This is in contrast to ncd where the two heads remain tightly attached on the microtubule surface. Using dimeric kinesin molecules crosslinked in their neck domain we also found that they stabilize protofilaments axially, but not laterally, which is a strong indication that the two heads of the dimers bind along one protofilament, rather than laterally bridging two protofilaments. A molecular walking model based on these results summarizes our conclusions and illustrates the implications of symmetry for such models.  相似文献   

13.
Phylogenetic dating is one of the most powerful and commonly used methods of drawing epidemiological interpretations from pathogen genomic data. Building such trees requires considering a molecular clock model which represents the rate at which substitutions accumulate on genomes. When the molecular clock rate is constant throughout the tree then the clock is said to be strict, but this is often not an acceptable assumption. Alternatively, relaxed clock models consider variations in the clock rate, often based on a distribution of rates for each branch. However, we show here that the distributions of rates across branches in commonly used relaxed clock models are incompatible with the biological expectation that the sum of the numbers of substitutions on two neighboring branches should be distributed as the substitution number on a single branch of equivalent length. We call this expectation the additivity property. We further show how assumptions of commonly used relaxed clock models can lead to estimates of evolutionary rates and dates with low precision and biased confidence intervals. We therefore propose a new additive relaxed clock model where the additivity property is satisfied. We illustrate the use of our new additive relaxed clock model on a range of simulated and real data sets, and we show that using this new model leads to more accurate estimates of mean evolutionary rates and ancestral dates.  相似文献   

14.
Modellers of large-scale genome rearrangement events, in which segments of DNA are inverted, moved, swapped, or even inserted or deleted, have found a natural syntax in the language of permutations. Despite this, there has been a wide range of modelling choices, assumptions and interpretations that make navigating the literature a significant challenge. Indeed, even authors of papers that use permutations to model genome rearrangement can struggle to interpret each others’ work, because of subtle differences in basic assumptions that are often deeply ingrained (and consequently sometimes not even mentioned). In this paper, we describe the different ways in which permutations have been used to model genomes and genome rearrangement events, presenting some features and limitations of each approach, and show how the various models are related. This paper will help researchers navigate the landscape of permutation-based genome rearrangement models and make it easier for authors to present clear and consistent models.  相似文献   

15.
Wu H  Ding AA 《Biometrics》1999,55(2):410-418
In this paper, we introduce a novel application of hierarchical nonlinear mixed-effect models to HIV dynamics. We show that a simple model with a sum of exponentials can give a good fit to the observed clinical data of HIV-1 dynamics (HIV-1 RNA copies) after initiation of potent antiviral treatments and can also be justified by a biological compartment model for the interaction between HIV and its host cells. This kind of model enjoys both biological interpretability and mathematical simplicity after reparameterization and simplification. A model simplification procedure is proposed and illustrated through examples. We interpret and justify various simplified models based on clinical data taken during different phases of viral dynamics during antiviral treatments. We suggest the hierarchical nonlinear mixed-effect model approach for parameter estimation and other statistical inferences. In the context of an AIDS clinical trial involving patients treated with a combination of potent antiviral agents, we show how the models may be used to draw biologically relevant interpretations from repeated HIV-1 RNA measurements and demonstrate the potential use of the models in clinical decision-making.  相似文献   

16.
Structural models of the variable domains of the murine anti-2-phenyloxazolone IgG (Ox1 idiotype) and its somatic variant, which has higher affinity to the hapten 2-phenyloxazolone, were constructed by computer-aided model building using known structures of highly homologous immunoglobulins as templates. Molecular dynamics simulations were used to dock the hapten between the VL and VH domains. The hapten is predicted to bind to slightly different sites in the two models. Hypotheses concerning the role of a number of preferred mutations in anti-oxazolone variants are presented. These can be tested by mutagenesis and crystallography. In particular, the higher binding affinities of the different antibody variants are shown to correlate with better complementarity of electrostatics. The molecular dynamic simulations also suggest that two mobile tryptophans at the mouth of the pocket may play an important role in the binding of hapten.  相似文献   

17.
Ecological stressors (i.e., environmental factors outside their normal range of variation) can mediate each other through their interactions, leading to unexpected combined effects on communities. Determining whether the net effect of stressors is ecologically surprising requires comparing their cumulative impact to a null model that represents the linear combination of their individual effects (i.e., an additive expectation). However, we show that standard additive and multiplicative null models that base their predictions on the effects of single stressors on community properties (e.g., species richness or biomass) do not provide this linear expectation, leading to incorrect interpretations of antagonistic and synergistic responses by communities. We present an alternative, the compositional null model, which instead bases its predictions on the effects of stressors on individual species, and then aggregates them to the community level. Simulations demonstrate the improved ability of the compositional null model to accurately provide a linear expectation of the net effect of stressors. We simulate the response of communities to paired stressors that affect species in a purely additive fashion and compare the relative abilities of the compositional null model and two standard community property null models (additive and multiplicative) to predict these linear changes in species richness and community biomass across different combinations (both positive, negative, or opposite) and intensities of stressors. The compositional model predicts the linear effects of multiple stressors under almost all scenarios, allowing for proper classification of net effects, whereas the standard null models do not. Our findings suggest that current estimates of the prevalence of ecological surprises on communities based on community property null models are unreliable, and should be improved by integrating the responses of individual species to the community level as does our compositional null model.  相似文献   

18.
Aim This paper reviews possible candidate models that may be used in theoretical modelling and empirical studies of species–area relationships (SARs). The SAR is an important and well‐proven tool in ecology. The power and the exponential functions are by far the models that are best known and most frequently applied to species–area data, but they might not be the most appropriate. Recent work indicates that the shape of species–area curves in arithmetic space is often not convex but sigmoid and also has an upper asymptote. Methods Characteristics of six convex and eight sigmoid models are discussed and interpretations of different parameters summarized. The convex models include the power, exponential, Monod, negative exponential, asymptotic regression and rational functions, and the sigmoid models include the logistic, Gompertz, extreme value, Morgan–Mercer–Flodin, Hill, Michaelis–Menten, Lomolino and Chapman–Richards functions plus the cumulative Weibull and beta‐P distributions. Conclusions There are two main types of species–area curves: sample curves that are inherently convex and isolate curves, which are sigmoid. Both types may have an upper asymptote. A few have attempted to fit convex asymptotic and/or sigmoid models to species–area data instead of the power or exponential models. Some of these or other models reviewed in this paper should be useful, especially if species–area models are to be based more on biological processes and patterns in nature than mere curve fitting. The negative exponential function is an example of a convex model and the cumulative Weibull distribution an example of a sigmoid model that should prove useful. A location parameter may be added to these two and some of the other models to simulate absolute minimum area requirements.  相似文献   

19.
Three dimensional models of NB-ARC domains in five different proteins were constructed based on the recently published crystal structure of the apoptotic protease activating factor 1, of which two are for tomato species, one each for flax, Arabidopsis, and nematode. Standard multiple sequence alignment was performed for chosen members of the NB-ARC domains, very divergent from each other in protein sequence, followed by homology model building and structure refinement. In this alignment, amino acid insertions and deletions between members generally fall in loop regions or at ends of alpha helices. Despite the presence of sequence divergence between the species, it is argued that the NB-ARC domains carry out the similar biological functions in the various species, highlighting the ATP binding and ATPase activity. By our comparative study of these models, it is predicted that NB-ARC domains should bind ADP/ATP rather than GDP/GTP. Both natural and induced mutants of Arabidopsis within the RPS2 locus and their phenotypes for disease reaction against Pseudomonas syringae are rationalized from the protein model. Apaf-1 Thr263 and Arg265 positions conserved totally within the NB-ARC domains are predicted to take active part in the catalytic activity of kinase-3 motif, the arginine known as the sensor I motif in AAA+ proteins. This was later verified for the Ced-4 crystal structure in complex with Ced-9. Our model of Ced-4 based on Apaf-1 was also compared with its crystal structure in the Ced-4-Ced-9 complex; the 3 layered alpha/beta domain superposes quite well, helical domain I is shifted by about 5 A but the winged helix domain is rotated away to a new position. Since Apaf-1 was crystallized with ADP and Ced-4-Ced9 with magnesium-ATP, this rotation signifies a change in structure of these NB-ARC domains between the two forms. Further, we hypothesize that certain mutants in the plant R proteins called 'constitutive gain-of-function' or 'autocatalytic' dispose their winged helix domains permanently like the magnesium-ATP form as observed for Ced-4, avoiding the closed ADP conformation. The models are also validated with mutagenesis data for a related tomato protein I-2, tomato prf and flax, including loss of function, wild type and autocatalytic phenotypes, and compared with similar data for potato and tobacco proteins, for which models were not built. These three dimensional models would help us to understand the spatial arrangement, function of R proteins and their conserved motifs.  相似文献   

20.
BackgroundProtein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).ResultsWe characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts.ConclusionsPartial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein’s gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0656-7) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号