首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The general theories of molecular evolution depend on relatively arbitrary assumptions about the relative distribution and rate of advantageous, deleterious, neutral, and nearly neutral mutations. The Fisher geometrical model (FGM) has been used to make distributions of mutations biologically interpretable. We explored an FGM-based molecular model to represent molecular evolutionary processes typically studied by nearly neutral and selection models, but in which distributions and relative rates of mutations with different selection coefficients are a consequence of biologically interpretable parameters, such as the average size of the phenotypic effect of mutations and the number of traits (complexity) of organisms. A variant of the FGM-based model that we called the static regime (SR) represents evolution as a nearly neutral process in which substitution rates are determined by a dynamic substitution process in which the population's phenotype remains around a suboptimum equilibrium fitness produced by a balance between slightly deleterious and slightly advantageous compensatory substitutions. As in previous nearly neutral models, the SR predicts a negative relationship between molecular evolutionary rate and population size; however, SR does not have the unrealistic properties of previous nearly neutral models such as the narrow window of selection strengths in which they work. In addition, the SR suggests that compensatory mutations cannot explain the high rate of fixations driven by positive selection currently found in DNA sequences, contrary to what has been previously suggested. We also developed a generalization of SR in which the optimum phenotype can change stochastically due to environmental or physiological shifts, which we called the variable regime (VR). VR models evolution as an interplay between adaptive processes and nearly neutral steady-state processes. When strong environmental fluctuations are incorporated, the process becomes a selection model in which evolutionary rate does not depend on population size, but is critically dependent on the complexity of organisms and mutation size. For SR as well as VR we found that key parameters of molecular evolution are linked by biological factors, and we showed that they cannot be fixed independently by arbitrary criteria, as has usually been assumed in previous molecular evolutionary models.  相似文献   

2.
Qian B  Goldstein RA 《Proteins》2003,52(3):446-453
It is often desired to identify further homologs of a family of biological sequences from the ever-growing sequence databases. Profile hidden Markov models excel at capturing the common statistical features of a group of biological sequences. With these common features, we can search the biological database and find new homologous sequences. Most general profile hidden Markov model methods, however, treat the evolutionary relationships between the sequences in a homologous group in an ad-hoc manner. We hereby introduce a method to incorporate phylogenetic information directly into hidden Markov models, and demonstrate that the resulting model performs better than most of the current multiple sequence-based methods for finding distant homologs.  相似文献   

3.
C Fuchs 《Gene》1980,10(4):371-373
Several Markov chain models (up to fourth order) have been fitted to the sequences of the seven DNAs presented in Fuchs et al. (1980). Two methods for determining the order of Markov chain are applied to the data. The two methods lead to different conclusions and we dicuss these discrepancies. When the distribution of the nucleotides in a DNA sequence is investigated, it is suggested that the study on the order of the Markov model should be supplemented with additional analysis.  相似文献   

4.
Several Markov chain models (up to fourth order) have been fitted to the sequences of the seven DNAs presented in Fuchs et al. (1980). Two methods for determining the order of Markov chain are applied to the data. The two methods lead to different conclusions and we dicuss these discrepancies. When the distribution of the nucleotides in a DNA sequence is investigated, it is suggested that the study on the order of the Markov model should be supplemented with additional analysis.  相似文献   

5.
The general Markov model (GMM) of nucleotide substitution does not assume the evolutionary process to be stationary, reversible, or homogeneous. The GMM can be simplified by assuming the evolutionary process to be stationary. A stationary GMM is appropriate for analyses of phylogenetic data sets that are compositionally homogeneous; a data set is considered to be compositionally homogeneous if a statistical test does not detect significant differences in the marginal distributions of the sequences. Though the general time-reversible (GTR) model assumes stationarity, it also assumes reversibility and homogeneity. We propose two new stationary and nonhomogeneous models--one constrains the GMM to be reversible, whereas the other does not. The two models, coupled with the GTR model, comprise a set of nested models that can be used to test the assumptions of reversibility and homogeneity for stationary processes. The two models are extended to incorporate invariable sites and used to analyze a seven-taxon hominoid data set that displays compositional homogeneity. We show that within the class of stationary models, a nonhomogeneous model fits the hominoid data better than the GTR model. We note that if one considers a wider set of models that are not constrained to be stationary, then an even better fit can be obtained for the hominoid data. However, the methods for reducing model complexity from an extremely large set of nonstationary models are yet to be developed.  相似文献   

6.
MacKay Altman R 《Biometrics》2004,60(2):444-450
In this article, we propose a graphical technique for assessing the goodness-of-fit of a stationary hidden Markov model (HMM). We show that plots of the estimated distribution against the empirical distribution detect lack of fit with high probability for large sample sizes. By considering plots of the univariate and multidimensional distributions, we are able to examine the fit of both the assumed marginal distribution and the correlation structure of the observed data. We provide general conditions for the convergence of the empirical distribution to the true distribution, and demonstrate that these conditions hold for a wide variety of time-series models. Thus, our method allows us to compare not only the fit of different HMMs, but also that of other models as well. We illustrate our technique using a multiple sclerosis data set.  相似文献   

7.
We develop a reversible jump Markov chain Monte Carlo approach to estimating the posterior distribution of phylogenies based on aligned DNA/RNA sequences under several hierarchical evolutionary models. Using a proper, yet nontruncated and uninformative prior, we demonstrate the advantages of the Bayesian approach to hypothesis testing and estimation in phylogenetics by comparing different models for the infinitesimal rates of change among nucleotides, for the number of rate classes, and for the relationships among branch lengths. We compare the relative probabilities of these models and the appropriateness of a molecular clock using Bayes factors. Our most general model, first proposed by Tamura and Nei, parameterizes the infinitesimal change probabilities among nucleotides (A, G, C, T/U) into six parameters, consisting of three parameters for the nucleotide stationary distribution, two rate parameters for nucleotide transitions, and another parameter for nucleotide transversions. Nested models include the Hasegawa, Kishino, and Yano model with equal transition rates and the Kimura model with a uniform stationary distribution and equal transition rates. To illustrate our methods, we examine simulated data, 16S rRNA sequences from 15 contemporary eubacteria, halobacteria, eocytes, and eukaryotes, 9 primates, and the entire HIV genome of 11 isolates. We find that the Kimura model is too restrictive, that the Hasegawa, Kishino, and Yano model can be rejected for some data sets, that there is evidence for more than one rate class and a molecular clock among similar taxa, and that a molecular clock can be rejected for more distantly related taxa.  相似文献   

8.
In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/).  相似文献   

9.
We derive an analytic expression for site-specific stationary distributions of amino acids from the structurally constrained neutral (SCN) model of protein evolution with conservation of folding stability. The stationary distributions that we obtain have a Boltzmann-like shape, and their effective temperature parameter, measuring the limit of divergent evolutionary changes at a given site, can be predicted from a site-specific topological property, the principal eigenvector of the contact matrix of the native conformation of the protein. These analytic results, obtained without free parameters, are compared with simulations of the SCN model and with the site-specific amino acid distributions obtained from the Protein Data Bank. These results also provide new insights into how the topology of a protein fold influences its designability, i.e., the number of sequences compatible with that fold. The dependence of the effective temperature on the principal eigenvector decreases for longer proteins, as a possible consequence of the fact that selection for thermodynamic stability becomes weaker in this case.  相似文献   

10.
Friel  Nial; Rue  Havard 《Biometrika》2007,94(3):661-672
We illustrate how the recursive algorithm of Reeves & Pettitt(2004) for general factorizable models can be extended to allowexact sampling, maximization of distributions and computationof marginal distributions. All of the methods we describe applyto discrete-valued Markov random fields with nearest neighbourintegrations defined on regular lattices; in particular we illustratethat exact inference can be performed for hidden autologisticmodels defined on moderately sized lattices. In this contextwe offer an extension of this methodology which allows approximateinference to be carried out for larger lattices without resortingto simulation techniques such as Markov chain Monte Carlo. Inparticular our work offers the basis for an automatic inferencemachine for such models.  相似文献   

11.
Summary Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time‐varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi‐Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi‐Markovian manner. The underlying semi‐Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi‐Markov chain represent—in the corresponding growth phase—both the influence of time‐varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi‐Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation–maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates.  相似文献   

12.
Mason Liang  Rasmus Nielsen 《Genetics》2014,197(3):953-967
The distribution of admixture tract lengths has received considerable attention, in part because it can be used to infer the timing of past gene flow events between populations. It is commonly assumed that these lengths can be modeled as independently and identically distributed (iid) exponential random variables. This assumption is fundamental for many popular methods that analyze admixture using hidden Markov models. We compare the expected distribution of admixture tract lengths under a number of population-genetic models to the distribution predicted by the Wright–Fisher model with recombination. We show that under the latter model, the assumption of iid exponential tract lengths does not hold for recent or for ancient admixture events and that relying on this assumption can lead to false positives when inferring the number of admixture events. To further investigate the tract-length distribution, we develop a dyadic interval-based stochastic process for generating admixture tracts. This representation is useful for analyzing admixture tract-length distributions for populations with recent admixture, a scenario in which existing models perform poorly.  相似文献   

13.
Hubbell’s neutral theory claims that ecological patterns such as species abundance distributions can be explained by a stochastic model based on simple assumptions. One of these assumptions, the point mutation assumption, states that every individual has the same probability to speciate. Etienne et al. have argued that other assumptions on the speciation process could be more realistic, for example, that every species has the same probability to speciate (Etienne, et al. in Oikos 116:241–258, 2007). They introduced a number of neutral community models with a different speciation process, and conjectured formulas for their stationary species abundance distribution. Here we study a generalised neutral community model, encompassing these modified models, and derive its stationary distribution, thus proving the conjectured formulas.  相似文献   

14.
Habitat selection models are used in ecology to link the spatial distribution of animals to environmental covariates and identify preferred habitats. The most widely used models of this type, resource selection functions, aim to capture the steady-state distribution of space use of the animal, but they assume independence between the observed locations of an animal. This is unrealistic when location data display temporal autocorrelation. The alternative approach of step selection functions embed habitat selection in a model of animal movement, to account for the autocorrelation. However, inferences from step selection functions depend on the underlying movement model, and they do not readily predict steady-state space use. We suggest an analogy between parameter updates and target distributions in Markov chain Monte Carlo (MCMC) algorithms, and step selection and steady-state distributions in movement ecology, leading to a step selection model with an explicit steady-state distribution. In this framework, we explain how maximum likelihood estimation can be used for simultaneous inference about movement and habitat selection. We describe the local Gibbs sampler, a novel rejection-free MCMC scheme, use it as the basis of a flexible class of animal movement models, and derive its likelihood function for several important special cases. In a simulation study, we verify that maximum likelihood estimation can recover all model parameters. We illustrate the application of the method with data from a zebra.  相似文献   

15.
Summary In order to clarify some controverisal phylogenies such as those regarding the triplet of human, rodent, and cow and the evolutionary position of Lagompopha with respect to other mammals, we have analyzed both nuclear and mitochondrial genes using the stationary Markov model developed in our laboratory. We found that the two sets of genes give different results. In particular the mitochondrial tree showed rabbit linked first to rodents and the the rabbit-rodents branch linked to artiodactyls with human as the outgroup. The most favorite nuclear tree showed human linked first to artiocactlys and the human-artiocactyls branch linked to rabbit with rodents as the outgroup. The obvious questions, (1) which tree is the correct one, or (2) both trees can be incorrect, and (3) how can we explain such an evolutionary pattern, are discussed on the basis of our limited knowledge of factors that influence the clocklike behavior of biological macromolecules.  相似文献   

16.
Summary Continuous‐time multistate models are widely used for categorical response data, particularly in the modeling of chronic diseases. However, inference is difficult when the process is only observed at discrete time points, with no information about the times or types of events between observation times, unless a Markov assumption is made. This assumption can be limiting as rates of transition between disease states might instead depend on the time since entry into the current state. Such a formulation results in a semi‐Markov model. We show that the computational problems associated with fitting semi‐Markov models to panel‐observed data can be alleviated by considering a class of semi‐Markov models with phase‐type sojourn distributions. This allows methods for hidden Markov models to be applied. In addition, extensions to models where observed states are subject to classification error are given. The methodology is demonstrated on a dataset relating to development of bronchiolitis obliterans syndrome in post‐lung‐transplantation patients.  相似文献   

17.
The underlying molecular mechanisms of metabolic and genetic regulations are computationally identical and can be described by a finite state Markov process. We establish a common computational model for both regulations based on the stationary distribution of the Markov process with the aim of establishing a unified, quantitative model of general biological regulations. Various existing results regarding intracellular regulations are derived including the classical Michaelis-Menten equation and its generalization to more complex allosteric enzymes in a systematic way. The notion of probability flow is introduced to distinguish the equilibrium stationary distribution from the non-equilibrium one; it plays a crucial role in the analysis of stationary state equations. A graphical criterion to guarantee the existence of an equilibrium stationary distribution is derived, which turns out to be identical to the classical Wegscheider condition. Simple graphical methods to compute the equilibrium and non-equilibrium stationary distributions are derived based crucially on the probability flow, which dramatically simplifies the classical methods still used in enzymology.  相似文献   

18.
19.
Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.  相似文献   

20.
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号