共查询到20条相似文献,搜索用时 15 毫秒
1.
In 1994, Muse and Gaut (MG) and Goldman and Yang (GY) proposed evolutionary models that recognize the coding structure of the nucleotide sequences under study, by defining a Markovian substitution process with a state space consisting of the 61 sense codons (assuming the universal genetic code). Several variations and extensions to their models have since been proposed, but no general and flexible framework for contrasting the relative performance of alternative approaches has yet been applied. Here, we compute Bayes factors to evaluate the relative merit of several MG and GY styles of codon substitution models, including recent extensions acknowledging heterogeneous nonsynonymous rates across sites, as well as selective effects inducing uneven amino acid or codon preferences. Our results on three real data sets support a logical model construction following the MG formulation, allowing for a flexible account of global amino acid or codon preferences, while maintaining distinct parameters governing overall nucleotide propensities. Through posterior predictive checks, we highlight the importance of such a parameterization. Altogether, the framework presented here suggests a broad modeling project in the MG style, stressing the importance of combining and contrasting available model formulations and grounding developments in a sound probabilistic paradigm. 相似文献
2.
Huelsenbeck JP Joyce P Lakner C Ronquist F 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2008,363(1512):3941-3953
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes. 相似文献
3.
Simon Y.W. Ho 《Biology letters》2009,5(3):421-424
Molecular evolutionary rates can show significant variation among lineages, complicating the task of estimating substitution rates and divergence times using phylogenetic methods. Accordingly, relaxed molecular clock models have been developed to accommodate such rate heterogeneity, but these often make the assumption of rate autocorrelation among lineages. In this paper, I examine the validity of this assumption. 相似文献
4.
Frequentist properties of Bayesian posterior probabilities of phylogenetic trees under simple and complex substitution models 总被引:1,自引:0,他引:1
What does the posterior probability of a phylogenetic tree mean?This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the posterior probability of a tree is the probability that the tree is correct, assuming that the model is correct. At the same time, the Bayesian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity of the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimates of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are likely to be similar, the assessment of the uncertainty of inferred trees via either bootstrapping (for maximum likelihood estimates) or posterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently available, as this should reduce the chance that the method will concentrate too much probability on too few trees. 相似文献
5.
The increasing ability to extract and sequence DNA from noncontemporaneous tissue offers biologists the opportunity to analyse ancient DNA (aDNA) together with modern DNA (mDNA) to address the taxonomy of extinct species, evolutionary origins, historical phylogeography and biogeography. Perhaps more exciting are recent developments in coalescence-based Bayesian inference that offer the potential to use temporal information from aDNA and mDNA for the estimation of substitution rates and divergence dates as an alternative to fossil and geological calibration. This comes at a time of growing interest in the possibility of time dependency for molecular rate estimates. In this study, we provide a critical assessment of Bayesian Markov chain Monte Carlo (MCMC) analysis for the estimation of substitution rate using simulated samples of aDNA and mDNA. We conclude that the current models and priors employed in Bayesian MCMC analysis of heterochronous mtDNA are susceptible to an upward bias in the estimation of substitution rates because of model misspecification when the data come from populations with less than simple demographic histories, including sudden short-lived population bottlenecks or pronounced population structure. However, when model misspecification is only mild, then the 95% highest posterior density intervals provide adequate frequentist coverage of the true rates. 相似文献
6.
Molting rate is a key life history parameter in copepods. Sincecopepod population growth is an inherently exponential process,accurate formulation of molting rate is of critical importance.Many experiments have been conducted to culture different copepodspecies under varying temperatures and food concentrations.Probability density functions (PDFs) then were used to estimatethe median development time (MDT) of different copepod stagesfrom the experimental data. These MDTs are used in copepod populationmodels. Asymmetrical PDFs are widely used to model molting rate,because the shapes of these curves are similar to laboratorydata on cohort development. In this paper, we developed an individualstochastic model (ISM) to simulate the molting rate with differentPDFs. We showed that there was no connection between the asymmetryof cohorts and the asymmetry of the molting PDF. Although age-within-stagemodels have been widely used to simulate copepod populationdynamics, we found that none had used the correct formulationof molting rate. The population model requires the probabilityof molting at each time step, whereas the laboratory-derivedPDF is the frequency distribution of stage duration. Therefore,the PDF cannot be applied directly to the population model.We present here a corrected formula based on the PDF for usein copepod population models, termed the probability of moltingfor remaining individuals (PMR). Despite emphasis on use ofthe gamma function for copepod molting, we found simpler functionswork equally well, but that prior use of incorrect molting ratefunctions in copepod models can seriously overestimate generationtime. 相似文献
7.
MOTIVATION: Mapping character state changes over phylogenetic trees is central to the study of evolution. However, current probabilistic methods for generating such mappings are ill-suited to certain types of evolutionary models, in particular, the widely used models of codon substitution. RESULTS: We describe a general method, based on a uniformization technique, which can be utilized to generate realizations of a Markovian substitution process conditional on an alignment of character states and a given tree topology. The method is applicable under a wide range of evolutionary models, and to illustrate its usefulness in practice, we embed it within a data augmentation-based Markov chain Monte Carlo sampler, for approximating posterior distributions under previously proposed codon substitution models. The sampler is found to be more efficient than the conventional pruning-based sampler with the decorrelation times between draws from the posterior reduced by a factor of 20 or more. 相似文献
8.
Yin G 《Biometrics》2005,61(2):552-558
Due to natural or artificial clustering, multivariate survival data often arise in biomedical studies, for example, a dental study involving multiple teeth from each subject. A certain proportion of subjects in the population who are not expected to experience the event of interest are considered to be "cured" or insusceptible. To model correlated or clustered failure time data incorporating a surviving fraction, we propose two forms of cure rate frailty models. One model naturally introduces frailty based on biological considerations while the other is motivated from the Cox proportional hazards frailty model. We formulate the likelihood functions based on piecewise constant hazards and derive the full conditional distributions for Gibbs sampling in the Bayesian paradigm. As opposed to the Cox frailty model, the proposed methods demonstrate great potential in modeling multivariate survival data with a cure fraction. We illustrate the cure rate frailty models with a root canal therapy data set. 相似文献
9.
Accurate estimates of mitochondrial substitution rates are central to molecular studies of human evolution, but meaningful comparisons of published studies are problematic because of the wide range of methodologies and data sets employed. These differences are nowhere more pronounced than among rates estimated from phylogenies, genealogies, and pedigrees. By using a data set comprising mitochondrial genomes from 177 humans, we estimate substitution rates for various data partitions by using Bayesian phylogenetic analysis with a relaxed molecular clock. We compare the effect of multiple internal calibrations with the customary human-chimpanzee split. The analyses reveal wide variation among estimated substitution rates and divergence times made with different partitions and calibrations, with evidence of substitutional saturation, natural selection, and significant rate heterogeneity among lineages and among sites. Collectively, the results support dates for migration out of Africa and the common mitochondrial ancestor of humans that are considerably more recent than most previous estimates. Our results also demonstrate that human mitochondrial genomes exhibit a number of molecular evolutionary complexities that necessitate the use of sophisticated analytical models for genetic analyses. 相似文献
10.
In case-control studies of gene-environment association with disease, when genetic and environmental exposures can be assumed to be independent in the underlying population, one may exploit the independence in order to derive more efficient estimation techniques than the traditional logistic regression analysis (Chatterjee and Carroll, 2005, Biometrika92, 399-418). However, covariates that stratify the population, such as age, ethnicity and alike, could potentially lead to nonindependence. In this article, we provide a novel semiparametric Bayesian approach to model stratification effects under the assumption of gene-environment independence in the control population. We illustrate the methods by applying them to data from a population-based case-control study on ovarian cancer conducted in Israel. A simulation study is conducted to compare our method with other popular choices. The results reflect that the semiparametric Bayesian model allows incorporation of key scientific evidence in the form of a prior and offers a flexible, robust alternative when standard parametric model assumptions do not hold. 相似文献
11.
V G Red'ko 《Biofizika》1986,31(3):511-516
The stage evolution of a "population" consisting of informational sequences having fixed length N is investigated. It is assumed that at each stage sequences of a "population" are selected, reproduced, mutated and diluted. Character of an evolution is revealed and the evolution rate is estimated for different values of N, "population" size n, and a number of different symbols lambda in sequences. According to the obtained estimations it is possible to find sequences in an evolution process which are close to the "optimum" one using far less number of sequences (approximately lambda N4) than for a random search (approximately lambda N). 相似文献
12.
The human visual system is the most complex pattern recognition device known. In ways that are yet to be fully understood, the visual cortex arrives at a simple and unambiguous interpretation of data from the retinal image that is useful for the decisions and actions of everyday life. Recent advances in Bayesian models of computer vision and in the measurement and modeling of natural image statistics are providing the tools to test and constrain theories of human object perception. In turn, these theories are having an impact on the interpretation of cortical function. 相似文献
13.
We prove that a wide class of Markov models of neighbor-dependent substitution processes on the integer line is solvable. This class contains some models of nucleotidic substitutions recently introduced and studied empirically by molecular biologists. We show that the polynucleotidic frequencies at equilibrium solve some finite-size linear systems. This provides, for the first time up to our knowledge, explicit and algebraic formulas for the stationary frequencies of non-degenerate neighbor-dependent models of DNA substitutions. Furthermore, we show that the dynamics of these stochastic processes and their distribution at equilibrium exhibit some stringent, rather unexpected, independence properties. For example, nucleotidic sites at distance at least three evolve independently, and all the sites, when encoded as purines and pyrimidines, evolve independently. 相似文献
14.
Evaluation of feral pig control in Hawaiian protected areas using Bayesian catch-effort models 总被引:1,自引:0,他引:1 下载免费PDF全文
《新西兰生态学杂志》2011,35(2):182-188
In 2007 The Nature Conservancy (TNC) undertook an intensive ungulate control programme throughout three of its preserves on the Hawaiian islands of Maui and Moloka'i, with one aim being to reduce feral pig numbers to zero or near zero. The preserves were divided into manageable zones and over a 2 to 5 month period hunted from the ground with dogs in a series of up to four sweeps across the zones. More focussed hunting followed at sites with evidence of survivors. We used the data collected by the hunters to evaluate the efficacy of the control programme. The data comprised the number of pigs shot per zone per sweep and the hunters?effort and were used to fit a Weibull catch-effort model within a Bayesian framework. The fitted model provided posterior parameter estimates of the initial number of pigs resident in each zone and the relationship between hunting effort and the probability of detecting (and dispatching) a pig. The large shape parameter estimate indicated that the probability of detecting a pig increased substantially with cumulative hunting effort or
experience in that zone. The control programme was successful in six out of eight of the control zones reducing pig numbers to zero or one per zone (equating to <1 pig per km2) but was less successful in two zones where an estimated 9?14 pigs remained. However there were large credible intervals around some of the parameter estimates, suggesting an additional source of variation that was not captured by the current model. We suggest this was due to immigration of pigs back into the preserves. The quantified relationship between search effort and the probability of detecting a pig was used to make predictions on how much effort is required to detect all pigs, and can be used by TNC to interpret future monitoring data. 相似文献
15.
We test models for the evolution of helical regions of RNA sequences, where the base pairing constraint leads to correlated compensatory substitutions occurring on either side of the pair. These models are of three types: 6-state models include only the four Watson-Crick pairs plus GU and UG; 7-state models include a single mismatch state that combines all of the 10 possible mismatches; 16-state models treat all mismatch states separately. We analyzed a set of eubacterial ribosomal RNA sequences with a well-established phylogenetic tree structure. For each model, the maximum-likelihood values of the parameters were obtained. The models were compared using the Akaike information criterion, the likelihood-ratio test, and Cox's test. With a high significance level, models that permit a nonzero rate of double substitutions performed better than those that assume zero double substitution rate. Some models assume symmetry between GC and CG, between AU and UA, and between GU and UG. Models that relaxed this symmetry assumption performed slightly better, but the tests did not all agree on the significance level. The most general time-reversible model significantly outperformed any of the simplifications. We consider the relative merits of all these models for molecular phylogenetics. 相似文献
16.
The nonsynonymous/synonymous substitution rate ratio versus the radical/conservative replacement rate ratio in the evolution of mammalian genes 总被引:1,自引:0,他引:1
There are 2 ways to infer selection pressures in the evolution of protein-coding genes, the nonsynonymous and synonymous substitution rate ratio (K(A)/K(S)) and the radical and conservative amino acid replacement rate ratio (K(R)/K(C)). Because the K(R)/K(C) ratio depends on the definition of radical and conservative changes in the classification of amino acids, we develop an amino acid classification that maximizes the correlation between K(A)/K(S) and K(R)/K(C). An analysis of 3,375 orthologous gene groups among 5 mammalian species shows that our classification gives a significantly higher correlation coefficient between the 2 ratios than those of existing classifications. However, there are many orthologous gene groups with a low K(A)/K(S) but a high K(R)/K(C) ratio. Examining the functions of these genes, we found an overrepresentation of functional categories related to development. To determine if the overrepresentation is stage specific, we examined the expression patterns of these genes at different developmental stages of the mouse. Interestingly, these genes are highly expressed in the early middle stage of development (blastocyst to amnion). It is commonly thought that developmental genes tend to be conservative in evolution, but some molecular changes in developmental stages should have contributed to morphological divergence in adult mammals. Therefore, we propose that the relaxed pressures indicated by the K(R)/K(C) ratio but not by K(A)/K(S) in the early middle stage of development may be important for the morphological divergence of mammals at the adult stage, whereas purifying selection detected by K(A)/K(S) occurs in the early middle developmental stage. 相似文献
17.
Statistical tests of models of DNA substitution 总被引:32,自引:0,他引:32
Nick Goldman 《Journal of molecular evolution》1993,36(2):182-198
Summary Penny et al. have written that The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement. The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests. 相似文献
18.
19.
The nucleotide substitution matrix inferred from avian data sets using cytochrome b differs considerably from the models commonly used in phylogenetic analyses. To analyze the possible effects of this particular pattern of change in phylogeny estimation we performed a computer simulation in which we started with a real sequence and used the inferred model of change to produce a tree of 10 species. Maximum parsimony (MP), maximum likelihood (ML), and various distance methods were then used to recover the topology and the branch lengths. We used two kinds of data with varying levels of variation. In addition, we tested with the removal of third positions and different weighting schemes. At low levels of variation, MP was outstanding in recovering the topology (90% correct), while unweighted pair-group method, arithmetic average (UPGMA), regardless of distances used, was poor (40%). At the higher level, most methods had a chance of around 40%-58% of finding the true tree. However, in most cases, the trees found were only slightly wrong, with only one or a few branches misplaced. On the other hand, the use of a "wrong" model had serious effects on the estimation of branch lengths (distances). Although precision was high, accuracy was poor with most methods, giving branch lengths that were biased downward. When seeded with the true distance matrix, Fitch and NJ always found the true tree, while UPGMA frequently failed to do so. The effect of removing third positions was dramatic at low levels of variation, because only one MP program was able to find a true tree at all, albeit rarely, while none of the others ever did so. At higher levels, the situation was better, but still much worse than with the whole data set. 相似文献
20.
Maximum-likelihood approaches to phylogenetic estimation have the potential of great flexibility, even though current implementations are highly constrained. One such constraint has been the limitation to one-parameter models of substitution. A general implementation of Newton's maximization procedure was developed that allows the maximum likelihood method to be used with multiparameter models. The Estimate and Maximize (EM) algorithm was also used to obtain a good approximation to the maximum likelihood for a certain class of multiparameter models. The condition for which a multiparameter model will only have a single maximum on the likelihood surface was identified. Two-and three-parameter models of substitution in base-paired regions of RNA sequences were used as examples for computer simulations to show that these implementations of the maximum likelihood method are not substantially slower than one-parameter models. Newton's method is much faster than the EM method but may be subject to divergence in some circumstances. In these cases the EM method can be used to restore convergence. 相似文献