首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Previous work has shown that it is often essential to account for the variation in rates at different sites in phylogenetic models in order to avoid phylogenetic artifacts such as long branch attraction. In most current models, the gamma distribution is used for the rates-across-sites distributions and is implemented as an equal-probability discrete gamma. In this article, we introduce discrete distribution estimates with large numbers of equally spaced rate categories allowing us to investigate the appropriateness of the gamma model. With large numbers of rate categories, these discrete estimates are flexible enough to approximate the shape of almost any distribution. Likelihood ratio statistical tests and a nonparametric bootstrap confidence-bound estimation procedure based on the discrete estimates are presented that can be used to test the fit of a parametric family. We applied the methodology to several different protein data sets, and found that although the gamma model often provides a good parametric model for this type of data, rate estimates from an equal-probability discrete gamma model with a small number of categories will tend to underestimate the largest rates. In cases when the gamma model assumption is in doubt, rate estimates coming from the discrete rate distribution estimate with a large number of rate categories provide a robust alternative to gamma estimates. An alternative implementation of the gamma distribution is proposed that, for equal numbers of rate categories, is computationally more efficient during optimization than the standard gamma implementation and can provide more accurate estimates of site rates.  相似文献   

2.
It has long been recognized that the rates of molecular evolution vary amongst sites in proteins. The usual model for rate heterogeneity assumes independent rate variation according to a rate distribution. In such models the rate at a site, although random, is assumed fixed throughout the evolutionary tree. Recent work by several groups has suggested that rates at sites often vary across subtrees of the larger tree as well as across sites. This phenomenon is not captured by most phylogenetic models but instead is more similar to the covarion model of Fitch and coworkers. In this article we present methods that can be useful in detecting whether different rates occur in two different subtrees of the larger tree and where these differences occur. Parametric bootstrapping and orthogonal regression methodologies are used to test for rate differences and to make statements about the general differences in the rates at sites. Confidence intervals based on the conditional distributions of rates at sites are then used to detect where the rate differences occur. Such methods will be helpful in studying the phylogenetic, structural, and functional bases of changes in evolutionary rates at sites, a phenomenon that has important consequences for deep phylogenetic inference.  相似文献   

3.
Mark-recapture techniques are widely used to estimate the size of wildlife populations. However, in cetacean photo-identification studies, it is often impractical to sample across the entire range of the population. Consequently, negatively biased population estimates can result when large portions of a population are unavailable for photographic capture. To overcome this problem, we propose that individuals be sampled from a number of discrete sites located throughout the population's range. The recapture of individuals between sites can then be presented in a simple contingency table, where the cells refer to discrete categories formed by combinations of the study sites. We present a Bayesian framework for fitting a suite of log-linear models to these data, with each model representing a different hypothesis about dependence between sites. Modeling dependence facilitates the analysis of opportunistic photo-identification data from study sites located due to convenience rather than by design. Because inference about population size is sensitive to model choice, we use Bayesian Markov chain Monte Carlo approaches to estimate posterior model probabilities, and base inference on a model-averaged estimate of population size. We demonstrate this method in the analysis of photographic mark-recapture data for bottlenose dolphins from three coastal sites around NE Scotland.  相似文献   

4.
Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates.  相似文献   

5.
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.  相似文献   

6.
The w statistic introduced by Lockhart et al. (1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol Biol Evol. 15:1183-1188) is a simple and easily calculated statistic intended to detect heterotachy by comparing amino acid substitution patterns between two monophyletic groups of protein sequences. It is defined as the difference between the fraction of varied sites in both groups and the fraction of varied sites in each group. The w test has been used to distinguish a covarion process from equal rates and rates variation across sites processes. Using simulation we show that the w test is effective for small data sets and for data sets that have low substitution rates in the groups but can have difficulties when these conditions are not met. Using site entropy as a measure of variability of a sequence site, we modify the w statistic to a w' statistic by assigning as varied in one group those sites that are actually varied in both groups but have a large entropy difference. We show that the w' test has more power to detect two kinds of heterotachy processes (covarion and bivariate rate shifts) in large and variable data. We also show that a test of Pearson's correlation of the site entropies between two monophyletic groups can be used to detect heterotachy and has more power than the w' test. Furthermore, we demonstrate that there are settings where the correlation test as well as w and w' tests do not detect heterotachy signals in data simulated under a branch length mixture model. In such cases, it is sometimes possible to detect heterotachy through subselection of appropriate taxa. Finally, we discuss the abilities of the three statistical tests to detect a fourth mode of heterotachy: lineage-specific changes in proportion of variable sites.  相似文献   

7.
Characterizing the microenvironment surrounding protein sites.   总被引:4,自引:0,他引:4       下载免费PDF全文
Sites are microenvironments within a biomolecular structure, distinguished by their structural or functional role. A site can be defined by a three-dimensional location and a local neighborhood around this location in which the structure or function exists. We have developed a computer system to facilitate structural analysis (both qualitative and quantitative) of biomolecular sites. Our system automatically examines the spatial distributions of biophysical and biochemical properties, and reports those regions within a site where the distribution of these properties differs significantly from control nonsites. The properties range from simple atom-based characteristics such as charge to polypeptide-based characteristics such as type of secondary structure. Our analysis of sites uses non-sites as controls, providing a baseline for the quantitative assessment of the significance of the features that are uncovered. In this paper, we use radial distributions of properties to study three well-known sites (the binding sites for calcium, the milieu of disulfide bridges, and the serine protease active site). We demonstrate that the system automatically finds many of the previously described features of these sites and augments these features with some new details. In some cases, we cannot confirm the statistical significance of previously reported features. Our results demonstrate that analysis of protein structure is sensitive to assumptions about background distributions, and that these distributions should be considered explicitly during structural analyses.  相似文献   

8.
Detection-nondetection data are often used to investigate species range dynamics using Bayesian occupancy models which rely on the use of Markov chain Monte Carlo (MCMC) methods to sample from the posterior distribution of the parameters of the model. In this article we develop two Variational Bayes (VB) approximations to the posterior distribution of the parameters of a single-season site occupancy model which uses logistic link functions to model the probability of species occurrence at sites and of species detection probabilities. This task is accomplished through the development of iterative algorithms that do not use MCMC methods. Simulations and small practical examples demonstrate the effectiveness of the proposed technique. We specifically show that (under certain circumstances) the variational distributions can provide accurate approximations to the true posterior distributions of the parameters of the model when the number of visits per site (K) are as low as three and that the accuracy of the approximations improves as K increases. We also show that the methodology can be used to obtain the posterior distribution of the predictive distribution of the proportion of sites occupied (PAO).  相似文献   

9.
Nucleotide substitution in both coding and noncoding regions is context-dependent, in the sense that substitution rates depend on the identity of neighboring bases. Context-dependent substitution has been modeled in the case of two sequences and an unrooted phylogenetic tree, but it has only been accommodated in limited ways with more general phylogenies. In this article, extensions are presented to standard phylogenetic models that allow for better handling of context-dependent substitution, yet still permit exact inference at reasonable computational cost. The new models improve goodness of fit substantially for both coding and noncoding data. Considering context dependence leads to much larger improvements than does using a richer substitution model or allowing for rate variation across sites, under the assumption of site independence. The observed improvements appear to derive from three separate properties of the models: their explicit characterization of context-dependent substitution within N-tuples of adjacent sites, their ability to accommodate overlapping N-tuples, and their rich parameterization of the substitution process. Parameter estimation is accomplished using an expectation maximization algorithm, with a quasi-Newton algorithm for the maximization step; this approach is shown to be preferable to ordinary Newton methods for parameter-rich models. Overlapping tuples are efficiently handled by assuming Markov dependence of the observed bases at each site on those at the N - 1 preceding sites, and the required conditional probabilities are computed with an extension of Felsenstein's algorithm. Estimated substitution rates based on a data set of about 160,000 noncoding sites in mammalian genomes indicate a pronounced CpG effect, but they also suggest a complex overall pattern of context-dependent substitution, comprising a variety of subtle effects. Estimates based on about 3 million sites in coding regions demonstrate that amino acid substitution rates can be learned at the nucleotide level, and suggest that context effects across codon boundaries are significant.  相似文献   

10.
The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non‐randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch lengths from one partition to another. In this study I use contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non‐random distributions of missing data distributed across all partitions. I do so using a systematic exploration of alternative seven‐taxon tree topologies and distributions of missing data in two partitions to demonstrate that these likelihood‐based artefacts may occur frequently and are not shared by parsimony. I also demonstrate that Bayesian Markov chain Monte Carlo analysis is more robust to these artefacts than is likelihood. © The Willi Hennig Society 2011.  相似文献   

11.
We have investigated the effects of different among-site rate variation models on the estimation of substitution model parameters, branch lengths, topology, and bootstrap proportions under minimum evolution (ME) and maximum likelihood (ML). Specifically, we examined equal rates, invariable sites, gamma-distributed rates, and site-specific rates (SSR) models, using mitochondrial DNA sequence data from three protein-coding genes and one tRNA gene from species of the New Zealand cicada genus Maoricicada. Estimates of topology were relatively insensitive to the substitution model used; however, estimates of bootstrap support, branch lengths, and R-matrices (underlying relative substitution rate matrix) were strongly influenced by the assumptions of the substitution model. We identified one situation where ME and ML tree building became inaccurate when implemented with an inappropriate among-site rate variation model. Despite the fact the SSR models often have a better fit to the data than do invariable sites and gamma rates models, SSR models have some serious weaknesses. First, SSR rate parameters are not comparable across data sets, unlike the proportion of invariable sites or the alpha shape parameter of the gamma distribution. Second, the extreme among-site rate variation within codon positions is problematic for SSR models, which explicitly assume rate homogeneity within each rate class. Third, the SSR models appear to give severe underestimates of R-matrices and branch lengths relative to invariable sites and gamma rates models in this example. We recommend performing phylogenetic analyses under a range of substitution models to test the effects of model assumptions not only on estimates of topology but also on estimates of branch length and nodal support.  相似文献   

12.
In ligand binding studies, ligand depletion often limits the accuracy of the results obtained. This problem is approached by employing the simple observation that as the concentration of receptor in the assay is reduced, ligand depletion is also reduced. Measuring apparent K(D)'s of a ligand at multiple concentrations of receptor with extrapolation to infinitely low receptor concentration takes ligand depletion into account and, depending on the binding model employed, yields a K(D) within the defined limits of accuracy. We apply this analysis to the binding of epidermal growth factor (EGF) to the EGF receptor expressed in intact 32D cells, using a homogeneous fluorescein-labeled preparation of EGF and measuring binding by flow cytometry. Binding isotherms were carried out at varying cell densities with each isotherm fit to the generally applied model with two independent binding sites. Examination of the variation in the K(D)'s versus cell density yields a high-affinity site that accounts for 18% of the sites and a lower affinity site that accounts for the remainder. However, further examination of these data suggests that while consistent with each individual isotherm, the simple model of two independent binding sites that is generally applied to EGF binding to the EGF receptor is inconsistent with the changes in the apparent K(D)'s seen across varying cell densities.  相似文献   

13.
We present an approach for identifying genes under natural selection using polymorphism and divergence data from synonymous and non-synonymous sites within genes. A generalized linear mixed model is used to model the genome-wide variability among categories of mutations and estimate its functional consequence. We demonstrate how the model''s estimated fixed and random effects can be used to identify genes under selection. The parameter estimates from our generalized linear model can be transformed to yield population genetic parameter estimates for quantities including the average selection coefficient for new mutations at a locus, the synonymous and non-synynomous mutation rates, and species divergence times. Furthermore, our approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software. The model is fit in both the empirical Bayes and Bayesian settings using the lme4 package in R, and Markov chain Monte Carlo methods in WinBUGS. Using simulated data we compare our method to existing approaches for detecting genes under selection: the McDonald-Kreitman test, and two versions of the Poisson random field based method MKprf. Overall, we find our method universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data.  相似文献   

14.
Summary I present an inclusive-fitness model for the evolution of dispersal rates of the offspring of asexual organisms living in discrete sites, which vary in available resources. I also assume a stable and saturated condition and that the offspring can respond to the variation in the capacity (amount of resources) of their natal sites. The model was tested using data obtained from the intergall migration in the yezo-spruce gall aphid,Adelges japonicus. All the parameters needed for the model, which included the cost of dispersal, both dispersal rates and available resources in each site, were estimated from field examinations. The data fit the model well, suggesting the importance of kin selection in determining the dispersal rates. Both actual and ESS dispersal rates are shown as concave functions of site capacity with a minimum rate for intermediate site capacity. The effect of both actual and ESS dispersal is to reduce, but not eliminate sibling competition within natal sites, which is most severe in intermediate site capacity.  相似文献   

15.
Mitochondrial DNA data have been used extensively to study evolution and early human origins. These applications require estimates of the rate at which nucleotide substitutions occur in the DNA sequence. We consider the problem of estimating substitution rates in the presence of site-to-site rate variation. A coalescent model is presented that allows for different substitution rates for purines and pyrimidines, as well as more detailed models that allow fast and slow rates within each of the purine and pyrimidine classes. A method for estimating such rates is presented. Even for these simple models of site heterogeneity, there are, typically, insufficient data to obtain reliable estimates of site-specific substitution rates. However, estimates of the average rate across all sites appear to be relatively stable even in the presence of site heterogeneity. Simulations of models with site-to-site variation in mutation rate show that hypervariable sites can produce peaks in the pairwise difference curves that have previously been attributed to population dynamics.  相似文献   

16.
17.
The development of parasitological immunity against malaria affects the ability to detect infection, the efficiency of the local human parasite reservoir at infecting mosquitoes, and the response to reintroduction of parasites to previously cleared areas. Observations of similar age-trends in detected prevalence and mean parasitaemia across more than an order-of-magnitude of variation in baseline transmission complicate simple exposure-driven explanations. Mathematical models often employ age-dependent immune factors to match the observed trends, while the present model uses a new detailed mechanistic model of parasite transmission dynamics to explain age-trends through the mechanism of parasite diversity. Illustrative simulations are performed for multiple field sites in Tanzania and Nigeria, and observed age-trends and seasonality in parasite prevalence are recreated in silico, proffering possible mechanistic explanations of the observational data. Observed temporal dynamics in measured parasitaemia are recreated for each location and age-prevalence outputs are studied. Increasing population-level diversity in malaria surface antigens delays development of broad parasitological immunity. A local parasite population with high diversity can recreate the observed trends in age-prevalence across more than an order of magnitude of variation in transmission intensities. Mechanistic models of human immunity and parasite antigen diversity can recreate the observed temporal patterns for the development of parasitological immunity across a wide range of transmission intensities. This has implications for the distribution of disease burden across the population, the human transmission reservoir, design of elimination campaigns, and development and roll-out of potential vaccines.  相似文献   

18.
When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models. Correspondence to: M. Nei  相似文献   

19.
We used Bayesian phylogenetic analysis of 5 kb of chloroplast DNA data from 68 Sapotaceae species to clarify phylogenetic relationships within Sapotoideae, one of the two major clades within Sapotaceae. Variation in substitution rates through time was shown to be a very important aspect of molecular evolution for this data set. Relative rates tests indicated that changes in overall rate have taken place in several lineages during the history of the group and Bayes factors strongly supported a covarion model, which allows the rate of a site to vary over time, over commonly used models that only allow rates to vary across sites. Rate variation over time was actually found to be a more important model component than rate variation across sites. The covarion model was originally developed for coding gene sequences and has so far only been tested for this type of data. The fact that it performed so well with the present data set, consisting mainly of data from noncoding spacer regions, suggests that it deserves a wider consideration in model based phylogenetic inference. Repeatability of phylogenetic results was very difficult to obtain with the more parameter rich models, and analyses with identical settings often supported different topologies. Overparameterization may be the reason why the MCMC did not sample from the posterior distribution in these cases. The problem could, however, be overcome by using less parameter rich evolutionary models, and adjusting the MCMC settings. The phylogenetic results showed that two taxa, previously thought to belong in Sapotoideae, are not part of this group. Eberhardtia aurata is the sister of the two major Sapotaceae clades, Chrysophylloideae and Sapotoideae, and Neohemsleya usambarensis belongs in Chrysophylloideae. Within Sapotoideae two clades, Sideroxyleae and Sapoteae, were strongly supported. Bayesian analysis of the character history of some floral morphological traits showed that the ancestral type of flower in Sapotoideae may have been characterized by floral parts (sepals, petals, stamens, and staminodes) in single whorls of five, entire corolla lobes, and seeds with an adaxial hilum.  相似文献   

20.
Morphology reflects ecological pressures, phylogeny, and genetic and biophysical constraints. Disentangling their influence is fundamental to understanding selection and trait evolution. Here, we assess the contributions of function, phylogeny, and habitat to patterns of plastron (ventral shell) shape variation in emydine turtles. We quantify shape variation using geometric morphometrics, and determine the influence of several variables on shape using path analysis. Factors influencing plastron shape variation are similar between emydine turtles and the more inclusive Testudinoidea. We evaluate the fit of various evolutionary models to the shape data to investigate the selective landscape responsible for the observed morphological patterns. The presence of a hinge on the plastron accounts for most morphological variance, but phylogeny and habitat also correlate with shape. The distribution of shape variance across emydine phylogeny is most consistent with an evolutionary model containing two adaptive zones—one for turtles with kinetic plastra, and one for turtles with rigid plastra. Models with more complex adaptive landscapes often fit the data only as well as the null model (purely stochastic evolution). The adaptive landscape of plastron shape in Emydinae may be relatively simple because plastral kinesis imposes overriding mechanical constraints on the evolution of form.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号