首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Examining rates and patterns of nucleotide substitution in plants   总被引:19,自引:0,他引:19  
Driven by rapid improvements in affordable computing power and by the even faster accumulation of genomic data, the statistical analysis of molecular sequence data has become an active area of interdisciplinary research. Maximum likelihood methods have become mainstream because of their desirable properties and, more importantly, their potential for providing statistically sound solutions in complex data analysis settings. In this chapter, a review of recent literature focusing on rates and patterns of nucleotide substitution rates in the nuclear, chloroplast, and mitochondrial genomes of plants demonstrates the power and flexibility of these new methods. The emerging picture of the nucleotide substitution process in plants is a complex one. Evolutionary rates are seen to be quite variable, both among genes and among plant lineages. However, there are hints, particularly in the chloroplast, that individual factors can have important effects on many genes simultaneously.  相似文献   

2.
Phylogenetic tree reconstruction frequently assumes the homogeneity of the substitution process over the whole tree. To test this assumption statistically, we propose a test based on the sample covariance matrix of the set of substitution rate matrices estimated from pairwise sequence comparison. The sample covariance matrix is condensed into a one-dimensional test statistic Delta = sum ln(1 + delta(i)), where delta(i) are the eigenvalues of the sample covariance matrix. The test does not assume a specific mutational model. It analyses the variation in the estimated rate matrices. The distribution of this test statistic is determined by simulations based on the phylogeny estimated from the data. We study the power of the test under various scenarios and apply the test to X chromosome and mtDNA primate sequence data. Finally, we demonstrate how to include rate variation in the test.  相似文献   

3.
A new method for calculating evolutionary substitution rates   总被引:39,自引:0,他引:39  
Summary In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo technqiues. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach. The method allows one to calculate the evolutionarily effective silent substitution rate (vs) for mitochondrial genes, which in the species mentioned above is 1.4×10–8 nucleotide substitutions per site per year. We have also determined the divergence time ratios between the couples mousecow/rat-mouse and rat-cow/rat-mouse. In both cases this value is approximately 1.4.  相似文献   

4.
We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.  相似文献   

5.
Statistical methods for detecting molecular adaptation   总被引:2,自引:0,他引:2  
The past few years have seen the development of powerful statistical methods for detecting adaptive molecular evolution. These methods compare synonymous and nonsynonymous substitution rates in protein-coding genes, and regard a nonsynonymous rate elevated above the synonymous rate as evidence for darwinian selection. Numerous cases of molecular adaptation are being identified in various systems from viruses to humans. Although previous analyses averaging rates over sites and time have little power, recent methods designed to detect positive selection at individual sites and lineages have been successful. Here, we summarize recent statistical methods for detecting molecular adaptation, and discuss their limitations and possible improvements.  相似文献   

6.
Extreme environmental perturbations are rare, but may have important evolutionary consequences. Responses to current perturbations may provide important information about the ability of living organisms to cope with similar conditions in the evolutionary past. Radioactive contamination from Chernobyl constitutes one such extreme perturbation, with significant but highly variable impact on local population density and mutation rates of different species of animals and plants. We explicitly tested the hypothesis that species with strong impacts of radiation on abundance were those with high rates of historical mutation accumulation as reflected by cytochrome b mitochondrial DNA base‐pair substitution rates during past environmental perturbations. Using a dataset of 32 species of birds, we show higher historical mitochondrial substitution rates in species with the strongest negative impact of local levels of radiation on local population density. These effects were robust to different estimates of impact of radiation on abundance, weighting of estimates of abundance by sample size, statistical control for similarity in the response among species because of common phylogenetic descent, and effects of population size and longevity. Therefore, species that respond strongly to the impact of radiation from Chernobyl are also the species that in the past have been most susceptible to factors that have caused high substitution rates in mitochondrial DNA.  相似文献   

7.
The variation of amino acid substitution rates in proteins depends on several variables. Among these, the protein's expression level, functional category, essentiality, or metabolic costs of its amino acid residues may play an important role. However, the relative importance of each variable has not yet been evaluated in comparative analyses. To this aim, we made regression analyses combining data available on these variables and on evolutionary rates, in two well-documented model bacteria, Escherichia coli and Bacillus subtilis. In both bacteria, the level of expression of the protein in the cell was by far the most important driving force constraining the amino acids substitution rate. Subsequent inclusion in the analysis of the other variables added little further information. Furthermore, when the rates of synonymous substitutions were included in the analysis of the E. coli data, only the variable expression levels remained statistically significant. The rate of nonsynonymous substitution was shown to correlate with expression levels independently of the rate of synonymous substitution. These results suggest an important direct influence of expression levels, or at least codon usage bias for translation optimization, on the rates of nonsynonymous substitutions in bacteria. They also indicate that when a control for this variable is included, essentiality plays no significant role in the rate of protein evolution in bacteria, as is the case in eukaryotes.  相似文献   

8.
Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.  相似文献   

9.
Rates of biological diversification should ultimately correspond to rates of genome evolution. Recent studies have compared diversification rates with phylogenetic branch lengths, but incomplete phylogenies hamper such analyses for many taxa. Herein, we use pairwise comparisons of confamilial sauropsid (bird and reptile) mitochondrial DNA (mtDNA) genome sequences to estimate substitution rates. These molecular evolutionary rates are considered in light of the age and species richness of each taxonomic family, using a random-walk speciation–extinction process to estimate rates of diversification. We find the molecular clock ticks at disparate rates in different families and at different genes. For example, evolutionary rates are relatively fast in snakes and lizards, intermediate in crocodilians and slow in turtles and birds. There was also rate variation across genes, where non-synonymous substitution rates were fastest at ATP8 and slowest at CO3. Family-by-gene interactions were significant, indicating that local clocks vary substantially among sauropsids. Most importantly, we find evidence that mitochondrial genome evolutionary rates are positively correlated with speciation rates and with contemporary species richness. Nuclear sequences are poorly represented among reptiles, but the correlation between rates of molecular evolution and species diversification also extends to 18 avian nuclear genes we tested. Thus, the nuclear data buttress our mtDNA findings.  相似文献   

10.
As species richness varies along the tree of life, there is a great interest in identifying factors that affect the rates by which lineages speciate or go extinct. To this end, theoretical biologists have developed a suite of phylogenetic comparative methods that aim to identify where shifts in diversification rates had occurred along a phylogeny and whether they are associated with some traits. Using these methods, numerous studies have predicted that speciation and extinction rates vary across the tree of life. In this study, we show that asymmetric rates of sequence evolution lead to systematic biases in the inferred phylogeny, which in turn lead to erroneous inferences regarding lineage diversification patterns. The results demonstrate that as the asymmetry in sequence evolution rates increases, so does the tendency to select more complicated models that include the possibility of diversification rate shifts. These results thus suggest that any inference regarding shifts in diversification pattern should be treated with great caution, at least until any biases regarding the molecular substitution rate have been ruled out.  相似文献   

11.
Summary The nucleotide substitution rate in structural portions of the embryonic β-globin genes of placental mammals is lower than that for the adult β-globin genes. This difference occurs entirely within the class of substitutions that result in nonsynonymous (replacement) differences between these genes, and therefore represents a constraint on the structure of the mammalian embryonic β-globin proteins relative to the adult proteins (Shapiro et al. 1983; Hardison 1984). A similar effect has also been observed in marsupial mammals (Koop and Goodman 1988). In an effort to determine whether the observed rates are evidence of a uniform degree of selective constraint on the embryonic β-globin genes, analyses were performed that compared replacement substitution rates. The analyses reveal that embryonic β-globin genes appear to have been fixing replacement substitutions at nearly the same average rate not only in placental and marsupial mammals but in avian and amphibian species as well. In contrast, the adult β-globin genes from these organisms appear to have a more variable rate of replacement substitution with an especially low rate for birds. In the chicken (Gallus gallus), the adult β-globin gene replacement substitution rate appears to be lower than the embryonic replacement substitution rate.  相似文献   

12.
We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.  相似文献   

13.
IQPNNI: moving fast through tree space and stopping in time   总被引:12,自引:0,他引:12  
An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni).  相似文献   

14.
The study of recent human evolution, or the origin of modern humans, is currently dominated by two theories. The recent African origin hypothesis holds that there was a single origin of modern humans in Africa about 100,000 years ago, after which these humans dispersed throughout the rest of the world, mixing little or not at all with nonmodern populations. The multiregional evolution hypothesis holds that there was no single origin of modern humans but, instead, that the mutations and other traits that led to modern humans were spread in concert throughout the old world by gene flow, leading to genetic continuity among old world populations during the past million years. Although both of these theories are based on observations stemming from the fossil record, much discussion and controversy during the past six years has focused on the application and interpretation of studies of DNA variation, particularly mitochondrial DNA (mtDNA). The past year, especially, has brought new data, interpretations, and controversies. Indeed, I initially resisted writing this review, on the grounds that new information would be likely to render it obsolete by the time it was published. However, now that the dust is starting to settle, it seems timely to review various investigations and interpretations and where they are likely to lead. While the focus of this review is the mtDNA story, brief mention is made of studies of nuclear DNA variation (both autosomal and Y-chromosome DNA) and the implications of the genetic data with regard to the fossil record and our understanding of recent human evolution.  相似文献   

15.
Likelihood methods for detecting temporal shifts in diversification rates   总被引:8,自引:0,他引:8  
Maximum likelihood is a potentially powerful approach for investigating the tempo of diversification using molecular phylogenetic data. Likelihood methods distinguish between rate-constant and rate-variable models of diversification by fitting birth-death models to phylogenetic data. Because model selection in this context is a test of the null hypothesis that diversification rates have been constant over time, strategies for selecting best-fit models must minimize Type I error rates while retaining power to detect rate variation when it is present. Here I examine model selection, parameter estimation, and power to reject the null hypothesis using likelihood models based on the birth-death process. The Akaike information criterion (AIC) has often been used to select among diversification models; however, I find that selecting models based on the lowest AIC score leads to a dramatic inflation of the Type I error rate. When appropriately corrected to reduce Type I error rates, the birth-death likelihood approach performs as well or better than the widely used gamma statistic, at least when diversification rates have shifted abruptly over time. Analyses of datasets simulated under a range of rate-variable diversification scenarios indicate that the birth-death likelihood method has much greater power to detect variation in diversification rates when extinction is present. Furthermore, this method appears to be the only approach available that can distinguish between a temporal increase in diversification rates and a rate-constant model with nonzero extinction. I illustrate use of the method by analyzing a published phylogeny for Australian agamid lizards.  相似文献   

16.
We derive an expectation maximization algorithm for maximum-likelihood training of substitution rate matrices from multiple sequence alignments. The algorithm can be used to train hidden substitution models, where the structural context of a residue is treated as a hidden variable that can evolve over time. We used the algorithm to train hidden substitution matrices on protein alignments in the Pfam database. Measuring the accuracy of multiple alignment algorithms with reference to BAliBASE (a database of structural reference alignments) our substitution matrices consistently outperform the PAM series, with the improvement steadily increasing as up to four hidden site classes are added. We discuss several applications of this algorithm in bioinformatics.  相似文献   

17.
The Japanese quail (Coturnix japonica; JQ) is one of the domesticated fowl species of Japan. To provide DNA sequence information for examination of its phylogenetic position in the order Galliformes, the complete sequence of the JQ mitochondria was determined. Sequence analysis revealed that the JQ mitochondrial genome is a circular DNA of 16 697 basepairs (bp), which is smaller than the chicken mitochondrial DNA of 16 775 bp, but the genomic structure of JQ mitochondria was the same as that of the chicken. The sequence homologies of all mitochondrial genes including those for 12S and 16S ribosomal RNA (rRNA), between Japanese quail and chicken ranged from 78.0 to 89.9%. Because the sequences of NADH dehydrogenase subunit 2 and cytochrome b genes had been reported in five species [Phasianus colchicus (ring-neck pheasant: RP), Gallus gallus domesticus (chicken: CH), Perdix perdix (grey partridge: GP), Bambusicola thoracia (Chinese bamboo partridge: CP), and Aythya americana (redhead: RH)], the concatenated nucleotide sequences (2184 bp) and amino acid sequences of these two genes were used in a phylogenetic analysis of JQ against these five species using a maximum likelihood (ML) method. Using the first and second bases of the codons, and the third base of the codons indicated a phylogenic tree of [RH, (RP, GP), (JQ, (CH, CP))]. A phylogenic tree of [RH, JQ, (RP, GP), (CH, CP)] was determined using amino acid sequences. Because the local bootstrap values for the JQ branch in these trees are not high, additional sequence is necessary for construction of a reliable tree.  相似文献   

18.
Choice of a substitution model is a crucial step in the maximum likelihood (ML) method of phylogenetic inference, and investigators tend to prefer complex mathematical models to simple ones. However, when complex models with many parameters are used, the extent of noise in statistical inferences increases, and thus complex models may not produce the true topology with a higher probability than simple ones. This problem was studied using computer simulation. When the number of nucleotides used was relatively large (1000 bp), the HKY+Gamma model showed smaller d(T) topological distance between the inferred and the true trees) than the JC and Kimura models. In the cases of shorter sequences (300 bp) simpler model and search algorithm such as JC model and SA+NNI search were found to be as efficient as more complicated searches and models in terms of topological distances, although the topologies obtained under HKY+Gamma model had the highest likelihood values. The performance of relatively simple search algorithm SA+NNI was found to be essentially the same as that of more extensive SA+TBR search under all models studied. Similarly to the conclusions reached by Takahashi and Nei [Mol. Biol. Evol. 17 (2000) 1251], our results indicate that simple models can be as efficient as complex models, and that use of complex models does not necessarily give more reliable trees compared with simple models.  相似文献   

19.
Summary A method of estimating the number of nucleotide substitutions from amino acid sequence data is developed by using Dayhoff's mutation probability matrix. This method takes into account the effect of nonrandom amino acid substitutions and gives an estimate which is similar to the value obtained by Fitch's counting method, but larger than the estimate obtained under the assumption of random substitutions (Jukes and Cantor's formula). Computer simulations based on Dayhoff's mutation probability matrix have suggested that Jukes and Holmquist's method of estimating the number of nucleotide substitutions gives an overestimate when amino acid substitution is not random and the variance of the estimate is generally very large. It is also shown that when the number of nucleotide substitutions is small, this method tends to give an overestimate even when amino acid substitution is purely at random.  相似文献   

20.
In risk assessment and environmental monitoring studies, concentration measurements frequently fall below detection limits (DL) of measuring instruments, resulting in left-censored data. The principal approaches for handling censored data include the substitution-based method, maximum likelihood estimation, robust regression on order statistics, and Kaplan-Meier. In practice, censored data are substituted with an arbitrary value prior to use of traditional statistical methods. Although some studies have evaluated the substitution performance in estimating population characteristics, they have focused mainly on normally and lognormally distributed data that contain a single DL. We employ Monte Carlo simulations to assess the impact of substitution when estimating population parameters based on censored data containing multiple DLs. We also consider different distributional assumptions including lognormal, Weibull, and gamma. We show that the reliability of the estimates after substitution is highly sensitive to distributional characteristics such as mean, standard deviation, skewness, and also data characteristics such as censoring percentage. The results highlight that although the performance of the substitution-based method improves as the censoring percentage decreases, its performance still depends on the population's distributional characteristics. Practical implications that follow from our findings indicate that caution must be taken in using the substitution method when analyzing censored environmental data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号