首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

2.
Finding correct species relationships using phylogeny reconstruction based on molecular data is dependent on several empirical and technical factors. These include the choice of DNA sequence from which phylogeny is to be inferred, the establishment of character homology within a sequence alignment, and the phylogeny algorithm used. Nevertheless, sequencing and phylogeny tools provide a way of testing certain hypotheses regarding the relationship among the organisms for which phenotypic characters demonstrate conflicting evolutionary information. The protozoan family Sarcocystidae is one such group for which molecular data have been applied phylogenetically to resolve questionable relationships. However, analyses carried out to date, particularly based on small-subunit ribosomal DNA, have not resolved all of the relationships within this family. Analysis of more than one gene is necessary in order to obtain a robust species signal, and some DNA sequences may not be appropriate in terms of their phylogenetic information content. With this in mind, we tested the informativeness of our chosen molecule, the large-subunit ribosomal DNA (lsu rDNA), by using subdivisions of the sequence in phylogenetic analysis through PAUP, fastDNAml, and neighbor joining. The segments of sequence applied correspond to areas of higher nucleotide variation in a secondary-structure alignment involving 21 taxa. We found that subdivision of the entire lsu rDNA is inappropriate for phylogenetic analysis of the Sarcocystidae. There are limited informative nucleotide sites in the lsu rDNA for certain clades, such as the one encompassing the subfamily Toxoplasmatinae. Consequently, the removal of any segment of the alignment compromises the final tree topology. We also tested the effect of using two different alignment procedures (CLUSTAL W and the structure alignment using DCSE) and three different tree-building methods on the final tree topology. This work shows that congruence between different methods in the formation of clades may be a feature of robust topology; however, a sequence alignment based on primary structure may not be comparing homologous nucleotides even though the expected topology is obtained. Our results support previous findings showing the paraphyly of the current genera Sarcocystis and Hammondia and again bring to question the relationships of Sarcocystis muris, Isospora felis, and Neospora caninum. In addition, results based on phylogenetic analysis of the structure alignment suggest that Sarcocystis zamani and Sarcocystis singaporensis, which have reptilian definitive hosts, are monophyletic with Sarcocystis species using mammalian definitive hosts if the genus Frenkelia is synonymized with Sarcocystis.  相似文献   

3.
The complete nucleotide sequence of the small ribosomal subunit RNA of the gastropod, Limicolaria kambeul, was determined and used to infer a secondary structure model. In order to clarify the phylogenetic position of the Mollusca among the Metazoa, an evolutionary tree was constructed by neighbor-joining, starting from an alignment of small ribosomal subunit RNA sequences. The Mollusca appear to be a monophyletic group, related to Arthropoda and Chordata in an unresolved trichotomy.  相似文献   

4.
Although probabilistic models of genotype (e.g., DNA sequence) evolution have been greatly elaborated, less attention has been paid to the effect of phenotype on the evolution of the genotype. Here we propose an evolutionary model and a Bayesian inference procedure that are aimed at filling this gap. In the model, RNA secondary structure links genotype and phenotype by treating the approximate free energy of a sequence folded into a secondary structure as a surrogate for fitness. The underlying idea is that a nucleotide substitution resulting in a more stable secondary structure should have a higher rate than a substitution that yields a less stable secondary structure. This free energy approach incorporates evolutionary dependencies among sequence positions beyond those that are reflected simply by jointly modeling change at paired positions in an RNA helix. Although there is not a formal requirement with this approach that secondary structure be known and nearly invariant over evolutionary time, computational considerations make these assumptions attractive and they have been adopted in a software program that permits statistical analysis of multiple homologous sequences that are related via a known phylogenetic tree topology. Analyses of 5S ribosomal RNA sequences are presented to illustrate and quantify the strong impact that RNA secondary structure has on substitution rates. Analyses on simulated sequences show that the new inference procedure has reasonable statistical properties. Potential applications of this procedure, including improved ancestral sequence inference and location of functionally interesting sites, are discussed.  相似文献   

5.
In order to maximise the positional homology in the primary sequence alignment of the second internal transcribed spacer for 30 species of equine strongyloid nematodes, the secondary structures of the precursor ribosomal RNA were predicted using an approach combining an energy minimisation method and comparative sequence analysis. The results indicated that a common secondary structure model of the second internal transcribed spacer of these nematodes was maintained, despite significant interspecific differences (2–56%) in primary sequences. The secondary structure model was then used to refine the primary second internal transcribed spacer sequence alignment. The “manual” and “structure” alignments were both subjected to phylogenetic analysis using three different tree-building methods to compare the effect of using different sequence alignments on phylogenetic inference. The topologies of the phylogenetic trees inferred from the manual second internal transcribed spacer alignment were usually different to those derived from the structure second internal transcribed spacer alignment. The results suggested that the positional homology in the second internal transcribed spacer primary sequence alignment was maximised when the secondary structure model was taken into consideration.  相似文献   

6.
7.
Summary We present compositional statistics, a new method of phylogenetic inference, which is an extension of evolutionary parsimony. Compositional statistics takes account of the base composition of the compared sequences by using nucleotide positions that evolutionary parsimony ignores. It shares with evolutionary parsimony the features of rate invariance and the fundamental distinction between transitions and transversions. Of the presently available methods of phylogenetic inference, compositional statistics is based on the fewest and mildest assumptions about the mode of DNA sequence evolution. It is therefore applicable to phylogenetic studies of the most distantly related organisms or molecules. This was illustrated by analyzing conservative positions in the DNA sequences of the large subunit of RNA polymerase from three archaebacterial groups, a eubacterium, a chloroplast, and the three eukaryotic polymerases. Internally consistent results, which are in accord with our knowledge of organelle origin and archaebacterial physiology, were achieved.  相似文献   

8.
Archaeal phylogeny based on ribosomal proteins   总被引:9,自引:0,他引:9  
Until recently, phylogenetic analyses of Archaea have mainly been based on ribosomal RNA (rRNA) sequence comparisons, leading to the distinction of the two major archaeal phyla: the Euryarchaeota and the Crenarchaeota. Here, thanks to the recent sequencing of several archaeal genomes, we have constructed a phylogeny based on the fusion of the sequences of the 53 ribosomal proteins present in most of the archaeal species. This phylogeny was remarkably congruent with the rRNA phylogeny, suggesting that both reflected the actual phylogeny of the domain Archaea even if some nodes remained unresolved. In both cases, the branches leading to hyperthermophilic species were short, suggesting that the evolutionary rate of their genes has been slowed down by structural constraints related to environmental adaptation. In addition, to estimate the impact of lateral gene transfer (LGT) on our tree reconstruction, we used a new method that revealed that 8 genes out of the 53 ribosomal proteins used in our study were likely affected by LGT. This strongly suggested that a core of 45 nontransferred ribosomal protein genes existed in Archaea that can be tentatively used to infer the phylogeny of this domain. Interestingly, the tree obtained using only the eight ribosomal proteins likely affected by LGT was not very different from the consensus tree, indicating that LGT mainly brought random phylogenetic noise. The major difference involves organisms living in similar environments, suggesting that LGTs are mainly directed by the physical proximity of the organisms rather than by their phylogenetic proximity.  相似文献   

9.
目的:构建节肢动物α-淀粉酶的系统进化树,探讨其进化关系,找出进化树中聚类在一起的α-淀粉酶的特异性序列。方法:在美国国立生物技术信息中心(National Center for Biotechnology Information,NCBI)数据库中选取了56个节肢动物的α-淀粉酶氨基酸序列,利用CLUSTALX2.0进行序列比对、MEGA6.0建立进化树,通过BOXSHADE找到聚类的α-淀粉酶特异性序列。结果:56个α-淀粉酶聚类成A、B、C、D四大簇,A簇特异性序列为"VD NHD NQ",B簇特异性序列为"ID NHD NX",C簇特异性序列为"ID NHD NQ",D簇特异性序列为"XGN NHD X"。A、B、C、D四簇都含有保守的NHD(天冬酰胺-组氨酸-天冬氨酸)序列,但序列两端氨基酸种类不同。结论:56个节肢动物α-淀粉酶分为4簇,每簇都有其特异性序列,但都含有保守序列NHD。  相似文献   

10.
Traditional phylogenetic analysis is based on multiple sequence alignment. With the development of worldwide genome sequencing project, more and more completely sequenced genomes become available. However, traditional sequence alignment tools are impossible to deal with large-scale genome sequence. So, the development of new algorithms to infer phylogenetic relationship without alignment from whole genome information represents a new direction of phylogenetic study in the post-genome era. In the present study, a novel algorithm based on BBC (base-base correlation) is proposed to analyze the phylogenetic relationships of HEV (Hepatitis E virus). When 48 HEV genome sequences are analyzed, the phylogenetic tree that is constructed based on BBC algorithm is well consistent with that of previous study. When compared with methods of sequence alignment, the merit of BBC algorithm appears to be more rapid in calculating evolutionary distances of whole genome sequence and not requires any human intervention, such as gene identification, parameter selection. BBC algorithm can serve as an alternative to rapidly construct phylogenetic trees and infer evolutionary relationships.  相似文献   

11.
Ribosomal DNA internal transcribed spacers (ITS) and partial external transcribed spacers (ETSf) are popularly used to infer evolutionary hypotheses. However, there is generally little consideration given to the secondary structures of these small RNA molecules and their potential effects on sequence alignment and phylogenetic analyzes. Intergeneric relationships amongst three of the four major lineages in the Sapindaceae, the Dodonaeoideae, Hippcastanoideae and Xanthoceroideae were assessed by firstly, generating secondary structure predictions for ITS and partial ETSf sequences, and then these predictions were used to assist alignment of the sequences. Secondly, the alignment was analyzed using RNA specific models of sequence evolution that account for the variation in nucleotide evolution in the independent loops and covariating stems regions of the ribosomal spacers. These models and phylogeny drawn from these analyzes were compared with that from analyzes using ‘traditional’ 4-state models and previous plastid analyzes. These analyzes identified that paired-site models developed to deal specifically with stem structures in RNA encoding sequences more appropriately account for the evolutionary history of the sequences than traditional 4-state substitution models.  相似文献   

12.
Alignment of nucleotide and/or amino acid sequences is a fundamental component of sequence‐based molecular phylogenetic studies. Here we examined how different alignment methods affect the phylogenetic trees that are inferred from the alignments. We used simulations to determine how alignment errors can lead to systematic biases that affect phylogenetic inference from those sequences. We compared four approaches to sequence alignment: progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment and direct optimization. When taking into account branch support, implied alignments produced by direct optimization were found to show the most extreme behaviour (based on the alignment programs for which nearly equivalent alignment parameters could be set) in that they provided the strongest support for the correct tree in the simulations in which it was easy to resolve the correct tree and the strongest support for the incorrect tree in our long‐branch‐attraction simulations. When applied to alignment‐sensitive process partitions with different histories, direct optimization showed the strongest mutual influence between the process partitions when they were aligned and phylogenetically analysed together, which makes detecting recombination more difficult. Simultaneous alignment performed well relative to direct optimization and progressive pairwise alignment across all simulations. Rather than relying upon methods that integrate alignment and tree search into a single step without accounting for alignment uncertainty, as with implied alignments, we suggest that simultaneous alignment using the similarity criterion, within the context of information available on biological processes and function, be applied whenever possible for sequence‐based phylogenetic analyses.  相似文献   

13.
14.
Summary Partial nucleotide sequences for the 5S and 5.8S rRNAs from the dinoflagellateCrypthecodinium cohnii have been determined, using a rapid chemical sequencing method, for the purpose of studying dinoflagellate phylogeny. The 5S RNA sequence shows the most homology (75%) with the 5S sequences of higher animals and the least homology (< 60%) with prokaryotic sequences. In addition, it lacks certain residues which are highly conserved in prokaryotic molecules but are generally missing in eukaryotes. These findings suggest a distant relationship between dinoflagellates and the prokaryotes. Using two different sequence alignments and several different methods for selecting an optimum phylogenetic tree for a collection of 5S sequences including higher plants and animals, fungi, and bacteria in addition to theC. cohnii sequence, the dinoflagellate lineage was joined to the tree at the point of the plant-animal divergence, well above the branching point of the fungi. This result is of interest because it implies that the well-documented absence in dinoflagellates of histones and the typical nucleosomal subunit structure of eukaryotic chromatin is the result of secondary loss. and not anindication of an extremely primitive state, as was previously suggested. Computer simulations of 5S RNA evolution have been carried out in order to demonstrate that the above-mentioned phylogenetic placement is not likely to be the result of random sequence convergence.We have also constructed a phylogeny for 5.8S RNA sequences in which plants, animals, fungi and the dinoflagellates are again represented. While the order of branching on this tree is the same as in the 5S tree for the organisms represented, because it lacks prokaryotes, the 5.8S tree cannot be considered a strong independent confirmation of the 5S result. Moreover, 5.8S RNA appears to have experienced very different rates of evolution in different lineages indicating that it may not be the best indicator of evolutionary relationships.We have also considered the existing biological data regarding dinoflagellate evolution in relation to our molecular phylogenetic evidence.  相似文献   

15.
Many prokaryotes have multiple ribosomal RNA operons. Generally, sequence differences between small subunit (SSU) rRNA genes are minor (<1%) and cause little concern for phylogenetic inference or environmental diversity studies. For Halobacteriales, an order of extremely halophilic, aerobic Archaea, within-genome SSU rRNA sequence divergence can exceed 5%, rendering phylogenetic assignment problematic. The RNA polymerase B' subunit gene (rpoB') is a single-copy conserved gene that may be an appropriate alternative phylogenetic marker for Halobacteriales. We sequenced a fragment of the rpoB' gene from 21 species, encompassing 15 genera of Halobacteriales. To examine the utility of rpoB' as a phylogenetic marker in Halobacteriales, we investigated three properties of rpoB' trees: the variation in resolution between trees inferred from the rpoB' DNA and RpoB' protein alignment, the degree of mutational saturation between taxa, and congruence with the SSU rRNA tree. The rpoB' DNA and protein trees were for the most part congruent and consistently recovered two well-supported monophyletic groups, the clade I and clade II haloarchaea, within a collection of less well resolved Halobacteriales lineages. A comparison of observed versus inferred numbers of substitution revealed mutational saturation in the rpoB' DNA data set, particularly between more distant species. Thus, the RpoB' protein sequence may be more reliable than the rpoB' DNA sequence for inferring Halobacteriales phylogeny. AU tests of tree selection indicated the trees inferred from rpoB' DNA and protein alignments were significantly incongruent with the SSU rRNA tree. We discuss possible explanations for this incongruence, including tree reconstruction artifact, differential paralog sampling, and lateral gene transfer. This is the first study of Halobacteriales evolution based on a marker other than the SSU rRNA gene. In addition, we present a valuable phylogenetic framework encompassing a broad diversity of Halobacteriales, in which novel sequences can be inserted for evolutionary, ecological, or taxonomic investigations.  相似文献   

16.
The availability of the complete genome sequence of Mycobacterium tuberculosis allows its phylogenetic analysis based on the whole genome rather than single genes. As a genome-based tree is more representative of whole organisms and less inconsistent than single-gene trees, it could provide a better index for interpretation and inference about the origin and nature of species. The standard bacterial phylogeny based on 16S ribosomal RNA sequence comparison shows that M. tuberculosis is more related to Gram-positive than to Gram-negative bacteria. Our results based on genome comparison in terms of shared orthologous genes challenge this implication. We demonstrate that M. tuberculosis is more related to Gram-negative than to Gram-positive bacteria by a quantitative analysis on the genome tree. The numerical distance data derived from genome comparison and those from 16S rRNA comparison show high significant correlation, implying that conserved gene content carries a strong phylogenetic signature in evolution.  相似文献   

17.
The reconstruction and synthesis of ancestral RNAs is a feasible goal for paleogenetics. This will require new bioinformatics methods, including a robust statistical framework for reconstructing histories of substitutions, indels and structural changes. We describe a “transducer composition” algorithm for extending pairwise probabilistic models of RNA structural evolution to models of multiple sequences related by a phylogenetic tree. This algorithm draws on formal models of computational linguistics as well as the 1985 protosequence algorithm of David Sankoff. The output of the composition algorithm is a multiple-sequence stochastic context-free grammar. We describe dynamic programming algorithms, which are robust to null cycles and empty bifurcations, for parsing this grammar. Example applications include structural alignment of non-coding RNAs, propagation of structural information from an experimentally-characterized sequence to its homologs, and inference of the ancestral structure of a set of diverged RNAs. We implemented the above algorithms for a simple model of pairwise RNA structural evolution; in particular, the algorithms for maximum likelihood (ML) alignment of three known RNA structures and a known phylogeny and inference of the common ancestral structure. We compared this ML algorithm to a variety of related, but simpler, techniques, including ML alignment algorithms for simpler models that omitted various aspects of the full model and also a posterior-decoding alignment algorithm for one of the simpler models. In our tests, incorporation of basepair structure was the most important factor for accurate alignment inference; appropriate use of posterior-decoding was next; and fine details of the model were least important. Posterior-decoding heuristics can be substantially faster than exact phylogenetic inference, so this motivates the use of sum-over-pairs heuristics where possible (and approximate sum-over-pairs). For more exact probabilistic inference, we discuss the use of transducer composition for ML (or MCMC) inference on phylogenies, including possible ways to make the core operations tractable.  相似文献   

18.

Background

The analysis of RNA sequences, once a small niche field for a small collection of scientists whose primary emphasis was the structure and function of a few RNA molecules, has grown most significantly with the realizations that 1) RNA is implicated in many more functions within the cell, and 2) the analysis of ribosomal RNA sequences is revealing more about the microbial ecology within all biological and environmental systems. The accurate and rapid alignment of these RNA sequences is essential to decipher the maximum amount of information from this data.

Methods

Two computer systems that utilize the Gutell lab's RNA Comparative Analysis Database (rCAD) were developed to align sequences to an existing template alignment available at the Gutell lab's Comparative RNA Web (CRW) Site. Multiple dimensions of cross-indexed information are contained within the relational database - rCAD, including sequence alignments, the NCBI phylogenetic tree, and comparative secondary structure information for each aligned sequence. The first program, CRWAlign-1 creates a phylogenetic-based sequence profile for each column in the alignment. The second program, CRWAlign-2 creates a profile based on phylogenetic, secondary structure, and sequence information. Both programs utilize their profiles to align new sequences into the template alignment.

Results

The accuracies of the two CRWAlign programs were compared with the best template-based rRNA alignment programs and the best de-novo alignment programs. We have compared our programs with a total of eight alternative alignment methods on different sets of 16S rRNA alignments with sequence percent identities ranging from 50% to 100%. Both CRWAlign programs were superior to these other programs in accuracy and speed.

Conclusions

Both CRWAlign programs can be used to align the very extensive amount of RNA sequencing that is generated due to the rapid next-generation sequencing technology. This latter technology is augmenting the new paradigm that RNA is intimately implicated in a significant number of functions within the cell. In addition, the use of bacterial 16S rRNA sequencing in the identification of the microbiome in many different environmental systems creates a need for rapid and highly accurate alignment of bacterial 16S rRNA sequences.
  相似文献   

19.
Phylogenetic analyses of gene and protein sequences have led to two major competing views of the universal phylogeny, the evolutionary tree relating the three kinds of living organisms, Bacteria, Archaea, and Eukarya. In the first scheme, called "the archaebacterial tree, " organisms of the same type are clustered together. In the second scenario, called "the eocyte tree," the archaeal phylum of Crenarchaeota is more closely related to eukaryotes than are other Archaea. A major property of the evolution of functional ribosomal and protein-encoding genes is that the rate of nucleotide and amino acid substitution varies across sequence sites. Here, using distance-based and maximum-likelihood methods, we show that universal phylogenies of ribosomal RNAs and RNA polymerases built by ignoring this variation are biased toward the archaebacterial tree because of attraction between long branches. In contrast, taking among-site rate variability into account gives support for the eocyte tree.  相似文献   

20.
The substitution rate of the individual positions in an alignment of 750 eukaryotic small ribosomal subunit RNA sequences was estimated. From the resulting rate distribution, an equation was derived that gives a more precise relationship between sequence dissimilarity and evolutionary distance than hitherto available. Trees constructed on the basis of evolutionary distances computed by this new equation for small ribosomal subunit RNA sequences from ciliates, apicomplexans, dinoflagellates, oomycetes, hyphochytriomycetes, bicosoecids, labyrinthuloids, and heterokont algae show a more consistent tree topology than trees constructed in the absence of substitution rate calibration. In particular, they do not suffer from anomalies caused by the presence of extremely long branches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号