首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

2.
Traditional phylogenetic analysis is based on multiple sequence alignment. With the development of worldwide genome sequencing project, more and more completely sequenced genomes become available. However, traditional sequence alignment tools are impossible to deal with large-scale genome sequence. So, the development of new algorithms to infer phylogenetic relationship without alignment from whole genome information represents a new direction of phylogenetic study in the post-genome era. In the present study, a novel algorithm based on BBC (base-base correlation) is proposed to analyze the phylogenetic relationships of HEV (Hepatitis E virus). When 48 HEV genome sequences are analyzed, the phylogenetic tree that is constructed based on BBC algorithm is well consistent with that of previous study. When compared with methods of sequence alignment, the merit of BBC algorithm appears to be more rapid in calculating evolutionary distances of whole genome sequence and not requires any human intervention, such as gene identification, parameter selection. BBC algorithm can serve as an alternative to rapidly construct phylogenetic trees and infer evolutionary relationships.  相似文献   

3.
Microsporidia branch at the base of eukaryotic phylogenies inferred from translation elongation factor 1alpha (EF-1alpha) sequences. Because these parasitic eukaryotes are fungi (or close relatives of fungi), it is widely accepted that fast-evolving microsporidian sequences are artifactually "attracted" to the long branch leading to the archaebacterial (outgroup) sequences ("long-branch attraction," or "LBA"). However, no previous studies have explicitly determined the reason(s) why the artifactual allegiance of microsporidia and archaebacteria ("M + A") is recovered by all phylogenetic methods, including maximum likelihood, a method that is supposed to be resistant to classical LBA. Here we show that the M + A affinity can be attributed to those alignment sites associated with large differences in evolutionary site rates between the eukaryotic and archaebacterial subtrees. Therefore, failure to model the significant evolutionary rate distribution differences (covarion shifts) between the ingroup and outgroup sequences is apparently responsible for the artifactual basal position of microsporidia in phylogenetic analyses of EF-1alpha sequences. Currently, no evolutionary model that accounts for discrete changes in the site rate distribution on particular branches is available for either protein or nucleotide level phylogenetic analysis, so the same artifacts may affect many other "deep" phylogenies. Furthermore, given the relative similarity of the site rate patterns of microsporidian and archaebacterial EF-1alpha proteins ("parallel site rate variation"), we suggest that the microsporidian orthologs may have lost some eukaryotic EF-1alpha-specific nontranslational functions, exemplifying the extreme degree of reduction in this parasitic lineage.  相似文献   

4.

Background  

The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools.  相似文献   

5.

Background

Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis.

Results

In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent.

Conclusion

Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods.  相似文献   

6.
刘超洋  庄文颖 《菌物学报》2011,30(6):912-919
探讨了核糖体小亚基二级结构对真菌系统发育分析的影响。对用不同方法构建的系统发育树进行比较,结果表明结合二级结构信息的分析方法较传统方法产生了更为合理的拓扑结构。二级结构信息除用于优化序列比对外,还需整合到核酸替代模型中;恰当的序列比对方法、进化模型和建树运算法则有助于更加准确地揭示类群之间的亲缘关系。  相似文献   

7.
Reconstructing the evolutionary history of protein sequences will provide a better understanding of divergence mechanisms of protein superfamilies and their functions. Long-term protein evolution often includes dynamic changes such as insertion, deletion, and domain shuffling. Such dynamic changes make reconstructing protein sequence evolution difficult and affect the accuracy of molecular evolutionary methods, such as multiple alignments and phylogenetic methods. Unfortunately, currently available simulation methods are not sufficiently flexible and do not allow biologically realistic dynamic protein sequence evolution. We introduce a new method, indel-Seq-Gen (iSG), that can simulate realistic evolutionary processes of protein sequences with insertions and deletions (indels). Unlike other simulation methods, iSG allows the user to simulate multiple subsequences according to different evolutionary parameters, which is necessary for generating realistic protein families with multiple domains. iSG tracks all evolutionary events including indels and outputs the "true" multiple alignment of the simulated sequences. iSG can also generate a larger sequence space by allowing the use of multiple related root sequences. With all these functions, iSG can be used to test the accuracy of, for example, multiple alignment methods, phylogenetic methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein family classification methods. We empirically evaluated the performance of iSG against currently available methods by simulating the evolution of the G protein-coupled receptor and lipocalin protein families. We examined their true multiple alignments, reconstruction of the transmembrane regions and beta-strands, and the results of similarity search against a protein database using the simulated sequences. We also presented an example of using iSG for examining how phylogenetic reconstruction is affected by high indel rates.  相似文献   

8.
Alignment ambiguity is a widespread problem in molecular evolutionary studies that has received insufficient attention. Most studies ignore such regions by deleting them before analyses, even though alignment-ambiguous regions can contain useful phylogenetic and evolutionary information. The alignment ambiguity might affect only one taxon, the region being readily alignable and phylogenetically informative across all other taxa. Alternatively, all possible alignments can consistently imply certain relationships. Because they are usually the most rapidly evolving regions, alignment-ambiguous regions might be those that are most able to resolve closely spaced divergences and contribute to estimates of branch lengths, evolutionary rates and divergence times. Three methods to incorporate such regions into phylogenetic and evolutionary analyses have been devised. The multiple analysis method evaluates each plausible alignment separately and seeks areas of congruence among the resultant trees, whereas the elision method combines all plausible alignments into a single analysis. Fragment-level alignment (= fixed states, INAASE) treats the entire unalignable section as a single but highly complex multistate character. Although these methods still need refining, they are preferable to discarding large portions of hard-earned and potentially informative sequence data.  相似文献   

9.
A comparison of MSA tools   总被引:3,自引:0,他引:3  
Multiple sequence alignment (MSA) is essential in phylogenetic, evolutionary and functional analysis. Several MSA tools are available in the literature. Here, we use several MSA tools such as ClustalX, Align-m, T-Coffee, SAGA, ProbCons, MAFFT, MUSCLE and DIALIGN to illustrate comparative phylogenetic trees analysis for two datasets. Results show that there is no single MSA tool that consistently outperforms the rest in producing reliable phylogenetic trees.  相似文献   

10.
A new sequence distance measure for phylogenetic tree construction   总被引:5,自引:0,他引:5  
MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors.  相似文献   

11.
Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.  相似文献   

12.
A "Long Indel" model for evolutionary sequence alignment   总被引:7,自引:0,他引:7  
We present a new probabilistic model of sequence evolution, allowing indels of arbitrary length, and give sequence alignment algorithms for our model. Previously implemented evolutionary models have allowed (at most) single-residue indels or have introduced artifacts such as the existence of indivisible "fragments." We compare our algorithm to these previous methods by applying it to the structural homology dataset HOMSTRAD, evaluating the accuracy of (1) alignments and (2) evolutionary time estimates. With our method, it is possible (for the first time) to integrate probabilistic sequence alignment, with reliability indicators and arbitrary gap penalties, in the same framework as phylogenetic reconstruction. Our alignment algorithm requires that we evaluate the likelihood of any specific path of mutation events in a continuous-time Markov model, with the event times integrated out. To this effect, we introduce a "trajectory likelihood" algorithm (Appendix A). We anticipate that this algorithm will be useful in more general contexts, such as Markov Chain Monte Carlo simulations.  相似文献   

13.
14.
15.
We have identified and characterized the full length cDNA sequence of macrophage migration inhibitory factor (MIF) from the American dog tick, Dermacentor variabilis. The nucleotide and putative amino acid sequences from this study shared a high level of sequence conservation with other tick MIFs. The bioinformatics analysis showed across species conservation of the MIF amino acid sequence in ticks, insects and nematodes. The multiple sequence alignment identified Pro 1, 3, 55; Thr 7, 112; Asn 8, 72; Ile 64, 96; Gly 65, 110, Ser 63 and Leu 87 amino acids to be highly conserved among the sequences selected for this study. Tick MIF does not have the oxidoreductase domain as found in MIFs from other animals suggesting that tick MIF is not capable of performing as an oxidoreductase. The phylogenetic analysis revealed that tick MIFs share a closer evolutionary proximity to parasitic nematode MIFs than to insect MIFs.  相似文献   

16.
BEAST: Bayesian evolutionary analysis by sampling trees   总被引:2,自引:0,他引:2  

Background  

The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented.  相似文献   

17.
A comparative analysis is presented of 24 known amino acid sequences of RNA-dependent RNA polymerases of positive strand RNA viruses infecting animals, plants and bacteria. Using a newly proposed methodology of group alignment for weakly similar sequences, evolutionary conserved fragments of all these proteins were unambiguously aligned. A unique pattern (consensus) of 7 invariant amino acid residues was revealed which is absent from the sequences of other RNA and DNA polymerases and is thought to unequivocally identify the RNA-dependent RNA polymerases of positive strand RNA viruses. Based on the obtained alignment a tentative phylogenetic tree of viral RNA polymerases was constructed for the first time. The RNA-dependent RNA polymerases of positive strand RNA viruses are concluded to comprise a distinct family of evolutionary related proteins.  相似文献   

18.
Phylogenetic signal, evolutionary process, and rate   总被引:1,自引:0,他引:1  
A recent advance in the phylogenetic comparative analysis of continuous traits has been explicit, model-based measurement of "phylogenetic signal" in data sets composed of observations collected from species related by a phylogenetic tree. Phylogenetic signal is a measure of the statistical dependence among species' trait values due to their phylogenetic relationships. Although phylogenetic signal is a measure of pattern (statistical dependence), there has nonetheless been a widespread propensity in the literature to attribute this pattern to aspects of the evolutionary process or rate. This may be due, in part, to the perception that high evolutionary rate necessarily results in low phylogenetic signal; and, conversely, that low evolutionary rate or stabilizing selection results in high phylogenetic signal (due to the resulting high resemblance between related species). In this study, we use individual-based numerical simulations on stochastic phylogenetic trees to clarify the relationship between phylogenetic signal, rate, and evolutionary process. Under the simplest model for quantitative trait evolution, homogeneous rate genetic drift, there is no relation between evolutionary rate and phylogenetic signal. For other circumstances, such as functional constraint, fluctuating selection, niche conservatism, and evolutionary heterogeneity, the relationship between process, rate, and phylogenetic signal is complex. For these reasons, we recommend against interpretations of evolutionary process or rate based on estimates of phylogenetic signal.  相似文献   

19.

Background  

Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance.  相似文献   

20.
Phylogenetic studies based on DNA sequences typically ignore the potential occurrence of recombination, which may produce different alignment regions with different evolutionary histories. Traditional phylogenetic methods assume that a single history underlies the data. If recombination is present, can we expect the inferred phylogeny to represent any of the underlying evolutionary histories? We examined this question by applying traditional phylogenetic reconstruction methods to simulated recombinant sequence alignments. The effect of recombination on phylogeny estimation depended on the relatedness of the sequences involved in the recombinational event and on the extent of the different regions with different phylogenetic histories. Given the topologies examined here, when the recombinational event was ancient, or when recombination occurred between closely related taxa, one of the two phylogenies underlying the data was generally inferred. In this scenario, the evolutionary history corresponding to the majority of the positions in the alignment was generally recovered. Very different results were obtained when recombination occurred recently among divergent taxa. In this case, when the recombinational breakpoint divided the alignment in two regions of similar length, a phylogeny that was different from any of the true phylogenies underlying the data was inferred.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号