首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
2.
We use a multigene data set (the mitochondrial locus and nine nuclear gene regions) to test phylogenetic relationships in the South American "lava lizards" (genus Microlophus) and describe a strategy for aligning noncoding sequences that accounts for differences in tempo and class of mutational events. We focus on seven nuclear introns that vary in size and frequency of multibase length mutations (i.e., indels) and present a manual alignment strategy that incorporates insertions and deletions (indels) for each intron. Our method is based on mechanistic explanations of intron evolution that does not require a guide tree. We also use a progressive alignment algorithm (Probabilistic Alignment Kit; PRANK) and distinguishes insertions from deletions and avoids the "gapcost" conundrum. We describe an approach to selecting a guide tree purged of ambiguously aligned regions and use this to refine PRANK performance. We show that although manual alignment is successful in finding repeat motifs and the most obvious indels, some regions can only be subjectively aligned, and there are limits to the size and complexity of a data matrix for which this approach can be taken. PRANK alignments identified more parsimony-informative indels while simultaneously increasing nucleotide identity in conserved sequence blocks flanking the indel regions. When comparing manual and PRANK with two widely used methods (CLUSTAL, MUSCLE) for the alignment of the most length-variable intron, only PRANK recovered a tree congruent at deeper nodes with the combined data tree inferred from all nuclear gene regions. We take this concordance as an objective function of alignment quality and present a strongly supported phylogenetic hypothesis for Microlophus relationships. From this hypothesis we show that (1) a coded indel data partition derived from the PRANK alignment contributed significantly to nodal support and (2) the indel data set permitted detection of significant conflict between mitochondrial and nuclear data partitions, which we hypothesize arose from secondary contact of distantly related taxa, followed by hybridization and mtDNA introgression.  相似文献   

3.
Substitution rates are one of the most fundamental parameters in a phylogenetic analysis and are represented in phylogenetic models as the branch lengths on a tree. Variation in substitution rates across an alignment of molecular sequences is well established and likely caused by variation in functional constraint across the genes encoded in the sequences. Rate variation across alignment sites is important to accommodate in a phylogenetic analysis; failure to account for across-site rate variation can cause biased estimates of phylogeny or other model parameters. Traditionally, rate variation across sites has been modeled by treating the rate for a site as a random variable drawn from some probability distribution (such as the gamma probability distribution) or by partitioning sites to different rate classes and estimating the rate for each class independently. We consider a different approach, related to site-specific models in which sites are partitioned to rate classes. However, instead of treating the partitioning scheme in which sites are assigned to rate classes as a fixed assumption of the analysis, we treat the rate partitioning as a random variable under a Dirichlet process prior. We find that the Dirichlet process prior model for across-site rate variation fits alignments of DNA sequence data better than commonly used models of across-site rate variation. The method appears to identify the underlying codon structure of protein-coding genes; rate partitions that were sampled by the Markov chain Monte Carlo procedure were closer to a partition in which sites are assigned to rate classes by codon position than to randomly permuted partitions but still allow for additional variability across sites.  相似文献   

4.
MOTIVATION: Algorithm development for finding typical patterns in sequences, especially multiple pseudo-repeats (pseudo-periodic regions), is at the core of many problems arising in biological sequence and structure analysis. In fact, one of the most significant features of biological sequences is their high quasi-repetitiveness. Variation in the quasi-repetitiveness of genomic and proteomic texts demonstrates the presence and density of different biologically important information. It is very important to develop sensitive automatic computational methods for the identification of pseudo-periodic regions of sequences through which we can infer, describe and understand biological properties, and seek precise molecular details of biological structures, dynamics, interactions and evolution. RESULTS: We develop a novel, powerful computational tool for partitioning a sequence to pseudo-periodic regions. The pseudo-periodic partition is defined as a partition, which intuitively has the minimal bias to some perfect-periodic partition of the sequence based on the evolutionary distance. We devise a quadratic time and space algorithm for detecting a pseudo-periodic partition for a given sequence, which actually corresponds to the shortest path in the main diagonal of the directed (acyclic) weighted graph constructed by the Smith-Waterman self-alignment of the sequence. We use several typical examples to demonstrate the utilization of our algorithm and software system in detecting functional or structural domains and regions of proteins. A big advantage of our software program is that there is a parameter, the granularity factor, associated with it and we can freely choose a biological sequence family as a training set to determine the best parameter. In general, we choose all repeats (including many pseudo-repeats) in the SWISS-PROT amino acid sequence database as a typical training set. We show that the granularity factor is 0.52 and the average agreement accuracy of pseudo-periodic partitions, detected by our software for all pseudo-repeats in the SWISS-PROT database, is as high as 97.6%.  相似文献   

5.
刘超洋  庄文颖 《菌物学报》2013,32(3):563-573
在使用rRNA基因进行系统发育分析过程中,不同位点间进化速度的差异性可能是导致系统误差的一个重要原因。以52个真菌为研究对象,利用rRNA二级结构特征构建分区策略,探讨不同分区策略对贝叶斯分析的影响。结果显示各结构分区的最优核酸替代模型及其参数与分区类型密切相关。与传统的贝叶斯方法相比,使用结构环的分区策略对结果没有显著影响,而引入臂元素的方法却导致更高的边际似然值和支持率。此外,不考虑结构特征,简单的增加子分区数量的分区策略尽管也能导致贝叶斯因素值的增加,却没有提高解决亲缘关系的能力,说明一个合理的分区策略应该基于生物学功能(或二级结构特征)而非纯数学因素。  相似文献   

6.
The evolution of homologous sequences affected by recombination or gene conversion cannot be adequately explained by a single phylogenetic tree. Many tree-based methods for sequence analysis, for example, those used for detecting sites evolving nonneutrally, have been shown to fail if such phylogenetic incongruity is ignored. However, it may be possible to propose several phylogenies that can correctly model the evolution of nonrecombinant fragments. We propose a model-based framework that uses a genetic algorithm to search a multiple-sequence alignment for putative recombination break points, quantifies the level of support for their locations, and identifies sequences or clades involved in putative recombination events. The software implementation can be run quickly and efficiently in a distributed computing environment, and various components of the methods can be chosen for computational expediency or statistical rigor. We evaluate the performance of the new method on simulated alignments and on an array of published benchmark data sets. Finally, we demonstrate that prescreening alignments with our method allows one to analyze recombinant sequences for positive selection.  相似文献   

7.
In many phylogenetic problems, assuming that species have evolved from a common ancestor by a simple branching process is unrealistic. Reticulate phylogenetic models, however, have been largely neglected because the concept of reticulate evolution have not been supported by using appropriate analytical tools and software. The reticulate model can adequately describe such complicated mechanisms as hybridization between species or lateral gene transfer in bacteria. In this paper, we describe a new algorithm for inferring reticulate phylogenies from evolutionary distances among species. The algorithm is capable of detecting contradictory signals encompassed in a phylogenetic tree and identifying possible reticulate events that may have occurred during evolution. The algorithm produces a reticulate phylogeny by gradually improving upon the initial solution provided by a phylogenetic tree model. The new algorithm is compared to the popular SplitsGraph method in a reanalysis of the evolution of photosynthetic organisms. A computer program to construct and visualize reticulate phylogenies, called T-Rex (Tree and Reticulogram Reconstruction), is available to researchers at the following URL: www.fas.umontreal.ca/biol/casgrain/en/labo/t-rex.  相似文献   

8.
Reticulate, or non-bifurcating, evolution is now recognized as an important phenomenon shaping the histories of many organisms. It appears to be particularly common in plants, especially in ferns, which have relatively few barriers to intra- and interspecific hybridization. Reticulate evolutionary patterns have been recognized in many fern groups, though very few have been studied rigorously using modern molecular phylogenetic techniques in order to determine the causes of the reticulate patterns. In the current study, we examine patterns of branching and reticulate evolution in the genus Dryopteris, the woodferns. The North American members of this group have long been recognized as a classic example of reticulate evolution in plants, and we extend analysis of the genus to all 30 species in the New World, as well as numerous taxa from other regions. We employ sequence data from the plastid and nuclear genomes and use maximum parsimony (MP), maximum likelihood (ML), Bayesian inference (BI), and divergence time analyses to explore the relationships of New World Dryopteris to other regions and to reconstruct the timing and events which may have led to taxa displaying reticulate rather than strictly branching histories. We find evidence for reticulation among both the North and Central/South American groups of species, and our data support a classic hypothesis for reticulate evolution via allopolyploid speciation in the North America taxa, including an extinct diploid progenitor in this group. In the Central and South American species, we find evidence of extensive reticulation involving unknown ancestors from Asia, and we reject deep coalescent processes such as incomplete lineage sorting in favor of more recent intercontinental hybridization and chloroplast capture as an explanation for the origin of the Latin American reticulate taxa.  相似文献   

9.
The number of nuclear small subunit (SSU) ribosomal RNA (rRNA) sequences for Nematoda has increased dramatically in recent years, and although their use in constructing phylogenies has also increased, relatively little attention has been given to their alignment. Here we examined the sensitivity of the nematode SSU data set to different alignment parameters and to the removal of alignment ambiguous regions. Ten alignments were created with CLUSTAL W using different sets of alignment parameters (10 full alignments), and each alignment was examined by eye and alignment ambiguous regions were removed (creating 10 reduced alignments). These alignment ambiguous regions were analyzed as a third type of data set, culled alignments. Maximum parsimony, neighbor-joining, and parsimony bootstrap analyses were performed. The resulting phylogenies were compared to each other by the symmetric difference distance tree comparison metric (SymD). The correlation of the phylogenies with the alignment parameters was tested by comparing matrices from SymD with corresponding matrices of Manhattan distances representing the alignment parameters. Differences among individual parsimony trees from the full alignments were frequently correlated with the differences among alignment parameters (580/1000 tests), as were trees from the culled alignments (403/1000 tests). Differences among individual parsimony trees from the reduced alignments were less frequently correlated with the differences among alignment parameters (230/1000 tests). Differences among majority-rule consensus trees (50%) from the parsimony analysis of the full alignments were significantly correlated with the differences among alignment parameters, whereas consensus trees from the reduced and culled analyses were not correlated with the alignment parameters. These patterns of correlation confirm that choice of alignment parameters has the potential to bias the resultant phylogenies for the nematode SSU data set, and suggest that the removal of alignment ambiguous regions reduces this effect. Finally, we discuss the implications of conservative phylogenetic hypotheses for Nematoda produced by exploring alignment space and removing alignment ambiguous regions for SSU rDNA.  相似文献   

10.
In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades. We compared two main approaches for constructing MSA based on (1) the knowledge of the secondary (2D) structure of ribosomal RNA (rRNA) genes, and (2) automatic construction of MSA by four alignment programs characterized by different algorithms and evaluation methods, CLUSTAL, MAFFT, MUSCLE, and SAM. In the primary fungal sequences of the two functional rRNA genes, the nuclear small and large ribosomal subunits (18 S and 28 S), we identified four and six, respectively, highly variable regions, which correspond mainly to hairpin loops in the 2D structure. These loops are often positioned in expansion segments, which are missing or are not completely developed in the Archaeal and Eubacterial kingdoms. Proper sorting of these sites was a key for constructing an accurate MSA. We utilized DNA sequences from 28 S as an example for one-gene analysis. Five different MSAs were created and analyzed with maximum parsimony and maximum likelihood methods. The phylogenies inferred from the alignments improved with 2D structure with identified homologous segments, and those constructed using the MAFFT alignment program, with all highly variable regions included, provided the most reliable phylograms with higher bootstrap support for the majority of clades. We illustrate and provide examples demonstrating that re-evaluating ambiguous positions in the consensus sequences using 2D structure and covariance is a promising means in order to improve the quality and reliability of sequence alignments.  相似文献   

11.
The nonsynonymous to synonymous substitution rate ratio (omega = d(N)/d(S)) provides a sensitive measure of selective pressure at the protein level, with omega values <1, =1, and >1 indicating purifying selection, neutral evolution, and diversifying selection, respectively. Maximum likelihood models of codon substitution developed recently account for variable selective pressures among amino acid sites by employing a statistical distribution for the omega ratio among sites. Those models, called random-sites models, are suitable when we do not know a priori which sites are under what kind of selective pressure. Sometimes prior information (such as the tertiary structure of the protein) might be available to partition sites in the protein into different classes, which are expected to be under different selective pressures. It is then sensible to use such information in the model. In this paper, we implement maximum likelihood models for prepartitioned data sets, which account for the heterogeneity among site partitions by using different omega parameters for the partitions. The models, referred to as fixed-sites models, are also useful for combined analysis of multiple genes from the same set of species. We apply the models to data sets of the major histocompatibility complex (MHC) class I alleles from human populations and of the abalone sperm lysin genes. Structural information is used to partition sites in MHC into two classes: those in the antigen recognition site (ARS) and those outside. Positive selection is detected in the ARS by the fixed-sites models. Similarly, sites in lysin are classified into the buried and solvent-exposed classes according to the tertiary structure, and positive selection was detected at the solvent-exposed sites. The random-sites models identified a number of sites under positive selection in each data set, confirming and elaborating the results of the fixed-sites models. The analysis demonstrates the utility of the fixed-sites models, as well as the power of previous random-sites models, which do not use the prior information to partition sites.  相似文献   

12.
A new sequence distance measure for phylogenetic tree construction   总被引:5,自引:0,他引:5  
MOTIVATION: Most existing approaches for phylogenetic inference use multiple alignment of sequences and assume some sort of an evolutionary model. The multiple alignment strategy does not work for all types of data, e.g. whole genome phylogeny, and the evolutionary models may not always be correct. We propose a new sequence distance measure based on the relative information between the sequences using Lempel-Ziv complexity. The distance matrix thus obtained can be used to construct phylogenetic trees. RESULTS: The proposed approach does not require sequence alignment and is totally automatic. The algorithm has successfully constructed consistent phylogenies for real and simulated data sets. AVAILABILITY: Available on request from the authors.  相似文献   

13.
Phylogenetic studies based on DNA sequences typically ignore the potential occurrence of recombination, which may produce different alignment regions with different evolutionary histories. Traditional phylogenetic methods assume that a single history underlies the data. If recombination is present, can we expect the inferred phylogeny to represent any of the underlying evolutionary histories? We examined this question by applying traditional phylogenetic reconstruction methods to simulated recombinant sequence alignments. The effect of recombination on phylogeny estimation depended on the relatedness of the sequences involved in the recombinational event and on the extent of the different regions with different phylogenetic histories. Given the topologies examined here, when the recombinational event was ancient, or when recombination occurred between closely related taxa, one of the two phylogenies underlying the data was generally inferred. In this scenario, the evolutionary history corresponding to the majority of the positions in the alignment was generally recovered. Very different results were obtained when recombination occurred recently among divergent taxa. In this case, when the recombinational breakpoint divided the alignment in two regions of similar length, a phylogeny that was different from any of the true phylogenies underlying the data was inferred.  相似文献   

14.
It is at present difficult to accurately position gaps in sequence alignment and to determine substructural homology in structure alignment when reconstructing phylogenies based on highly divergent sequences. Therefore, we have developed a new strategy for inferring phylogenies based on highly divergent sequences. In this new strategy, the whole secondary structure presented as a string in bracket notation is used as phylogenetic characters to infer phylogenetic relationships. It is no longer necessary to decompose the secondary structure into homologous substructural components. In this study, reliable phylogenetic relationships of eight species in Pectinidae were inferred from the structure alignment, but not from sequence alignment, even with the aid of structural information. The results suggest that this new strategy should be useful for inferring phylogenetic relationships based on highly divergent sequences. Moreover, the structural evolution of ITS1 in Pectinidae was also investigated. The whole ITS1 structure could be divided into four structural domains. Compensatory changes were found in all four structural domains. Structural motifs in these domains were identified further. These motifs, especially those in D2 and D3, may have important functions in the maturation of rRNAs.  相似文献   

15.
The performance of the computer program for phyloge netic analysis, POY, and its two implemented methods, "optimization alignment" and "fixed-states optimization," are explored for four data sets. Four gap costs are analyzed for every partition; some of the partitions (the 18S rRNA) are treated as a single fragment or in increasing fragments of 3, 10, and 30. Comparisons within and among methods are undertaken according to gap cost, number of fragments in which the sequences are divided, tree length, character congruence, topological congruence, primary homology statements, and computation time.  相似文献   

16.

Background

Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation.

Methodology/Principal Findings

The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories.

Conclusion

TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.  相似文献   

17.
Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.  相似文献   

18.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

19.
Modelling phylogenetic relationships using reticulated networks   总被引:1,自引:0,他引:1  
Makarenkov, V., Legendre, P. & Desdevises, Y. (2004). Modelling phylogenetic relationships using reticulated networks. —  Zoologica Scripta , 33 , 89–96.
Most traditional methods of phylogenetic analysis assume that species evolution can be represented by means of a bifurcating tree model. In many phylogenetic situations, however, some of the evolutionary links between species are due to reticulate evolution. For instance, reticulate models can adequately describe such complicated mechanisms as lateral gene transfer in bacteria or species hybridization. The theoretical concepts of reticulate evolution developed in the 1980s and 1990s need to be supported by appropriate analytical tools and software. In this paper, we present the main features of a new distance-based method for modelling phylogenetic relationships among species by means of reticulated networks (RNs). The method uses the least-squares model to build a RN by gradually improving upon the solution provided by a phylogenetic tree. A computer program facilitating the reconstruction and visualization of reticulate phylogenies is made available to researchers. In the application section, we illustrate the usefulness of the method by studying the evolution of honeybees (genus Apis ). The method for reconstructing RNs has been included in the T-Rex ( Tree and Reticulogram Reconstruction ) package recently developed by the first-named author.  相似文献   

20.

Background

Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasi-median networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasi-median networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks.

Results

We address the problems inherent in construction and reduction of quasi-median networks. We describe a novel method of generating quasi-median networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph.

Conclusion

Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号