首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A fast method for reconstructing phylogenies from distance data is presented. The method is economical in the number of pairwise comparisons needed. It can be combined with a new phylogenetic alignment procedure to yield an algorithm that gives a complete history of a set of homologous sequences. The method is applicable to very large distance matrices. An auxiliary program was developed that simplifies large phylogenies without ignoring biologically essential features. A set of 213 globins from vertebrates, plants, and Vitreoscilla (a prokaryote) were analyzed using this method.   相似文献   

2.
Efficient determination of evolutionary distances is important for the correct reconstruction of phylogenetic trees. The performance of the pooled distance required for reconstructing a phylogenetic tree can be improved by applying large weights to appropriate distances for reconstructing phylogenetic trees and small weights to inappropriate distances. We developed two weighting methods, the modified Tajima–Takezaki method and the modified least-squares method, for reconstructing phylogenetic trees from multiple loci. By computer simulations, we found that both of the new methods were more efficient in reconstructing correct topologies than the no-weight method. Hence, we reconstructed hominoid phylogenetic trees from mitochondrial DNA using our new methods, and found that the levels of bootstrap support were significantly increased by the modified Tajima–Takezaki and by the modified least-squares method.  相似文献   

3.
MOTIVATION: Inferring networks of proteins from biological data is a central issue of computational biology. Most network inference methods, including Bayesian networks, take unsupervised approaches in which the network is totally unknown in the beginning, and all the edges have to be predicted. A more realistic supervised framework, proposed recently, assumes that a substantial part of the network is known. We propose a new kernel-based method for supervised graph inference based on multiple types of biological datasets such as gene expression, phylogenetic profiles and amino acid sequences. Notably, our method assigns a weight to each type of dataset and thereby selects informative ones. Data selection is useful for reducing data collection costs. For example, when a similar network inference problem must be solved for other organisms, the dataset excluded by our algorithm need not be collected. RESULTS: First, we formulate supervised network inference as a kernel matrix completion problem, where the inference of edges boils down to estimation of missing entries of a kernel matrix. Then, an expectation-maximization algorithm is proposed to simultaneously infer the missing entries of the kernel matrix and the weights of multiple datasets. By introducing the weights, we can integrate multiple datasets selectively and thereby exclude irrelevant and noisy datasets. Our approach is favorably tested in two biological networks: a metabolic network and a protein interaction network. AVAILABILITY: Software is available on request.  相似文献   

4.
Phylogenetic networks are a generalization of phylogenetic trees that allow for the representation of nontreelike evolutionary events, like recombination, hybridization, or lateral gene transfer. While much progress has been made to find practical algorithms for reconstructing a phylogenetic network from a set of sequences, all attempts to endorse a class of phylogenetic networks (strictly extending the class of phylogenetic trees) with a well-founded distance measure have, to the best of our knowledge and with the only exception of the bipartition distance on regular networks, failed so far. In this paper, we present and study a new meaningful class of phylogenetic networks, called tree-child phylogenetic networks, and we provide an injective representation of these networks as multisets of vectors of natural numbers, their path multiplicity vectors. We then use this representation to define a distance on this class that extends the well-known Robinson-Foulds distance for phylogenetic trees and to give an alignment method for pairs of networks in this class. Simple polynomial algorithms for reconstructing a tree-child phylogenetic network from its path multiplicity vectors, for computing the distance between two tree-child phylogenetic networks and for aligning a pair of tree-child phylogenetic networks, are provided. They have been implemented as a Perl package and a Java applet, which can be found at http://bioinfo.uib.es/~recerca/phylonetworks/mudistance/.  相似文献   

5.
In this paper, we evaluate the relative performance of competing approaches for estimating phylogenies from incomplete distance matrices. The direct approach proceeds with phylogenetic reconstruction while ignoring missing cells, whereas the indirect approach proceeds by estimating the missing distances prior to phylogenetic analysis. Two distinct indirect procedures based on the ultrametric inequality and the four-point condition are further compared. Using simulations, we show that more reliable results are obtained when such indirect methods are used. Expectedly, the phylogenies become less accurate as the percentage of missing cells increases, but combining different estimation methods greatly improves the accuracy. An application to bat phylogeny confirms the results obtained in the simulation study and illustrates the effect of missing distances in the construction of supertrees.  相似文献   

6.
Summary The common but generally overlooked problem of how best to construct phylogenies from orthologous amino acid sequences, when their alignment requires the placement therein of gaps denoting insertions/deletions in the evolutionary history of their genes since their common ancestor, has been studied. Three diverse methods were examined: 1. each missing residue in a gap is weighted as equivalent to the average number of minimum nucleotide replacements in known conjugate amino acid pairs of those same two sequences, which weight necessarily differs for each pair of sequences; 2. each missing residue in a gap is weighted as equivalent to a fixed number of nucleotide replacements; and 3. each gap, regardless of length, is weighted as equivalent to a fixed number of nucleotide replacements. For the flavodoxins, each method yielded a different best tree and suggests that the choice of method may be crucial. For the plant ferredoxins, all methods give results inconsistent with botanical classification and suggests the sequences may not all be orthologous. For the bacterial ferredoxins, the method was less germane than the actual weight used, five different best trees being obtained depending upon the weight. The best tree for all ferredoxins (prokaryotic plus eukaryotic) combined proved to be greatly dependent upon the gap locations with several reasonable alignments yielding different best trees. They also suggest that functional equivalence may well prove to be a poor guide to which residues have a common ancestral codon. The rubredoxin sequences show that a partial internal gene duplication occurred in thePseudomonas line, probably very soon after its divergence from the other genera. Together, the results clearly indicate that the phylogenetic answer one gets may greatly depend upon how one treats the gaps but they fail to indicate what treatment may be best. This results partly from the fact that the phylogenies of the taxa represented are not known with sufficient confidence to be sure when the procedures are performing best.  相似文献   

7.
SUMMARY: ProfDist is a user-friendly software package using the profile-neighbor-joining method (PNJ) in inferring phylogenies based on profile distances on DNA or RNA sequences. It is a tool for reconstructing and visualizing large phylogenetic trees providing new and standard features with a special focus on time efficency, robustness and accuracy. AVAILABILITY: A Windows version of ProfDist comes with a graphical user interface and is freely available at http://profdist.bioapps.biozentrum.uni-wuerzburg.de  相似文献   

8.
Species trees have traditionally been inferred from a few selected markers, and genome‐wide investigations remain largely restricted to model organisms or small groups of species for which sampling of fresh material is available, leaving out most of the existing and historical species diversity. The genomes of an increasing number of species, including specimens extracted from natural history collections, are being sequenced at low depth. While these data sets are widely used to analyse organelle genomes, the nuclear fraction is generally ignored. Here we evaluate different reference‐based methods to infer phylogenies of large taxonomic groups from such data sets. Using the example of the Oleeae tribe, a worldwide‐distributed group, we build phylogenies based on single nucleotide polymorphisms (SNPs) obtained using two reference genomes (the olive and ash trees). The inferred phylogenies are overall congruent, yet present differences that might reflect the effect of distance to the reference on the amount of missing data. To limit this issue, genome complexity was reduced by using pairs of orthologous coding sequences as the reference, thus allowing us to combine SNPs obtained using two distinct references. Concatenated and coalescence trees based on these combined SNPs suggest events of incomplete lineage sorting and/or hybridization during the diversification of this large phylogenetic group. Our results show that genome‐wide phylogenetic trees can be inferred from low‐depth sequence data sets for eukaryote groups with complex genomes, and histories of reticulate evolution. This opens new avenues for large‐scale phylogenomics and biogeographical analyses covering both the extant and the historical diversity stored in museum collections.  相似文献   

9.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

10.
MOTIVATION: The Profile Neighbor Joining (PNJ) algorithm as implemented in the software ProfDist is computationally efficient in reconstructing very large trees. Besides the huge amount of sequence data the structure is important in RNA alignment analysis and phylogenetic reconstruction. RESULTS: For this ProfDistS provides a phylogenetic workflow that uses individual RNA secondary structures in reconstructing phylogenies based on sequence-structure alignments-using PNJ with manual or iterative and automatic profile definition. Moreover, ProfDistS can deal also with protein sequences.  相似文献   

11.
The problem of missing data is often considered to be the most important obstacle in reconstructing the phylogeny of fossil taxa and in combining data from diverse characters and taxa for phylogenetic analysis. Empirical and theoretical studies show that including highly incomplete taxa can lead to multiple equally parsimonious trees, poorly resolved consensus trees, and decreased phylogenetic accuracy. However, the mechanisms that cause incomplete taxa to be problematic have remained unclear. It has been widely assumed that incomplete taxa are problematic because of the proportion or amount of missing data that they bear. In this study, I use simulations to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells. This seemingly subtle distinction has a number of important implications. First, the so-called missing data problem for incomplete taxa is, paradoxically, not directly related to their amount or proportion of missing data. Thus, the level of completeness alone should not guide the exclusion of taxa (contrary to common practice), and these results may explain why empirical studies have sometimes found little relationship between the completeness of a taxon and its impact on an analysis. These results also (1) suggest a more effective strategy for dealing with incomplete taxa, (2) call into question a justification of the controversial phylogenetic supertree approach, and (3) show the potential for the accurate phylogenetic placement of highly incomplete taxa, both when combining diverse data sets and when analyzing relationships of fossil taxa.  相似文献   

12.
Elevated substitution rates estimated from ancient DNA sequences   总被引:1,自引:0,他引:1  
Ancient DNA sequences are able to offer valuable insights into molecular evolutionary processes, which are not directly accessible via modern DNA. They are particularly suitable for the estimation of substitution rates because their ages provide calibrating information in phylogenetic analyses, circumventing the difficult task of choosing independent calibration points. The substitution rates obtained from such datasets have typically been high, falling between the rates estimated from pedigrees and species phylogenies. Many of these estimates have been made using a Bayesian phylogenetic method that explicitly accommodates heterochronous data. Stimulated by recent criticism of this method, we present a comprehensive simulation study that validates its performance. For datasets of moderate size, it produces accurate estimates of rates, while appearing robust to assumptions about demographic history. We then analyse a large collection of 749 ancient and 727 modern DNA sequences from 19 species of animals, plants and bacteria. Our new estimates confirm that the substitution rates estimated from ancient DNA sequences are elevated above long-term phylogenetic levels.  相似文献   

13.

Background

Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.

Results

Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.

Conclusion

SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
  相似文献   

14.
Craniometric measurements represent a useful tool for studying the differentiation of mammal populations. However, the fragility of skulls often leads to incomplete data matrices. Damaged specimens or incomplete sets of measurements are usually discarded prior to statistical analysis. We assessed the performance of two strategies that avoid elimination of observations: (1) pairwise deletion of missing cells, and (2) estimation of missing data using available measurements. The effect of these distinct approaches on the computation of inter-individual distances and population differentiation analyses were evaluated using craniometric measurements obtained from insular populations of deer micePeromyscus maniculatus (Wagner, 1845). In our simulations, Euclidean distances were greatly altered by pairwise deletion, whereas Gower’s distance coefficient corrected for missing data provided accurate results. Among the different estimation methods compared in this paper, the regression-based approximations weighted by coefficients of determination (r 2) outperformed the competing approaches. We further show that incomplete sets of craniometric measurements can be used to compute distance matrices, provided that an appropriate coefficient is selected. However, the application of estimation procedures provides a flexible approach that allows researchers to analyse incomplete data sets.  相似文献   

15.
It is at present difficult to accurately position gaps in sequence alignment and to determine substructural homology in structure alignment when reconstructing phylogenies based on highly divergent sequences. Therefore, we have developed a new strategy for inferring phylogenies based on highly divergent sequences. In this new strategy, the whole secondary structure presented as a string in bracket notation is used as phylogenetic characters to infer phylogenetic relationships. It is no longer necessary to decompose the secondary structure into homologous substructural components. In this study, reliable phylogenetic relationships of eight species in Pectinidae were inferred from the structure alignment, but not from sequence alignment, even with the aid of structural information. The results suggest that this new strategy should be useful for inferring phylogenetic relationships based on highly divergent sequences. Moreover, the structural evolution of ITS1 in Pectinidae was also investigated. The whole ITS1 structure could be divided into four structural domains. Compensatory changes were found in all four structural domains. Structural motifs in these domains were identified further. These motifs, especially those in D2 and D3, may have important functions in the maturation of rRNAs.  相似文献   

16.
The neighbor-joining (NJ) method is widely used in reconstructing large phylogenies because of its computational speed and the high accuracy in phylogenetic inference as revealed in computer simulation studies. However, most computer simulation studies have quantified the overall performance of the NJ method in terms of the percentage of branches inferred correctly or the percentage of replications in which the correct tree is recovered. We have examined other aspects of its performance, such as the relative efficiency in correctly reconstructing shallow (close to the external branches of the tree) and deep branches in large phylogenies; the contribution of zero-length branches to topological errors in the inferred trees; and the influence of increasing the tree size (number of sequences), evolutionary rate, and sequence length on the efficiency of the NJ method. Results show that the correct reconstruction of deep branches is no more difficult than that of shallower branches. The presence of zero-length branches in realized trees contributes significantly to the overall error observed in the NJ tree, especially in large phylogenies or slowly evolving genes. Furthermore, the tree size does not influence the efficiency of NJ in reconstructing shallow and deep branches in our simulation study, in which the evolutionary process is assumed to be homogeneous in all lineages. Received: 7 March 2000 / Accepted: 2 August 2000  相似文献   

17.
Meta-analysis is increasingly used in ecology and evolutionary biology. Yet, in these fields this technique has an important limitation: phylogenetic non-independence exists among taxa, violating the statistical assumptions underlying traditional meta-analytic models. Recently, meta-analytical techniques incorporating phylogenetic information have been developed to address this issue. However, no syntheses have evaluated how often including phylogenetic information changes meta-analytic results. To address this gap, we built phylogenies for and re-analysed 30 published meta-analyses, comparing results for traditional vs. phylogenetic approaches and assessing which characteristics of phylogenies best explained changes in meta-analytic results and relative model fit. Accounting for phylogeny significantly changed estimates of the overall pooled effect size in 47% of datasets for fixed-effects analyses and 7% of datasets for random-effects analyses. Accounting for phylogeny also changed whether those effect sizes were significantly different from zero in 23 and 40% of our datasets (for fixed- and random-effects models, respectively). Across datasets, decreases in pooled effect size magnitudes after incorporating phylogenetic information were associated with larger phylogenies and those with stronger phylogenetic signal. We conclude that incorporating phylogenetic information in ecological meta-analyses is important, and we provide practical recommendations for doing so.  相似文献   

18.
19.
We describe a statistical framework for reconstructing the sequence of transmission events between observed cases of an endemic infectious disease using genetic, temporal and spatial information. Previous approaches to reconstructing transmission trees have assumed all infections in the study area originated from a single introduction and that a large fraction of cases were observed. There are as yet no approaches appropriate for endemic situations in which a disease is already well established in a host population and in which there may be multiple origins of infection, or that can enumerate unobserved infections missing from the sample. Our proposed framework addresses these shortcomings, enabling reconstruction of partially observed transmission trees and estimating the number of cases missing from the sample. Analyses of simulated datasets show the method to be accurate in identifying direct transmissions, while introductions and transmissions via one or more unsampled intermediate cases could be identified at high to moderate levels of case detection. When applied to partial genome sequences of rabies virus sampled from an endemic region of South Africa, our method reveals several distinct transmission cycles with little contact between them, and direct transmission over long distances suggesting significant anthropogenic influence in the movement of infected dogs.  相似文献   

20.
phangorn: phylogenetic analysis in R   总被引:4,自引:0,他引:4  
SUMMARY: phangorn is a package for phylogenetic reconstruction and analysis in the R language. Previously it was only possible to estimate phylogenetic trees with distance methods in R. phangorn, now offers the possibility of reconstructing phylogenies with distance based methods, maximum parsimony or maximum likelihood (ML) and performing Hadamard conjugation. Extending the general ML framework, this package provides the possibility of estimating mixture and partition models. Furthermore, phangorn offers several functions for comparing trees, phylogenetic models or splits, simulating character data and performing congruence analyses. AVAILABILITY: phangorn can be obtained through the CRAN homepage http://cran.r-project.org/web/packages/phangorn/index.html. phangorn is licensed under GPL 2.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号