共查询到20条相似文献,搜索用时 15 毫秒
1.
The future of phylogeny reconstruction 总被引:1,自引:0,他引:1
JAMES S. FARRIS 《Zoologica scripta》1997,26(4):303-311
A new approach to phylogenetic analysis, parsimony jackknifing, uses simple parsimony calculations combined with resampling of characters to arrive at a tree comprising well-supported groups. This is usually much the same as the consensus of most-parsimonious trees found from extensive multiple-tree calculations, but the new method is thousands of times faster, allowing analysis of much larger data matrices, and also provides information on the strength of support for different groups. Jackknife frequencies provide a more reliable assessment of support than do alternative methods, notably confidence probability (CP) and T-PTP testing. 相似文献
2.
The methodology of coding polymorphic taxa has received limited attention to date. A search of the taxonomic literature revealed seven types of coding methods. Apart from ignoring polymorphic characters (sometimes called the fixed-only method), two main categories can be distinguished: methods that identify the start of a new character state with the origin of an evolutionary novelty, and methods that identify the new state with the fixation of a novelty. The methods of the first category introduce soft reversals, yielding signals that support cladograms incompatible with true phylogenies. We conclude that coding the plesiomorphy is the method to be preferred, unless the ancestral state is unknown, in which case coding as ambiguous is recommended. This holds for coding polymorphism in species as well as in supraspecific taxa. In this light we remark on methods proposed by previous authors. 相似文献
3.
ABSTRACT: BACKGROUND: In sparse-view CT imaging, strong streak artifacts may appear around bony structures and they often compromise the image readability. Compressed sensing (CS) or total variation (TV) minimization-based image reconstruction method has reduced the streak artifacts to a great extent, but, sparse-view CT imaging still suffers from residual streak artifacts. We introduce a new bone-induced streak artifact reduction method in the CS-based image reconstruction. METHODS: We firstly identify the high-intensity bony regions from the image reconstructed by the filtered backprojection (FBP) method, and we calculate the sinogram stemming from the bony regions only. Then, we subtract the calculated sinogram, which stands for the bony regions, from the measured sinogram before performing the CS-based image reconstruction. The image reconstructed from the subtracted sinogram will stand for the soft tissues with little streak artifacts on it. To restore the original image intensity in the bony regions, we add the bony region image, which has been identified from the FBP image, to the soft tissue image to form a combined image. Then, we perform the CS-based image reconstruction again on the measured sinogram using the combined image as the initial condition of the iteration. For experimental validation of the proposed method, we take images of a contrast phantom and a rat using a micro-CT and we evaluate the reconstructed images based on two figures of merit, relative mean square error and total variation caused by the streak artifacts. RESULTS: The images reconstructed by the proposed method have been found to have smaller streak artifacts than the ones reconstructed by the original CS-based method when visually inspected. The quantitative image evaluation studies have also shown that the proposed method outperforms the conventional CS-based method. CONCLUSIONS: The proposed method can effectively suppress streak artifacts stemming from bony structures in sparse-view CT imaging. 相似文献
4.
In this study the limitations of the RAPD technique for phylogenetic analysis of very closely related and less related species of Drosophila are examined. In addition, assumptions of positional homology of amplified fragments in different species are examined by cross-hybridization of RAPD fragments. It is demonstrated that in Drosophila the use of RAPD markers is very efficient in identification of species. For assessment of phylogenetic relationships, however, the method is limited to sibling species, and reliable measures for genetic distances cannot be obtained. Hybridization experiments demonstrate that fragments of similar length amplified from different species are not always derived from corresponding loci, and that not all RAPD fragments within the same amplification pattern are independent. 相似文献
5.
6.
Although the reconstruction of phylogenetic trees and the computation of multiple sequence alignments are highly interdependent, these two areas of research lead quite separate lives, the former often making use of stochastic modeling, whereas the latter normally does not. Despite the fact that reasonable insertion and deletion models for sequence pairs were already introduced more than 10 years ago, they have only recently been applied to multiple alignment and only in their simplest version. In this paper we present and discuss a strategy based on simulated annealing, which makes use of these models to infer a phylogenetic tree for a set of DNA or protein sequences together with the sequences'indel history, i.e., their multiple alignment augmented with information about the positioning of insertion and deletion events in the tree. Our method is also the first application of the TKF2 model in the context of multiple sequence alignment. We validate the method via simulations and illustrate it using a data set of primate mtDNA. 相似文献
7.
Wu G You JH Lin G 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(1):139-152
In this paper, a new representation is presented for the maximum quartet consistency (MQC) problem, where solving the MQC problem becomes searching for an ultrametric matrix that satisfies a maximum number of given quartet topologies. A number of structural properties of the MQC problem in this new representation are characterized through formulating into answer set programming, a recent powerful logic programming tool for modeling and solving search problems. Using these properties, a number of optimization techniques are proposed to speed up the search process. The experimental results on a number of simulated data sets suggest that the new representation, combined with answer set programming, presents a unique perspective to the MQC problem. 相似文献
8.
GREGORY E. WEBB 《Lethaia: An International Journal of Palaeontology and Stratigraphy》1994,27(3):185-192
Webb, G.E. 1994 1015: Parallelism, non-biotic data and phylogeny reconstruction in paleobiology.
Many systematists equate parallelism and convergence. However, whereas convergence is relatively uncommon and easily recognized using divergent characters, parallelism is common but more difficult to recognize because divergent characters are less abundant. Cladists, in particular, equate homeomorphy with convergence and reject parallelism as a distinct concept. Unfortunately, cladistic parsimony analysis may not resolve most parallelism. Therefore, criteria for the a priori recognition and objective evaluation of parallelism are very significant. Non-biotic data (e.g., stratigraphic and geographic distribution) provide independent criteria for the construction of hypotheses of parallelism in cases where taxa (1) were geographically isolated during homeomorphic character-state transformations, (2) occurred with endemic faunas, and (3) evolved in similar environmental conditions as suggested by paleoecological data. Australian lithostrotionoid corals were long considered congeneric with European taxa. However, because of their geographic isolation, occurrence with endemic rugose corals and occurrence in similar depositional environments as European forms, they are now considered a homeomorphic clade, resulting from an extended sequence of parallel character-state transformations. The high degree of parallelism, combined with abundant symplesiomorphic characters, led to erroneous phylogenetic inferences when non-biotic data were excluded from analysis. Cladistics, homeomorphy, lithostrotionoid corals, parallelism, phylogeny . 相似文献
Many systematists equate parallelism and convergence. However, whereas convergence is relatively uncommon and easily recognized using divergent characters, parallelism is common but more difficult to recognize because divergent characters are less abundant. Cladists, in particular, equate homeomorphy with convergence and reject parallelism as a distinct concept. Unfortunately, cladistic parsimony analysis may not resolve most parallelism. Therefore, criteria for the a priori recognition and objective evaluation of parallelism are very significant. Non-biotic data (e.g., stratigraphic and geographic distribution) provide independent criteria for the construction of hypotheses of parallelism in cases where taxa (1) were geographically isolated during homeomorphic character-state transformations, (2) occurred with endemic faunas, and (3) evolved in similar environmental conditions as suggested by paleoecological data. Australian lithostrotionoid corals were long considered congeneric with European taxa. However, because of their geographic isolation, occurrence with endemic rugose corals and occurrence in similar depositional environments as European forms, they are now considered a homeomorphic clade, resulting from an extended sequence of parallel character-state transformations. The high degree of parallelism, combined with abundant symplesiomorphic characters, led to erroneous phylogenetic inferences when non-biotic data were excluded from analysis. Cladistics, homeomorphy, lithostrotionoid corals, parallelism, phylogeny . 相似文献
9.
10.
Phylogenetic inference is well known to be problematic if both long and short branches occur together in the underlying tree. With biological data, correcting for this problem may require simultaneous consideration for both substitution biases and rate heterogeneity between lineages and across sequence positions. A particular form of the latter is the presence of invariable sites, which are well known to mislead estimation of genetic divergences. Here we describe a capture-recapture method to estimate the proportion of invariable sites in an alignment of amino acids or nucleotides. We use it to investigate phylogenetic signals in 18S ribosomal DNA sequences from Holometabolus insects. Our results suggest that, as taxa diverged, their 18S rDNA sequences have altered in both their distribution of sites that can vary as well as in their base compositions. 相似文献
11.
Background
Maximum parsimony phylogenetic tree reconstruction from genetic variation data is a fundamental problem in computational genetics with many practical applications in population genetics, whole genome analysis, and the search for genetic predictors of disease. Efficient methods are available for reconstruction of maximum parsimony trees from haplotype data, but such data are difficult to determine directly for autosomal DNA. Data more commonly is available in the form of genotypes, which consist of conflated combinations of pairs of haplotypes from homologous chromosomes. Currently, there are no general algorithms for the direct reconstruction of maximum parsimony phylogenies from genotype data. Hence phylogenetic applications for autosomal data must therefore rely on other methods for first computationally inferring haplotypes from genotypes. 相似文献12.
Keane TM Naughton TJ Travers SA McInerney JO McCormack GP 《Bioinformatics (Oxford, England)》2005,21(7):969-974
MOTIVATION: In recent years there has been increased interest in producing large and accurate phylogenetic trees using statistical approaches. However for a large number of taxa, it is not feasible to construct large and accurate trees using only a single processor. A number of specialized parallel programs have been produced in an attempt to address the huge computational requirements of maximum likelihood. We express a number of concerns about the current set of parallel phylogenetic programs which are currently severely limiting the widespread availability and use of parallel computing in maximum likelihood-based phylogenetic analysis. RESULTS: We have identified the suitability of phylogenetic analysis to large-scale heterogeneous distributed computing. We have completed a distributed and fully cross-platform phylogenetic tree building program called distributed phylogeny reconstruction by maximum likelihood. It uses an already proven maximum likelihood-based tree building algorithm and a popular phylogenetic analysis library for all its likelihood calculations. It offers one of the most extensive sets of DNA substitution models currently available. We are the first, to our knowledge, to report the completion of a distributed phylogenetic tree building program that can achieve near-linear speedup while only using the idle clock cycles of machines. For those in an academic or corporate environment with hundreds of idle desktop machines, we have shown how distributed computing can deliver a 'free' ML supercomputer. 相似文献
13.
Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid-based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid-based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models. 相似文献
14.
Each person's genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person's genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person's genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomial-time algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2-SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver (http://www.cs.columbia.edu/compbio/hap/) is publicly available for predicting haplotypes from genotype data and partitioning genotype data into blocks. 相似文献
15.
Springer MS DeBry RW Douady C Amrine HM Madsen O de Jong WW Stanhope MJ 《Molecular biology and evolution》2001,18(2):132-143
Both mitochondrial and nuclear gene sequences have been employed in efforts to reconstruct deep-level phylogenetic relationships. A fundamental question in molecular systematics concerns the efficacy of different types of sequences in recovering clades at different taxonomic levels. We compared the performance of four mitochondrial data sets (cytochrome b, cytochrome oxidase II, NADH dehydrogenase subunit I, 12S rRNA-tRNA-16S rRNA) and eight nuclear data sets (exonic regions of alpha-2B adrenergic receptor, aquaporin, ss-casein, gamma-fibrinogen, interphotoreceptor retinoid binding protein, kappa-casein, protamine, von Willebrand Factor) in recovering deep-level mammalian clades. We employed parsimony and minimum-evolution with a variety of distance corrections for superimposed substitutions. In 32 different pairwise comparisons between these mitochondrial and nuclear data sets, we used the maximum set of overlapping taxa. In each case, the variable-length bootstrap was used to resample at the size of the smaller data set. The nuclear exons consistently performed better than mitochondrial protein and rRNA-tRNA coding genes on a per-residue basis in recovering benchmark clades. We also concatenated nuclear genes for overlapping taxa and made comparisons with concatenated mitochondrial protein-coding genes from complete mitochondrial genomes. The variable-length bootstrap was used to score the recovery of benchmark clades as a function of the number of resampled base pairs. In every case, the nuclear concatenations were more efficient than the mitochondrial concatenations in recovering benchmark clades. Among genes included in our study, the nuclear genes were much less affected by superimposed substitutions. Nuclear genes having appropriate rates of substitution should receive strong consideration in efforts to reconstruct deep-level phylogenetic relationships. 相似文献
16.
Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. 总被引:15,自引:0,他引:15
The Minimum Evolution (ME) approach to phylogeny estimation has been shown to be statistically consistent when it is used in conjunction with ordinary least-squares (OLS) fitting of a metric to a tree structure. The traditional approach to using ME has been to start with the Neighbor Joining (NJ) topology for a given matrix and then do a topological search from that starting point. The first stage requires O(n(3)) time, where n is the number of taxa, while the current implementations of the second are in O(p n(3)) or more, where p is the number of swaps performed by the program. In this paper, we examine a greedy approach to minimum evolution which produces a starting topology in O(n(2)) time. Moreover, we provide an algorithm that searches for the best topology using nearest neighbor interchanges (NNIs), where the cost of doing p NNIs is O(n(2) + p n), i.e., O(n(2)) in practice because p is always much smaller than n. The Greedy Minimum Evolution (GME) algorithm, when used in combination with NNIs, produces trees which are fairly close to NJ trees in terms of topological accuracy. We also examine ME under a balanced weighting scheme, where sibling subtrees have equal weight, as opposed to the standard "unweighted" OLS, where all taxa have the same weight so that the weight of a subtree is equal to the number of its taxa. The balanced minimum evolution scheme (BME) runs slower than the OLS version, requiring O(n(2) x diam(T)) operations to build the starting tree and O(p n x diam(T)) to perform the NNIs, where diam(T) is the topological diameter of the output tree. In the usual Yule-Harding distribution on phylogenetic trees, the diameter expectation is in log(n), so our algorithms are in practice faster that NJ. Moreover, this BME scheme yields a very significant improvement over NJ and other distance-based algorithms, especially with large trees, in terms of topological accuracy. 相似文献
17.
Anne Kupczok Heiko A Schmidt Arndt von Haeseler 《Algorithms for molecular biology : AMB》2010,5(1):37
Background
The availability of many gene alignments with overlapping taxon sets raises the question of which strategy is the best to infer species phylogenies from multiple gene information. Methods and programs abound that use the gene alignment in different ways to reconstruct the species tree. In particular, different methods combine the original data at different points along the way from the underlying sequences to the final tree. Accordingly, they are classified into superalignment, supertree and medium-level approaches. Here, we present a simulation study to compare different methods from each of these three approaches. 相似文献18.
Background
The rapid accumulation of whole-genome data has renewed interest in the study of using gene-order data for phylogenetic analyses and ancestral reconstruction. Current software and web servers typically do not support duplication and loss events along with rearrangements.Results
MLGO (Maximum Likelihood for Gene-Order Analysis) is a web tool for the reconstruction of phylogeny and/or ancestral genomes from gene-order data. MLGO is based on likelihood computation and shows advantages over existing methods in terms of accuracy, scalability and flexibility.Conclusions
To the best of our knowledge, it is the first web tool for analysis of large-scale genomic changes including not only rearrangements but also gene insertions, deletions and duplications. The web tool is available from http://www.geneorder.org/server.php. 相似文献19.
20.
Comparison of several protein phylogeny reconstruction methods was realized on a set of natural protein sequences. The programs of the PHYLIP package and FastME, PhyML and TreeTop programs were tested. In contrast to several studied programs that used simulated sequences, our results demonstrate the superiority of distance methods over the maximum likelihood method. 相似文献