共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
Accuracy of phylogenetic trees estimated from DNA sequence data 总被引:4,自引:1,他引:3
The relative merits of four different tree-making methods in obtaining the
correct topology were studied by using computer simulation. The methods
studied were the unweighted pair-group method with arithmetic mean (UPGMA),
Fitch and Margoliash's (FM) method, thd distance Wagner (DW) method, and
Tateno et al.'s modified Farris (MF) method. An ancestral DNA sequence was
assumed to evolve into eight sequences following a given model tree. Both
constant and varying rates of nucleotide substitution were considered. Once
the DNA sequences for the eight extant species were obtained, phylogenetic
trees were constructed by using corrected (d) and uncorrected (p)
nucleotide substitutions per site. The topologies of the trees obtained
were then compared with that of the model tree. The results obtained can be
summarized as follows: (1) The probability of obtaining the correct rooted
or unrooted tree is low unless a large number of nucleotide differences
exists between different sequences. (2) When the number of nucleotide
substitutions per sequence is small or moderately large, the FM, DW, and MF
methods show a better performance than UPGMA in recovering the correct
topology. The former group of methods is particularly good for obtaining
the correct unrooted tree. (3) When the number of substitutions per
sequence is large, UPGMA is at least as good as the other methods,
particularly for obtaining the correct rooted tree. (4) When the rate of
nucleotide substitution varies with evolutionary lineage, the FM, DW, and
MF methods show a better performance in obtaining the correct topology than
UPGMA, except when a rooted tree is to be produced from data with a large
number of nucleotide substitutions per sequence.(ABSTRACT TRUNCATED AT 250
WORDS)
相似文献
3.
In a previous paper (Klotz et a1., 1979) we described a method for determining evolutionary trees from sequence data when rates of evolution of the sequences might differ greatly. It was shown theoretically that the method always gave the correct topology and root when the exact number of mutation differences between sequences and from their common ancestor was known. However, the method is impractical to use in most situations because it requires some knowledge of the ancestor. In this present paper we describe another method, related to the previous one, in which a present-day sequence can serve temporarily as an ancestor for purposes of determining the evolutionary tree regardless of the rates of evolution of the sequences involved. This new method can be carried out with high precision without the aid of a computer, and it does not increase in difficulty rapidly as the number of sequences involved in the study increases, unlike other methods. 相似文献
4.
Estimating species trees using multiple-allele DNA sequence data 总被引:3,自引:0,他引:3
Liu L Pearl DK Brumfield RT Edwards SV 《Evolution; international journal of organic evolution》2008,62(8):2080-2091
Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multiple-allele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets--yeast (Saccharomyces) and birds (Manacus-manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics. 相似文献
5.
MOTIVATION: B cells responding to antigenic stimulation can fine-tune their binding properties through a process of affinity maturation composed of somatic hypermutation, affinity-selection and clonal expansion. The mutation rate of the B cell receptor DNA sequence, and the effect of these mutations on affinity and specificity, are of critical importance for understanding immune and autoimmune processes. Unbiased estimates of these properties are currently lacking due to the short time-scales involved and the small numbers of sequences available. RESULTS: We have developed a bioinformatic method based on a maximum likelihood analysis of phylogenetic lineage trees to estimate the parameters of a B cell clonal expansion model, which includes somatic hypermutation with the possibility of lethal mutations. Lineage trees are created from clonally related B cell receptor DNA sequences. Important links between tree shapes and underlying model parameters are identified using mutual information. Parameters are estimated using a likelihood function based on the joint distribution of several tree shapes, without requiring a priori knowledge of the number of generations in the clone (which is not available for rapidly dividing populations in vivo). A systematic validation on synthetic trees produced by a mutating birth-death process simulation shows that our estimates are precise and robust to several underlying assumptions. These methods are applied to experimental data from autoimmune mice to demonstrate the existence of hypermutating B cells in an unexpected location in the spleen. 相似文献
6.
Summary Three new methods for constructing evolutionary trees from molecular sequence data are presented. These methods are based on a theory for correcting for non-constant evolutionary rates (Klotz et al. 1979; Klotz and Blanken 1981). Extensive computer simulations were run to compare these new methods to the commonly used criteria of Dayhoff (1978) and Fitch and Margoliash (1967). The results of these simulations showed that two of the new methods performed as well as Dayhoff's criterion, significantly better than that of Fitch and Margoliash, and as well as a simple variation of the latter (Prager and Wilson 1978) where any topology containing negative branch mutations is discarded. However, no method yielded the correct topology all of the time, which demonstrated the need to determine confidence estimates in a particular result when evolutionary trees are determined from sequence data. 相似文献
7.
Accuracy of estimated phylogenetic trees from molecular data 总被引:27,自引:0,他引:27
The accuracies and efficiencies of three different methods of making phylogenetic trees from gene frequency data were examined by using computer simulation. The methods examined are UPGMA, Farris' (1972) method, and Tateno et al.'s (1982) modified Farris method. In the computer simulation eight species (or populations) were assumed to evolve according to a given model tree, and the evolutionary changes of allele frequencies were followed by using the infinite-allele model. At the end of the simulated evolution five genetic distance measures (Nei's standard and minimum distances, Rogers' distance, Cavalli-Sforza's f theta, and the modified Cavalli-Sforza distance) were computed for all pairs of species, and the distance matrix obtained for each distance measure was used for reconstructing a phylogenetic tree. The phylogenetic tree obtained was then compared with the model tree. The results obtained indicate that in all tree-making methods examined the accuracies of both the topology and branch lengths of a reconstructed tree (rooted tree) are very low when the number of loci used is less than 20 but gradually increase with increasing number of loci. When the expected number of gene substitutions (M) for the shortest branch is 0.1 or more per locus and 30 or more loci are used, the topological error as measured by the distortion index (dT) is not great, but the probability of obtaining the correct topology (P) is less than 0.5 even with 60 loci. When M is as small as 0.004, P is substantially lower. In obtaining a good topology (small dT and high P) UPGMA and the modified Farris method generally show a better performance than the Farris method. The poor performance of the Farris method is observed even when Rogers' distance which obeys the triangle inequality is used. The main reason for this seems to be that the Farris method often gives overestimates of branch lengths. For estimating the expected branch lengths of the true tree UPGMA shows the best performance. For this purpose Nei's standard distance gives a better result than the others because of its linear relationship with the number of gene substitutions. Rogers' or Cavalli-Sforza's distance gives a phylogenetic tree in which the parts near the root are condensed and the other parts are elongated. It is recommended that more than 30 loci, including both polymorphic and monomorphic loci, be used for making phylogenetic trees. The conclusions from this study seem to apply also to data on nucleotide differences obtained by the restriction enzyme techniques. 相似文献
8.
Accuracy of estimated phylogenetic trees from molecular data 总被引:2,自引:0,他引:2
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained. 相似文献
9.
10.
T J Beanland C J Howe 《Comparative biochemistry and physiology. B, Comparative biochemistry》1992,102(4):643-659
1. Procedures for multiple alignment of sequence data, subsequent phylogenetic inference, and testing of the trees derived are presented. 2. The assumptions underlying different approaches and the extent to which they are valid are discussed. 相似文献
11.
Kinematic data from 3D gait analysis together with musculoskeletal modeling techniques allow the derivation of muscle-tendon lengths during walking. However, kinematic data are subject to soft tissue artifacts (STA), referring to skin marker displacements during movement. STA are known to significantly affect the computation of joint kinematics, and would therefore also have an effect on muscle-tendon lengths which are derived from the segmental positions. The present study aimed to introduce an analytical approach to calculate the error propagation from STA to modeled muscle-tendon lengths. Skin marker coordinates were assigned uncorrelated, isotropic error functions with given standard deviations accounting for STA. Two different musculoskeletal models were specified; one with the joints moving freely in all directions, and one with the joints constrained to rotation but no translation. Using reference kinematic data from two healthy boys (mean age 9 y 5 m), the propagation of STA to muscle-tendon lengths was quantified for semimembranosus, gastrocnemius and soleus. The resulting average SD ranged from 6% to 50% of the normalized muscle-tendon lengths during gait depending on the muscle, the STA magnitudes and the musculoskeletal model. These results highlight the potential impact STA has on the biomechanical analysis of modeled muscle-tendon lengths during walking, and suggest the need for caution in the clinical interpretation of muscle-tendon lengths derived from joint kinematics. 相似文献
12.
13.
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species. 相似文献
14.
Since metabolites cannot be predicted from the genome sequence, high-throughput de novo identification of small molecules is highly sought. Mass spectrometry (MS) in combination with a fragmentation technique is commonly used for this task. Unfortunately, automated analysis of such data is in its infancy. Recently, fragmentation trees have been proposed as an analysis tool for such data. Additional fragmentation steps (MS(n)) reveal more information about the molecule. We propose to use MS(n) data for the computation of fragmentation trees, and present the Colorful Subtree Closure problem to formalize this task: There, we search for a colorful subtree inside a vertex-colored graph, such that the weight of the transitive closure of the subtree is maximal. We give several negative results regarding the tractability and approximability of this and related problems. We then present an exact dynamic programming algorithm, which is parameterized by the number of colors in the graph and is swift in practice. Evaluation of our method on a dataset of 45 reference compounds showed that the quality of constructed fragmentation trees is improved by using MS(n) instead of MS2 measurements. 相似文献
15.
Genome-scale sequence data have become increasingly available in the phylogenetic studies for understanding the evolutionary histories of species. However, it is challenging to develop probabilistic models to account for heterogeneity of phylogenomic data. The multispecies coalescent model describes gene trees as independent random variables generated from a coalescence process occurring along the lineages of the species tree. Since the multispecies coalescent model allows gene trees to vary across genes, coalescent-based methods have been popularly used to account for heterogeneous gene trees in phylogenomic data analysis. In this paper, we summarize and evaluate the performance of coalescent-based methods for estimating species trees from genome-scale sequence data. We investigate the effects of deep coalescence and mutation on the performance of species tree estimation methods. We found that the coalescent-based methods perform well in estimating species trees for a large number of genes, regardless of the degree of deep coalescence and mutation. The performance of the coalescent methods is negatively correlated with the lengths of internal branches of the species tree. 相似文献
16.
17.
With the large amount of genomics and proteomics data that we are confronted with, computational support for the elucidation of protein function becomes more and more pressing. Many different kinds of biological data harbour signals of protein function, but these signals are often concealed. Computational methods that use protein sequence and structure data can be used for discovering these signals. They provide information that can substantially speed up experimental function elucidation. In this review we concentrate on such methods. 相似文献
18.
Diversification is nested, and early models suggested this could lead to a great deal of evolutionary redundancy in the Tree of Life. This result is based on a particular set of branch lengths produced by the common coalescent, where pendant branches leading to tips can be very short compared with branches deeper in the tree. Here, we analyze alternative and more realistic Yule and birth-death models. We show how censoring at the present both makes average branches one half what we might expect and makes pendant and interior branches roughly equal in length. Although dependent on whether we condition on the size of the tree, its age, or both, these results hold both for the Yule model and for birth-death models with moderate extinction. Importantly, the rough equivalency in interior and exterior branch lengths means that the loss of evolutionary history with loss of species can be roughly linear. Under these models, the Tree of Life may offer limited redundancy in the face of ongoing species loss. 相似文献
19.
Jotun J. Hein 《Bulletin of mathematical biology》1989,51(5):597-603
In this article the question of reconstructing a phylogeny from additive distance data is addressed. Previous algorithms used
the complete distance matrix of then OTUs (Operational Taxonomic Unit), that corresponds to the tips of the tree. This usedO(n
2) computing time. It is shown that this is wasteful for biologically reasonable trees. If the tree has internal nodes with
degrees that are bounded onO(n*log(n)) algorithm is possible. It is also shown if the nodes can have unbounded degrees the problem hasn
2 as lower bound. 相似文献
20.
Distributions of duplicated sequences from genome self-alignment are characterized, including forward and backward alignments in bacteria and eukaryotes. A Markovian process without auto-correlation should generate an exponential distribution expected from local effects of point mutation and selection on localised function; however, the observed distributions show substantial deviation from exponential form--they are roughly algebraic instead--suggesting a novel kind of long-distance correlation that must be non-local in origin. 相似文献