首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Evolutionary processes have been described not only in biology but also for a wide range of human cultural activities including languages and law. In contrast to the evolution of DNA or protein sequences, the detailed mechanisms giving rise to the observed evolution-like processes are not or only partially known. The absence of a mechanistic model of evolution implies that it remains unknown how the distances between different taxa have to be quantified. Considering distortions of metric distances, we first show that poor choices of the distance measure can lead to incorrect phylogenetic trees. Based on the well-known fact that phylogenetic inference requires additive metrics, we then show that the correct phylogeny can be computed from a distance matrix \({\mathbf {D}}\) if there is a monotonic, subadditive function \(\zeta\) such that \(\zeta ^{-1}({\mathbf {D}})\) is additive. The required metric-preserving transformation \(\zeta\) can be computed as the solution of an optimization problem. This result shows that the problem of phylogeny reconstruction is well defined even if a detailed mechanistic model of the evolutionary process remains elusive.  相似文献   

2.
Phylogenetic inference under the pure drift model   总被引:1,自引:1,他引:0  
When pairwise genetic distances are used for phylogenetic reconstruction, it is usually assumed that the genetic distance between two taxa contains information about the time after the two taxa diverged. As a result, upon an appropriate transformation if necessary, the distance usually can be fitted to a linear model such that it is expressed as the sum of lengths of all branches that connect the two taxa in a given phylogeny. This kind of distance is referred to as "additive distance." For a phylogenetic tree exclusively driven by random genetic drift, genetic distances related to coancestry coefficients (theta XY) between any two taxa are more suitable. However, these distances are fundamentally different from the additive distance in that coancestry does not contain any information about the time after two taxa split from a common ancestral population; instead, it reflects the time before the two taxa diverged. In other words, the magnitude of theta XY provides information about how long the two taxa share the same evolutionary pathways. The fundamental difference between the two kinds of distances has led to a different algorithm of evaluating phylogenetic trees when theta XY and related distance measures are used. Here we present the new algorithm using the ordinary- least-squares approach but fitting to a different linear model. This treatment allows genetic variation within a taxon to be included in the model. Monte Carlo simulation for a rooted phylogeny of four taxa has verified the efficacy and consistency of the new method. Application of the method to human population was demonstrated.   相似文献   

3.
Varied approaches to estimating confidence intervals for immunological and hybridization distances can be uniformly applied to any matrix of distances. One procedure bootstraps the pairwise dissimilarities between the distances of every pair of taxa to all others, creating a derived matrix of distances for which dispersions can be estimated. Another approach bootstraps the sample of differences between pairwise homologous branch lengths concerning each pair of taxa and between asymmetric halves of the matrix, to find a standard error of the dispersions. This allows comparison of the robustness of trees among different sources of data. DNA hybridization, transferrin immunology and protein immunodiffusion matrices all yield much the same result once standard deviations of dissimilarities are acknowledged: namely, unresolvable trichotomies among the human-chimp-gorilla clade and among this clade with orang and gibbon; conventional relationships among hominoids, cercopithecoids, ceboids and strepsirhines; and a polychotomy among anthropoids, strepsirhines, tarsiers, tupaiids and dermopterans.  相似文献   

4.
El'chinova GI 《Genetika》2000,36(6):856-858
A new metric based on Malecot's parameters of isolation by distance is proposed for estimation of genetic similarity between populations. This metric is in good agreement with the angular metric used for estimation of interpopulation genetic distances. It is suggested to term the new metric Malecot's metric.  相似文献   

5.
Biplots for multifactorial analysis of distance   总被引:1,自引:0,他引:1  
Krzanowski WJ 《Biometrics》2004,60(2):517-524
Many data sets in practice fit a multivariate analysis of variance (MANOVA) structure, but do not accord with MANOVA assumptions for their analysis. One way forward is to calculate the matrix of dissimilarities or distances between every pair of individuals, and then to conduct an analysis of distance on the resulting data. Various metric scaling plots can be used to interpret the results of the analysis. However, developments to date of this approach have focused mainly on the individuals in the sample, and little attention has been paid to the assessment of influence of the original variables on the results. The present article attempts to rectify this omission. We discuss the inclusion of biplots on all forms of metric scaling representations in the analysis of distance. Exact biplots will often be nonlinear so we propose a simple linear approximation, and contrast it with other simple linear possibilities. An example from ecology illustrates the methodology.  相似文献   

6.

Background  

Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number of true substitutions given an observed alignment. So far, the most accurate protein distance estimators have looked for the optimal matrix in a series of transition probability matrices, e.g. the Dayhoff series. The evolutionary distance between two aligned sequences is here estimated as the evolutionary distance of the optimal matrix. The optimal matrix can be found either by an iterative search for the Maximum Likelihood matrix, or by integration to find the Expected Distance. As a consequence, these methods are more complex to implement and computationally heavier than correction-based methods. Another problem is that the result may vary substantially depending on the evolutionary model used for the matrices. An ideal distance estimator should produce consistent and accurate distances independent of the evolutionary model used.  相似文献   

7.
The genetic relationships among 30 populations of 11 species and five genera of North American Unionidae were assessed by using standard allozyme procedures. Emphasis was on relationships among populations and species of Elliptio and Fusconaia. Multi-dimensional scaling based on a matrix of Nei's (1972) genetic distances substantiated the immunoelectrophoretic results of Davis&Fuller 1981, which demonstrated the distinct and divergent taxonomic groups Anodontinae, Margaritiferinae, and Ambleminae, plus die close relationship of Elliptio and Fusconaia , which justifies their inclusion within the same tribe. Genetic distance appears to increase regularly with time since divergence of taxa. The E. complanata species group is an apparently recent radiation and probably is actively radiating today. The I values among species of this group range from 0.90 to 0.99. Considerable heterozygosity, numerous polymorphic loci, and much interpopuladon phenotypic diversity was also recorded for this group. Some taxa mat have been considered synonyms are demonstrated to be valid species. Reasons for the low genetic distances among unionid taxa are discussed. Standard allozyme analyses are shown to be of great value for assessing relationships among unionid taxa.  相似文献   

8.
Dendritic morphology is the structural correlate for receiving and processing inputs to a neuron. An interesting question then is what the design principles and the functional consequences of enlarged or shrinked dendritic trees might be. As yet, only a few studies have examined the effects of neuron size changes. Two theoretical scaling modes have been analyzed, conservative (isoelectrotonic) scaling (preserves the passive and active response properties) and isometric scaling (steps up low pass-filtering of inputs). It has been suggested that both scaling modes were verified in neuroanatomical studies. To overcome obvious limitations of these studies like small size of analyzed samples and restricted validity of utilized scaling measures, we considered the scaling problem of neurons on the basis of large sample data and by employing a more general method of scaling analysis. This method consists in computing the morphoelectrotonic transform (MET) of neurons. The MET maps the neuron from anatomical space into electrotonic space using the logarithm of voltage attenuation as the distance metric. The theory underlying this approach is described and then applied to two samples of morphologically reconstructed pyramidal neurons (cells from neocortex of wildtype and synRas transgenic mice) using the NEURON simulator. In a previous study, we could verify a striking increase of dendritic tree size in synRas pyramidal neurons. Surprisingly, in this study the statistical analysis of the sample MET dendrograms revealed that the electrotonic architecture of these neurons scaled roughly in a MET-conserving mode. In conclusion, our results suggest only a minor impact of the Ras protein on dendritic electroanatomy, with non-significant changes of most regions of the corresponding METs.  相似文献   

9.
Phylogenomic studies aim to build phylogenies from large sets of homologous genes. Such "genome-sized" data require fast methods, because of the typically large numbers of taxa examined. In this framework, distance-based methods are useful for exploratory studies and building a starting tree to be refined by a more powerful maximum likelihood (ML) approach. However, estimating evolutionary distances directly from concatenated genes gives poor topological signal as genes evolve at different rates. We propose a novel method, named super distance matrix (SDM), which follows the same line as average consensus supertree (ACS; Lapointe and Cucumel, 1997) and combines the evolutionary distances obtained from each gene into a single distance supermatrix to be analyzed using a standard distance-based algorithm. SDM deforms the source matrices, without modifying their topological message, to bring them as close as possible to each other; these deformed matrices are then averaged to obtain the distance supermatrix. We show that this problem is equivalent to the minimization of a least-squares criterion subject to linear constraints. This problem has a unique solution which is obtained by resolving a linear system. As this system is sparse, its practical resolution requires O(naka) time, where n is the number of taxa, k the number of matrices, and a < 2, which allows the distance supermatrix to be quickly obtained. Several uses of SDM are proposed, from fast exploratory studies to more accurate approaches requiring heavier computing time. Using simulations, we show that SDM is a relevant alternative to the standard matrix representation with parsimony (MRP) method, notably when the taxa sets of the different genes have low overlap. We also show that SDM can be used to build an excellent starting tree for an ML approach, which both reduces the computing time and increases the topogical accuracy. We use SDM to analyze the data set of Gatesy et al. (2002, Syst. Biol. 51: 652-664) that involves 48 genes of 75 placental mammals. The results indicate that these genes have strong rate heterogeneity and confirm the simulation conclusions.  相似文献   

10.
Given a collection of discrete characters (e.g., aligned DNA sites, gene adjacencies), a common measure of distance between taxa is the proportion of characters for which taxa have different character states. Tree reconstruction based on these (uncorrected) distances can be statistically inconsistent and can lead to trees different from those obtained using character-based methods such as maximum likelihood or maximum parsimony. However, in these cases the distance data often reveal their unreliability by some deviation from additivity, as indicated by conflicting support for more than one tree. We describe two results that show how uncorrected (and miscorrected) distance data can be simultaneously perfectly additive and misleading. First, multistate character data can be perfectly compatible and define one tree, and yet the uncorrected distances derived from these characters are perfectly treelike (and obey a molecular clock), only for a completely different tree. Second, under a Markov model of character evolution a similar phenomenon can occur; not only is there statistical inconsistency using uncorrected distances, but there is no evidence of this inconsistency because the distances look perfectly treelike (this does not occur in the classic two-parameter Felsenstein zone). We characterize precisely when uncorrected distances are additive on the true (and on a false) tree for four taxa. We also extend this result to a more general setting that applies to distances corrected according to an incorrect model.  相似文献   

11.
Phylogenetic methods that use matrices of pairwise distances between sequences (e.g., neighbor joining) will only give accurate results when the initial estimates of the pairwise distances are accurate. For many different models of sequence evolution, analytical formulae are known that give estimates of the distance between two sequences as a function of the observed numbers of substitutions of various classes. These are often of a form that we call "log transform formulae". Errors in these distance estimates become larger as the time t since divergence of the two sequences increases. For long times, the log transform formulae can sometimes give divergent distance estimates when applied to finite sequences. We show that these errors become significant when t approximately 1/2 |lambda(max)|(-1) logN, where lambda(max) is the eigenvalue of the substitution rate matrix with the largest absolute value and N is the sequence length. Various likelihood-based methods have been proposed to estimate the values of parameters in rate matrices. If rate matrix parameters are known with reasonable accuracy, it is possible to use the maximum likelihood method to estimate evolutionary distances while keeping the rate parameters fixed. We show that errors in distances estimated in this way only become significant when t approximately 1/2 |lambda(1)|(-1) logN, where lambda(1) is the eigenvalue of the substitution rate matrix with the smallest nonzero absolute value. The accuracy of likelihood-based distance estimates is therefore much higher than those based on log transform formulae, particularly in cases where there is a large range of timescales involved in the rate matrix (e.g., when the ratio of transition to transversion rates is large). We discuss several practical ways of estimating the rate matrix parameters before distance calculation and hence of increasing the accuracy of distance estimates.  相似文献   

12.
Distance based algorithms are a common technique in the construction of phylogenetic trees from taxonomic sequence data. The first step in the implementation of these algorithms is the calculation of a pairwise distance matrix to give a measure of the evolutionary change between any pair of the extant taxa. A standard technique is to use the log det formula to construct pairwise distances from aligned sequence data. We review a distance measure valid for the most general models, and show how the log det formula can be used as an estimator thereof. We then show that the foundation upon which the log det formula is constructed can be generalized to produce a previously unknown estimator which improves the consistency of the distance matrices constructed from the log det formula. This distance estimator provides a consistent technique for constructing quartets from phylogenetic sequence data under the assumption of the most general Markov model of sequence evolution.  相似文献   

13.
The study compares distance relationships in Eskimoid populations based on metric and attribute data with linguistic relationships based on structural and lexicostatistical data. Taxonomic congruence and the non-specificity hypothesis are investigated by matrix correlations and by a clustering procedure. The matrix correlation approaches employed are the Pearson product-moment correlation coefficient and the Spearman rank-order correlation coefficient. An unweighted pair-group clustering procedure provides a visual comparison of biological and linguistic relationships. Data consist of 74 craniometric measurements and 28 cranial observations taken on 12 Eskimoid populations. Mahalanobis' D2 and Balakrishnan and Sanghvi's B2 were used to compute the metric and attribute distances, respectively. The results indicate that a strict adherence to the non-specificity hypothesis is untenable. Also, there is better concordance between the sexes for metric distances than for attribute distances, and the metric data are more concordant with linguistic relationships than are the attribute data.  相似文献   

14.

Background  

A phylogenetic network is a generalization of phylogenetic trees that allows the representation of conflicting signals or alternative evolutionary histories in a single diagram. There are several methods for constructing these networks. Some of these methods are based on distances among taxa. In practice, the methods which are based on distance perform faster in comparison with other methods. The Neighbor-Net (N-Net) is a distance-based method. The N-Net produces a circular ordering from a distance matrix, then constructs a collection of weighted splits using circular ordering. The SplitsTree which is a program using these weighted splits makes a phylogenetic network. In general, finding an optimal circular ordering is an NP-hard problem. The N-Net is a heuristic algorithm to find the optimal circular ordering which is based on neighbor-joining algorithm.  相似文献   

15.
Summary In this paper, we present a reassessment of the sampling properties of the metric matrix distance geometry algorithm, which is in wide-spread use in the determination of three-dimensional structures from nuclear magnetic resonance (NMR) data. To this end, we compare the conformational space sampled by structures generated with a variety of metric matrix distance geometry protocols. As test systems we use an unconstrained polypeptide, and a small protein (rabbit neutrophil defensin peptide 5) for which only few tertiary distances had been derived from the NMR data, allowing several possible folds of the polypeptide chain. A process called metrization in the preparation of a trial distance matrix has a very large effect on the sampling properties of the algorithm. It is shown that, depending on the metrization protocol used, metric matrix distance geometry can have very good sampling properties'indeed, both for the unconstrained model system and the NMR-structure case. We show that the sampling properties are to a great degree determined by the way in which the first few distances are chosen within their bounds. Further, we present a new protocol (partial metrization) that is computationally more efficient but has the same excellent sampling properties. This novel protocol has been implemented in an expanded new release of the program X-PLOR with distance geometry capabilities.  相似文献   

16.
We examined the efficiencies of ordination methods in the treatment of gene frequency data at intraspecific level, using metric and nonmetric distance measures (Nei's and Rogers' genetic distances, chi 2 distance). We assessed initial processes responsible for the geographical distribution of the Mediterranean land snail Helix aspersa. Seventeen enzyme loci from 30 North African snail populations were considered in the present analysis. Five combinations of distance/multivariate analysis were compared: correspondence analysis (CA), nonmetric multidimensional scaling (NMDS) on Nei's, Rogers', and chi 2 distances, and principal coordinates analysis on Rogers' distances. Configuration of the objects resulting from ordination was projected onto three-dimensional graphics with the minimum spanning tree or the relative neighborhood graph superimposed. Pre- and postordination or clustering distance matrices were compared by means of correlation methods. As expected, all combinations led to a clear west versus east pattern of variation. However, the intraregional relationships and degree of connectivity between pairs of operational taxonomic units were not necessarily constant from one method to another. Ordination methods when applied with Nei's and Rogers' distances provided the best fit, with original distances (r = 0.98) compared with UPGMA clustering (r approximately 0.75). The Nei/NMDS combination seems to be a good compromise (distortion index dt = 10%) between Rogers/NMDS, which produces a more confusing pattern of differentiation (dt = 24%), and chi 2/CA, which tends to distort large distances (dt = 31%). NMDS obviously provides a powerful method to summarize relationships between populations, when neither hierarchical structure nor phylogenetic inference are required. These findings led the discussion on the good performance of NMDS, the appropriate distances to be used, and the potential application of this method to other types of allelic data (such as microsatellite loci) or data on nucleotide sequences of genes.  相似文献   

17.
Great Lakes coastal wetlands are widely recognized as areas of concentrated biodiversity and productivity, but the factors that influence diversity and productivity within these systems are largely unknown. Several recent studies have suggested that the abundance and diversity of flora and fauna in coastal wetlands may be related to distance from the open water/macrophyte edge. We examined this possibility for three faunal groups inhabiting a coastal wetland in Saginaw Bay, Lake Huron. We sampled crustacean zooplankton and benthic macro-invertebrates at five distances from open water in the summer 1994, and fish at three distances from open water in 1994 and 1995. We found significant spatial trends in the total abundance and diversity of zooplankton and fish, as well as the diversity of benthic macro-invertebrates. Zooplankton abundance and taxa richness were highest at intermediate distances from open water in a transition zone between the well-mixed bayward portion of the wetland, and the non-circulating nearshore area. Benthic macro-invertebrate taxa richness increased linearly with distance from open water. In contrast, fish abundance and species richness declined linearly and substantially (abundance by 78%, species richness by 40%) with distance from open water. Of the 40 taxa examined in this study, 21 had significant horizontal trends in abundance. This led to notable differences in community composition throughout the wetland. Our results suggest that distance from open water may be a primary determinant of the spatial distributions of numerous organismal groups inhabiting this coastal wetland. Several possible reasons for these distributions are discussed.  相似文献   

18.
MOTIVATION: Comparing two protein databases is a fundamental task in biosequence annotation. Given two databases, one must find all pairs of proteins that align with high score under a biologically meaningful substitution score matrix, such as a BLOSUM matrix (Henikoff and Henikoff, 1992). Distance-based approaches to this problem map each peptide in the database to a point in a metric space, such that peptides aligning with higher scores are mapped to closer points. Many techniques exist to discover close pairs of points in a metric space efficiently, but the challenge in applying this work to proteomic comparison is to find a distance mapping that accurately encodes all the distinctions among residue pairs made by a proteomic score matrix. Buhler (2002) proposed one such mapping but found that it led to a relatively inefficient algorithm for protein-protein comparison. RESULTS: This work proposes a new distance mapping for peptides under the BLOSUM matrices that permits more efficient similarity search. We first propose a new distance function on peptides derived from a given score matrix. We then show how to map peptides to bit vectors such that the distance between any two peptides is closely approximated by the Hamming distance (i.e. number of mismatches) between their corresponding bit vectors. We combine these two results with the LSH-ALL-PAIRS-SIM algorithm of Buhler (2002) to produce an improved distance-based algorithm for proteomic comparison. An initial implementation of the improved algorithm exhibits sensitivity within 5% of that of the original LSH-ALL-PAIRS-SIM, while running up to eight times faster.  相似文献   

19.
Relative weighting of characters used in taxonomic decisions is detected by comparison with taxonomic models in which characters are given equal weights. Classifications are analysed for implied distance inequalities between triplets of taxa, and the minimal weights applied to the distances between taxa necessary to satisfy these constraints are estimated. Weights acting as multipliers on interacting characters are compared with weight estimations: geometric parameters which depend upon the relative locations of the taxa in taxonomic space.  相似文献   

20.
Summary Operator metrics are explicity designed to measure evolutionary distances from nucleic acid sequences when substitution rates differ greatly among the organisms being compared, or when substitutions have been extensive. Unlike lengths calculated by the distance matrix and parsimony methods, in which substitutions in one branch of a tree can alter the measured length of another branch, lengths determined by operator metrics are not affected by substitutions outside the branch.In the method, lengths (operator metrics) corresponding to each of the branches of an unrooted tree are calculated. The metric length of a branch reconstructs the number of (transversion) differences between sequences at a tip and a node (or between nodes) of a tree. The theory is general and is fundamentally independent of differences in substitution rates among the organisms being compared. Mathematically, the independence has been obtained becuase the metrics are eigen vectors of fundamental equations which describe the evolution of all unrooted trees.Even under conditions when both the distance matrix method or a simple parsimony length method are show to indicate lengths than are an order of magnitude too large or too small, the operator metrics are accurate. Examples, using data calculated with evolutionary rates and branchings designed to confuse the measurement of branch lengths and to camouflage the topology of the true tree, demonstrate the validity of operator metrics. The method is robust. Operator metric distances are easy to calculated, can be extended to any number of taxa, and provide a statistical estimate of their variances.The utility of the method is demonstrated by using it to analyze the origins and evolutionary of chloroplasts, mitochondria, and eubacteria.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号