首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
D-H Kim  D Heber  D W Still 《Génome》2004,47(1):102-111
The taxonomy of Echinacea is based on morphological characters and has varied depending on the monographer. The genus consists of either nine species and four varieties or four species and eight varieties. We have used amplified fragment length polymorphisms (AFLP) to assess genetic diversity and phenetic relationships among nine species and three varieties of Echinacea (sensu McGregor). A total of 1086 fragments, of which approximately 90% were polymorphic among Echinacea taxa, were generated from six primer combinations. Nei and Li's genetic distance coefficient and the neighbor-joining algorithm were employed to construct a phenetic tree. Genetic distance results indicate that all Echinacea species are closely related, and the average pairwise distance between populations was approximately three times the intrapopulation distances. The topology of the neighbor-joining tree strongly supports two major clades, one containing Echinacea purpurea, Echinacea sanguinea, and Echinacea simulata and the other containing the remainder of the Echinacea taxa (sensu McGregor). The species composition within the clades differs between our AFLP data and the morphometric treatment offered by Binns and colleagues. We also discuss the suitability of AFLP in determining phylogenetic relationships.  相似文献   

2.
An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein for the set. The algorithm is a heuristic in that it computes an approximation to the optimal multiple structure alignment that minimizes the sum of the pairwise distances between the protein structures. The algorithm chooses an input protein as the initial consensus and computes a correspondence between the protein structures (which are represented as sets of unit vectors) using an approach analogous to the center-star method for multiple sequence alignment. From this correspondence, a set of rotation matrices (optimal for the given correspondence) is derived to align the structures and derive the new consensus. The process is iterated until the sum of pairwise distances converges. The computation of the optimal rotations is itself an iterative process that both makes use of the current consensus and generates simultaneously a new one. This approach is based on an interesting result that allows the sum of all pairwise distances to be represented compactly as distances to the consensus. Experimental results on several protein families are presented, showing that the algorithm converges quite rapidly.  相似文献   

3.
The popular neighbor-joining (NJ) algorithm used in phylogenetics is a greedy algorithm for finding the balanced minimum evolution (BME) tree associated to a dissimilarity map. From this point of view, NJ is "optimal" when the algorithm outputs the tree which minimizes the balanced minimum evolution criterion. We use the fact that the NJ tree topology and the BME tree topology are determined by polyhedral subdivisions of the spaces of dissimilarity maps to study the optimality of the neighbor-joining algorithm. In particular, we investigate and compare the polyhedral subdivisions for n ≤ 8. This requires the measurement of volumes of spherical polytopes in high dimension, which we obtain using a combination of Monte Carlo methods and polyhedral algorithms. Our results include a demonstration that highly unrelated trees can be co-optimal in BME reconstruction, and that NJ regions are not convex. We obtain the l 2 radius for neighbor-joining for n = 5 and we conjecture that the ability of the neighbor-joining algorithm to recover the BME tree depends on the diameter of the BME tree.  相似文献   

4.

Background

Microbial typing methods are commonly used to study the relatedness of bacterial strains. Sequence-based typing methods are a gold standard for epidemiological surveillance due to the inherent portability of sequence and allelic profile data, fast analysis times and their capacity to create common nomenclatures for strains or clones. This led to development of several novel methods and several databases being made available for many microbial species. With the mainstream use of High Throughput Sequencing, the amount of data being accumulated in these databases is huge, storing thousands of different profiles. On the other hand, computing genetic evolutionary distances among a set of typing profiles or taxa dominates the running time of many phylogenetic inference methods. It is important also to note that most of genetic evolution distance definitions rely, even if indirectly, on computing the pairwise Hamming distance among sequences or profiles.

Results

We propose here an average-case linear-time algorithm to compute pairwise Hamming distances among a set of taxa under a given Hamming distance threshold. This article includes both a theoretical analysis and extensive experimental results concerning the proposed algorithm. We further show how this algorithm can be successfully integrated into a well known phylogenetic inference method, and how it can be used to speedup querying local phylogenetic patterns over large typing databases.
  相似文献   

5.
We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.  相似文献   

6.

Background  

An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins.  相似文献   

7.
The upland mesic rainforests of eastern Australia have been described as a "mesothermal archipelago" where a chain of cool mountain "islands" arise from a warm "sea" of tropical and subtropical lowlands. An endemic freshwater crayfish belonging to the genus Euastacus is found on each of these mountain "islands." The Euastacus are particularly suitable for the study of evolution because each mountain harbors a unique species, there are many taxa present providing replication within the group and, most importantly, their distribution is linear, extending along a south-north axis. This group could have evolved by "simultaneous vicariance" where there was one vicariant separation event of a widespread ancestor, or by "south to north stepping stone dispersal" where there were long distance dispersal events from neighboring mountain islands, starting in the south and proceeding north in a dispersal-colonization wave. We used pairwise genetic distances between nearest geographic neighbors as a novel way to test the two hypotheses. If diversification was due to "south to north stepping stone dispersal," then pairwise genetic distances between nearest geographic neighbors should decrease progressively the farther north the taxon pairs are found, reflecting the decreasing periods of isolation. In this case there should be a negative correlation between the south to north rank order of nearest neighbors and pairwise genetic distances. A Spearman's correlation on 16S mtDNA pairwise genetic distances and geographic rank order was not significant, indicating there was no support for the south to north stepping stone dispersal hypothesis. If simultaneous vicariance was responsible for diversification then all nearest geographic neighbor taxon pairs should have similar genetic distances and, therefore, the variance in nearest neighbor distances should be zero, or close to it. To test if the observed variance was tending towards zero we developed a randomization test where nearest neighbor taxon pairs were assigned random genetic distances and the variances calculated. The observed variance lay in the < 0.05 range of the simulated variances, providing support for the simultaneous vicariance hypothesis. The data also suggest there was simultaneous vicariance of at least two ancestral Queensland lineages. The timing of this vicariant event was probably in the Pliocene, which is consistent with the divergence times reported for other Australian mesic rainforest restricted taxa.  相似文献   

8.
Determining molecular structure from interatomic distances is an important and challenging problem. Given a molecule with n atoms, lower and upper bounds on interatomic distances can usually be obtained only for a small subset of the atom pairs, using NMR. Given the bounds so obtained on the distances between some of the atom pairs, it is often useful to compute tighter bounds on all the pairwise distances. This process is referred to as bound smoothing. The initial lower and upper bounds for the pairwise distances not measured are usually assumed to be 0 and ∞. One method for bound smoothing is to use the limits imposed by the triangle inequality. The distance bounds so obtained can often be tightened further by applying the tetrangle inequality—the limits imposed on the six pairwise distances among a set of four atoms (instead of three for the triangle inequalities). The tetrangle inequality is expressed by the Cayley—Menger determinants. For every quadruple of atoms, each pass of the tetrangle inequality bound smoothing procedure finds upper and lower limits on each of the six distances in the quadruple. Applying the tetrangle inequalities to each of the ( 4 n ) quadruples requires O(n 4) time. Here, we propose a parallel algorithm for bound smoothing employing the tetrangle inequality. Each pass of our algorithm requires O(n 3 log n) time on a CREW PRAM (Concurrent Read Exclusive Write Parallel Random Access Machine) with processors. An implementation of this parallel algorithm on the Intel Paragon XP/S and its performance are also discussed.  相似文献   

9.
Phylogenetic inference under the pure drift model   总被引:1,自引:1,他引:0  
When pairwise genetic distances are used for phylogenetic reconstruction, it is usually assumed that the genetic distance between two taxa contains information about the time after the two taxa diverged. As a result, upon an appropriate transformation if necessary, the distance usually can be fitted to a linear model such that it is expressed as the sum of lengths of all branches that connect the two taxa in a given phylogeny. This kind of distance is referred to as "additive distance." For a phylogenetic tree exclusively driven by random genetic drift, genetic distances related to coancestry coefficients (theta XY) between any two taxa are more suitable. However, these distances are fundamentally different from the additive distance in that coancestry does not contain any information about the time after two taxa split from a common ancestral population; instead, it reflects the time before the two taxa diverged. In other words, the magnitude of theta XY provides information about how long the two taxa share the same evolutionary pathways. The fundamental difference between the two kinds of distances has led to a different algorithm of evaluating phylogenetic trees when theta XY and related distance measures are used. Here we present the new algorithm using the ordinary- least-squares approach but fitting to a different linear model. This treatment allows genetic variation within a taxon to be included in the model. Monte Carlo simulation for a rooted phylogeny of four taxa has verified the efficacy and consistency of the new method. Application of the method to human population was demonstrated.   相似文献   

10.
FastJoin, an improved neighbor-joining algorithm   总被引:1,自引:0,他引:1  
Reconstructing the evolutionary history of a set of species is an elementary problem in biology, and methods for solving this problem are evaluated based on two characteristics: accuracy and efficiency. Neighbor-joining reconstructs phylogenetic trees by iteratively picking a pair of nodes to merge as a new node until only one node remains; due to its good accuracy and speed, it has been embraced by the phylogeny research community. With the advent of large amounts of data, improved fast and precise methods for reconstructing evolutionary trees have become necessary. We improved the neighbor-joining algorithm by iteratively picking two pairs of nodes and merging as two new nodes, until only one node remains. We found that another pair of true neighbors could be chosen to merge as a new node besides the pair of true neighbors chosen by the criterion of the neighbor-joining method, in each iteration of the clustering procedure for the purely additive tree. These new neighbors will be selected by another iteration of the neighbor-joining method, so that they provide an improved neighbor-joining algorithm, by iteratively picking two pairs of nodes to merge as two new nodes until only one node remains, constructing the same phylogenetic tree as the neighbor-joining algorithm for the same input data. By combining the improved neighbor-joining algorithm with styles upper bound computation optimization of RapidNJ and external storage of ERapidNJ methods, a new method of reconstructing phylogenetic trees, FastJoin, was proposed. Experiments with sets of data showed that this new neighbor-joining algorithm yields a significant speed-up compared to classic neighbor-joining, showing empirically that FastJoin is superior to almost all other neighbor-joining implementations.  相似文献   

11.
In the reconstruction of a large phylogenetic tree, the most difficult part is usually the problem of how to explore the topology space to find the optimal topology. We have developed a "divide-and-conquer" heuristic algorithm in which an initial neighbor-joining (NJ) tree is divided into subtrees at internal branches having bootstrap values higher than a threshold. The topology search is then conducted by using the maximum-likelihood method to reevaluate all branches with a bootstrap value lower than the threshold while keeping the other branches intact. Extensive simulation showed that our simple method, the neighbor-joining maximum-likelihood (NJML) method, is highly efficient in improving NJ trees. Furthermore, the performance of the NJML method is nearly equal to or better than existing time-consuming heuristic maximum-likelihood methods. Our method is suitable for reconstructing relatively large molecular phylogenetic trees (number of taxa >/= 16).  相似文献   

12.
H Tyson 《Génome》1992,35(2):360-371
Optimum alignment in all pairwise combinations among a group of amino acid sequences generated a distance matrix. These distances were clustered to evaluate relationships among the sequences. The degree of relationship among sequences was also evaluated by calculating specific distances from the distance matrix and examining correlations between patterns of specific distances for pairs of sequences. The sequences examined were a group of 20 amino acid sequences of scorpion toxins originally published and analyzed by M.J. Dufton and H. Rochat in 1984. Alignment gap penalties were constant for all 190 pairwise sequence alignments and were chosen after assessing the impact of changing penalties on resultant distances. The total distances generated by the 190 pairwise sequence alignments were clustered using complete (farthest neighbour) linkage. The square, symmetrical input distance matrix is analogous to diallel cross data where reciprocal and parental values are absent. Diallel analysis methods provided analogues for the distance matrix to genetical specific combining abilities, namely specific distances between all sequence pairs that are independent of the average distances shown by individual sequences. Correlation of specific distance patterns, with transformation to modified z values and a stringent probability level, were used to delineate subgroups of related sequences. These were compared with complete linkage clustering results. Excellent agreement between the two approaches was found. Three originally outlying sequences were placed within the four new subgroups.  相似文献   

13.

Background

A recursive algorithm to calculate the fifteen detailed coefficients of identity is introduced. Previous recursive procedures based on the generalized coefficients of kinship provided the detailed coefficients of identity under the assumption that the two individuals were not an ancestor of each other.

Findings

By using gametic relationships to include three, four or two pairs of gametes, we can obtain these coefficients for any pair of individuals. We have developed a novel linear transformation that allows for the calculation of pairwise detailed identity coefficients for any pedigree given the gametic relationships. We illustrate the procedure using the well-known pedigree of Julio and Mencha, which contains 20 Jicaque Indians of Honduras, to calculate their detailed coefficients.

Conclusions

The proposed algorithm can be used to calculate the detailed identity coefficients of two or more individuals with any pedigree relationship.  相似文献   

14.
The serine -lactamases present a special problem for phylogenetics because they have diverged so much that they fall into three classes that share no detectable sequence homology among themselves. Here we offer a solution to the problem in the form of two phylogenies that are based on a protein structure alignment. In the first, structural alignments were used as a guide for aligning amino acid sequences and in the second, the average root mean square distances between the alpha carbons of the proteins were used to create a pairwise distance matrix from which a neighbor-joining phylogeny was created. From those phylogenies, we show that the Class A and Class D -lactamases are sister taxa and that the divergence of the Class C -lactamases predated the divergence of the Class A and Class D -lactamases.  相似文献   

15.
Both multiple sequence alignment and phylogenetic analysis are problematic in the "twilight zone" of sequence similarity (≤ 25% amino acid identity). Herein we explore the accuracy of phylogenetic inference at extreme sequence divergence using a variety of simulated data sets. We evaluate four leading multiple sequence alignment (MSA) methods (MAFFT, T-COFFEE, CLUSTAL, and MUSCLE) and six commonly used programs of tree estimation (Distance-based: Neighbor-Joining; Character-based: PhyML, RAxML, GARLI, Maximum Parsimony, and Bayesian) against a novel MSA-independent method (PHYRN) described here. Strikingly, at "midnight zone" genetic distances (~7% pairwise identity and 4.0 gaps per position), PHYRN returns high-resolution phylogenies that outperform traditional approaches. We reason this is due to PHRYN's capability to amplify informative positions, even at the most extreme levels of sequence divergence. We also assess the applicability of the PHYRN algorithm for inferring deep evolutionary relationships in the divergent DANGER protein superfamily, for which PHYRN infers a more robust tree compared to MSA-based approaches. Taken together, these results demonstrate that PHYRN represents a powerful mechanism for mapping uncharted frontiers in highly divergent protein sequence data sets.  相似文献   

16.
尖塘鳢属鱼类线粒体12SrRNA基因序列分析   总被引:7,自引:0,他引:7  
利用PCR技术扩增和测序了线纹尖塘鳢、云斑尖塘鳢和海丰沙塘鳢线粒体12SrRNA基因,结合从GenBank中下载的部分同源序列,共分析了5种鱼类的系统发育关系。在Kimura2-parameter模型构建的邻接树中,原产泰国的云斑尖塘鳢与原产澳州线纹尖塘鳢均为单系类群,二者为亲缘关系最为密切的姐妹群,海丰沙塘鳢与其它群体的亲缘关系较远,支持将尖塘鳢属从塘鳢属中分出的传统分类处理。尖塘鳢属内云斑尖塘鳢和线纹尖塘鳢鱼类种内DNA序列无差异,而种间差异明显,表明线粒体12SrRNA基因可作为塘鳢科鱼类种类鉴定的良好分子标记。  相似文献   

17.
We propose permutation tests based on the pairwise distances between microarrays to compare location, variability, or equivalence of gene expression between two populations. For these tests the entire microarray or some pre-specified subset of genes is the unit of analysis. The pairwise distances only have to be computed once so the procedure is not computationally intensive despite the high dimensionality of the data. An R software package, permtest, implementing the method is freely available from the Comprehensive R Archive Network at http://cran.r-project.org.  相似文献   

18.

Background  

A phylogenetic network is a generalization of phylogenetic trees that allows the representation of conflicting signals or alternative evolutionary histories in a single diagram. There are several methods for constructing these networks. Some of these methods are based on distances among taxa. In practice, the methods which are based on distance perform faster in comparison with other methods. The Neighbor-Net (N-Net) is a distance-based method. The N-Net produces a circular ordering from a distance matrix, then constructs a collection of weighted splits using circular ordering. The SplitsTree which is a program using these weighted splits makes a phylogenetic network. In general, finding an optimal circular ordering is an NP-hard problem. The N-Net is a heuristic algorithm to find the optimal circular ordering which is based on neighbor-joining algorithm.  相似文献   

19.
MOTIVATION: The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. RESULTS: We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.  相似文献   

20.
We report the application of an integrated computational approach for biomolecular structure determination at a low resolution. In particular, a neural network is trained to predict the spatial proximity of C-alpha atoms that are less than a given threshold apart, whereas a Kalman filter algorithm is employed to outline the biomolecular fold, with a constraints set that includes these pairwise atomic distances, and the distances and angles that define the structure as it is known from the protein's sequence. The results for Crambin demonstrate that this integrated approach is useful for molecular structure prediction at a low resolution and may also complement existing experimental distance data for a protein structure determination. © 1996 John Wiley & Sons, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号