首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The β-lactamases enzymes cleave the amide bond in β-lactam ring, rendering β-lactam antibiotics harmless to bacteria. In this communication we have studied structure-function relationship and phylogenies of class A, B and D beta-lactamases using structure-based sequence alignment and phylip programs respectively. The data of structure-based sequence alignment suggests that in different isolates of TEM-1, mutations did not occur at or near sequence motifs. Since deletions are reported to be lethal to structure and function of enzyme. Therefore, in these variants antibiotic hydrolysis profile and specificity will be affected. The alignment data of class A enzyme SHV-1, CTX-M-15, class D enzyme, OXA-10, and class B enzyme VIM-2 and SIM-1 show sequence motifs along with other part of polypeptide are essentially conserved. These results imply that conformations of betalactamases are close to native state and possess normal hydrolytic activities towards beta-lactam antibiotics. However, class B enzyme such as IMP-1 and NDM-1 are less conserved than other class A and D studied here because mutation and deletions occurred at critically important region such as active site. Therefore, the structure of these beta-lactamases will be altered and antibiotic hydrolysis profile will be affected. Phylogenetic studies suggest that class A and D beta-lactamases including TOHO-1 and OXA-10 respectively evolved by horizontal gene transfer (HGT) whereas other member of class A such as TEM-1 evolved by gene duplication mechanism. Taken together, these studies justify structure-function relationship of beta-lactamases and phylogenetic studies suggest these enzymes evolved by different mechanisms.  相似文献   

2.

Background

Existing sequence alignment algorithms use heuristic scoring schemes based on biological expertise, which cannot be used as objective distance metrics. As a result one relies on crude measures, like the p- or log-det distances, or makes explicit, and often too simplistic, a priori assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI). MI is, in principle, an objective and model independent similarity measure, but it is not widely used in this context and no algorithm for extracting MI from a given alignment (without assuming an evolutionary model) is known. MI can be estimated without alignments, by concatenating and zipping sequences, but so far this has only produced estimates with uncontrolled errors, despite the fact that the normalized compression distance based on it has shown promising results.

Results

We describe a simple approach to get robust estimates of MI from global pairwise alignments. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. For animal mitochondrial DNA our approach uses the alignments made by popular global alignment algorithms to produce MI estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. We point out that, due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances.

Conclusions

Several versions of MI based distances outperform conventional distances in distance-based phylogeny. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments. It strongly suggests that information theory concepts can be exploited further in sequence analysis.  相似文献   

3.
BackgroundProtein domains are commonly used to assess the functional roles and evolutionary relationships of proteins and protein families. Here, we use the Pfam protein family database to examine a set of candidate partial domains. Pfam protein domains are often thought of as evolutionarily indivisible, structurally compact, units from which larger functional proteins are assembled; however, almost 4% of Pfam27 PfamA domains are shorter than 50% of their family model length, suggesting that more than half of the domain is missing at those locations. To better understand the structural nature of partial domains in proteins, we examined 30,961 partial domain regions from 136 domain families contained in a representative subset of PfamA domains (RefProtDom2 or RPD2).ResultsWe characterized three types of apparent partial domains: split domains, bounded partials, and unbounded partials. We find that bounded partial domains are over-represented in eukaryotes and in lower quality protein predictions, suggesting that they often result from inaccurate genome assemblies or gene models. We also find that a large percentage of unbounded partial domains produce long alignments, which suggests that their annotation as a partial is an alignment artifact; yet some can be found as partials in other sequence contexts.ConclusionsPartial domains are largely the result of alignment and annotation artifacts and should be viewed with caution. The presence of partial domain annotations in proteins should raise the concern that the prediction of the protein’s gene may be incomplete. In general, protein domains can be considered the structural building blocks of proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0656-7) contains supplementary material, which is available to authorized users.  相似文献   

4.
Inferring evolutionary processes from phylogenies   总被引:23,自引:0,他引:23  
Evolutionary processes shape the regular trends of evolution and are responsible for the diversity and distribution of contemporary species. They include correlated evolutionary change and trajectories of trait evolution, convergent and parallel evolution, differential rates of evolution, speciation and extinction, the order and direction of change in characters, and the nature of the evolutionary process itself—does change accumulate gradually, episodically, or in punctuational bursts. Phylogenies, in combination with information on species, contain the imprint of these historical evolutionary processes. By applying comparative methods based upon statistical models of evolution to well resolved phylogenies, it is possible to infer the historical evolutionary processes that must have existed in the past, given the patterns of diversity seen in the present. I describe a set of maximum likelihood statistical methods for inferring such processes. The methods estimate parameters of statistical models for inferring correlated evolutionary change in continuously varying characters, for detecting correlated evolution in discrete characters, for estimating rates of evolution, and for investigating the nature of the evolutionary process itself. They also anticipate the wealth of information becoming available to biological scientists from genetic studies that pin down relationships among organisms with unprecedented accuracy.  相似文献   

5.
Inferring speciation rates from phylogenies   总被引:6,自引:0,他引:6  
Abstract It is possible to estimate the rate of diversification of clades from phylogenies with a temporal dimension. First, I present several methods for constructing confidence intervals for the speciation rate under the simple assumption of a pure birth process. I discuss the relationships among these methods in the hope of clarifying some fundamental theory in this area. Their performances are compared in a simulation study and one is recommended for use as a result. A variety of other questions that may, in fact, be the questions of primary interest (e.g., Has the rate of cladogenesis been declining?) are then recast as biological variants of the purely statistical question—Is the birth process model appropriate for my data? Seen in this way, a preexisting arsenal of statistical techniques is opened up for use in this area: in particular, techniques developed for the analysis of Poisson processes and the analysis of survival data. These two approaches start from different representations of the data—the branch lengths in the tree—and I explicitly relate the two. Aiming for a synoptic account of useful theory in this area, I briefly discuss some important results from the analysis of two distinct birth‐death processes: the one introduced into this area by Hey (1992) is refitted with some powerful statistical tools.  相似文献   

6.
7.
The present study illustrates a method for analysing the biogeography of a group that is based on the group's phylogeny but does not invoke founder dispersal or centre of origin. The case studies presented include groups from many different parts of the world, but most are from the south‐west Pacific. The idea that basal groups are ancestral is not valid as a generalization. Neither the basal group, nor the oldest fossil represents the centre of origin, the time of origin or the ancestral ecology. Basal groups comprise less diverse sister groups and their distributions occur around centres of differentiation in already widespread ancestors, and not centres of origin for the whole group. Thus, the sequence of nodes in a phylogeny may indicate the spatial sequence of differentiation in a widespread ancestor rather than a series of founder dispersal events. Allocation of clades to a priori geographic areas, such as the continents, in the initial stages of biogeographic analysis has often involved incorrect assumptions of sympatry. This has led to the idea that the ‘areas of sympatry’ were centres of origin. Areas other than those defined by the taxa themselves need not be used in analysis. The fossil‐calibrated molecular clock, with dates transmogrified from minimum to maximum dates, has been used to test for vicariance. Recent work in population genetics, however, indicates that allopatry is caused by vicariance rather than founder dispersal, and so vicariance can instead be used to test the clock. Deriving evolutionary chronology by calibrating spatial vicariance in molecular clades with associated tectonic events is more reasonable than relying on the fossil record to give maximum (absolute) dates. © 2009 The Linnean Society of London, Biological Journal of the Linnean Society, 2009, 98 , 757–774.  相似文献   

8.
Rubin BE  Ree RH  Moreau CS 《PloS one》2012,7(4):e33394
Reduced-representation genome sequencing represents a new source of data for systematics, and its potential utility in interspecific phylogeny reconstruction has not yet been explored. One approach that seems especially promising is the use of inexpensive short-read technologies (e.g., Illumina, SOLiD) to sequence restriction-site associated DNA (RAD)--the regions of the genome that flank the recognition sites of restriction enzymes. In this study, we simulated the collection of RAD sequences from sequenced genomes of different taxa (Drosophila, mammals, and yeasts) and developed a proof-of-concept workflow to test whether informative data could be extracted and used to accurately reconstruct "known" phylogenies of species within each group. The workflow consists of three basic steps: first, sequences are clustered by similarity to estimate orthology; second, clusters are filtered by taxonomic coverage; and third, they are aligned and concatenated for "total evidence" phylogenetic analysis. We evaluated the performance of clustering and filtering parameters by comparing the resulting topologies with well-supported reference trees and we were able to identify conditions under which the reference tree was inferred with high support. For Drosophila, whole genome alignments allowed us to directly evaluate which parameters most consistently recovered orthologous sequences. For the parameter ranges explored, we recovered the best results at the low ends of sequence similarity and taxonomic representation of loci; these generated the largest supermatrices with the highest proportion of missing data. Applications of the method to mammals and yeasts were less successful, which we suggest may be due partly to their much deeper evolutionary divergence times compared to Drosophila (crown ages of approximately 100 and 300 versus 60 Mya, respectively). RAD sequences thus appear to hold promise for reconstructing phylogenetic relationships in younger clades in which sufficient numbers of orthologous restriction sites are retained across species.  相似文献   

9.
ABSTRACT: Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.  相似文献   

10.
Summary This communication examines the question of phylogenetic congruency- i.e., whether or not the branching order of evolutionary trees is independent of the protein studied. It was found that trees constructed for birds on the basis of immunological comparison of their transferrins, albumins, and ovalbumins agree approximately with a published tree based on the amino acid sequences of their lysozymesc. This congruency is especially noteworthy with respect to the phylogenetic position of the chachalaca, a Mexican bird classified on morphological grounds in the family Cracidae of the order Galliformes. At the protein level, this species differs as much from non-cracid galliform birds as does the duck, which belongs to another order. Despite the organismal similarity between cracid and non-cracid galliform birds, the molecular relationship is remote. If this contrast between organismal and molecular results had been based on comparative studies with only lysozyme, one could have ascribed the contrast to the possibility that chachalaca lysozyme was paralogous, rather than orthologous, to the other bird lysozymesc. Examination of several proteins is thus desirable in cases of possible paralogy.This work was supported in part by grants GB-42028X from NSF and GM-21509 from NIH  相似文献   

11.
Abstract The theory of ‘punctuated equilibrium’ hypothesises that most morphological change in species takes place in rapid bursts triggered by speciation. Eldregde and Gould postulated the theory in 1972, as an alternative to the idea that morphological change slowly accumulates in the course of time, a then common belief they dubbed ‘phyletic gradualism’. Ever since its introduction the theory of punctuated equilibrium has been the subject of speculation rather than empirical validation. Here I present a method to detect punctuated evolution without reference to fossil data, based on the phenotypes of extant species and on their relatedness as revealed by molecular phylogeny. The method involves a general mathematical model describing morphological differentiation of two species over time. The two parameters in the model, the rates of punctual (cladogenetic) and gradual (anagenetic) change, are estimated from plots of morphological diversification against time since divergence of extant species.  相似文献   

12.
Haplotype reconstruction from SNP alignment.   总被引:4,自引:0,他引:4  
In this paper, we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment assembly, we wish to reconstruct the two haplotypes of the parents. Given a set of potential SNP sites inferred from the assembly alignment, we wish to divide the fragment set into two subsets, each of which represents one chromosome. Our method is based on a statistical model of sequencing errors, compositional information, and haplotype memberships. We calculate probabilities of different haplotypes conditional on the alignment. Due to computational complexity, we first determine phases for neighboring SNPs. Then we connect them and construct haplotype segments. Also, we compute the accuracy or confidence of the reconstructed haplotypes. We discuss other issues, such as alternative methods, parameter estimation, computational efficiency, and relaxation of assumptions.  相似文献   

13.
14.
The Cavender-Felsenstein edge-length invariants for binary characters on 4-trees provide the starting point for the development of "customized" invariants for evaluating and comparing phylogenetic hypotheses. The binary character invariants may be generalized to k-valued characters without losing the quadratic nature of the invariants as functions of the theoretical frequencies f(UVXY) of observable character configurations (U at organism 1, V at 2, etc.). The key to the approach is that certain sets of these configurations constitute events which are probabilistically independent from other such sets, under the symmetric Markov change models studied. By introducing more complex sets of configurations, we find the quadratic invariants for 5-trees in the binary model and for individual edges in 6-trees or, indeed, in any size tree. The same technique allows us to formulate invariants for entire trees, but these are cubic functions for 6-trees and are higher-degree polynomials for larger trees. With k-valued characters and, especially, with large trees, the types of configuration sets (events) used in the simpler examples are too rare (i.e., their predicted frequencies are too low) to be useful, and the construction of meaningful pairs of independent events becomes an important and nontrivial task in designing invariants suited to testing specific hypotheses. In a very natural way, this approach fits in with well-known statistical methodology for contingency tables. We explore use of events such as "only transitions occur for character i (i.e., position i in a nucleic acid sequence) in subtree a" in analyzing a set of data on ribosomal RNA in the context of the controversy over the origins of archaebacteria, eubacteria, and eukaryotes.  相似文献   

15.
We give a 5-approximation algorithm to the rooted Subtree-Prune-and-Regraft (rSPR) distance between two phylogenies, which was recently shown to be NP-complete. This paper presents the first approximation result for this important tree distance. The algorithm follows a standard format for tree distances. The novel ideas are in the analysis. In the analysis, the cost of the algorithm uses a "cascading" scheme that accounts for possible wrong moves. This accounting is missing from previous analysis of tree distance approximation algorithms. Further, we show how all algorithms of this type can be implemented in linear time and give experimental results.  相似文献   

16.

Background  

Gene trees that arise in the context of reconstructing the evolutionary history of polyploid species are often multiply-labeled, that is, the same leaf label can occur several times in a single tree. This property considerably complicates the task of forming a consensus of a collection of such trees compared to usual phylogenetic trees.  相似文献   

17.
Biodiversity arises from the balance between speciation and extinction. Fossils record the origins and disappearance of organisms, and the branching patterns of molecular phylogenies allow estimation of speciation and extinction rates, but the patterns of diversification are frequently incongruent between these two data sources. I tested two hypotheses about the diversification of primates based on ~600 fossil species and 90% complete phylogenies of living species: (1) diversification rates increased through time; (2) a significant extinction event occurred in the Oligocene. Consistent with the first hypothesis, analyses of phylogenies supported increasing speciation rates and negligible extinction rates. In contrast, fossils showed that while speciation rates increased, speciation and extinction rates tended to be nearly equal, resulting in zero net diversification. Partially supporting the second hypothesis, the fossil data recorded a clear pattern of diversity decline in the Oligocene, although diversification rates were near zero. The phylogeny supported increased extinction ~34 Ma, but also elevated extinction ~10 Ma, coinciding with diversity declines in some fossil clades. The results demonstrated that estimates of speciation and extinction ignoring fossils are insufficient to infer diversification and information on extinct lineages should be incorporated into phylogenetic analyses.  相似文献   

18.
Molecular biologists strive to infer evolutionary relationships from quantitative macromolecular comparisons obtained by immunological, DNA hybridization, electrophoretic or amino acid sequencing techniques. The problem is to find unrooted phylogenies that best approximate a given dissimilarity matrix according to a goodness-of-fit measure, for example the least-squares-fit criterion or Farris'sf statistic. Computational costs of known algorithms guaranteeing optimal solutions to these problems increase exponentially with problem size; practical computational considerations limit the algorithms to analyzing small problems. It is established here that problems of phylogenetic inference based on the least-squares-fit criterion and thef statistic are NP-complete and thus are so difficult computationally that efficient optimal algorithms are unlikely to exist for them. The Natural Sciences and Engineering Research Council of Canada partially supported this research through an individual operating grant (A4142) to W.H.E. Day.  相似文献   

19.
Median-joining networks for inferring intraspecific phylogenies.   总被引:72,自引:0,他引:72  
Reconstructing phylogenies from intraspecific data (such as human mitochondrial DNA variation) is often a challenging task because of large sample sizes and small genetic distances between individuals. The resulting multitude of plausible trees is best expressed by a network which displays alternative potential evolutionary paths in the form of cycles. We present a method ("median joining" [MJ]) for constructing networks from recombination-free population data that combines features of Kruskal's algorithm for finding minimum spanning trees by favoring short connections, and Farris's maximum-parsimony (MP) heuristic algorithm, which sequentially adds new vertices called "median vectors", except that our MJ method does not resolve ties. The MJ method is hence closely related to the earlier approach of Foulds, Hendy, and Penny for estimating MP trees but can be adjusted to the level of homoplasy by setting a parameter epsilon. Unlike our earlier reduced median (RM) network method, MJ is applicable to multistate characters (e.g., amino acid sequences). An additional feature is the speed of the implemented algorithm: a sample of 800 worldwide mtDNA hypervariable segment I sequences requires less than 3 h on a Pentium 120 PC. The MJ method is demonstrated on a Tibetan mitochondrial DNA RFLP data set.  相似文献   

20.
GENIE implements a statistical framework for inferring the demographic history of a population from phylogenies that have been reconstructed from sampled DNA sequences. The methods are based on population genetic models known collectively as coalescent theory. AVAILABILITY: GENIE is available from http://evolve.zoo.ox.ac.uk. All popular operating systems are supported.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号