首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.  相似文献   

2.
If lateral gene transfer (LGT) has affected all genes over the course of prokaryotic evolution, reconstruction of organismal phylogeny is compromised. However, if a core of genes is immune to transfer, then the evolutionary history of that core might be our most reliable guide to the evolution of organisms. Such a core should be preferentially included in the subset of genes shared by all organisms, but where universally conserved genes have been analyzed, there is too little phylogenetic signal to allow determination of whether or not they indeed have the same history (Hansmann and Martin 2000; Teichmann and Mitchison 1999). Here we look at a more restricted set, 521 homologous genes (COGs) simultaneously present in four sequenced euryarchaeal genomes. Although there is overall little robust phylogenetic signal in this data set, there is, among well-supported trees, strong representation of all three possible four-taxon topologies. ``Informational' genes seem no less subject to LGT than are ``operational genes,' within the euryarchaeotes. We conclude that (i) even in this collection of conserved genes there has been extensive LGT (orthologous gene replacement) and (ii) the notion that there is a core of nontransferable genes (the ``core hypothesis') has not been proven and may be unprovable. Received: 7 November 2000 / Accepted: 20 February 2001  相似文献   

3.
MOTIVATION: Phylogenomic approaches towards functional and evolutionary annotation of unknown sequences have been suggested to be superior to those based only on pairwise local alignments. User-friendly software tools making the advantages of phylogenetic annotation available for the ever widening range of bioinformatically uninitiated biologists involved in genome/EST annotation projects are, however, not available. We were particularly confronted with this issue in the annotation of sequences from different groups of complex algae originating from secondary endosymbioses, where the identification of the phylogenetic origin of genes is often more problematic than in taxa well represented in the databases (e.g. animals, plants or fungi). RESULTS: We present a flexible pipeline with a user-friendly, interactive graphical user interface running on desktop computers that automatically performs a basic local alignment search tool (BLAST) search of query sequences, selects a representative subset of them, then creates a multiple alignment from the selected sequences, and finally computes a phylogenetic tree. The pipeline, named PhyloGena, uses public domain software for all standard bioinformatics tasks (similarity search, multiple alignment, and phylogenetic reconstruction). As the major technological innovation, selection of a meaningful subset of BLAST hits was implemented using logic programming, mimicing the selection procedure (BLAST tables, multiple alignments and phylogenetic trees) are displayed graphically, allowing the user to interact with the pipeline and deduce the function and phylogenetic origin of the query. PhyloGena thus makes phylogenomic annotation available also for those biologists without access to large computing facilities and with little informatics background. Although phylogenetic annotation is particularly useful when working with composite genomes (e.g. from complex algae), PhyloGena can be helpful in expressed sequence tag and genome annotation also in other organisms. AVAILABILITY: PhyloGena (executables for LINUX and Windows 2000/XP as well as source code) is available by anonymous ftp from http://www.awi.de/en/phylogena.  相似文献   

4.

Background  

The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available.  相似文献   

5.
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.  相似文献   

6.
A 17-dimensional vector named the proteome vector is defined to represent an organism. The components of the vector reflect the relative contents of protein-encoding genes of the 17 cluster of orthologous groups of proteins (COGs) classes in the whole genome of the relevant organism. Based on the definition of this proteome vector, the fuzzy clustering of 36 completely sequenced organisms (8 archaea, 24 bacteria, and 4 eukarya) was performed and a proteome tree was constructed. Our results show that (1) the 36 organisms can be 100% correctly classified into three clusters corresponding to the three primary kingdoms, (2) our proteome tree is remarkably similar to that derived from 16S rRNA, and (3) the chromosomes and/or plasmids belonging to the same organism have very similar gene composition. Based on these results, we argue that the 17-dimensional proteome vector could be a good criterion for clustering approaches and to a large extent reveals the phylogenetic properties of organisms; the Three Primary Kingdoms Hypothesis is trustworthy although the existence of lateral gene transfer (LGT) brings controversy to the construction of the "universal tree of life."  相似文献   

7.
8.
A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.  相似文献   

9.
We created genecap to facilitate analysis of multilocus genotype data for use in non‐invasive DNA sampling and genetic capture‐recapture studies. genecap is a Microsoft excel macro that uses multilocus genetic data to match samples with identical genotypes, calculate frequency of alleles, identify sample genotypes that differ by one and two alleles, calculate probabilities of identity, and match probabilities for matching samples. genecap allows the user to include background data and samples with missing genotypes for multiple loci. Capture histories for each user‐defined sampling period are output in formats consistent with commonly employed population estimation programs.  相似文献   

10.
11.
Phylogenomic studies produce increasingly large phylogenetic forests of trees with patchy taxonomical sampling. Typically, prokaryotic data generate thousands of gene trees of all sizes that are difficult, if not impossible, to root. Their topologies do not match the genealogy of lineages, as they are influenced not only by duplication, losses, and vertical descent but also by lateral gene transfer (LGT) and recombination. Because this complexity in part reflects the diversity of evolutionary processes, the study of phylogenetic forests is thus a great opportunity to improve our understanding of prokaryotic evolution. Here, we show how the rich evolutionary content of such novel phylogenetic objects can be exploited through the development of new approaches designed specifically for extracting the multiple evolutionary signals present in the forest of life, that is, by slicing up trees into remarkable bits and pieces: clans, slices, and clips. We harvested a forest of 6,901 unrooted gene trees comprising up to 100 prokaryotic genomes (41 archaea and 59 bacteria) to search for evolutionary events that a species tree would not account for. We identified 1) trees and partitions of trees that reflected the lifestyle of organisms rather than their taxonomy, 2) candidate lifestyle-specific genetic modules, used by distinct unrelated organisms to adapt to the same environment, 3) gene families, nonrandomly distributed in the functional space, that were frequently exchanged between archaea and bacteria, sometimes without major changes in their sequences. Finally, 4) we reconstructed polarized networks of genetic partnerships between archaea and bacteria to describe some of the rules affecting LGT between these two Domains.  相似文献   

12.

Background  

A significant number of proteins have been shown to be intrinsically disordered, meaning that they lack a fixed 3 D structure or contain regions that do not posses a well defined 3 D structure. It has also been proven that a protein's disorder content is related to its function. We have performed an exhaustive analysis and comparison of the disorder content of proteins from prokaryotic organisms (i.e., superkingdoms Archaea and Bacteria) with respect to functional categories they belong to, i.e., Clusters of Orthologous Groups of proteins (COGs) and groups of COGs-Cellular processes (Cp), Information storage and processing (Isp), Metabolism (Me) and Poorly characterized (Pc).  相似文献   

13.
MOTIVATION: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. RESULTS: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.  相似文献   

14.
15.
In view of their propositional content (i.e. they can be right or wrong), character statements (i.e. statements that predicate characters of organisms) are treated as low-level hypotheses. The thesis of the present study is that such character statements, as do more complex scientific theories, come with variable scope. The scope of a hypothesis, or theory, is the domain of discourse over which the hypothesis, or theory, ranges. A character statement is initially introduced within the context of a certain domain of discourse that is defined by the scale of the initial phylogenetic analysis. The doctrine of 'total evidence' requires the inclusion of previously introduced characters in subsequent studies. As a consequence, the initial scope of character statements is widened to the extent that the scale of subsequent analyses is broadened. Scope expansion for character statements may result in incomplete characters, in the subdivision of characters, or in ambiguity of reference (indeterminacy of the extension of anatomical terms). Character statements with a wide scope are desirable because they refer to characters with the potential to resolve deep nodes in phylogenetic analyses. Care must be taken to preserve referential unambiguity of anatomical terms if the originally restricted scope of a character statement is expanded to match a broad-scale phylogenetic analysis.  © 2007 The Linnean Society of London, Biological Journal of the Linnean Society , 2007, 92 , 297–308.  相似文献   

16.
Within the methodology of phylogenetic systematics four hierarchic levels are distinguished: the “Central Claim” (to reconstruct phylogeny), methodoloical postulate (to conclude analysis with a purely dichotomous cladogram if ever possible), method (search for sister-group relationships by character analysis), and “Taxonomic Principle” (establishment of a classification reflecting merely the recognized genealoy). Certain limits of applicability and reliability of traditional phylogenetic systematics are specified: genealogy can only be analysed among taxa with perceptible evolutionary novelties; reticulated genealogy is not yet regarded; events other than cladogenetic ones cannot be recognised. Phylogenetic systematics is an independent method which has not been absorbed by any type of “pattern” or “transformed” cladism. Phylogenetic systematics relies on the theory of evolution, which does not lead into circularity, since phylogenetic systematics does not claim to prove or to explain evolution whatsoever.  相似文献   

17.
Fourier outline shape analysis is a powerful tool for the morphometric study of two-dimensional form in organisms lacking many biologically homologous landmarks. Several improvements to the method are described herein; these modifications are incorporated into the new computer programs H angle , H match and H curve . First, automated tracing of outlines using image capture software, although desirable, results in high frequency pixel 'noise' that can corrupt the Fourier analysis. Program H angle eliminates this noise using optional and variable levels of outline smoothing. Secondly, a widely used Fourier technique, elliptic Fourier analysis (EFA, Kuhl and Giardina 1982), yields coefficients that are not computationally independent of each other, a condition that hampers and compromises statistical analysis. In addition, EFA increasingly downweights successively more detailed features of the outline. Program H angle solves both of these problems. Lastly, Fourier methods in general are sensitive to the placement of the starting position of the digitized trace. This problem is acute when the organisms under study have no unambiguously defined, homologous point on the outline from which to start the trace. Program H angle allows the user to normalize for starting position using various properties of individual outlines. Alternatively, H match takes a new approach and can be used to normalize using properties of the entire population under study. key words : Fourier shape analysis, morphometric studies, new computer programs, foraminiferal outlines.  相似文献   

18.
19.
We present a new, broadly applicable measure of the spatial restriction of phylogenetic diversity, termed phylogenetic endemism (PE). PE combines the widely used phylogenetic diversity and weighted endemism measures to identify areas where substantial components of phylogenetic diversity are restricted. Such areas are likely to be of considerable importance for conservation. PE has a number of desirable properties not combined in previous approaches. It assesses endemism consistently, independent of taxonomic status or level, and independent of previously defined political or biological regions. The results can be directly compared between areas because they are based on equivalent spatial units. PE builds on previous phylogenetic analyses of endemism, but provides a more general solution for mapping endemism of lineages. We illustrate the broad applicability of PE using examples of Australian organisms having contrasting life histories: pea-flowered shrubs of the genus Daviesia (Fabaceae) and the Australian species of the Australo-Papuan tree frog radiation within the family Hylidae.  相似文献   

20.
Three null models have been proposed to predict the relative frequencies of topologies of phylogenetic trees. One null model assumes each distinguishable n-member tree is equally likely (proportional-to-distinguishable-arrangements model). A second model assumes that each topological type is equally likely (equiprobable model). A third model assumes that the probability of each topological type is determined by random speciation (Markov model). We sampled published phylogenetic trees from three major groups of organisms: division Angiospermae, class Insecta, and superclass Tetrapoda. Our sampling was more restricted than previous studies and was designed to test whether observed topological frequencies were distinguishable from those predicted by the three null models. The pattern of evolution reflected in five-member phylogenetic trees is different from predictions of the equiprobable and Markov model but is indistinguishable from the proportional-to-distinguishable-arrangements model. This indicates that 1) speciation (and/or extinction) is not equally likely among all taxa, even for small phylogenies; or 2) systematists' attempts at reconstructing small phylogenies are, on average, indistinguishable from those expected if they had merely selected a tree at random from the pool of all possible trees. The topology frequencies were not different among the three groups of organisms, suggesting that factors shaping patterns of speciation and extinction are consistent among major taxonomic groups.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号