首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on a phylogenetic classification of the proteins encoded in 21 complete genomes of bacteria, archaea and eukaryotes (http://www. ncbi.nlm. nih.gov/COG). The COGs were constructed by applying the criterion of consistency of genome-specific best hits to the results of an exhaustive comparison of all protein sequences from these genomes. The database comprises 2091 COGs that include 56-83% of the gene products from each of the complete bacterial and archaeal genomes and approximately 35% of those from the yeast Saccharomyces cerevisiae genome. The COG database is accompanied by the COGNITOR program that is used to fit new proteins into the COGs and can be applied to functional and phylogenetic annotation of newly sequenced genomes.  相似文献   

2.
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih. gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis.  相似文献   

3.
MOTIVATION: Most molecular phylogenies are based on sequence alignments. Consequently, they fail to account for modes of sequence evolution that involve frequent insertions or deletions. Here we present a method for generating accurate gene and species phylogenies from whole genome sequence that makes use of short character string matches not placed within explicit alignments. In this work, the singular value decomposition of a sparse tetrapeptide frequency matrix is used to represent the proteins of organisms uniquely and precisely as vectors in a high-dimensional space. Vectors of this kind can be used to calculate pairwise distance values based on the angle separating the vectors, and the resulting distance values can be used to generate phylogenetic trees. Protein trees so derived can be examined directly for homologous sequences. Alternatively, vectors defining each of the proteins within an organism can be summed to provide a vector representation of the organism, which is then used to generate species trees. RESULTS: Using a large mitochondrial genome dataset, we have produced species trees that are largely in agreement with previously published trees based on the analysis of identical datasets using different methods. These trees also agree well with currently accepted phylogenetic theory. In principle, our method could be used to compare much larger bacterial or nuclear genomes in full molecular detail, ultimately allowing accurate gene and species relationships to be derived from a comprehensive comparison of complete genomes. In contrast to phylogenetic methods based on alignments, sequences that evolve by relative insertion or deletion would tend to remain recognizably similar.  相似文献   

4.
5.
The major organ systems of Goniodoris castanea were investigated by histological means, with an emphasis on those structures that are difficult to see by dissection. The species is characterized by some peculiar features that are unique or seldom within the Nudibranchia, such as the complete absence of specialized vacuolated cells, the presence of globular salivary glands, the presence of cuticular structures in the proximal intestine, a muscular sphincter around the distal vaginal duct, and the position of the blood gland closer to the pericardium than to the nervous system. Some of these characters are discussed in a phylogenetic context, although a thorough phylogenetic analysis is preliminary, due to lack of knowledge of probably related species.  相似文献   

6.
Evolution of the Rab family of small GTP-binding proteins.   总被引:33,自引:0,他引:33  
Rab proteins are small GTP-binding proteins that form the largest family within the Ras superfamily. Rab proteins regulate vesicular trafficking pathways, behaving as membrane-associated molecular switches. Here, we have identified the complete Rab families in the Caenorhabditis elegans (29 members), Drosophila melanogaster (29), Homo sapiens (60) and Arabidopsis thaliana (57), and we defined criteria for annotation of this protein family in each organism. We studied sequence conservation patterns and observed that the RabF motifs and the RabSF regions previously described in mammalian Rabs are conserved across species. This is consistent with conserved recognition mechanisms by general regulators and specific effectors. We used phylogenetic analysis and other approaches to reconstruct the multiplication of the Rab family and observed that this family shows a strict phylogeny of function as opposed to a phylogeny of species. Furthermore, we observed that Rabs co-segregating in phylogenetic trees show a pattern of similar cellular localisation and/or function. Therefore, animal and fungi Rab proteins can be grouped in "Rab functional groups" according to their segregating patterns in phylogenetic trees. These functional groups reflect similarity of sequence, localisation and/or function, and may also represent shared ancestry. Rab functional groups can help the understanding of the functional evolution of the Rab family in particular and vesicular transport in general, and may be used to predict general functions for novel Rab sequences.  相似文献   

7.
MOTIVATION: Functional linkages implicate pairwise relationships between proteins that work together to implement biological tasks. During evolution, functionally linked proteins are likely to be preserved or eliminated across a range of genomes in a correlated fashion. Based on this hypothesis, phylogenetic profiling-based approaches try to detect pairs of protein families that show similar evolutionary patterns. Traditionally, the evolutionary pattern of a protein is encoded by either a binary profile of presence and absence of this protein across species or an occurrence profile that indicates the distribution of copies of this protein across species. RESULTS: In our study, we characterize each protein by its enhanced phylogenetic tree, a novel graphical model of the evolution of a protein family with explicitly marked by speciation and duplication events. By topological comparison between enhanced phylogenetic trees, we are able to detect the functionally associated protein pairs. Because the enhanced phylogenetic trees contain more evolutionary information of proteins, our method shows greater performance and discovers functional linkages among proteins more reliably compared with the conventional approaches.  相似文献   

8.
The phylogenetic distribution of Methanococcus jannaschii proteins can provide, for the first time, an estimate of the genome content of the last common ancestor of the three domains of life. Relying on annotation and comparison with reference to the species distribution of sequence similarities results in 324 proteins forming the universal family set. This set is very well characterized and relatively small and nonredundant, containing 301 biochemical functions, of which 246 are unique. This universal function set contains mostly genes coding for energy metabolism or information processing. It appears that the Last Universal Common Ancestor was an organism with metabolic networks and genetic machinery similar to those of extant unicellular organisms.  相似文献   

9.
The reconstruction of bacterial evolutionary relationships has proven to be a daunting task because variable mutation rates and horizontal gene transfer (HGT) among species can cause grave incongruities between phylogenetic trees based on single genes. Recently, a highly robust phylogenetic tree was constructed for 13 gamma-proteobacteria using the combined alignments of 205 conserved orthologous proteins.1 Only two proteins had incongruent tree topologies, which were attributed to HGT between Pseudomonas species and Vibrio cholerae or enterics. While the evolutionary relationships among these species appears to be resolved, further analysis suggests that HGT events with other bacterial partners likely occurred; this alters the implicit assumption of gamma-proteobacteria monophyly. Thus, any thorough reconstruction of bacterial evolution must not only choose a suitable set of molecular markers but also strive to reduce potential bias in the selection of species.  相似文献   

10.
We have determined the solution NMR structure of SACOL2532, a putative GCN5-like N-acetyltransferase (GNAT) from Staphylococcus aureus. SACOL2532 was shown to bind both CoA and acetyl-CoA, and structures with and without bound CoA were determined. Based on analysis of the structure and sequence, a subfamily of small GCN5-related N-acetyltransferase (GNAT)-like proteins can be defined. Proteins from this subfamily, which is largely congruent with COG2388, are characterized by a cysteine residue in the acetyl-CoA binding site near the acetyl group, by their small size in relation to other GNATs, by a lack of obvious substrate binding site, and by a distinct conformation of bound CoA in relation to other GNATs. Subfamily members are found in many bacterial and eukaryotic genomes, and in some archaeal genomes. Whereas other GNATs transfer the acetyl group of acetyl-CoA directly to an aliphatic amine, the presence of the conserved cysteine residue suggests that proteins in the COG2388 GNAT-subfamily transfer an acetyl group from acetyl-CoA to one or more presently unidentified aliphatic amines via an acetyl (cysteine) enzyme intermediate. The apparent absence of a substrate-binding region suggests that the substrate is a macromolecule, such as another protein, or that a second protein subunit providing a substrate-binding region must combine with SACOL2532 to make a fully functional N-acetyl transferase.  相似文献   

11.
COG0354 proteins have been implicated in synthesis or repair of iron/sulfur (Fe/S) clusters in all domains of life, and those of bacteria, animals, and protists have been shown to require a tetrahydrofolate to function. Two COG0354 proteins were identified in Arabidopsis and many other plants, one (At4g12130) related to those of α-proteobacteria and predicted to be mitochondrial, the other (At1g60990) related to those of cyanobacteria and predicted to be plastidial. Grasses and poplar appear to lack the latter. The predicted subcellular locations of the Arabidopsis proteins were validated by in vitro import assays with purified pea organelles and by targeting assays in Arabidopsis and tobacco protoplasts using green fluorescent protein fusions. The At4g12130 protein was shown to be expressed mainly in flowers, siliques, and seeds, whereas the At1g60990 protein was expressed mainly in young leaves. The folate dependence of both Arabidopsis proteins was established by functional complementation of an Escherichia coli COG0354 (ygfZ) deletant; both plant genes restored in vivo activity of the Fe/S enzyme MiaB but restoration was abrogated when folates were eliminated by deleting folP. Insertional inactivation of At4g12130 was embryo lethal; this phenotype was reversed by genetic complementation of the mutant. These data establish that COG0354 proteins have a folate-dependent function in mitochondria and plastids, and that the mitochondrial protein is essential. That plants retain mitochondrial and plastidial COG0354 proteins with distinct phylogenetic origins emphasizes how deeply the extant Fe/S cluster assembly machinery still reflects the ancient endosymbioses that gave rise to plants.  相似文献   

12.
The microbial communities of three different habitat types and from two sediment depths in the River Elbe were investigated by fluorescence in situ hybridization at various levels of complexity. Differences in the microbial community composition of free-flowing river water, water within the hyporheic interstitial and sediment-associated bacteria were quantitatively analyzed using domain- and group-specific oligonucleotide probes. Qualitative data on the presence/absence of specific bacterial taxa were gathered using genus- and species-specific probes. The complete data set was statistically processed by univariate statistical approaches, and two-dimensional ordinations of nonmetric multidimensional scaling. The analysis showed: (1) that the resolution of microbial community structures at microenvironments, habitats and locations can be regulated by targeted application of oligonucleotides on phylogenetic levels ranging from domains to species, and (2) that an extensive qualitative presence/absence analysis of multiparallel hybridization assays enables a fine-scale apportionment of spatial differences in microbial community structures that is robust against apparent limitations of fluorescence in situ hybridization such as false positive hybridization signals or inaccessibility of in situ oligonucleotide probes. A general model for the correlation of the phylogenetic depth of focus and the relative spatial resolution of microbial communities by fluorescence in situ hybridization is presented.  相似文献   

13.
ABSTRACT: BACKGROUND: The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. RESULTS: Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. CONCLUSIONS: Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.  相似文献   

14.
Whole-genome transporter analyses have been conducted on 141 organisms whose complete genome sequences are available. For each organism, the complete set of membrane transport systems was identified with predicted functions, and classified into protein families based on the transporter classification system. Organisms with larger genome sizes generally possessed a relatively greater number of transport systems. In prokaryotes and unicellular eukaryotes, the significant factor in the increase in transporter content with genome size was a greater diversity of transporter types. In contrast, in multicellular eukaryotes, greater number of paralogs in specific transporter families was the more important factor in the increase in transporter content with genome size. Both eukaryotic and prokaryotic intracellular pathogens and endosymbionts exhibited markedly limited transport capabilities. Hierarchical clustering of phylogenetic profiles of transporter families, derived from the presence or absence of a certain transporter family, showed that clustering patterns of organisms were correlated to both their evolutionary history and their overall physiology and lifestyles.  相似文献   

15.
MOTIVATION: The phylogenetic profile of a protein is a string that encodes the presence or absence of the protein in every fully sequenced genome. Because proteins that participate in a common structural complex or metabolic pathway are likely to evolve in a correlated fashion, the phylogenetic profiles of such proteins are often 'similar' or at least 'related' to each other. The question we address in this paper is the following: how to measure the 'similarity' between two profiles, in an evolutionarily relevant way, in order to develop efficient function prediction methods? RESULTS: We show how the profiles can be mapped to a high-dimensional vector space which incorporates evolutionarily relevant information, and we provide an algorithm to compute efficiently the inner product in that space, which we call the tree kernel. The tree kernel can be used by any kernel-based analysis method for classification or data mining of phylogenetic profiles. As an application a Support Vector Machine (SVM) trained to predict the functional class of a gene from its phylogenetic profile is shown to perform better with the tree kernel than with a naive kernel that does not include any information about the phylogenetic relationships among species. Moreover a kernel principal component analysis (KPCA) of the phylogenetic profiles illustrates the sensitivity of the tree kernel to evolutionarily relevant variations.  相似文献   

16.
Galperin MY  Koonin EV 《Genetica》1999,106(1-2):159-170
Computational analysis of complete genomes, followed by experimental testing of emerging hypotheses — the area of research often referred to as functional genomics — aims at deciphering the wealth of information contained in genome sequences and at using it to improve our understanding of the mechanisms of cell function. This review centers on the recent progress in the genome analysis with special emphasis on the new insights in enzyme evolution. Standard methods of predicting functions for new proteins are listed and the common errors in their application are discussed. A new method of improving the functional predictions is introduced, based on a phylogenetic approach to functional prediction, as implemented in the recently constructed Clusters of Orthologous Groups (COG) database (available at http://www.ncbi.nlm.nih.gov/COG). This approach provides a convenient way to characterize the protein families (and metabolic pathways) that are present or absent in any given organism. Comparative analysis of microbial genomes based on this approach shows that metabolic diversity generally correlates with the genome size-parasitic bacteria code for fewer enzymes and lesser number of metabolic pathways than their free-living relatives. Comparison of different genomes reveals another evolutionary trend, the non-orthologous gene displacement of some enzymes by unrelated proteins with the same cellular function. An examination of the phylogenetic distribution of such cases provides new clues to the problems of biochemical evolution, including evolution of glycolysis and the TCA cycle.This revised version was published online in October 2005 with corrections to the Cover Date.  相似文献   

17.
The complete mitochondrial genome of the endangered Banded Hare wallaby ( Lagostrophus fasciatus ) was sequenced and used for phylogenetic analysis. The data set consisted of 10 377 nucleotides (3459 amino acids) from three kangaroo species. The phylogenetic analyses strongly supported the hypothesis that the Banded Hare wallaby is the sister-group of the wallaroo (subfamily Macropodidae). In addition to the phylogenetic reconstruction, the mt control region, or d -loop, from Australian marsupials has been mapped for the first time. The results show that the organization of the kangaroo control region is similar to that of placental mammals. The presence of a duplicated CSB-1 block found in all three kangaroo species is an uncommon feature of mammalian mt DNA. The CSB domain was found to be the most variable region in the control region, followed by a less variable ETAS domain.  相似文献   

18.
The ionome is the elemental composition of a living organism, its tissues, cells or cell compartments. The ionomes of roots, stems and leaves of 14 native Brazilian forest species were characterised to examine the relationships between plant and organ ionomes and the phylogenetic and ecological affiliations of species. The null hypothesis that ionomes of Brazilian forest species and their organs do not differ was tested. Concentrations of mineral nutrients in roots, stems and leaves were determined for 14 Brazilian forest species, representing seven angiosperm orders, grown hydroponically in a complete nutrient solution. The 14 species could be differentiated by their ionomes and the partitioning of mineral nutrients between organs. The ionomic differences between the 14 species did not reflect their phylogenetic relationships or successional ecology. Differences between shoot ionomes and root ionomes were greater than differences in the ionome of an organ when compared among genotypes. In conclusion, differences in ionomes of species and their organs reflect a combination of ancient phylogenetic and recent environmental adaptations.  相似文献   

19.
This paper reports an intraorder study on the D-loop-containing region of the mitochondrial DNA in rodents. A complete multialignment of this region is not feasible with the exception of some conserved regions. The comparative analysis of 25 complete rodent sequences from 23 species plus one lagomorph has revealed that only the central domain (CD), a conserved region of about 80 bp in the extended termination-associated sequences (ETAS) domain, adjacent to the CD, the ETAS1, and conserved sequence block (CSB) 1 blocks are present in all rodent species, whereas the presence of CSB2 and CSB3 is erratic within the order. We have also found a conserved region of 90 bp located between tRNAPro and ETAS1 present in fat dormouse, squirrel, guinea pig, and rabbit. Repeated sequences are present in both the ETAS and the CSB domain, but the repeats differ in length, copy number, and base composition in different species. The potential use of the D-loop for evolutionary studies has been investigated; the presence/absence of conserved blocks and/or repeated sequences cannot be used as a reliable phylogenetic marker, since in some cases they may be shared by distantly related organisms but not by close ones, while in other ones a relationship between tree topology and presence/absence of such motifs is observed. Better results can be obtained by the use of the CD, which, however, due to its reduced size, when used for tracing a phylogenetic tree, shows some nodes with low statistical support. Received: 26 February 2001 / Accepted: 6 June 2001  相似文献   

20.
The present study is the first to consider human and nonhuman consumers together to reveal several general patterns of plant utilization. We provide evidence that at a global scale, plant apparency and phylogenetic isolation can be important predictors of plant utilization and consumer diversity. Using the number of species or genera or the distribution area of each plant family as the island “area” and the minimum phylogenetic distance to common plant families as the island “distance”, we fitted presence–area relationships and presence–distance relationships with a binomial GLM (generalized linear model) with a logit link. The presence–absence of consumers among each plant family strongly depended on plant apparency (family size and distribution area); the diversity of consumers increased with plant apparency but decreased with phylogenetic isolation. When consumers extended their host breadth, unapparent plants became more likely to be used. Common uses occurred more often on common plants and their relatives, showing higher host phylogenetic clustering than uncommon uses. On the contrary, highly specialized uses might be related to the rarity of plant chemicals and were therefore very species‐specific. In summary, our results provide a global illustration of plant–consumer combinations and reveal several general patterns of plant utilization across humans, insects and microbes. First, plant apparency and plant phylogenetic isolation generally govern plant utilization value, with uncommon and isolated plants suffering fewer parasites. Second, extension of the breadth of utilized hosts helps explain the presence of consumers on unapparent plants. Finally, the phylogenetic clustering structure of host plants is different between common uses and uncommon uses. The strength of such consistent plant utilization patterns across a diverse set of usage types suggests that the persistence and accumulation of consumer diversity and use value for plant species are determined by similar ecological and evolutionary processes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号