首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Functional classification of proteins is central to comparative genomics. The need for algorithms tuned to enable integrative interpretation of analytical data is felt globally. The availability of a general, automated software with built-in flexibility will significantly aid this activity. We have prepared ARC (Automated Resource Classifier), which is an open source software meeting the user requirements of flexibility. The default classification scheme based on keyword match is agglomerative and directs entries into any of the 7 basic non-overlapping functional classes: Cell wall, Cell membrane and Transporters (C), Cell division (
), Information (I), Translocation (\(\mathcal{L}\)), Metabolism (
), Stress (
), Signal and communication(S) and 2 ancillary classes: Others (O) and Hypothetical (
). The keyword library of ARC was built serially by first drawing keywords from Bacillus subtilis and Escherichia coli K12. In subsequent steps, this library was further enriched by collecting terms from archaeal representative Archaeoglobus fulgidus, Gene Ontology, and Gene Symbols. ARC is 94.04% successful on 6,75,663 annotated proteins from 348 prokaryotes. Three examples are provided to illuminate the current perspectives on mycobacterial physiology and costs of proteins in 333 prokaryotes. ARC is available at http://arc.igib.res.in .
  相似文献   

2.
SUMMARY: Vbmp is an R package for Gaussian Process classification of data over multiple classes. It features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. This software also incorporates feature weighting by means of Automatic Relevance Determination. Being equipped with only one main function and reasonable default values for optional parameters, vbmp combines flexibility with ease of usage as is demonstrated on a breast cancer microarray study. AVAILABILITY: The R library vbmp implementing this method is part of Bioconductor and can be downloaded from http://www.dcs.gla.ac.uk/~girolami  相似文献   

3.
Gene prediction methods for eukaryotic genomes still are not fully satisfying. One way to improve gene prediction accuracy, proven to be relevant for prokaryotes, is to consider more than one model of genes. Thus, we used our classification of Arabidopsis thaliana genes in two classes (CU(1) and CU(2)), previously delineated according to statistical features, in the GeneMark gene identification program. For each gene class, as well as for the two classes combined, a Markov model was developed (respectively, GM-CU(1), GM-CU(2) and GM-all) and then used on a test set of 168 genes to compare their respective efficiency. We concluded from this analysis that GM-CU(1) is more sensitive than GM-CU(2) which seems to be more specific to a gene type. Besides, GM-all does not give better results than GM-CU(1) and combining results from GM-CU(1) and GM-CU(2) greatly improve prediction efficiency in comparison with predictions made with GM-all only. Thus, this work confirms the necessity to consider more than one gene model for gene prediction in eukaryotic genomes, and to look for gene classes in order to build these models.  相似文献   

4.
The PRODISTIN Web Site is a web service allowing users to functionally classify genes/proteins from any type of interaction network. The resulting computation provides a classification tree in which (1) genes/proteins are clustered according to the identity of their interaction partners and (2) functional classes are delineated in the tree using the Biological Process Gene Ontology annotations. AVAILABILITY: The PRODISTIN Web Site is freely accessible at http://gin.univ-mrs.fr/webdistin  相似文献   

5.
6.
A chromosomal region of Pectobacterium chrysanthemi PY35 that contains of genes for glycogen synthesis was isolated from a cosmid library. The operon consists of glycogen branching enzyme (glgB), glycogen debranching enzyme (glgX), ADP-glucose pyrophosphorylase (glgC), glycogen synthase (glgA), and glycogen phosphorylase (glgP) genes. Gene organization is similar to that of Escherichia coli. The purified ADP-glucose pyrophosphorylase (GlgC) was activated by fructose 1,6-bisphosphate and inhibited by AMP. The constructed glgX::Omega mutant failed to integrate into the chromosome of P. chrysanthemi by marker exchange. Phylogenetic analysis based on the 16S rDNA and the amino acid sequence of Glg enzymes showed correlation with other bacteria. gamma-Proteobacteria have the glgX gene instead of the bacilli glgD gene in the glg operon. The possible evolutionary implications of the results among the prokaryotes are discussed.  相似文献   

7.
Glycosylation is one of the most abundant protein posttranslational modifications. Protein glycosylation plays important roles not only in eukaryotes but also in prokaryotes. To further understand the roles of protein glycosylation in prokaryotes, we developed a lectin binding assay to screen glycoproteins on an Escherichia coli proteome microarray containing 4,256 affinity-purified E.coli proteins. Twenty-three E.coli proteins that bound Wheat-Germ Agglutinin (WGA) were identified. PANTHER protein classification analysis showed that these glycoprotein candidates were highly enriched in metabolic process and catalytic activity classes. One sub-network centered on deoxyribonuclease I (sbcB) was identified. Bioinformatics analysis suggests that prokaryotic protein glycosylation may play roles in nucleotide and nucleic acid metabolism. Fifteen of the 23 glycoprotein candidates were validated by lectin (WGA) staining, thereby increasing the number of validated E. coli glycoproteins from 3 to 18. By cataloguing glycoproteins in E.coli, our study greatly extends our understanding of protein glycosylation in prokaryotes.  相似文献   

8.
Cell division mechanisms in eukaryotes and prokaryotes have until recently been seen as being widely different. However, pole-to-pole oscillations of proteins like MinE in prokaryotes are now known to determine the division plane. These protein waves arise through spontaneous pattern forming reaction—diffusion mechanisms, based on cooperative binding of the proteins to a quasistationary matrix (like the cell membrane or DNA). Rather than waves, stationary bipolar pattern formation may arise as well. Some of the involved proteins have eukaryotic homologs (e.g. FtsZ and tubulin), pointing to a possible ancient shared mechanism. Tubulin polymerizes to microtubules in the spindle. Mitotic microtubules are in a highly dynamical state, frequently undergoing rapid shortening (catastrophe), and fragments formed from the microtubule ends are inferred to enhance the destabilization. Here, we show that cooperative binding of such fragments to microtubules may set up a similar pattern forming mechanism as seen in prokaryotes. The result is a spontaneously formed, well controllable, bipolar state of microtubule dynamics in the cell, which may contribute to defining the bipolar spindle.  相似文献   

9.
The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10.  相似文献   

10.
More than 30 organisms have been sequenced entirely. Here, we applied a variety of simple bioinformatics tools to analyze 29 proteomes for representatives from all three kingdoms: eukaryotes, prokaryotes, and archaebacteria. We confirmed that eukaryotes have relatively more long proteins than prokaryotes and archaes, and that the overall amino acid composition is similar among the three. We predicted that approximately 15%-30% of all proteins contained transmembrane helices. We could not find a correlation between the content of membrane proteins and the complexity of the organism. In particular, we did not find significantly higher percentages of helical membrane proteins in eukaryotes than in prokaryotes or archae. However, we found more proteins with seven transmembrane helices in eukaryotes and more with six and 12 transmembrane helices in prokaryotes. We found twice as many coiled-coil proteins in eukaryotes (10%) as in prokaryotes and archaes (4%-5%), and we predicted approximately 15%-25% of all proteins to be secreted by most eukaryotes and prokaryotes. Every tenth protein had no known homolog in current databases, and 30%-40% of the proteins fell into structural families with >100 members. A classification by cellular function verified that eukaryotes have a higher proportion of proteins for communication with the environment. Finally, we found at least one homolog of experimentally known structure for approximately 20%-45% of all proteins; the regions with structural homology covered 20%-30% of all residues. These numbers may or may not suggest that there are 1200-2600 folds in the universe of protein structures. All predictions are available at http://cubic.bioc.columbia.edu/genomes.  相似文献   

11.
ABSTRACT: BACKGROUND: Understanding protein subcellular localization is a necessary component toward understanding the overall function of a protein. Numerous computational methods have been published over the past decade, with varying degrees of success. Despite the large number of published methods in this area, only a small fraction of them are available for researchers to use in their own studies. Of those that are available, many are limited by predicting only a small number of major organelles in the cell. Additionally, the majority of methods predict only a single location, even though it is known that a large fraction of the proteins in eukaryotic species shuttle between locations to carry out their function. FINDINGS: We present a software package and a web server for predicting subcellular localization of protein sequences based on the ngLOC method. ngLOC is an n-gram-based Bayesian classifier that predicts subcellular localization of proteins both in prokaryotes and eukaryotes. The overall prediction accuracy varies from 89.8% to 91.4% across species. This program can predict 11 distinct locations each in plant and animal species. ngLOC also predicts 4 and 5 distinct locations on gram-positive and gram-negative bacterial datasets, respectively. CONCLUSIONS: ngLOC is a generic method that can be trained by data from a variety of species or classes for predicting protein subcellular localization. The standalone software is freely available for academic use under GNU GPL, and the ngLOC web server is also accessible at http://ngloc.unmc.edu.  相似文献   

12.
1 引  言景观生态学的产生与发展 ,给传统生态学与地理学带来了活力与许多新思想 .其研究方法与成果为资源开发和环境生态保护提供了新的科学方法和决策依据 .景观异质性是景观生态学的核心概念之一 .景观格局是景观异质性的表现[2 ,3 ,5,9] .景观格局分析是景观生态学研究任务之一 ,是定量研究斑块在景观中的分布规律 .空间格局分析的目的是从无序的景观上发现潜在的有意义的秩序和规律[5] .而景观格局分析是因为景观格局对其中元素流产生影响 ,不同景观格局或景观格局动态演变导致区域景观功能发生变化[7] ,景观格局会影响到物种的丰度…  相似文献   

13.
Apurinic/apyrimidinic (AP) sites, a prominent type of DNA damage, are repaired through the base excision repair mechanism in both prokaryotes and eukaryotes and may interfere with many other cellular processes. A full repertoire of AP site-binding proteins in cells is presently unknown, preventing reliable assessment of harm inflicted by these ubiquitous lesions and of their involvement in the flux of DNA metabolism. We present a proteomics-based strategy for assembling at least a partial catalogue of proteins capable of binding AP sites in DNA. The general scheme relies on the sensitivity of many AP site-bound protein species to NaBH(4) cross-linking. An affinity-tagged substrate is used to facilitate isolation of the cross-linked species, which are then separated and analyzed by mass spectrometry methods. We report identification of seven proteins from Escherichia coli (AroF, DnaK, MutM, PolA, TnaA, TufA, and UvrA) and two proteins from bakers' yeast (ARC1 and Ygl245wp) reactive for AP sites in this system.  相似文献   

14.
Mammalian mitochondrial small subunit ribosomal proteins were separated by two-dimensional polyacrylamide gel electrophoresis. The proteins in six individual spots were subjected to in-gel tryptic digestion. Peptides were separated by capillary liquid chromatography, and the sequences of selected peptides were obtained by electrospray tandem mass spectrometry. The peptide sequences obtained were used to screen human expressed sequence tag data bases, and complete consensus cDNAs were assembled. Mammalian mitochondrial small subunit ribosomal proteins from six different classes of ribosomal proteins were identified. Only two of these proteins have significant sequence similarities to ribosomal proteins from prokaryotes. These proteins correspond to Escherichia coli S10 and S14. Homologs of two human mitochondrial proteins not found in prokaryotes were observed in the genomes of Drosophila melanogaster and Caenorhabditis elegans. A homolog of one of these proteins was observed in D. melanogaster but not in C. elegans, while a homolog of the other was present in C. elegans but not in D. melanogaster. A homolog of one of the ribosomal proteins not found in prokaryotes was tentatively identified in the yeast genome. This latter protein is the first reported example of a ribosomal protein that is shared by mitochondrial ribosomes from lower and higher eukaryotes that does not have a homolog in prokaryotes.  相似文献   

15.
16.
Rational classification of proteins encoded in sequenced genomes is critical for making the genome sequences maximally useful for functional and evolutionary studies. The family of DNA-binding proteins is one of the most populated and studied amongst the various genomes of bacteria, archaea and eukaryotes and the Web-based system presented here is an approach to their classification. The DnaProt resource is an annotated and searchable collection of protein sequences for the families of DNA-binding proteins. The database contains 3238 full-length sequences (retrieved from the SWISS-PROT database, release 38) that include, at least, a DNA-binding domain. Sequence entries are organized into families defined by PROSITE patterns, PRINTS motifs and de novo excised signatures. Combining global similarities and functional motifs into a single classification scheme, DNA-binding proteins are classified into 33 unique classes, which helps to reveal comprehensive family relationships. To maximize family information retrieval, DnaProt contains a collection of multiple alignments for each DNA-binding family while the recognized motifs can be used as diagnostically functional fingerprints. All available structural class representatives have been referenced. The resource was developed as a Web-based management system for online free access of customized data sets. Entries are fully hyperlinked to facilitate easy retrieval of the original records from the source databases while functional and phylogenetic annotation will be applied to newly sequenced genomes. The database is freely available for online search of a library containing specific patterns of the identified DNA-binding protein classes and retrieval of individual entries from our WWW server (http://kronos.biol.uoa.gr/~mariak/dbDNA.html).  相似文献   

17.
MOTIVATION: Probabilistic graphical models have been developed in the past for the task of protein classification. In many cases, classifications obtained from the Gene Ontology have been used to validate these models. In this work we directly incorporate the structure of the Gene Ontology into the graphical representation for protein classification. We present a method in which each protein is represented by a replicate of the Gene Ontology structure, effectively modeling each protein in its own 'annotation space'. Proteins are also connected to one another according to different measures of functional similarity, after which belief propagation is run to make predictions at all ontology terms. RESULTS: The proposed method was evaluated on a set of 4879 proteins from the Saccharomyces Genome Database whose interactions were also recorded in the GRID project. Results indicate that direct utilization of the Gene Ontology improves predictive ability, outperforming traditional models that do not take advantage of dependencies among functional terms. Average increase in accuracy (precision) of positive and negative term predictions of 27.8% (2.0%) over three different similarity measures and three subontologies was observed. AVAILABILITY: C/C++/Perl implementation is available from authors upon request.  相似文献   

18.
Pancsa R  Tompa P 《PloS one》2012,7(4):e34687
Based on early bioinformatic studies on a handful of species, the frequency of structural disorder of proteins is generally thought to be much higher in eukaryotes than in prokaryotes. To refine this view, we present here a comparative prediction study and analysis of 194 fully described eukaryotic proteomes and 87 reference prokaryotes for structural disorder. We found that structural disorder does distinguish eukaryotes from prokaryotes, but its frequency spans a very wide range in the two superkingdoms that largely overlap. The number of disordered binding regions and different Pfam domain types also contribute to distinguish eukaryotes from prokaryotes. Unexpectedly, the highest levels--and highest variability--of predicted disorder is found in protists, i.e. single-celled eukaryotes, often surpassing more complex eukaryote organisms, plants and animals. This trend contrasts with that of the number of domain types, which increases rather monotonously toward more complex organisms. The level of structural disorder appears to be strongly correlated with lifestyle, because some obligate intracellular parasites and endosymbionts have the lowest levels, whereas host-changing parasites have the highest level of predicted disorder. We conclude that protists have been the evolutionary hot-bed of experimentation with structural disorder, in a period when structural disorder was actively invented and the major functional classes of disordered proteins established.  相似文献   

19.

Background

Computational identification of apicoplast-targeted proteins is important in drug target determination for diseases such as malaria. While there are established methods for identifying proteins with a bipartite signal in multiple species of Apicomplexa, not all apicoplast-targeted proteins possess this bipartite signature. The publication of recent experimental findings of apicoplast membrane proteins, called transmembrane proteins, that do not possess a bipartite signal has made it feasible to devise a machine learning approach for identifying this new class of apicoplast-targeted proteins computationally.

Methodology/principal findings

In this work, we develop a method for predicting apicoplast-targeted transmembrane proteins for multiple species of Apicomplexa, whereby several classifiers trained on different feature sets and based on different algorithms are evaluated and combined in an ensemble classification model to obtain the best expected performance. The feature sets considered are the hydrophobicity and composition characteristics of amino acids over transmembrane domains, the existence of short sequence motifs over cytosolically disposed regions, and Gene Ontology (GO) terms associated with given proteins. Our model, ApicoAMP, is an ensemble classification model that combines decisions of classifiers following the majority vote principle. ApicoAMP is trained on a set of proteins from 11 apicomplexan species and achieves 91% overall expected accuracy.

Conclusions/significance

ApicoAMP is the first computational model capable of identifying apicoplast-targeted transmembrane proteins in Apicomplexa. The ApicoAMP prediction software is available at http://code.google.com/p/apicoamp/ and http://bcb.eecs.wsu.edu.  相似文献   

20.
The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号