首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A substantial percentage of the putative protein-encoding open reading frames (ORFs) in bacterial genomes have no homolog of known function, and their function cannot be confidently assigned on the basis of sequence similarity. Methods not based on sequence similarity are needed and being developed. One method, SVMProt (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi), predicts protein functional family irrespective of sequence similarity (Nucleic Acids Res. 2003;31:3692-3697). While it has been tested on a large number of proteins, its capability for non-homologous proteins has so far been evaluated for a relatively small number of proteins, and additional tests are needed to more fully assess SVMProt. In this work, 90 novel bacterial proteins (non-homologous to known proteins) are used to evaluate the capability of SVMProt. These proteins are such that none of their homologs are in the Swiss-Prot database, their functions not clearly described in the literature, and they themselves and their homologs are not included in the training sets of SVMProt. They represent proteins whose function cannot be confidently predicted by sequence similarity methods at present. The predicted functional class of 76.7% of each of these proteins shows various levels of consistency with the literature-described function, compared to the overall accuracy of 87% for the SVMProt functional class assignment of 34,582 proteins that have at least one homolog of known function. Our study suggests that SVMProt is capable of assigning functional class for novel bacterial proteins at a level not too much lower than that of sequence alignment methods for homologous proteins.  相似文献   

2.
In plant genomes, the function of a substantial percentage of the putative protein-coding open reading frames (ORFs) is unknown. These ORFs have no significant sequence similarity to known proteins, which complicates the task of functional study of these proteins. Efforts are being made to explore methods that are complementary to, or may be used in combination with, sequence alignment and clustering methods. A web-based protein functional class prediction software, SVMProt, has shown some capability for predicting functional class of distantly related proteins. Here the usefulness of SVMProt for functional study of novel plant proteins is evaluated. To test SVMProt, 49 plant proteins (without a sequence homolog in the Swiss-Prot protein database, not in the SVMProt training set, and with functional indications provided in the literature) were selected from a comprehensive search of MEDLINE abstracts and Swiss-Prot databases in 1999-2004. These represent unique proteins the function of which, at present, cannot be confidently predicted by sequence alignment and clustering methods. The predicted functional class of 31 proteins was consistent, and that of four other proteins was weakly consistent, with published functions. Overall, the functional class of 71.4% of these proteins was consistent, or weakly consistent, with functional indications described in the literature. SVMProt shows a certain level of ability to provide useful hints about the functions of novel plant proteins with no similarity to known proteins.  相似文献   

3.
Cai CZ  Han LY  Ji ZL  Chen X  Chen YZ 《Nucleic acids research》2003,31(13):3692-3697
Prediction of protein function is of significance in studying biological processes. One approach for function prediction is to classify a protein into functional family. Support vector machine (SVM) is a useful method for such classification, which may involve proteins with diverse sequence distribution. We have developed a web-based software, SVMProt, for SVM classification of a protein into functional family from its primary sequence. SVMProt classification system is trained from representative proteins of a number of functional families and seed proteins of Pfam curated protein families. It currently covers 54 functional families and additional families will be added in the near future. The computed accuracy for protein family classification is found to be in the range of 69.1-99.6%. SVMProt shows a certain degree of capability for the classification of distantly related proteins and homologous proteins of different function and thus may be used as a protein function prediction tool that complements sequence alignment methods. SVMProt can be accessed at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

4.
Han LY  Cai CZ  Ji ZL  Cao ZW  Cui J  Chen YZ 《Nucleic acids research》2004,32(21):6437-6444
The function of a protein that has no sequence homolog of known function is difficult to assign on the basis of sequence similarity. The same problem may arise for homologous proteins of different functions if one is newly discovered and the other is the only known protein of similar sequence. It is desirable to explore methods that are not based on sequence similarity. One approach is to assign functional family of a protein to provide useful hint about its function. Several groups have employed a statistical learning method, support vector machines (SVMs), for predicting protein functional family directly from sequence irrespective of sequence similarity. These studies showed that SVM prediction accuracy is at a level useful for functional family assignment. But its capability for assignment of distantly related proteins and homologous proteins of different functions has not been critically and adequately assessed. Here SVM is tested for functional family assignment of two groups of enzymes. One consists of 50 enzymes that have no homolog of known function from PSI-BLAST search of protein databases. The other contains eight pairs of homologous enzymes of different families. SVM correctly assigns 72% of the enzymes in the first group and 62% of the enzyme pairs in the second group, suggesting that it is potentially useful for facilitating functional study of novel proteins. A web version of our software, SVMProt, is accessible at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi.  相似文献   

5.
Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total of 4011 RNA-binding and 9781 non-RNA-binding proteins was used to train and test the SVM classification system, and an independent set of 447 RNA-binding and 4881 non-RNA-binding proteins was used to evaluate the classification accuracy. Testing results using this independent evaluation set show a prediction accuracy of 94.1%, 79.3%, and 94.1% for rRNA-, mRNA-, and tRNA-binding proteins, and 98.7%, 96.5%, and 99.9% for non-rRNA-, non-mRNA-, and non-tRNA-binding proteins, respectively. The SVM classification system was further tested on a small class of snRNA-binding proteins with only 60 available sequences. The prediction accuracy is 40.0% and 99.9% for snRNA-binding and non-snRNA-binding proteins, indicating a need for a sufficient number of proteins to train SVM. The SVM classification systems trained in this work were added to our Web-based protein functional classification software SVMProt, at http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi. Our study suggests the potential of SVM as a useful tool for facilitating the prediction of protein-RNA interactions.  相似文献   

6.
Lipid binding proteins play important roles in signaling, regulation, membrane trafficking, immune response, lipid metabolism, and transport. Because of their functional and sequence diversity, it is desirable to explore additional methods for predicting lipid binding proteins irrespective of sequence similarity. This work explores the use of support vector machines (SVMs) as such a method. SVM prediction systems are developed using 14,776 lipid binding and 133,441 nonlipid binding proteins and are evaluated by an independent set of 6,768 lipid binding and 64,761 nonlipid binding proteins. The computed prediction accuracy is 78.9, 79.5, 82.2, 79.5, 84.4, 76.6, 90.6, 79.0, and 89.9% for lipid degradation, lipid metabolism, lipid synthesis, lipid transport, lipid binding, lipopolysaccharide biosynthesis, lipoprotein, lipoyl, and all lipid binding proteins, respectively. The accuracy for the nonmember proteins of each class is 99.9, 99.2, 99.6, 99.8, 99.9, 99.8, 98.5, 99.9, and 97.0%, respectively. Comparable accuracies are obtained when homologous proteins are considered as one, or by using a different SVM kernel function. Our method predicts 86.8% of the 76 lipid binding proteins nonhomologous to any protein in the Swiss-Prot database and 89.0% of the 73 known lipid binding domains as lipid binding. These findings suggest the usefulness of SVMs for facilitating the prediction of lipid binding proteins. Our software can be accessed at the SVMProt server (http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi).  相似文献   

7.
Fielding BC  Tan YJ  Shuo S  Tan TH  Ooi EE  Lim SG  Hong W  Goh PY 《Journal of virology》2004,78(14):7311-7318
A novel coronavirus (CoV) has been identified as the etiological agent of severe acute respiratory syndrome (SARS). The SARS-CoV genome encodes the characteristic essential CoV replication and structural proteins. Additionally, the genome contains six group-specific open reading frames (ORFs) larger than 50 amino acids, with no known homologues. As with the group-specific genes of the other CoVs, little is known about the SARS-CoV group-specific genes. SARS-CoV ORF7a encodes a putative unique 122-amino-acid protein, designated U122 in this study. The deduced sequence contains a probable cleaved signal sequence and a C-terminal transmembrane helix, indicating that U122 is likely to be a type I membrane protein. The C-terminal tail also contains a typical endoplasmic reticulum (ER) retrieval motif, KRKTE. U122 was expressed in SARS-CoV-infected Vero E6 cells, as it could be detected by Western blot and immunofluorescence analyses. U122 is localized to the perinuclear region of both SARS-CoV-infected and transfected cells and colocalized with ER and intermediate compartment markers. Mutational analyses showed that both the signal peptide sequence and ER retrieval motif were functional.  相似文献   

8.
9.
Identification of different protein functions facilitates a mechanistic understanding of Japanese encephalitis virus (JEV) infection and opens novel means for drug development. Support vector machines (SVM), useful for predicting the functional class of distantly related proteins, is employed to ascribe a possible functional class to Japanese encephalitis virus protein. Our study from SVMProt and available JE virus sequences suggests that structural and nonstructural proteins of JEV genome possibly belong to diverse protein functions, are expected to occur in the life cycle of JE virus. Protein functions common to both structural and non-structural proteins are iron-binding, metal-binding, lipid-binding, copper-binding, transmembrane, outer membrane, channels/Pores - Pore-forming toxins (proteins and peptides) group of proteins. Non-structural proteins perform functions like actin binding, zinc-binding, calcium-binding, hydrolases, Carbon-Oxygen Lyases, P-type ATPase, proteins belonging to major facilitator family (MFS), secreting main terminal branch (MTB) family, phosphotransfer-driven group translocators and ATP-binding cassette (ABC) family group of proteins. Whereas structural proteins besides belonging to same structural group of proteins (capsid, structural, envelope), they also perform functions like nuclear receptor, antibiotic resistance, RNA-binding, DNA-binding, magnesium-binding, isomerase (intra-molecular), oxidoreductase and participate in type II (general) secretory pathway (IISP).  相似文献   

10.
11.

Background

Mimivirus isolated from A. polyphaga is the largest virus discovered so far. It is unique among all the viruses in having genes related to translation, DNA repair and replication which bear close homology to eukaryotic genes. Nevertheless, only a small fraction of the proteins (33%) encoded in this genome has been assigned a function. Furthermore, a large fraction of the unassigned protein sequences bear no sequence similarity to proteins from other genomes. These sequences are referred to as ORFans. Because of their lack of sequence similarity to other proteins, they can not be assigned putative functions using standard sequence comparison methods. As part of our genome-wide computational efforts aimed at characterizing Mimivirus ORFans, we have applied fold-recognition methods to predict the structure of these ORFans and further functions were derived based on conservation of functionally important residues in sequence-template alignments.

Results

Using fold recognition, we have identified highly confident computational 3D structural assignments for 21 Mimivirus ORFans. In addition, highly confident functional predictions for 6 of these ORFans were derived by analyzing the conservation of functional motifs between the predicted structures and proteins of known function. This analysis allowed us to classify these 6 previously unannotated ORFans into their specific protein families: carboxylesterase/thioesterase, metal-dependent deacetylase, P-loop kinases, 3-methyladenine DNA glycosylase, BTB domain and eukaryotic translation initiation factor eIF4E.

Conclusion

Using stringent fold recognition criteria we have assigned three-dimensional structures for 21 of the ORFans encoded in the Mimivirus genome. Further, based on the 3D models and an analysis of the conservation of functionally important residues and motifs, we were able to derive functional attributes for 6 of the ORFans. Our computational identification of important functional sites in these ORFans can be the basis for a subsequent experimental verification of our predictions. Further computational and experimental studies are required to elucidate the 3D structures and functions of the remaining Mimivirus ORFans.  相似文献   

12.
Of the membrane proteins of known structure, we found that a remarkable 67% of the water soluble domains are structurally similar to water soluble proteins of known structure. Moreover, 41% of known water soluble protein structures share a domain with an already known membrane protein structure. We also found that functional residues are frequently conserved between extramembrane domains of membrane and soluble proteins that share structural similarity. These results suggest membrane and soluble proteins readily exchange domains and their attendant functionalities. The exchanges between membrane and soluble proteins are particularly frequent in eukaryotes, indicating that this is an important mechanism for increasing functional complexity. The high level of structural overlap between the two classes of proteins provides an opportunity to employ the extensive information on soluble proteins to illuminate membrane protein structure and function, for which much less is known. To this end, we employed structure guided sequence alignment to elucidate the functions of membrane proteins in the human genome. Our results bridge the gap of fold space between membrane and water soluble proteins and provide a resource for the prediction of membrane protein function. A database of predicted structural and functional relationships for proteins in the human genome is provided at sbi.postech.ac.kr/emdmp.  相似文献   

13.
MOTIVATION: The recent outbreak of severe acute respiratory syndrome (SARS) caused by SARS coronavirus (SARS-CoV) has necessitated an in-depth molecular understanding of the virus to identify new drug targets. The availability of complete genome sequence of several strains of SARS virus provides the possibility of identification of protein-coding genes and defining their functions. Computational approach to identify protein-coding genes and their putative functions will help in designing experimental protocols. RESULTS: In this paper, a novel analysis of SARS genome using gene prediction method GeneDecipher developed in our laboratory has been presented. Each of the 18 newly sequenced SARS-CoV genomes has been analyzed using GeneDecipher. In addition to polyprotein 1ab(1), polyprotein 1a and the four genes coding for major structural proteins spike (S), small envelope (E), membrane (M) and nucleocapsid (N), six to eight additional proteins have been predicted depending upon the strain analyzed. Their lengths range between 61 and 274 amino acids. Our method also suggests that polyprotein 1ab, polyprotein 1a, S, M and N are proteins of viral origin and others are of prokaryotic. Putative functions of all predicted protein-coding genes have been suggested using conserved peptides present in their open reading frames. AVAILABILITY: Detailed results of GeneDecipher analysis of all the 18 strains of SARS-CoV genomes are available at http://www.igib.res.in/sarsanalysis.html  相似文献   

14.
对SARS病人粪便样本直接测序,得到SRAS—CoV BJ202全基因组序列(AY864806)。应用比较基因组研究方法对GenBank中公布的115株SARS—CoV基因组序列以及BJ202进行分析。以GZ02序列为参照,发现2个以上基因组中同时存在单核苷酸多态(SNP)位点共278个。多态位点在SARS—CoV基因组中呈偏态分布,大约一半突变位点(50.4%,140/278)发生在基因组3’末端1/3区域。编码Orf10-11、Orf3/4、E蛋白、M蛋白和S蛋白区域突变率较高。克隆并测序含有BJ202基因组12个多态位点的11个cDNA以及4个不含已知多态位点的cDNA片段(15个片段总长度为6.0kb),结果显示:BJ202特有的3个多态位点(13804、1503l和20792)以及另外3个多态位点(26428、26477和27243)均检出两种不同核苷酸;位点18379虽在已公布的115株SARS—CoV基因组中未发现突变,实际上也是多态位点。14个克隆中有8个克隆该位点为A,6个克隆为G。全部116个SARS—CoV基因组中共有18种缺失类型和2种插入类型。大部分缺失发生在编码ORF9和ORF10-11区域(基因组序列27700—28000bp处)。以邻位连接法(Neighbor-Joining)构建了116株SARS—CoV系统发育树,BJ202与BJ01和LLJ-2004等SARS—CoV的亲缘关系较接近。  相似文献   

15.
In this work, severe acute respiratory syndrome associated coronavirus (SARS-CoV) genome BJ202 (AY864806) was completely sequenced. The genome was directly accessed from the stool sample of a patient in Beijing. Comparative genomics methods were used to analyze the sequence variations of 116 SARS-CoV genomes (including BJ202) available in the NCBI Gen-Bank. With the genome sequence of GZ02 as the reference, there were 41 polymorphic sites identified in BJ202 and a total of 278 polymorphic sites present in at least two of the 116 genomes. The distribution of the polymorphic sites was biased over the whole genome. Nearly half of the variations (50.4%, 140/278) clustered in the one third of the whole genome at the 3′ end (19.0 kb-29.7 kb). Regions encoding Orf10-11, Orf3/4, E, M and S protein had the highest mutation rates. A total of 15 PCR products (about 6.0 kb of the genome) including 11 fragments containing 12 known polymorphic sites and 4 fragments without identified polymorphic sites were cloned and sequenced. Results showed that 3 unique polymorphic sites of BJ202 (positions 13 804, 15 031 and 20 792) along with 3 other polymorphic sites (26 428, 26 477 and 27 243) all contained 2 kinds of nucleotides. It is interesting to find that position 18379 which has not been identified to be polymorphic in any of the other 115 published SARS-CoV genomes is actually a polymorphic site. The nucleotide composition of this site is A (8) to G (6). Among 116 SARS-CoV genomes, 18 types of deletions and 2 insertions were identified. Most of them were related to a 300 bp region (27 700-28 000) which encodes parts of the putative ORF9 and ORF10-11. A phylogenetic tree illustrating the divergence of whole BJ202 genome from 115 other completely sequenced SARS-CoVs was also constructed. BJ202 was phylogeneticly closer to BJ01 and LLJ-2004.  相似文献   

16.
17.
Severe acute respiratory syndrome-associated coronavirus (SARS-CoV) is a newly identified member of the family Coronaviridae and poses a serious public health threat. Recent studies indicated that the SARS-CoV viral spike glycoprotein is a class I viral fusion protein. A fusion peptide present at the N-terminal region of class I viral fusion proteins is believed to initiate viral and cell membrane interactions and subsequent fusion. Although the SARS-CoV fusion protein heptad repeats have been well characterized, the fusion peptide has yet to be identified. Based on the conserved features of known viral fusion peptides and using Wimley and White interfacial hydrophobicity plots, we have identified two putative fusion peptides (SARS(WW-I) and SARS(WW-II)) at the N terminus of the SARS-CoV S2 subunit. Both peptides are hydrophobic and rich in alanine, glycine, and/or phenylalanine residues and contain a canonical fusion tripeptide along with a central proline residue. Only the SARS(WW-I) peptide strongly partitioned into the membranes of large unilamellar vesicles (LUV), adopting a beta-sheet structure. Likewise, only SARS(WW-I) induced the fusion of LUV and caused membrane leakage of vesicle contents at peptide/lipid ratios of 1:50 and 1:100, respectively. The activity of this synthetic peptide appeared to be dependent on its amino acid (aa) sequence, as scrambling the peptide rendered it unable to partition into LUV, assume a defined secondary structure, or induce both fusion and leakage of LUV. Based on the activity of SARS(WW-I), we propose that the hydrophobic stretch of 19 aa corresponding to residues 770 to 788 is a fusion peptide of the SARS-CoV S2 subunit.  相似文献   

18.
The crystal structure of a conserved domain of nonstructural protein 3 (nsP3) from severe acute respiratory syndrome coronavirus (SARS-CoV) has been solved by single-wavelength anomalous dispersion to 1.4 A resolution. The structure of this "X" domain, seen in many single-stranded RNA viruses, reveals a three-layered alpha/beta/alpha core with a macro-H2A-like fold. The putative active site is a solvent-exposed cleft that is conserved in its three structural homologs, yeast Ymx7, Archeoglobus fulgidus AF1521, and Er58 from E. coli. Its sequence is similar to yeast YBR022W (also known as Poa1P), a known phosphatase that acts on ADP-ribose-1'-phosphate (Appr-1'-p). The SARS nsP3 domain readily removes the 1' phosphate group from Appr-1'-p in in vitro assays, confirming its phosphatase activity. Sequence and structure comparison of all known macro-H2A domains combined with available functional data suggests that proteins of this superfamily form an emerging group of nucleotide phosphatases that dephosphorylate Appr-1'-p.  相似文献   

19.
A number of structural genomics/proteomics initiatives are focused on bacterial or viral pathogens. In this article, we will review the progress of structural proteomics initiatives targeting the SARS coronavirus (SARS-CoV), the etiological agent of the 2003 worldwide epidemic that culminated in approximately 8,000 cases and 800 deaths. The SARS-CoV genome encodes 28 proteins in three distinct classes, many of them with unknown function and sharing low similarity to other proteins. The structures of 16 SARS-CoV proteins or functional domains have been determined to date. Remarkably, eight of these 16 proteins or functional domains have novel folds, indicating the uniqueness of the coronavirus proteins. The results of SARS-CoV structural proteomics initiatives will have several profound biological impacts, including elucidation of the structure-function relationships of coronavirus proteins; identification of targets for the design of anti-viral compounds against SARS-CoV and other coronaviruses; and addition of new protein folds to the fold space, with further understanding of the structure-function relationships for several new protein families. We discuss the use of structural proteomics in response to emerging infectious diseases such as SARS-CoV and to increase preparedness against future emerging coronaviruses.  相似文献   

20.
Although Pyrococcus furiosus is one of the best studied hyperthermophilic archaea, to date no experimental investigation of the extent of protein secretion has been performed. We describe experimental verification of the extracellular proteome of P. furiosus grown on starch. LC–MS/MS-based analysis of culture supernatants led to the identification of 58 proteins. Fifteen of these proteins had a putative N-terminal signal peptide (SP), tagging the proteins for translocation across the membrane. The detected proteins with predicted SPs and known function were almost exclusively involved in important extracellular functions, like substrate degradation or transport. Most of the 43 proteins without predicted N-terminal signal sequences are known to have intracellular functions, mainly (70 %) related to intracellular metabolism. In silico analyses indicated that the genome of P. furiosus encodes 145 proteins with N-terminal SPs, including 21 putative lipoproteins and 17 with a class III peptide. From these we identified 15 (10 %; 7 SPI, 3 SPIII and 5 lipoproteins) under the specific growth conditions of this study. The putative lipoprotein signal peptides have a unique sequence motif, distinct from the motifs in bacteria and other archaeal orders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号