首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到13条相似文献,搜索用时 7 毫秒
1.
Guo J  Lin Y  Liu X 《Proteomics》2006,6(19):5099-5105
This paper proposes a new integrative system (GNBSL--Gram-negative bacteria subcellular localization) for subcellular localization specifized on the Gram-negative bacteria proteins. First, the system generates a position-specific frequency matrix (PSFM) and a position-specific scoring matrix (PSSM) for each protein sequence by searching the Swiss-Prot database. Then different features are extracted by four modules from the PSFM and the PSSM. The features include whole-sequence amino acid composition, N- and C-terminus amino acid composition, dipeptide composition, and segment composition. Four probabilistic neural network (PNN) classifiers are used to classify these modules. To further improve the performance, two modules trained by support vector machine (SVM) are added in this system. One module extracts the residue-couple distribution from the amino acid sequence and the other module applies a pairwise profile alignment kernel to measure the local similarity between every two sequences. Finally, an additional SVM is used to fuse the outputs from the six modules. Test on a benchmark dataset shows that the overall success rate of GNBSL is higher than those of PSORT-B, CELLO, and PSLpred. A web server GNBSL can be visited from http://166.111.24.5/webtools/GNBSL/index.htm.  相似文献   

2.
Despite studies of the mechanism underlying the intracellular localization of membrane proteins, the specific mechanisms by which each membrane protein localizes to the endoplasmic reticulum, Golgi apparatus, and plasma membrane in the secretory pathway are unclear. In this study, a discriminant analysis of endoplasmic reticulum, Golgi apparatus and plasma membrane-localized type II membrane proteins was performed using a position-specific scoring matrix derived from the amino acid propensity of the sequences around signal-anchors. The possibility that the sequence around the signal-anchor is a factor for identifying each localization group was evaluated. The discrimination accuracy between the Golgi apparatus and plasma membrane-localized type II membrane proteins was as high as 90%, indicating that, in addition to other factors, the sequence around signal-anchor is an essential component of the selection mechanism for the Golgi and plasma membrane localization. These results may improve the use of membrane proteins for drug delivery and therapeutic applications.  相似文献   

3.
Nair R  Rost B 《Proteins》2003,53(4):917-930
The native sub-cellular compartment of a protein is one aspect of its function. Thus, predicting localization is an important step toward predicting function. Short zip code-like sequence fragments regulate some of the shuttling between compartments. Cataloguing and predicting such motifs is the most accurate means of determining localization in silico. However, only few motifs are currently known, and not all the trafficking appears regulated in this way. The amino acid composition of a protein correlates with its localization. All general prediction methods employed this observation. Here, we explored the evolutionary information contained in multiple alignments and aspects of protein structure to predict localization in absence of homology and targeting motifs. Our final system combined statistical rules and a variety of neural networks to achieve an overall four-state accuracy above 65%, a significant improvement over systems using only composition. The system was at its best for extra-cellular and nuclear proteins; it was significantly less accurate than TargetP for mitochondrial proteins. Interestingly, all methods that were developed on SWISS-PROT sequences failed grossly when fed with sequences from proteins of known structures taken from PDB. We therefore developed two separate systems: one for proteins of known structure and one for proteins of unknown structure. Finally, we applied the PDB-based system along with homology-based inferences and automatic text analysis to annotate all eukaryotic proteins in the PDB (http://cubic.bioc.columbia.edu/db/LOC3D). We imagine that this pilot method-certainly in combination with similar tools-may be valuable target selection in structural genomics.  相似文献   

4.
5.

Background

Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences.

Results

We estimated the error rate of curated GO sequence annotations in the GOSeqLite database (March 2006) at between 28% and 30%. Annotations made without use of sequence similarity based methods (non-ISS) had an estimated error rate of between 13% and 18%. Annotations made with the use of sequence similarity methodology (ISS) had an estimated error rate of 49%.

Conclusion

While the overall error rate is reasonably low, it would be prudent to treat all ISS annotations with caution. Electronic annotators that use ISS annotations as the basis of predictions are likely to have higher false prediction rates, and for this reason designers of these systems should consider avoiding ISS annotations where possible. Electronic annotators that use ISS annotations to make predictions should be viewed sceptically. We recommend that curators thoroughly review ISS annotations before accepting them as valid. Overall, users of curated sequence annotations from the GO database should feel assured that they are using a comparatively high quality source of information.  相似文献   

6.
Shen HB  Chou KC 《Biopolymers》2007,85(3):233-240
Viruses can reproduce their progenies only within a host cell, and their actions depend both on its destructive tendencies toward a specific host cell and on environmental conditions. Therefore, knowledge of the subcellular localization of viral proteins in a host cell or virus-infected cell is very useful for in-depth studying of their functions and mechanisms as well as designing antiviral drugs. An analysis on the Swiss-Prot database (version 50.0, released on May 30, 2006) indicates that only 23.5% of viral protein entries are annotated for their subcellular locations in this regard. As for the gene ontology database, the corresponding percentage is 23.8%. Such a gap calls for the development of high throughput tools for timely annotating the localization of viral proteins within host and virus-infected cells. In this article, a predictor called "Virus-PLoc" has been developed that is featured by fusing many basic classifiers with each engineered according to the K-nearest neighbor rule. The overall jackknife success rate obtained by Virus-PLoc in identifying the subcellular compartments of viral proteins was 80% for a benchmark dataset in which none of proteins has more than 25% sequence identity to any other in a same location site. Virus-PLoc will be freely available as a web-server at http://202.120.37.186/bioinf/virus for the public usage. Furthermore, Virus-PLoc has been used to provide large-scale predictions of all viral protein entries in Swiss-Prot database that do not have subcellular location annotations or are annotated as being uncertain. The results thus obtained have been deposited in a downloadable file prepared with Microsoft Excel and named "Tab_Virus-PLoc.xls." This file is available at the same website and will be updated twice a year to include the new entries of viral proteins and reflect the continuous development of Virus-PLoc.  相似文献   

7.
Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.  相似文献   

8.
9.
Given an uncharacterized protein sequence, how can we identify whether it is a membrane protein or not? If it is, which membrane protein type it belongs to? These questions are important because they are closely relevant to the biological function of the query protein and to its interaction process with other molecules in a biological system. Particularly, with the avalanche of protein sequences generated in the Post-Genomic Age and the relatively much slower progress in using biochemical experiments to determine their functions, it is highly desired to develop an automated method that can be used to help address these questions. In this study, a 2-layer predictor, called MemType-2L, has been developed: the 1st layer prediction engine is to identify a query protein as membrane or non-membrane; if it is a membrane protein, the process will be automatically continued with the 2nd-layer prediction engine to further identify its type among the following eight categories: (1) type I, (2) type II, (3) type III, (4) type IV, (5) multipass, (6) lipid-chain-anchored, (7) GPI-anchored, and (8) peripheral. MemType-2L is featured by incorporating the evolution information through representing the protein samples with the Pse-PSSM (Pseudo Position-Specific Score Matrix) vectors, and by containing an ensemble classifier formed by fusing many powerful individual OET-KNN (Optimized Evidence-Theoretic K-Nearest Neighbor) classifiers. The success rates obtained by MemType-2L on a new-constructed stringent dataset by both the jackknife test and the independent dataset test are quite high, indicating that MemType-2L may become a very useful high throughput tool. As a Web server, MemType-2L is freely accessible to the public at http://chou.med.harvard.edu/bioinf/MemType.  相似文献   

10.
Recent advances in large-scale genome sequencing have led to the rapid accumulation of amino acid sequences of proteins whose functions are unknown. Since the functions of these proteins are closely correlated with their subcellular localizations, many efforts have been made to develop a variety of methods for predicting protein subcellular location. In this study, based on the strategy by hybridizing the functional domain composition and the pseudo-amino acid composition (Cai and Chou [2003]: Biochem. Biophys. Res. Commun. 305:407-411), the Intimate Sorting Algorithm (ISort predictor) was developed for predicting the protein subcellular location. As a showcase, the same plant and non-plant protein datasets as investigated by the previous investigators were used for demonstration. The overall success rate by the jackknife test for the plant protein dataset was 85.4%, and that for the non-plant protein dataset 91.9%. These are so far the highest success rates achieved for the two datasets by following a rigorous cross validation test procedure, further confirming that such a hybrid approach may become a very useful high-throughput tool in the area of bioinformatics, proteomics, as well as molecular cell biology.  相似文献   

11.
Prediction of protein subcellular location is a meaningful task which attracted much attention in recent years. A lot of protein subcellular location predictors which can only deal with the single-location proteins were developed. However, some proteins may belong to two or even more subcellular locations. It is important to develop predictors which will be able to deal with multiplex proteins, because these proteins have extremely useful implication in both basic biological research and drug discovery. Considering the circumstance that the number of methods dealing with multiplex proteins is limited, it is meaningful to explore some new methods which can predict subcellular location of proteins with both single and multiple sites. Different methods of feature extraction and different models of predict algorithms using on different benchmark datasets may receive some general results. In this paper, two different feature extraction methods and two different models of neural networks were performed on three benchmark datasets of different kinds of proteins, i.e. datasets constructed specially for Gram-positive bacterial proteins, plant proteins and virus proteins. These benchmark datasets have different number of location sites. The application result shows that RBF neural network has apparently superiorities against BP neural network on these datasets no matter which type of feature extraction is chosen.  相似文献   

12.
13.
Arrays of MS2 binding sites are placed into mRNAs and are commonly used to visualize the localization of mRNAs in vivo by the expression of an MS2-GFP fusion protein. In Saccharomyces cerevisiae, we observed that arrays of MS2 binding sites inhibit 5′ to 3′ degradation of the mRNA in yeast cells and lead to the accumulation of a 3′ mRNA fragment containing the MS2 binding sites. This accumulation can be dependent on the binding of the MS2 stem loops (MS2-SL) by MS2 coat proteins (MCPs). Since such decay fragments can still bind MCP-GFP, the localization of such mRNA fragments can complicate the interpretation of the localization of full-length mRNA in vivo.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号