共查询到20条相似文献,搜索用时 15 毫秒
1.
Jiangning Song Hao Tan Andrew J. Perry Tatsuya Akutsu Geoffrey I. Webb James C. Whisstock Robert N. Pike 《PloS one》2012,7(11)
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/. 相似文献
2.
Calpain, an intracellular -dependent cysteine protease, is known to play a role in a wide range of metabolic pathways through limited proteolysis of its substrates. However, only a limited number of these substrates are currently known, with the exact mechanism of substrate recognition and cleavage by calpain still largely unknown. While previous research has successfully applied standard machine-learning algorithms to accurately predict substrate cleavage by other similar types of proteases, their approach does not extend well to calpain, possibly due to its particular mode of proteolytic action and limited amount of experimental data. Through the use of Multiple Kernel Learning, a recent extension to the classic Support Vector Machine framework, we were able to train complex models based on rich, heterogeneous feature sets, leading to significantly improved prediction quality (6% over highest AUC score produced by state-of-the-art methods). In addition to producing a stronger machine-learning model for the prediction of calpain cleavage, we were able to highlight the importance and role of each feature of substrate sequences in defining specificity: primary sequence, secondary structure and solvent accessibility. Most notably, we showed there existed significant specificity differences across calpain sub-types, despite previous assumption to the contrary. Prediction accuracy was further successfully validated using, as an unbiased test set, mutated sequences of calpastatin (endogenous inhibitor of calpain) modified to no longer block calpain''s proteolytic action. An online implementation of our prediction tool is available at http://calpain.org. 相似文献
3.
ChangKug Kim SooJin Kwon GangSeob Lee HwanKi Lee JiWeon Choi YongHwan Kim JangHo Hahn 《Bioinformation》2009,3(8):344-345
The AllergenPro database has developed a web-based system that will provide information about allergen in microbes, animals and plants. The database has three major parts and
functions:(i) database list; (ii) allergen search; and (iii) allergenicity prediction. The database contains 2,434 allergens related information readily available in the database
such as on allergens in rice microbes (712 records), animals (617 records) and plants (1,105 records). Furthermore, this database provides bioinformatics tools for allergenicity
prediction. Users can search for specific allergens by various methods and can run tools for allergenicity prediction using three different methods.
Availability
The database is available for free at http://www.niab.go.kr/nabic/ 相似文献4.
Shanmugam Anandakumar Saravanan Vijayakumar Nagarajan Arumugam M Michael Gromiha 《Bioinformation》2015,11(11):512-513
Mammalian Mitochondrial ncRNA is a web-based database, which provides specific information on non-coding RNA in mammals.
This database includes easy searching, comparing with BLAST and retrieving information on predicted structure and its function
about mammalian ncRNAs.
Availability
The database is available for free at http://www.iitm.ac.in/bioinfo/mmndb/ 相似文献5.
Chang-Kug Kim Young-Joo Seol Dong-Jun Lee Jae-Hee Lee Tae-Ho Lee Dong-Suk Park 《Bioinformation》2014,10(10):664-666
The National Agricultural Biotechnology Information Center (NABIC) in South Korea reconstructed a RiceQTLPro database for
gene positional analysis and structure prediction of the chromosomes. This database is an integrated web-based system providing
information about quantitative trait loci (QTL) markers in rice plant. The RiceQTLPro has the three main features namely, (1) QTL
markers list, (2) searching of markers using keyword, and (3) searching of marker position on the rice chromosomes. This updated
database provides 112 QTL markers information with 817 polymorphic markers on each of the 12 chromosomes in rice.
Availability
The database is available for free at http://nabic.rda.go.kr/gere/rice/geneticMap/ 相似文献6.
As one of the most important and ubiquitous post-translational modifications (PTMs) of proteins, S-nitrosylation plays important roles in a variety of biological processes, including the regulation of cellular dynamics and plasticity. Identification of S-nitrosylated substrates with their exact sites is crucial for understanding the molecular mechanisms of S-nitrosylation. In contrast with labor-intensive and time-consuming experimental approaches, prediction of S-nitrosylation sites using computational methods could provide convenience and increased speed. In this work, we developed a novel software of GPS-SNO 1.0 for the prediction of S-nitrosylation sites. We greatly improved our previously developed algorithm and released the GPS 3.0 algorithm for GPS-SNO. By comparison, the prediction performance of GPS 3.0 algorithm was better than other methods, with an accuracy of 75.80%, a sensitivity of 53.57% and a specificity of 80.14%. As an application of GPS-SNO 1.0, we predicted putative S-nitrosylation sites for hundreds of potentially S-nitrosylated substrates for which the exact S-nitrosylation sites had not been experimentally determined. In this regard, GPS-SNO 1.0 should prove to be a useful tool for experimentalists. The online service and local packages of GPS-SNO were implemented in JAVA and are freely available at: http://sno.biocuckoo.org/. 相似文献
7.
An important aspect of the functional annotation of enzymes is not only the type of reaction catalysed by an enzyme, but also the substrate specificity, which can vary widely within the same family. In many cases, prediction of family membership and even substrate specificity is possible from enzyme sequence alone, using a nearest neighbour classification rule. However, the combination of structural information and sequence information can improve the interpretability and accuracy of predictive models. The method presented here, Active Site Classification (ASC), automatically extracts the residues lining the active site from one representative three-dimensional structure and the corresponding residues from sequences of other members of the family. From a set of representatives with known substrate specificity, a Support Vector Machine (SVM) can then learn a model of substrate specificity. Applied to a sequence of unknown specificity, the SVM can then predict the most likely substrate. The models can also be analysed to reveal the underlying structural reasons determining substrate specificities and thus yield valuable insights into mechanisms of enzyme specificity. We illustrate the high prediction accuracy achieved on two benchmark data sets and the structural insights gained from ASC by a detailed analysis of the family of decarboxylating dehydrogenases. The ASC web service is available at http://asc.informatik.uni-tuebingen.de/. 相似文献
8.
As one of the most widespread protein post-translational modifications, phosphorylation is involved in many biological processes such as cell cycle, apoptosis. Identification of phosphorylated substrates and their corresponding sites will facilitate the understanding of the molecular mechanism of phosphorylation. Comparing with the labor-intensive and time-consuming experiment approaches, computational prediction of phosphorylation sites is much desirable due to their convenience and fast speed. In this paper, a new bioinformatics tool named CKSAAP_PhSite was developed that ignored the kinase information and only used the primary sequence information to predict protein phosphorylation sites. The highlight of CKSAAP_PhSite was to utilize the composition of k-spaced amino acid pairs as the encoding scheme, and then the support vector machine was used as the predictor. The performance of CKSAAP_PhSite was measured with a sensitivity of 84.81%, a specificity of 86.07% and an accuracy of 85.43% for serine, a sensitivity of 78.59%, a specificity of 82.26% and an accuracy of 80.31% for threonine as well as a sensitivity of 74.44%, a specificity of 78.03% and an accuracy of 76.21% for tyrosine. Experimental results obtained from cross validation and independent benchmark suggested that our method was very promising to predict phosphorylation sites and can be served as a useful supplement tool to the community. For public access, CKSAAP_PhSite is available at http://59.73.198.144/cksaap_phsite/. 相似文献
9.
10.
11.
Chang Kug Kim Jung Sun Kim Gang Seob Lee Beom Seok Park Jang Ho Hahn 《Bioinformation》2008,3(2):61-62
The Plant Genetic Map Database (PlantGM) has been developed as a web-based system which provides information about genetic
markers in rice (Oryza sativa) and Chinese cabbage (Brassica rapa). The database has three major parts and functions;
(1) Map Search, (2) Marker Search, and (3) QTL Search. At present, the database provides characterization information for
about 3258 genetic markers. It has 2800 RFLP and 112 QTL markers related to rice in addition to 321 RFLP and 25 PCR-based
markers for Chinese cabbage. In addition, a genetic linkage map was also constructed by using 1,054 markers from 2,912
markers in rice.
Availability
The database is available for free at http://www.niab.go.kr/nabic/PlantGM 相似文献12.
As the biomedical impact of small RNAs grows, so does the need to understand competing structural alternatives for regions of functional interest. Suboptimal structure analysis provides significantly more RNA base pairing information than a single minimum free energy prediction. Yet computational enhancements like Boltzmann sampling have not been fully adopted by experimentalists since identifying meaningful patterns in this data can be challenging. Profiling is a novel approach to mining RNA suboptimal structure data which makes the power of ensemble-based analysis accessible in a stable and reliable way. Balancing abstraction and specificity, profiling identifies significant combinations of base pairs which dominate low-energy RNA secondary structures. By design, critical similarities and differences are highlighted, yielding crucial information for molecular biologists. The code is freely available via http://gtfold.sourceforge.net/profiling.html. 相似文献
13.
Chang-Kug Kim Young-Joo Seol Dong-Jun Lee In-Seon Jeong Ung-Han Yoon Jong-Yeol Lee Gang-Seob Lee Dong-Suk Park 《Bioinformation》2014,10(6):378-380
The National Agricultural Biotechnology Information Center (NABIC) reconstructed an AllergenPro database for allergenic
proteins analysis and allergenicity prediction. The AllergenPro is an integrated web-based system providing information about
allergen in foods, microorganisms, animals and plants. The allergen database has the three main features namely, (1) allergen list
with epitopes, (2) searching of allergen using keyword, and (3) methods for allergenicity prediction. This updated AllergenPro
outputs the search based allergen information through a user-friendly web interface, and users can run tools for allergenicity
prediction using three different methods namely, (1) FAO/WHO, (2) motif-based and (3) epitope-based methods.
Availability
The database is available for free at http://nabic.rda.go.kr/allergen/ 相似文献14.
15.
16.
Ji-Hyun Lee Kyoung Mii Park Dong-Jin Han Nam Young Bang Do-Hee Kim Hyeongjin Na Semi Lim Tae Bum Kim Dae Gyu Kim Hyun-Jung Kim Yeonseok Chung Sang Hyun Sung Young-Joon Surh Sunghoon Kim Byung Woo Han 《PloS one》2015,10(11)
Despite the growing attention given to Traditional Medicine (TM) worldwide, there is no well-known, publicly available, integrated bio-pharmacological Traditional Korean Medicine (TKM) database for researchers in drug discovery. In this study, we have constructed PharmDB-K, which offers comprehensive information relating to TKM-associated drugs (compound), disease indication, and protein relationships. To explore the underlying molecular interaction of TKM, we integrated fourteen different databases, six Pharmacopoeias, and literature, and established a massive bio-pharmacological network for TKM and experimentally validated some cases predicted from the PharmDB-K analyses. Currently, PharmDB-K contains information about 262 TKMs, 7,815 drugs, 3,721 diseases, 32,373 proteins, and 1,887 side effects. One of the unique sets of information in PharmDB-K includes 400 indicator compounds used for standardization of herbal medicine. Furthermore, we are operating PharmDB-K via phExplorer (a network visualization software) and BioMart (a data federation framework) for convenient search and analysis of the TKM network. Database URL: http://pharmdb-k.org, http://biomart.i-pharm.org. 相似文献
17.
We introduce a novel contact prediction method that achieves high prediction accuracy by combining evolutionary and physicochemical information about native contacts. We obtain evolutionary information from multiple-sequence alignments and physicochemical information from predicted ab initio protein structures. These structures represent low-energy states in an energy landscape and thus capture the physicochemical information encoded in the energy function. Such low-energy structures are likely to contain native contacts, even if their overall fold is not native. To differentiate native from non-native contacts in those structures, we develop a graph-based representation of the structural context of contacts. We then use this representation to train an support vector machine classifier to identify most likely native contacts in otherwise non-native structures. The resulting contact predictions are highly accurate. As a result of combining two sources of information—evolutionary and physicochemical—we maintain prediction accuracy even when only few sequence homologs are present. We show that the predicted contacts help to improve ab initio structure prediction. A web service is available at http://compbio.robotics.tu-berlin.de/epc-map/. 相似文献
18.
19.
Konstantinos Mavromatis Ken Chu Natalia Ivanova Sean D. Hooper Victor M. Markowitz Nikos C. Kyrpides 《PloS one》2009,4(11)
Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG''s regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov. 相似文献
20.