首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Identifying the interface between two interacting proteins provides important clues to the function of a protein, and is becoming increasing relevant to drug discovery. Here, surface patch analysis was combined with a Bayesian network to predict protein-protein binding sites with a success rate of 82% on a benchmark dataset of 180 proteins, improving by 6% on previous work and well above the 36% that would be achieved by a random method. A comparable success rate was achieved even when evolutionary information was missing, a further improvement on our previous method which was unable to handle incomplete data automatically. In a case study of the Mog1p family, we showed that our Bayesian network method can aid the prediction of previously uncharacterised binding sites and provide important clues to protein function. On Mog1p itself a putative binding site involved in the SLN1-SKN7 signal transduction pathway was detected, as was a Ran binding site, previously characterized solely by conservation studies, even though our automated method operated without using homologous proteins. On the remaining members of the family (two structural genomics targets, and a protein involved in the photosystem II complex in higher plants) we identified novel binding sites with little correspondence to those on Mog1p. These results suggest that members of the Mog1p family bind to different proteins and probably have different functions despite sharing the same overall fold. We also demonstrated the applicability of our method to drug discovery efforts by successfully locating a number of binding sites involved in the protein-protein interaction network of papilloma virus infection. In a separate study, we attempted to distinguish between the two types of binding site, obligate and non-obligate, within our dataset using a second Bayesian network. This proved difficult although some separation was achieved on the basis of patch size, electrostatic potential and conservation. Such was the similarity between the two interacting patch types, we were able to use obligate binding site properties to predict the location of non-obligate binding sites and vice versa.  相似文献   

2.
3.
As one of the most common post-translational modifications, ubiquitination regulates the quantity and function of a variety of proteins. Experimental and clinical investigations have also suggested the crucial roles of ubiquitination in several human diseases. The complicated sequence context of human ubiquitination sites revealed by proteomic studies highlights the need of developing effective computational strategies to predict human ubiquitination sites. Here we report the establishment of a novel human-specific ubiquitination site predictor through the integration of multiple complementary classifiers. Firstly, a Support Vector Machine (SVM) classier was constructed based on the composition of k-spaced amino acid pairs (CKSAAP) encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and properties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were constructed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic regression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of 0.770 in 5-fold cross-validation test on a class-balanced training dataset. When tested on a class-balanced independent testing dataset that contains 3419 ubiquitination sites, hCKSAAP_UbSite has also achieved a robust performance with an AUC of 0.757. Specifically, it has consistently performed better than the predictor using the CKSAAP encoding alone and two other publicly available predictors which are not human-specific. Given its promising performance in our large-scale datasets, hCKSAAP_UbSite has been made publicly available at our server (http://protein.cau.edu.cn/cksaap_ubsite/).  相似文献   

4.
The CAPRI (Critical Assessment of Predicted Interactions) and CASP (Critical Assessment of protein Structure Prediction) experiments have demonstrated the power of community-wide tests of methodology in assessing the current state of the art and spurring progress in the very challenging areas of protein docking and structure prediction. We sought to bring the power of community-wide experiments to bear on a very challenging protein design problem that provides a complementary but equally fundamental test of current understanding of protein-binding thermodynamics. We have generated a number of designed protein-protein interfaces with very favorable computed binding energies but which do not appear to be formed in experiments, suggesting that there may be important physical chemistry missing in the energy calculations. A total of 28 research groups took up the challenge of determining what is missing: we provided structures of 87 designed complexes and 120 naturally occurring complexes and asked participants to identify energetic contributions and/or structural features that distinguish between the two sets. The community found that electrostatics and solvation terms partially distinguish the designs from the natural complexes, largely due to the nonpolar character of the designed interactions. Beyond this polarity difference, the community found that the designed binding surfaces were, on average, structurally less embedded in the designed monomers, suggesting that backbone conformational rigidity at the designed surface is important for realization of the designed function. These results can be used to improve computational design strategies, but there is still much to be learned; for example, one designed complex, which does form in experiments, was classified by all metrics as a nonbinder.  相似文献   

5.
Bhasin M  Zhang H  Reinherz EL  Reche PA 《FEBS letters》2005,579(20):4302-4308
DNA methylation plays a key role in the regulation of gene expression. The most common type of DNA modification consists of the methylation of cytosine in the CpG dinucleotide. At the present time, there is no method available for the prediction of DNA methylation sites. Therefore, in this study we have developed a support vector machine (SVM)-based method for the prediction of cytosine methylation in CpG dinucleotides. Initially a SVM module was developed from human data for the prediction of human-specific methylation sites. This module achieved a MCC and AUC of 0.501 and 0.814, respectively, when evaluated using a 5-fold cross-validation. The performance of this SVM-based module was better than the classifiers built using alternative machine learning and statistical algorithms including artificial neural networks, Bayesian statistics, and decision trees. Additional SVM modules were also developed based on mammalian- and vertebrate-specific methylation patterns. The SVM module based on human methylation patterns was used for genome-wide analysis of methylation sites. This analysis demonstrated that the percentage of methylated CpGs is higher in UTRs as compared to exonic and intronic regions of human genes. This method is available on line for public use under the name of Methylator at http://bio.dfci.harvard.edu/Methylator/.  相似文献   

6.
7.
8.
Protein-protein recognition, frequently mediated by members of large families of interaction domains, is one of the cornerstones of biological function. Here, we present a computational, structure-based method to predict the sequence space of peptides recognized by PDZ domains, one of the largest families of recognition proteins. As a test set, we use a considerable amount of recent phage display data that describe the peptide recognition preferences for 169 naturally occurring and engineered PDZ domains. For both wild-type PDZ domains and single point mutants, we find that 70-80% of the most frequently observed amino acids by phage display are predicted within the top five ranked amino acids. Phage display frequently identified recognition preferences for amino acids different from those present in the original crystal structure. Notably, in about half of these cases, our algorithm correctly captures these preferences, indicating that it can predict mutations that increase binding affinity relative to the starting structure. We also find that we can computationally recapitulate specificity changes upon mutation, a key test for successful forward design of protein-protein interface specificity. Across all evaluated data sets, we find that incorporation backbone sampling improves accuracy substantially, irrespective of using a crystal or NMR structure as the starting conformation. Finally, we report successful prediction of several amino acid specificity changes from blind tests in the DREAM4 peptide recognition domain specificity prediction challenge. Because the foundational methods developed here are structure based, these results suggest that the approach can be more generally applied to specificity prediction and redesign of other protein-protein interfaces that have structural information but lack phage display data.  相似文献   

9.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

10.
To develop a non-invasive and sensitive diagnostic test for cancer using peripheral blood, we evaluated gene expression profiling of blood obtained from patients with cancer of the digestive system and normal subjects. The expression profiles of blood-derived total RNA obtained from 39 cancer patients (11 colon cancer, 14 gastric cancer, and 14 pancreatic cancer) was clearly different from those obtained from 15 normal subjects. By comparing the gene expression profiles of cancer patients and normal subjects, 25 cancer-differentiating genes (p < 5.0 × 10−6 and fold differences >3) were identified and an “expression index” deduced from the expression values of these genes differentiated the validation cohort (11 colon cancer, 8 gastric cancer, 18 pancreatic cancer, and 15 normal subjects) into cancer patients and normal subjects with 100% (37/37) and 87% (13/15) accuracy, respectively. Although, the expression profiles were not clearly different between the cancer patients, some characteristic genes were identified according to the stage and species of the cancer. Interestingly, many immune-related genes such as antigen presenting, cell cycle accelerating, and apoptosis- and stress-inducing genes were up-regulated in cancer patients, reflecting the active turnover of immune regulatory cells in cancer patients. These results showed the potential relevance of peripheral blood gene expression profiling for the development of new diagnostic examination tools for cancer patients.  相似文献   

11.
This paper first identified differentially expressed miRNAs associated with early gastric cancer and then respectively constructed relevant connection networks among the identified differentially expressed miRNAs that corresponded to early gastric cancer and control tissues. Twenty-three differentially expressed miRNAs were identified, 18 of which were different with the related results on the same data, and they provide great discriminatory power between patients and controls. There are not only conserved unchangeable sub-networks but also different sub-networks between the two connection networks. From the consistency and differences between two connection networks, we disclosed several new biological features that promote early gastric cancer development. This study shows 23 miRNAs that are early gastric cancer-specific and are worthy to do further experimental studies. The revealed biological features for early gastric cancer will provide new insights into improved understanding of the molecular mechanisms of this disease.  相似文献   

12.
Tissue transglutaminase (TG2) catalyzes the Ca2+-dependent posttranslational modification of proteins via formation of isopeptide bonds between their glutamine and lysine residues. Although substrate specificity of TG2 has been studied repeatedly at the sequence level, no clear consensus sequences have been determined so far. With the use of the extensive structural information on TG2 substrate proteins listed in TRANSDAB Wiki database†, a slight preference of TG2 for glutamine and lysine residues situated in turns could be observed. When the spatial environment of the favored glutamine and lysine residues was analyzed with logistic regression, the presence of specific amino acid patterns was identified. By using the occurrence of the predictor amino acids as selection criteria, several polypeptides were predicted and later identified as novel in vitro substrates for TG2. By studying the sequence of TG2 substrate proteins lacking available crystal structure, the strong favorable influence on substrate selection of the presence of substrate glutamine and lysine residues in intrinsically disordered regions could also be revealed. The collected structural data have provided novel understanding of how this versatile enzyme selects its substrates in various cell compartments and tissues.  相似文献   

13.
DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly.The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.  相似文献   

14.
15.
16.
Prostate-specific antigen (PSA) is a serum marker that is widely used for the diagnosis of prostatic diseases. Various subforms of free PSA, which are associated with prostate cancer differently, have been identified in sera. Thus, specific detection of certain subforms could permit discrimination between benign and malignant cases. Although the monoclonal antibody 5D3D11 displays the desired selectivity, its relative weak binding affinity prevents its development into an effective diagnostic tool. The directed-evolution strategy presented here succeeds in enhancing affinity and immunoassay sensitivity while maintaining selectivity.Starting without structural data, we constructed four independent phage-display single-chain variable fragment (scFv) libraries targeting hot spots from CDR-L1, H1, H2, and H3. Mutations derived from each library were combined, yielding further affinity gains. This constitutes the first demonstration of additivity for independently selected complementarity-determining region (CDR) hot-spot mutations. The X-ray structure of the Fab′ 5D3D11-PSA complex (after it became available) inspired the design of two new libraries targeting CDR-L3 that resulted in other higher-affinity variants. Attempts at combining the new variants with previous ones did not result in further gains, suggesting that mutations from the two strategies provide alternative but noncomplementary solutions for affinity enhancement of 5D3D11. The results can be interpreted to provide a plausible explanation for the observed lack of additivity.Finally, with respect to the wild-type scFv, the best binders show an enhancement of sensitivity in sandwich immunoassay. Its ability to discriminate between prostate cancer sera and benign prostatic hyperplasia sera has now been confirmed through the dosage of 63 patients.  相似文献   

17.
Recently, many long non-coding RNAs (lncRNAs) have been identified and their biological function has been characterized; however, our understanding of their underlying molecular mechanisms related to disease is still limited. To overcome the limitation in experimentally identifying disease–lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies have calculated the similarity between two diseases by mapping exactly each disease to a single Disease Ontology (DO) term, and then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO terms. Until now, there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO) is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, that is, regularized least square and heterogeneous graph-based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than that on DO-based networks/matrices. In addition, our method can predict 11 novel cancer-associated lncRNAs, which are supported by literature evidence.  相似文献   

18.
microRNAs (miRNAs) are a class of small non-coding RNAs that deregulate and/or decrease the expression of target messenger RNAs (mRNAs), which specifically contribute to complex diseases. In our study, we reanalyzed an integrated data to promote classification performance by rebuilding miRNA–mRNA modules, in which a group of deregulated miRNAs cooperatively regulated a group of significant mRNAs. In five-fold cross validation, the multiple processes flow considered the biological and statistical significant correlations. First, of statistical significant miRNAs, 6 were identified as core miRNAs. Second, in the 13 significant pathways enriched by gene set enrichment analysis (GSEA), 705 deregulated mRNAs were found. Based on the union of predicted sets and correlation sets, 6 modules were built. Finally, after verified by test sets, three indexes, including area under the ROC curve (AUC), Accuracy and Matthews correlation coefficients (MCCs), indicated only 4 modules (miR-106b-CIT-KPNA2-miR-93, miR-106b-POLQ-miR-93, miR-107-BTRC-UBR3-miR-16 and miR-200c-miR-16-EIF2B5-miR-15b) had discriminated ability and their classification performance were prior to that of the single molecules. By applying this flow to different subtypes, Module 1 was the consistent module across subtypes, but some different modules were still specific to each subtype. Taken together, this method gives new insight to building modules related to complex diseases and simultaneously can give a supplement to explain the mechanism of breast cancer (BC).  相似文献   

19.
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project.  相似文献   

20.
The analysis of allele-specific gene expression (ASE) is essential for the mapping of genetic variants that affect gene regulation, and for the identification of alleles that modify disease risk. Although RNA sequencing offers the opportunity to measure expression at allele levels, the availability of powerful statistical methods for mapping ASE in single or multiple individuals is limited. We developed a maximum likelihood model to characterize ASE in the human genome. Approximately 17% of genes displayed an allele-specific effect on gene expression in a single individual. Simulations using our model gave a better performance and improved robustness when compared with the binomial test, with different coverage levels, allelic expression fractions and random noise. In addition, our method can identify ASE in multiple individuals, with enhanced performance. This is helpful in understanding the mechanism of genetic regulation leading to expression changes, alternative splicing variants and even disease susceptibility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号