首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We predicted gamma-turns from amino acid sequences using the first-order Markov chain theory and enlarged representative data sets corresponding to protein chains selected from the Protein Data Bank (PDB). The following data sets were used for training and deriving the probability values: (1) an initial data set containing 315 protein chains comprising 904 gamma-turns and (2) a later data set in order to include new entries in the PDB, containing 434 protein chains and comprising 1053 gamma-turns. By excluding 93 protein chains that were common to these two training data sets, we generated two mutually exclusive data sets containing 222 and 341 protein chains for testing our predictions. Applying amino acid probability values derived from training data sets on to testing data sets yielded overall prediction accuracies in the range 54-57%. We recommend the use of probability values derived from the data set comprising 315 protein chains that represents more gamma-turns and also provides better predictions.  相似文献   

2.
An integrated approach to the prediction of domain-domain interactions   总被引:1,自引:0,他引:1  

Background  

The development of high-throughput technologies has produced several large scale protein interaction data sets for multiple species, and significant efforts have been made to analyze the data sets in order to understand protein activities. Considering that the basic units of protein interactions are domain interactions, it is crucial to understand protein interactions at the level of the domains. The availability of many diverse biological data sets provides an opportunity to discover the underlying domain interactions within protein interactions through an integration of these biological data sets.  相似文献   

3.
4.
5.
In vitro protein stability studies are commonly conducted via thermal or chemical denaturation/renaturation of protein. Conventional data analyses on the protein unfolding/(re)folding require well‐defined pre‐ and post‐transition baselines to evaluate Gibbs free‐energy change associated with the protein unfolding/(re)folding. This evaluation becomes problematic when there is insufficient data for determining the pre‐ or post‐transition baselines. In this study, fitting on such partial data obtained in protein chemical denaturation is established by introducing second‐order differential (SOD) analysis to overcome the limitations that the conventional fitting method has. By reducing numbers of the baseline‐related fitting parameters, the SOD analysis can successfully fit incomplete chemical denaturation data sets with high agreement to the conventional evaluation on the equivalent completed data, where the conventional fitting fails in analyzing them. This SOD fitting for the abbreviated isothermal chemical denaturation further fulfills data analysis methods on the insufficient data sets conducted in the two prevalent protein stability studies.  相似文献   

6.
We present a tool to improve quantitative accuracy and precision in mass spectrometry based on shotgun proteomics: protein quantification by peptide quality control, PQPQ. The method is based on the assumption that the quantitative pattern of peptides derived from one protein will correlate over several samples. Dissonant patterns arise either from outlier peptides or because of the presence of different protein species. By correlation analysis, protein quantification by peptide quality control identifies and excludes outliers and detects the existence of different protein species. Alternative protein species are then quantified separately. By validating the algorithm on seven data sets related to different cancer studies we show that data processing by protein quantification by peptide quality control improves the information output from shotgun proteomics. Data from two labeling procedures and three different instrumental platforms was included in the evaluation. With this unique method using both peptide sequence data and quantitative data we can improve the quantitative accuracy and precision on the protein level and detect different protein species.  相似文献   

7.
We evaluated the prediction of beta-turns from amino acid sequences using the residue-coupled model with an enlarged representative protein data set selected from the Protein Data Bank. Our results show that the probability values derived from a data set comprising 425 protein chains yielded an overall beta-turn prediction accuracy 68.74%, compared with 94.7% reported earlier on a data set of 30 proteins using the same method. However, we noted that the overall beta-turn prediction accuracy using probability values derived from the 30-protein data set reduces to 40.74% when tested on the data set comprising 425 protein chains. In contrast, using probability values derived from the 425 data set used in this analysis, the overall beta-turn prediction accuracy yielded consistent results when tested on either the 30-protein data set (64.62%) used earlier or a more recent representative data set comprising 619 protein chains (64.66%) or on a jackknife data set comprising 476 representative protein chains (63.38%). We therefore recommend the use of probability values derived from the 425 representative protein chains data set reported here, which gives more realistic and consistent predictions of beta-turns from amino acid sequences.  相似文献   

8.
Protein–protein interactions (PPIs) play very important roles in many cellular processes, and provide rich information for discovering biological facts and knowledge. Although various experimental approaches have been developed to generate large amounts of PPI data for different organisms, high-throughput experimental data usually suffers from high error rates, and as a consequence, the biological knowledge discovered from this data is distorted or incorrect. Therefore, it is vital to assess the quality of protein interaction data and extract reliable protein interactions from the high-throughput experimental data. In this paper, we propose a new Semantic Reliability (SR) method to assess the reliability of each protein interaction and identify potential false-positive protein interactions in a dataset. For each pair of target interacting proteins, the SR method takes into account the semantic influence between proteins that interact with the target proteins, and the semantic influence between the target proteins themselves when assessing the interaction reliability. Evaluations on real protein interaction datasets demonstrated that our method outperformed other existing methods in terms of extracting more reliable interactions from original protein interaction datasets.  相似文献   

9.
10.
Protein–protein interactions mediate essentially all biological processes. Despite the quality of these data being widely questioned a decade ago, the reproducibility of large-scale protein interaction data is now much improved and there is little question that the latest screens are of high quality. Moreover, common data standards and coordinated curation practices between the databases that collect the interactions have made these valuable data available to a wide group of researchers. Here, I will review how protein–protein interactions are measured, collected and quality controlled. I discuss how the architecture of molecular protein networks has informed disease biology, and how these data are now being computationally integrated with the newest genomic technologies, in particular genome-wide association studies and exome-sequencing projects, to improve our understanding of molecular processes perturbed by genetics in human diseases. This article is part of a Special Issue entitled: From Genome to Function.  相似文献   

11.
Perhaps one of the most prominent realizations of recent years is the critical role that protein dynamics plays in many facets of cellular function. While characterization of protein dynamics is fundamental to our understanding of protein function, the ability to explicitly detect an ensemble of protein conformations from dynamics data is a paramount challenge in structural biology. Here, we report a new computational method, Sample and Select, for determining the ensemble of protein conformations consistent with NMR dynamics data. This method can be generalized and extended to different sources of dynamics data, enabling broad applicability in deciphering protein dynamics at different timescales. The structural ensemble derived from Sample and Select will provide structural and dynamic information that should aid us in understanding and manipulating protein function.  相似文献   

12.
13.
MOTIVATION: The current need for high-throughput protein interaction detection has resulted in interaction data being generated en masse through such experimental methods as yeast-two-hybrids and protein chips. Such data can be erroneous and they often do not provide adequate functional information for the detected interactions. Therefore, it is useful to develop an in silico approach to further validate and annotate the detected protein interactions. RESULTS: Given that protein-protein interactions involve physical interactions between protein domains, domain-domain interaction information can be useful for validating, annotating, and even predicting protein interactions. However, large-scale, experimentally determined domain-domain interaction data do not exist. Here, we describe an integrative approach to computationally derive putative domain interactions from multiple data sources, including protein interactions, protein complexes, and Rosetta Stone sequences. We further prove the usefulness of such an integrative approach by applying the derived domain interactions to predict and validate protein-protein interactions. AVAILABILITY: A database of putative protein domain interactions derived using the method described in this paper is available at http://interdom.lit.org.sg.  相似文献   

14.
15.
Prediction of protein–protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.  相似文献   

16.
To increase the efficiency of diffraction data collection for protein crystallographic studies, an automated system designed to store frozen protein crystals, mount them sequentially, align them to the X-ray beam, collect complete data sets, and return the crystals to storage has been developed. Advances in X-ray data collection technology including more brilliant X-ray sources, improved focusing optics, and faster-readout detectors have reduced diffraction data acquisition times from days to hours at a typical protein crystallography laboratory [1,2]. In addition, the number of high-brilliance synchrotron X-ray beam lines dedicated to macromolecular crystallography has increased significantly, and data collection times at these facilities can be routinely less than an hour per crystal. Because the number of protein crystals that may be collected in a 24 hr period has substantially increased, unattended X-ray data acquisition, including automated crystal mounting and alignment, is a desirable goal for protein crystallography. The ability to complete X-ray data collection more efficiently should impact a number of fields, including the emerging structural genomics field [3], structure-directed drug design, and the newly developed screening by X-ray crystallography [4], as well as small molecule applications.  相似文献   

17.
A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences.  相似文献   

18.
Recent large-scale data sets of protein complex purifications have provided unprecedented insights into the organization of cellular protein complexes. Several computational methods have been developed to detect co-complexed proteins in these data sets. Their common aim is the identification of biologically relevant protein complexes. However, much less is known about the network of direct physical protein contacts within the detected protein complexes. Therefore, our work investigates whether direct physical contacts can be computationally derived by combining raw data of large-scale protein complex purifications. We assess four established scoring schemes and introduce a new scoring approach that is specifically devised to infer direct physical protein contacts from protein complex purifications. The physical contacts identified by the five methods are comprehensively benchmarked against different reference sets that provide evidence for true physical contacts. Our results show that raw purification data can indeed be exploited to determine high-confidence physical protein contacts within protein complexes. In particular, our new method outperforms competing approaches at discovering physical contacts involving proteins that have been screened multiple times in purification experiments. It also excels in the analysis of recent protein purification screens of molecular chaperones and protein kinases. In contrast to previous findings, we observe that physical contacts inferred from purification experiments of protein complexes can be qualitatively comparable to binary protein interactions measured by experimental high-throughput assays such as yeast two-hybrid. This suggests that computationally derived physical contacts might complement binary protein interaction assays and guide large-scale interactome mapping projects by prioritizing putative physical contacts for further experimental screens.  相似文献   

19.
Cryo-Electron Microscopy (cryo-EM) has emerged as a key technology to determine the structure of proteins, particularly large protein complexes and assemblies in recent years. A key challenge in cryo-EM data analysis is to automatically reconstruct accurate protein structures from cryo-EM density maps. In this review, we briefly overview various deep learning methods for building protein structures from cryo-EM density maps, analyze their impact, and discuss the challenges of preparing high-quality data sets for training deep learning models. Looking into the future, more advanced deep learning models of effectively integrating cryo-EM data with other sources of complementary data such as protein sequences and AlphaFold-predicted structures need to be developed to further advance the field.  相似文献   

20.
MOTIVATION: Subcellular protein localization data are critical to the quantitative understanding of cellular function and regulation. Such data are acquired via observation and quantitative analysis of fluorescently labeled proteins in living cells. Differentiation of labeled protein from cellular artifacts remains an obstacle to accurate quantification. We have developed a novel hybrid machine-learning-based method to differentiate signal from artifact in membrane protein localization data by deriving positional information via surface fitting and combining this with fluorescence-intensity-based data to generate input for a support vector machine. RESULTS: We have employed this classifier to analyze signaling protein localization in T-cell activation. Our classifier displayed increased performance over previously available techniques, exhibiting both flexibility and adaptability: training on heterogeneous data yielded a general classifier with good overall performance; training on more specific data yielded an extremely high-performance specific classifier. We also demonstrate accurate automated learning utilizing additional experimental data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号