首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org /#Software and http://mamiris.com/ . Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc.  相似文献   

2.
Locating sequences compatible with a protein structural fold is the well‐known inverse protein‐folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy‐optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment‐derived sequence profiles and structure‐derived energy profiles. SPIN improves over the fragment‐derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild‐type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single‐body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks‐lab.org . Proteins 2014; 82:2565–2573. © 2014 Wiley Periodicals, Inc.  相似文献   

3.
NETASA: neural network based prediction of solvent accessibility   总被引:3,自引:0,他引:3  
MOTIVATION: Prediction of the tertiary structure of a protein from its amino acid sequence is one of the most important problems in molecular biology. The successful prediction of solvent accessibility will be very helpful to achieve this goal. In the present work, we have implemented a server, NETASA for predicting solvent accessibility of amino acids using our newly optimized neural network algorithm. Several new features in the neural network architecture and training method have been introduced, and the network learns faster to provide accuracy values, which are comparable or better than other methods of ASA prediction. RESULTS: Prediction in two and three state classification systems with several thresholds are provided. Our prediction method achieved the accuracy level upto 90% for training and 88% for test data sets. Three state prediction results provide a maximum 65% accuracy for training and 63% for the test data. Applicability of neural networks for ASA prediction has been confirmed with a larger data set and wider range of state thresholds. Salient differences between a linear and exponential network for ASA prediction have been analysed. AVAILABILITY: Online predictions are freely available at: http://www.netasa.org. Linux ix86 binaries of the program written for this work may be obtained by email from the corresponding author.  相似文献   

4.
Haipeng Gong 《Proteins》2017,85(12):2162-2169
Helix‐helix interactions are crucial in the structure assembly, stability and function of helix‐rich proteins including many membrane proteins. In spite of remarkable progresses over the past decades, the accuracy of predicting protein structures from their amino acid sequences is still far from satisfaction. In this work, we focused on a simpler problem, the prediction of helix‐helix interactions, the results of which could facilitate practical protein structure prediction by constraining the sampling space. Specifically, we started from the noisy 2D residue contact maps derived from correlated residue mutations, and utilized ridge detection to identify the characteristic residue contact patterns for helix‐helix interactions. The ridge information as well as a few additional features were then fed into a machine learning model HHConPred to predict interactions between helix pairs. In an independent test, our method achieved an F‐measure of ~60% for predicting helix‐helix interactions. Moreover, although the model was trained mainly using soluble proteins, it could be extended to membrane proteins with at least comparable performance relatively to previous approaches that were generated purely using membrane proteins. All data and source codes are available at http://166.111.152.91/Downloads.html or https://github.com/dpxiong/HHConPred .  相似文献   

5.
DNA metabarcoding offers new perspectives in biodiversity research. This recently developed approach to ecosystem study relies heavily on the use of next‐generation sequencing (NGS) and thus calls upon the ability to deal with huge sequence data sets. The obitools package satisfies this requirement thanks to a set of programs specifically designed for analysing NGS data in a DNA metabarcoding context. Their capacity to filter and edit sequences while taking into account taxonomic annotation helps to set up tailor‐made analysis pipelines for a broad range of DNA metabarcoding applications, including biodiversity surveys or diet analyses. The obitools package is distributed as an open source software available on the following website: http://metabarcoding.org/obitools . A Galaxy wrapper is available on the GenOuest core facility toolshed: http://toolshed.genouest.org .  相似文献   

6.
Regarding Paper “Sample size determination in clinical trials with multiple co‐primary endpoints including mixed continuous and binary variables” by T. Sozu , T. Sugimoto , and T. Hamasaki Biometrical Journal (2012) 54 (5): 716–729 Article: http://dx.doi.org/10.1002/bimj.201100221 Authors' Reply: http://dx.doi.org/10.1002/bimj.201300032 This paper recently introduced a methodology for calculating the sample size in clinical trials with multiple mixed binary and continuous co‐primary endpoints modeled by the so‐called conditional grouped continuous model (CGCM). The purpose of this note is to clarify certain aspects of the methodology and propose an alternative approach based on latent means tests for the binary endpoints. We demonstrate that our approach is more powerful, yielding smaller sample sizes at powers comparable to those used in the paper.  相似文献   

7.
Nguyen MN  Rajapakse JC 《Proteins》2006,63(3):542-550
We address the problem of predicting solvent accessible surface area (ASA) of amino acid residues in protein sequences, without classifying them into buried and exposed types. A two-stage support vector regression (SVR) approach is proposed to predict real values of ASA from the position-specific scoring matrices generated from PSI-BLAST profiles. By adding SVR as the second stage to capture the influences on the ASA value of a residue by those of its neighbors, the two-stage SVR approach achieves improvements of mean absolute errors up to 3.3%, and correlation coefficients of 0.66, 0.68, and 0.67 on the Manesh dataset of 215 proteins, the Barton dataset of 502 nonhomologous proteins, and the Carugo dataset of 338 proteins, respectively, which are better than the scores published earlier on these datasets. A Web server for protein ASA prediction by using a two-stage SVR method has been developed and is available (http://birc.ntu.edu.sg/~ pas0186457/asa.html).  相似文献   

8.
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.  相似文献   

9.
Liu R  Hu J 《PloS one》2011,6(10):e25560
Computational identification of heme-binding residues is beneficial for predicting and designing novel heme proteins. Here we proposed a novel method for heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures. Comprehensive analysis showed that key residues located in heme-binding regions are generally associated with the nodes with higher degree, closeness and betweenness, but lower clustering coefficient in the network. HemeNet, a support vector machine (SVM) based predictor, was developed to identify heme-binding residues by combining topological features with existing sequence and structural features. The results showed that incorporation of network-based features significantly improved the prediction performance. We also compared the residue interaction networks of heme proteins before and after heme binding and found that the topological features can well characterize the heme-binding sites of apo structures as well as those of holo structures, which led to reliable performance improvement as we applied HemeNet to predicting the binding residues of proteins in the heme-free state. HemeNet web server is freely accessible at http://mleg.cse.sc.edu/hemeNet/.  相似文献   

10.
Computational prediction of RNA‐binding residues is helpful in uncovering the mechanisms underlying protein‐RNA interactions. Traditional algorithms individually applied feature‐ or template‐based prediction strategy to recognize these crucial residues, which could restrict their predictive power. To improve RNA‐binding residue prediction, herein we propose the first integrative algorithm termed RBRDetector (RNA‐Binding Residue Detector) by combining these two strategies. We developed a feature‐based approach that is an ensemble learning predictor comprising multiple structure‐based classifiers, in which well‐defined evolutionary and structural features in conjunction with sequential or structural microenvironment were used as the inputs of support vector machines. Meanwhile, we constructed a template‐based predictor to recognize the putative RNA‐binding regions by structurally aligning the query protein to the RNA‐binding proteins with known structures. The final RBRDetector algorithm is an ingenious fusion of our feature‐ and template‐based approaches based on a piecewise function. By validating our predictors with diverse types of structural data, including bound and unbound structures, native and simulated structures, and protein structures binding to different RNA functional groups, we consistently demonstrated that RBRDetector not only had clear advantages over its component methods, but also significantly outperformed the current state‐of‐the‐art algorithms. Nevertheless, the major limitation of our algorithm is that it performed relatively well on DNA‐binding proteins and thus incorrectly predicted the DNA‐binding regions as RNA‐binding interfaces. Finally, we implemented the RBRDetector algorithm as a user‐friendly web server, which is freely accessible at http://ibi.hzau.edu.cn/rbrdetector . Proteins 2014; 82:2455–2471. © 2014 Wiley Periodicals, Inc.  相似文献   

11.
STING Millennium Suite (SMS) is a new web-based suite of programs and databases providing visualization and a complex analysis of molecular sequence and structure for the data deposited at the Protein Data Bank (PDB). SMS operates with a collection of both publicly available data (PDB, HSSP, Prosite) and its own data (contacts, interface contacts, surface accessibility). Biologists find SMS useful because it provides a variety of algorithms and validated data, wrapped-up in a user friendly web interface. Using SMS it is now possible to analyze sequence to structure relationships, the quality of the structure, nature and volume of atomic contacts of intra and inter chain type, relative conservation of amino acids at the specific sequence position based on multiple sequence alignment, indications of folding essential residue (FER) based on the relationship of the residue conservation to the intra-chain contacts and Calpha-Calpha and Cbeta-Cbeta distance geometry. Specific emphasis in SMS is given to interface forming residues (IFR)-amino acids that define the interactive portion of the protein surfaces. SMS may simultaneously display and analyze previously superimposed structures. PDB updates trigger SMS updates in a synchronized fashion. SMS is freely accessible for public data at http://www.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS and http://trantor.bioc.columbia.edu/SMS.  相似文献   

12.
GermOnline is a web-accessible relational database that enables life scientists to make a significant and sustained contribution to the annotation of genes relevant for the fields of mitosis, meiosis, germ line development and gametogenesis across species. This novel approach to genome annotation includes a platform for knowledge submission and curation as well as microarray data storage and visualization hosted by a global network of servers. AVAILABILITY: The database is accessible at http://www.germonline.org/. For convenient world-wide access we have set up a network of servers in Europe (http://germonline.unibas.ch/; http://germonline.igh.cnrs.fr/), Japan (http://germonline.biochem.s.u-tokyo.ac.jp/) and USA (http://germonline.yeastgenome.org/). SUPPLEMENTARY INFORMATION: Extended documentation of the database is available through the link 'About GermOnline' at the websites.  相似文献   

13.
We present a novel partner‐specific protein–protein interaction site prediction method called PAIRpred. Unlike most existing machine learning binding site prediction methods, PAIRpred uses information from both proteins in a protein complex to predict pairs of interacting residues from the two proteins. PAIRpred captures sequence and structure information about residue pairs through pairwise kernels that are used for training a support vector machine classifier. As a result, PAIRpred presents a more detailed model of protein binding, and offers state of the art accuracy in predicting binding sites at the protein level as well as inter‐protein residue contacts at the complex level. We demonstrate PAIRpred's performance on Docking Benchmark 4.0 and recent CAPRI targets. We present a detailed performance analysis outlining the contribution of different sequence and structure features, together with a comparison to a variety of existing interface prediction techniques. We have also studied the impact of binding‐associated conformational change on prediction accuracy and found PAIRpred to be more robust to such structural changes than existing schemes. As an illustration of the potential applications of PAIRpred, we provide a case study in which PAIRpred is used to analyze the nature and specificity of the interface in the interaction of human ISG15 protein with NS1 protein from influenza A virus. Python code for PAIRpred is available at http://combi.cs.colostate.edu/supplements/pairpred/ . Proteins 2014; 82:1142–1155. © 2013 Wiley Periodicals, Inc.  相似文献   

14.
Introduction: This review is an update on recent progress in proteomic studies of formalin-fixed paraffin-embedded (FFPE) tissues, which open the opportunity to investigate diseases and research potential biomarkers, particularly when availability of fresh/frozen tissues is low.

Areas covered: We described improvement of existing protocols or the new ones regarding deparaffinization and protein extraction of FFPE samples published from 2014 to today. Moreover, the growing interest to use FFPE tissues for mass spectrometry imaging approach is presented together with the search of post-translational modifications.

Expert opinion: In the last few years, the number of papers using FFPE tissues in proteomic analysis is growing. The interest to apply proteomic analysis to FFPE tissues lies in the easy accessibility of a great number of samples from archives. Nevertheless, standardization in the approach among the different researchers is not achieved, making essentially incomparable the results obtained. This limit should be overcome.  相似文献   


15.
Context: The endothelin system (Big-ET-1) is a key regulator in cardiovascular (CV) disease and congestive heart failure (CHF).

Objectives: We have examined the incremental value of Big-ET-1 in predicting total and CV mortality next to the well-established CV risk marker N-Terminal Pro-B-Type Natriuretic Peptide (NT-proBNP).

Methods: Big-ET-1 and NT-proBNP were determined in 2829 participants referred for coronary angiography (follow-up 9.9 years).

Results: Big-ET-1 is an independent predictor of total, CV mortality and death due to CHF.

Discussion: The conjunct use of Big-ET-1 and NT-proBNP improves the risk stratification of patients with intermediate to high risk of CV death and CHF.

Conclusions: Big-ET-1improves risk stratification in patients referred for coronary angiography.  相似文献   


16.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

17.
In this brief report, we provide a pictorial essay on an international conference “Photosynthesis Research for Sustainability-2013 in honor of Jalal A. Aliyev” that was held in Baku, Azerbaijan, during June 5–9, 2013 (http://photosynthesis2013.cellreg.org/). We begin this report with a brief note on Jalal Aliyev, the honored scientist, and on John Walker (1997 Nobel laureate in Chemistry) who was a distinguished guest and lecturer at the Conference. We briefly describe the Conference, and the program. In addition to the excellent scientific program, a special feature of the Conference was the presentation of awards to nine outstanding young investigators; they are recognized in this report. We have also included several photographs to show the pleasant ambience at this conference. (See http://photosynthesis2013.cellreg.org/Photo-Gallery.php; https://www.dropbox.com/sh/qcr124dajwffwh6/TlcHBvFu4H?m; and https://www.copy.com/s/UDlxb9fgFXG9/Baku for more photographs taken by the authors as well as by others.) We invite the readers to the next conferences on “Photosynthesis Research for Sustainability—2014: in honor of Vladimir A. Shuvalov” to be held during June 2–7, 2014, in Pushchino, Russia. Detailed information for this will be posted at the Website: http://photosynthesis2014.cellreg.org/, and for the subsequent conference on “Photosynthesis Research for Sustainability—2015” to be held in May or June 2015, in Baku, Azerbaijan, at http://photosynthesis2015.cellreg.org/.  相似文献   

18.
《Genomics》2019,111(6):1274-1282
A cell contains numerous protein molecules. One of the fundamental goals in molecular cell biology is to determine their subcellular locations since this information is extremely important to both basic research and drug development. In this paper, we report a novel and very powerful predictor called “pLoc_bal-mHum” for predicting the subcellular localization of human proteins based on their sequence information alone. Cross-validation tests on exactly the same experiment-confirmed dataset have indicated that the new predictor is remarkably superior to the existing state-of-the-art predictor in identifying the subcellular localization of human proteins. To maximize the convenience for the majority of experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mHum/, by which users can easily get their desired results without the need to go through the detailed mathematics.  相似文献   

19.
Capsule: Long-term population trends of gulls on the Isle of Canna, Scotland, showed a correlation to fish tonnage landed in a nearby port.

Aims: To assess whether gull numbers and breeding success at Canna have been influenced by the amount of fish discarded in the area.

Methods: We examined data on gull breeding numbers, breeding success and diet studied at Canna from 1969 to 2014, and data on fish landings at the nearby port of Mallaig for 1985 to 2014. We examined correlations between gull and fishery data, and performed a detrended analysis of Herring Gull Larus argentatus numbers in relation to demersal fish catch (the latter as a proxy for discard volumes).

Results: Gulls fed extensively on discards. Gull breeding numbers declined at Canna, especially between 2000 and 2006, the decline being more pronounced than seen in national totals. Gull breeding numbers correlated with demersal landings, even after detrending for long-term decreases in both.

Conclusions: Correlation between detrended Herring Gull breeding numbers and detrended demersal fish landings provided strong evidence for a causal link between fishery discarding and gull breeding numbers.  相似文献   


20.
Understanding which peptides and proteins have the potential to undergo amyloid formation and what driving forces are responsible for amyloid-like fiber formation and stabilization remains limited. This is mainly because proteins that can undergo structural changes, which lead to amyloid formation, are quite diverse and share no obvious sequence or structural homology, despite the structural similarity found in the fibrils. To address these issues, a novel approach based on recursive feature selection and feed-forward neural networks was undertaken to identify key features highly correlated with the self-assembly problem. This approach allowed the identification of seven physicochemical and biochemical properties of the amino acids highly associated with the self-assembly of peptides and proteins into amyloid-like fibrils (normalized frequency of β-sheet, normalized frequency of β-sheet from LG, weights for β-sheet at the window position of 1, isoelectric point, atom-based hydrophobic moment, helix termination parameter at position j+1 and ΔG° values for peptides extrapolated in 0 M urea). Moreover, these features enabled the development of a new predictor (available at http://cran.r-project.org/web/packages/appnn/index.html) capable of accurately and reliably predicting the amyloidogenic propensity from the polypeptide sequence alone with a prediction accuracy of 84.9 % against an external validation dataset of sequences with experimental in vitro, evidence of amyloid formation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号