首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.  相似文献   

2.
A numerical study of the energy landscape of the space of model proteinsequences is carried out. As a consequence of the heterogeneity of thecontact energies among amino acids, the energy landscape displays a veryrough profile, a behaviour typical of frustrated systems. This givesraise to a hierarchical clustering of low-energy sequences and can have evolutionary consequences.  相似文献   

3.
Abstract: The processing of the amyloid precursor protein (APP) was investigated in cells stably expressing different APP hybrid proteins. The cytoplasmic domain of APP was either deleted or replaced by the corresponding domain of the membrane protein TGN38, lamp-1, or LIMPII. The cytosolic domain of TGN38 in the APP molecule did not alter the secretion of βA4 when compared with the wild-type APP; however, APP associated with the cell surface and the nonamyloidogenic processing of APP were reduced. With the APP molecules carrying the lysosomal targeting signals of lamp-1 or LIMPII, a decrease in the secretion of βA4 was observed. Cell surface association and nonamyloidogenic processing were also impaired. This suggests increased degradation of APP and thus efficient targeting to the lysosomal system. Cells expressing the Swedish APP variant generated intracellular βA4 that accumulated after treatment with chloroquine. This effect was more dramatic with APP mutants carrying lysosomal targeting signals than with full-length APP. Our data suggest the existence of an intracellular site of βA4 generation from where βA4 is degraded rather than secreted.  相似文献   

4.
Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set.  相似文献   

5.
We investigate methods of estimating residue correlation within protein sequences. We begin by using mutual information (MI) of adjacent residues, and improve our methodology by defining the mutual information vector (MIV) to estimate long range correlations between nonadjacent residues. We also consider correlation based on residue hydropathy rather than protein-specific interactions. Finally, in experiments of family classification tests, the modeling power of MIV was shown to be significantly better than the classic MI method, reaching the level where proteins can be classified without alignment information.  相似文献   

6.
7.
For most proteins, multiple sequence alignments are a viable method to identify functionally and structurally important amino acids, but for most organisms, there is a subset of proteins that are unique or found in a few closely related organisms. For these proteins, it is not possible to produce sequence alignments that are useful in identifying functionally or structurally important amino acids. We have investigated the relationship between amino acid conservation and five factors (the amino acid’s identity, N-terminal neighbor, C-terminal neighbor, the local hydropathy of surrounding amino acids, and the local expected net charge of the surrounding amino acids based on the primary sequence) in Escherichia coli proteins. For four of the factors examined (all but the amino acid’s identity), there is a significant relationship with conservation for some of the standard 20 amino acids. Using the combination of all five factors, we show that it is possible to calculate a score based on the primary sequences of a subset of E. coli proteins that has statistically significant predictive value with respect to predicting conserved amino acids in other E. coli proteins and Saccharomyces cerevisiae proteins. As these five variables show significant relationships with conservation, we have termed them conservation factors. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

8.
It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.  相似文献   

9.
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.  相似文献   

10.
Abstract

A new Monte Carlo sampling scheme, namely the Modified Valley Restrained Monte Carlo procedure, is used to obtain the global energy minimum conformations for polypeptides, such as Met-enkephalin and Melittin. For each peptide, we found close agreement with previous results from both theoretical and experimental studies. The simple idea for controlling the step size according to the Valley Function, provides useful suggestions in searching the global energy minimum structures, and furthermore helps solve the multiple minima problem.  相似文献   

11.
We undertook this project in response to the rapidly increasing number of protein structures with unknown functions in the Protein Data Bank. Here, we combined a genetic algorithm with a support vector machine to predict protein–protein binding sites. In an experiment on a testing dataset, we predicted the binding sites for 66% of our datasets, made up of 50 testing hetero-complexes. This classifier achieved greater sensitivity (60.17%), specificity (58.17%), accuracy (64.08%), and F-measure (54.79%), and a higher correlation coefficient (0.2502) than those of the support vector machine. This result can be used to guide biologists in designing specific experiments for protein analysis.  相似文献   

12.

Background

Structured Logistic Regression (SLR) is a newly developed machine learning tool first proposed in the context of text categorization. Current availability of extensive protein sequence databases calls for an automated method to reliably classify sequences and SLR seems well-suited for this task. The classification of P-type ATPases, a large family of ATP-driven membrane pumps transporting essential cations, was selected as a test-case that would generate important biological information as well as provide a proof-of-concept for the application of SLR to a large scale bioinformatics problem.

Results

Using SLR, we have built classifiers to identify and automatically categorize P-type ATPases into one of 11 pre-defined classes. The SLR-classifiers are compared to a Hidden Markov Model approach and shown to be highly accurate and scalable. Representing the bulk of currently known sequences, we analysed 9.3 million sequences in the UniProtKB and attempted to classify a large number of P-type ATPases. To examine the distribution of pumps on organisms, we also applied SLR to 1,123 complete genomes from the Entrez genome database. Finally, we analysed the predicted membrane topology of the identified P-type ATPases.

Conclusions

Using the SLR-based classification tool we are able to run a large scale study of P-type ATPases. This study provides proof-of-concept for the application of SLR to a bioinformatics problem and the analysis of P-type ATPases pinpoints new and interesting targets for further biochemical characterization and structural analysis.  相似文献   

13.
14.
Mass spectrometric (MS) data of human cell secretomes are usually run through the conventional human database for identification. However, the search may result in false identifications due to contamination of the secretome with fetal bovine serum (FBS) proteins. To overcome this challenge, here we provide a composite protein database including human as well as 199 FBS protein sequences for MS data search of human cell secretomes. Searching against the human-FBS database returned more reliable results with fewer false-positive and false-negative identifications compared to using either a human only database or a human-bovine database. Furthermore, the improved results validated our strategy without complex experiments like SILAC. We expect our strategy to improve the accuracy of human secreted protein identification and to also add value for general use.  相似文献   

15.
The functional annotation of the new protein sequences represents a major drawback for genomic science. The best way to suggest the function of a protein from its sequence is by finding a related one for which biological information is available. Current alignment algorithms display a list of protein sequence stretches presenting significant similarity to different protein targets, ordered by their respective mathematical scores. However, statistical and biological significance do not always coincide, therefore, the rearrangement of the program output according to more biological characteristics than the mathematical scoring would help functional annotation. A new method that predicts the putative function for the protein integrating the results from the PSI-BLAST program and a fuzzy logic algorithm is described. Several protein sequence characteristics have been checked in their ability to rearrange a PSI-BLAST profile according more to their biological functions. Four of them: amino acid content, matched segment length and hydropathic and flexibility profiles positively contributed, upon being integrated by a fuzzy logic algorithm into a program, BYPASS, to the accurate prediction of the function of a protein from its sequence. Antonio Gómez and Juan Cedano contributed equally to this work.  相似文献   

16.
Rapid analysis of protein structure, interaction, and dynamics requires fast and automated assignments of 3D protein backbone triple-resonance NMR spectra. We introduce a new depth-first ordered tree search method of automated assignment, CASA, which uses hand-edited peak-pick lists of a flexible number of triple resonance experiments. The computer program was tested on 13 artificially simulated peak lists for proteins up to 723 residues, as well as on the experimental data for four proteins. Under reasonable tolerances, it generated assignments that correspond to the ones reported in the literature within a few minutes of CPU time. The program was also tested on the proteins analyzed by other methods, with both simulated and experimental peaklists, and it could generate good assignments in all relevant cases. The robustness was further tested under various situations.  相似文献   

17.
蛋白质核心设计的序列组合文库筛选方法   总被引:1,自引:1,他引:1  
本文提出一种新的蛋白质序列组合文库筛选方法,异型自系统最优法,用于从头设计蛋白质核心。经λ-阻遏蛋白、噬菌体434CRO蛋白、白介素-4、硫氧还蛋白、泛肽等的检验,表明此方法用于从头设计蛋白质的核心是可行的。  相似文献   

18.
Brain slices from 20-day-old rats were incubated with [3H]palmitate for 2 hours in the absence or presence of the NO-donors S-nitroso-N-acetyl-penicillamine (SNAP), ethyl-2-[hydroxyimino]-5-nitro-3-hexeneamide (NOR-3), 4-phenyl-3-furoxan carbonitrile (PFC) and sodium nitroprusside (SNP). Each of these drugs reduced the incorporation of [3H]palmitate into myelin proteolipid protein (PLP) in a concentration-dependent manner, SNP being the most active. The effect of SNAP was prevented by the NO-scavenger PTIO (2-phenyl-4,4,5,5-tetramethylimidazoline-1-oxyl-3-oxide). Furthermore, decayed-SNAP, sodium nitrite and N- nitrosopyrrolidine were inactive, suggesting that free NO and/or some of its direct oxidation products are the active molecular species. The amount of fatty acids bound to PLP and the rate of deacylation were unaffected by NO. Although NO diminished the number of thiols in brain and myelin proteins, with the formation of both nitrosothiols and disulfides, these changes did not parallel those in PLP acylation. In contrast, NO was effective at reducing the palmitoylation of brain and myelin lipids, and this effect along with that of PLP, was ascribed to a decrease in palmitoyl-CoA levels. The NO-induced reduction in acyl-CoA concentration was due to the decline in ATP levels, while the amount of [3H]palmitate incorporated into the tissue, the activity of palmitoyl-CoA ligase and palmitoyl-CoA hydrolase, and the concentration of CoASH were unaltered by the drugs. Experiments with endogenously-synthesized [18O]fatty acids confirmed that NO affects predominantly the ATP-dependent palmitoylation of PLP. In conclusion, the inhibitory action of NO on the fatty acylation of PLP is indirect and caused by energy depletion.  相似文献   

19.

Motivation

Intrinsically disordered regions of proteins play an essential role in the regulation of various biological processes. Key to their regulatory function is often the binding to globular protein domains via sequence elements known as molecular recognition features (MoRFs). Development of computational tools for the identification of candidate MoRF locations in amino acid sequences is an important task and an area of growing interest. Given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we introduce MoRFCHiBi_Web, which predicts MoRF locations in protein sequences with higher accuracy compared to current MoRF predictors.

Methods

Three distinct and largely independent property scores are computed with component predictors and then combined to generate the final MoRF propensity scores. The first score reflects the likelihood of sequence windows to harbour MoRFs and is based on amino acid composition and sequence similarity information. It is generated by MoRFCHiBi using small windows of up to 40 residues in size. The second score identifies long stretches of protein disorder and is generated by ESpritz with the DisProt option. Lastly, the third score reflects residue conservation and is assembled from PSSM files generated by PSI-BLAST. These propensity scores are processed and then hierarchically combined using Bayes rule to generate the final MoRFCHiBi_Web predictions.

Results

MoRFCHiBi_Web was tested on three datasets. Results show that MoRFCHiBi_Web outperforms previously developed predictors by generating less than half the false positive rate for the same true positive rate at practical threshold values. This level of accuracy paired with its relatively high processing speed makes MoRFCHiBi_Web a practical tool for MoRF prediction.

Availability

http://morf.chibi.ubc.ca:8080/morf/.  相似文献   

20.
Abstract

TATA-box binding protein (TBP) in a monomelic form and the complexes it forms with DNA have been elucidated with molecular dynamics simulations. Large TBP domain motions (bend and twist) are detected in the monomer as well as in the DNA complexes; these motions can be important for TBP binding of DNA. TBP interacts with guanine bases flanking the TATA element in the simulations of the complex; these interactions may explain the preference for guanine observed at these DNA positions. Side chains of some TBP residues at the binding interface display significant dynamic flexibility that results in ‘flipflop’ contacts involving multiple base pairs of the DNA. We discuss the possible functional significance of these observations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号