共查询到11条相似文献,搜索用时 93 毫秒
1.
2.
A new approach is introduced for analyzing and ultimately predicting protein structures, defined at the level of C alpha coordinates. We analyze hexamers (oligopeptides of six amino acid residues) and show that their structure tends to concentrate in specific clusters rather than vary continuously. Thus, we can use a limited set of standard structural building blocks taken from these clusters as representatives of the repertoire of observed hexamers. We demonstrate that protein structures can be approximated by concatenating such building blocks. We have identified about 100 building blocks by applying clustering algorithms, and have shown that they can "replace" about 76% of all hexamers in well-refined known proteins with an error of less than 1 A, and can be joined together to cover 99% of the residues. After replacing each hexamer by a standard building block with similar conformation, we can approximately reconstruct the actual structure by smoothly joining the overlapping building blocks into a full protein. The reconstructed structures show, in most cases, high resemblance to the original structure, although using a limited number of building blocks and local criteria of concatenating them is not likely to produce a very precise global match. Since these building blocks reflect, in many cases, some sequence dependency, it may be possible to use the results of this study as a basis for a protein structure prediction procedure. 相似文献
3.
N-terminal N-myristoylation of proteins: prediction of substrate proteins from amino acid sequence 总被引:5,自引:0,他引:5
Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity.Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for biological function. Among others, the list includes kinases, phosphatases, proteasomal regulatory subunit 4, kinase interacting proteins KIP1/KIP2, protozoan flagellar proteins, homologues of mitochondrial translocase TOM40, of the neuronal calcium sensor NCS-1 and of the cytochrome c-type heme lyase CCHL. Analyses of complete eukaryote genomes indicate that about 0.5 % of all encoded proteins are apparent NMT substrates except for a higher fraction in Arabidopsis thaliana ( approximately 0.8 %). 相似文献
4.
A proposed architecture for the central domain of the bacterial enhancer-binding proteins based on secondary structure prediction and fold recognition 下载免费PDF全文
Joel Osuna Xavier Soberon Enrique Morett 《Protein science : a publication of the Protein Society》1997,6(3):543-555
The expression of genes transcribed by the RNA polymerase with the alternative sigma factor <r54 (Ecr54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and Nif A, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (“Frontiers of protein structure prediction,” IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/ irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alfi topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain ATPase activity of the Eo-54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitutions that alter the function of the Ecr54 activators, leaving intact the Central domain ATPase activity, are mapped on a region proposed to play an equivalent role as the effector region of the GTPase superfamily. 相似文献
5.
Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchmc.org. 相似文献
6.
Matsuyuki Shirota Kengo Kinoshita 《Protein science : a publication of the Protein Society》2013,22(6):725-733
The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues. 相似文献
7.
8.
Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins. 下载免费PDF全文
G. Mocz 《Protein science : a publication of the Protein Society》1995,4(6):1178-1187
Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein''s sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction. 相似文献
9.
An analysis of the amino acid distributions at 15 positions, viz., N“, N′, Ncap, N1, N2, N3, N4, Mid, C4, C3, C2, C1, Ccap, C′, and C” in 1,131 α-helices reveals that each position has its own unique characteristics. In general, natural helix sequences optimize by identifying the residues to be avoided at a given position and minimizing the occurrence of these avoided residues rather than by maximizing the preferred residues at various positions. Ncap is most selective in its choice of residues, with six amino acids (S, D, T, N, G, and P) being preferred at this position and another 11 (V, I, F, A, K, L, Y, R, E, M, and Q) being strongly avoided. Ser, Asp, and Thr are all more preferred at Ncap position than Asn, whose role at helix N-terminus has been highlighted by earlier analyses. Furthermore, Asn is also found to be almost equally preferred at helix C-terminus and a novel structural motif is identified, involving a hydrogen bond formed by Nδ2 of Asn at Ccap or C1 position, with the backbone carbonyl oxygen four residues inside the helix. His also forms a similar motif at the C-terminus. Pro is the most avoided residue in the main body (N4 to C4 positions) and at C-ter-minus, including Ccap of an α-helix. In 1,131 α-helices, no helix contains Pro at C3 or C2 positions. However, Pro is highly favoured at N1 and C′. The doublet X-Pro, with Pro at C′ position and extended backbone conformation for the X residue at Ccap, appears to be a common structural motif for termination of α-helices, in addition to the Schellman motif. Main body of the helix shows a high preference for aliphatic residues Ala, Leu, Val, and Ile, while these are avoided at helix termini. A propensity scale for amino acids to occur in the middle of helices has been obtained. Comparison of this scale with several previously reported scales shows that this scale correlates best with the experimentally determined values. Proteins 31:460–476, 1998. © 1998 Wiley-Liss, Inc. 相似文献
10.
Identification of related proteins with weak sequence identity using secondary structure information 下载免费PDF全文
Geourjon C Combet C Blanchet C Deléage G 《Protein science : a publication of the Protein Society》2001,10(4):788-797
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation. 相似文献
11.
New relationships found in the process of updating the structural classification of proteins (SCOP) database resulted in the revision of the structure of the N-terminal, DNA-binding domain of the transition state regulator AbrB. The dimeric AbrB domain shares a common fold with the addiction antidote MazE and the subunit of uncharacterized protein MraZ implicated in cell division and cell envelope formation. It has a detectable sequence similarity to both MazE and MraZ thus providing an evolutionary link between the two proteins. The putative DNA-binding site of AbrB is found on the same face as the DNA-binding site of MazE and appears similar, both in structure and sequence, to the exposed conserved region of MraZ. This strongly suggests that MraZ also binds DNA and allows for a consensus model of DNA recognition by the members of this novel protein superfamily. 相似文献