首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The amino acid sequences of soluble, ordered proteins with stable structures have evolved due to biological and physical requirements, thus distinguishing them from random sequences. Previous analyses have focused on extracting the features that frequently appear in protein substructures, such as α‐helix and β‐sheet, but the universal features of protein sequences have not been addressed. To clarify the differences between native protein sequences and random sequences, we analyzed 7368 soluble, ordered protein sequences, by inspecting the observed and expected occurrences of 400 amino acid pairs in local proximity, up to 10 residues along the sequence in comparison with their expected occurrence in random sequence. We found the trend that the hydrophobic residue pairs and the polar residue pairs are significantly decreased, whereas the pairs between a hydrophobic residue and a polar residue are increased. This trend was universally observed regardless of the secondary structure content but was not observed in protein sequences that include intrinsically disordered regions, indicating that it can be a general rule of protein foldability. The possible benefits of this rule are discussed from the viewpoints of protein aggregation and disorder, which are both caused by low‐complexity regions of hydrophobic or polar residues.  相似文献   

2.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

3.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

4.
Three thousand eight hundred ninety-nine beta-turns have been identified and classified using a nonhomologous data set of 205 protein chains. These were used to derive beta-turn positional potentials for turn types I'' and II'' for the first time and to provide updated potentials for formation of the more common types I, II, and VIII. Many of the sequence preferences for each of the 4 positions in turns can be rationalized in terms of the formation of stabilizing hydrogen bonds, preferences for amino acids to adopt a particular conformation in phi, psi space, and the involvement of turn types I'' and II'' in beta-hairpins. Only 1,632 (42%) of the turns occur in isolation; the remainder have at least 1 residue in common with another turn and have hence been classified as multiple turns. Several types of multiple turn have been identified and analyzed.  相似文献   

5.
Prediction of gene sequences and their exon-intron structure in large eukaryotic genomic sequences is one of the central problems of mathematical biology. Solving this problem involves, in particular, high-accuracy splice site recognition. Using statistical analysis of a splice site-containing human gene fragment database, some characteristic features were described for nucleotide sequences in the splicing site neighborhood, the frequencies of all nucleotides and dinucleotides were determined, and those with frequencies increased or decreased in comparison to a random sequence were identified. The results can be used in sequence annotation, splicing site prediction, and the recognition of the gene exon-intron structure.  相似文献   

6.
The architecture and weights of an artificial neural network model that predicts putative transmembrane sequences have been developed and optimized by the algorithm of structure evolution. The resulting filter is able to classify membrane/nonmembrane transition regions in sequences of integral human membrane proteins with high accuracy. Similar results have been obtained for both training and test set data, indicating that the network has focused on general features of transmembrane sequences rather than specializing on the training data. Seven physicochemical amino acid properties have been used for sequence encoding. The predictions are compared to hydrophobicity plots.  相似文献   

7.
Collagen fibrils represent a unique case of protein folding and self‐association. We have recently successfully developed triple‐helical peptides that can further self‐assemble into collagen‐mimetic mini‐fibrils. The 35 nm axially repeating structure of the mini‐fibrils, which is designated the d‐period, is highly reminiscent of the well‐known 67 nm D‐period of native collagens when examined using TEM and atomic force spectroscopy. We postulate that it is the pseudo‐identical repeating sequence units in the primary structure of the designed peptides that give rise to the d‐period of the quaternary structure of the mini‐fibrils. In this work, we characterize the self‐assembly of two additional designed peptides: peptide Col877 and peptide Col108rr. The triple‐helix domain of Col877 consists of three pseudo‐identical amino acid sequence units arranged in tandem, whereas that of Col108rr consists of three sequence units identical in amino acid composition but different in sequence. Both peptides form stable collagen triple helices, but only triple helices Col877 self‐associate laterally under fibril forming conditions to form mini‐fibrils having the predicted d‐period. The Co108rr triple helices, however, only form nonspecific aggregates having no identifiable structural features. These results further accentuate the critical involvement of the repeating sequence units in the self‐assembly of collagen mini‐fibrils; the actual amino acid sequence of each unit has only secondary effects. Collagen is essential for tissue development and function. This novel approach to creating collagen‐mimetic fibrils can potentially impact fundamental research and have a wide range of biomedical and industrial applications.  相似文献   

8.
The amino-acid sequences of soluble, globular proteins must have hydrophobic residues to form a stable core, but excess sequence hydrophobicity can lead to loss of native state conformational specificity and aggregation. Previous studies of polar-to-hydrophobic mutations in the β-sheet of the Arc repressor dimer showed that a single substitution at position 11 (N11L) leads to population of an alternate dimeric fold in which the β-sheet is replaced by helix. Two additional hydrophobic mutations at positions 9 and 13 (Q9V and R13V) lead to population of a differently folded octamer along with both dimeric folds. Here we conduct a comprehensive study of the sequence determinants of this progressive loss of fold specificity. We find that the alternate dimer-fold specifically results from the N11L substitution and is not promoted by other hydrophobic substitutions in the β-sheet. We also find that three highly hydrophobic substitutions at positions 9, 11, and 13 are necessary and sufficient for oligomer formation, but the oligomer size depends on the identity of the hydrophobic residue in question. The hydrophobic substitutions increase thermal stability, illustrating how increased hydrophobicity can increase folding stability even as it degrades conformational specificity. The oligomeric variants are predicted to be aggregation-prone but may be hindered from doing so by proline residues that flank the β-sheet region. Loss of conformational specificity due to increased hydrophobicity can manifest itself at any level of structure, depending upon the specific mutations and the context in which they occur.  相似文献   

9.
10.
The nucleotide sequences of cloned genes coding for the elongation factor Tu of seven eubacteria have been determined. These genes were fiom Anacystis nidulans, Bacillus subtilis, Bacteroides fragilis, Deinonema spec., Pseudomonas cepacia, Shewanella putrefaciens and Streptococcus oralis. The primary structures of the genes were compared to the available sequences of prokaryotic elongation factors Tu and eukaryotic elongation factors 1 alpha. A conservation profile was determined for homologous amino acid residues. Sites of known or putative functions are usually located at highly conserved positions or within highly conserved sequence stretches. The aligned 24 amino acid sequences were used as basis for a phylogenetic analysis. The phylogenetic tree corroborates the kingdom as well as phylum concept deduced from 16S rRNA data.Abbreviations EF-Tu elongation factor Tu - GDP guanosine 5-diphosphate - GTP guanosine 5-triphosphate; tuf gene, gene coding for elongation factor Tu  相似文献   

11.
Beta-turns are sites at which proteins change their overall chain direction, and they occur with high frequency in globular proteins. The Protein Data Bank has many instances of conformations that resemble beta-turns but lack the characteristic N-H(i) --> O=C(i - 3) hydrogen bond of an authentic beta-turn. Here, we identify potential hydrogen-bonded beta-turns in the coil library, a Web-accessible database utility comprised of all residues not in repetitive secondary structure, neither alpha-helix nor beta-sheet (http://www.roselab.jhu.edu/coil). In particular, candidate turns were identified as four-residue segments satisfying highly relaxed geometric criteria but lacking a strictly defined hydrogen bond. Such candidates were then subjected to a minimization protocol to determine whether slight changes in torsion angles are sufficient to shift the conformation into reference-quality geometry without deviating significantly from the original structure. This approach of applying constrained minimization to known structures reveals a substantial population of previously unidentified, stringently defined, hydrogen-bonded beta-turns. In particular, 33% of coil library residues were classified as beta-turns prior to minimization. After minimization, 45% of such residues could be classified as beta-turns, with another 8% in 3(10) helixes (which closely resemble type III beta-turns). Of the remaining coil library residues, 37% have backbone dihedral angles in left-handed polyproline II structure.  相似文献   

12.
Neural networks were used to generalize common themes found in transmembrane-spanning protein helices. Various-sized databases were used containing nonoverlapping sequences, each 25 amino acids long. Training consisted of sorting these sequences into 1 of 2 groups: transmembrane helical peptides or nontransmembrane peptides. Learning was measured using a test set 10% the size of the training set. As training set size increased from 214 sequences to 1,751 sequences, learning increased in a nonlinear manner from 75% to a high of 98%, then declined to a low of 87%. The final training database consisted of roughly equal numbers of transmembrane (928) and nontransmembrane (1,018) sequences. All transmembrane sequences were entered into the database with respect to their lipid membrane orientation: from inside the membrane to outside. Generalized transmembrane helix and nontransmembrane peptides were constructed from the maximally weighted connecting strengths of fully trained networks. Four generalized transmembrane helices were found to contain 9 consensus residues: a K-R-F triplet was found at the inside lipid interface, 2 isoleucine and 2 other phenylalanine residues were present in the helical body, and 2 tryptophan residues were found near the outside lipid interface. As a test of the training method, bacteriorhodopsin was examined to determine the position of its 7 transmembrane helices.  相似文献   

13.
In multi‐domain proteins, the domains typically run end‐to‐end, that is, one domain follows the C‐terminus of another domain. However, approximately 10% of multi‐domain proteins are formed by insertion of one domain sequence into that of another domain. Detecting such insertions within protein sequences is a fundamental challenge in structural biology. The haloacid dehalogenase superfamily (HADSF) serves as a challenging model system wherein a variable cap domain (~5–200 residues in length) accessorizes the ubiquitous Rossmann‐fold core domain, with variations in insertion site and topology corresponding to different classes of cap types. Herein, we describe a comprehensive computational strategy, CapPredictor, for determining large, variable domain insertions in protein sequences. Using a novel sequence‐alignment algorithm in conjunction with a structure‐guided sequence profile from 154 core‐domain‐only structures, more than 40,000 HADSF member sequences were assigned cap types. The resulting data set afforded insight into HADSF evolution. Notably, a similar distribution of cap‐type classes across different phyla was observed, indicating that all cap types existed in the last universal common ancestor. In addition, comparative analyses of the predicted cap‐type and functional assignments showed that different cap types carry out similar chemistries. Thus, while cap domains play a role in substrate recognition and chemical reactivity, cap‐type does not strictly define functional class. Through this example, we have shown that CapPredictor is an effective new tool for the study of form and function in protein families where domain insertion occurs. Proteins 2014; 82:1896–1906. © 2014 Wiley Periodicals, Inc.  相似文献   

14.
15.
16.
The intron/exon organization of the human gene for glycogen phosphorylase has been determined. The segments of the polypeptide chain that corresponds to the 19 exons of the gene are examined for relationships between the three-dimensional structure to the protein and gene structure. Only weak correlations are observed between domains of phosphorylase and exons. The nucleotide binding domains that are found in phosphorylase and other glycolytic enzymes are examined for relationships between exons of the genes and structures of the domains. When mapped to the three-dimensional structures, the intron/exon boundaries are shown to be widely distributed in this family of protein domains.  相似文献   

17.
The complete amino acid sequence of the 125-residue photoactive yellow protein (PYP) from Ectothiorhodospira halophila has been determined to be MEHVAFGSEDIENTLAKMDDGQLDGLAFGAIQLDGDGNILQYNAAEGDITGRDPKEVIGKNFFKDVAP+ ++ CTDSPEFYGKFKEGVASGNLNTMFEYTFDYQMTPTKVKVHMKKALSGDSYWVFVKRV. This is the first sequence to be reported for this class of proteins. There is no obvious sequence homology to any other protein, although the crystal structure, known at 2.4 A resolution (McRee, D.E., et al., 1989, Proc. Natl. Acad. Sci. USA 86, 6533-6537), indicates a relationship to the similarly sized fatty acid binding protein (FABP), a representative of a family of eukaryotic proteins that bind hydrophobic molecules. The amino acid sequence exhibits no greater similarity between PYP and FABP than for proteins chosen at random (8%). The photoactive yellow protein contains an unidentified chromophore that is bleached by light but recovers within a second. Here we demonstrate that the chromophore is bound covalently to Cys 69 instead of Lys 111 as deduced from the crystal structure analysis. The partially exposed side chains of Tyr 76, 94, and 118, plus Trp 119 appear to be arranged in a cluster and probably become more exposed due to a conformational change of the protein resulting from light-induced chromophore bleaching. The charged residues are not uniformly distributed on the protein surface but are arranged in positive and negative clusters on opposite sides of the protein. The exact chemical nature of the chromophore remains undetermined, but we here propose a possible structure based on precise mass analysis of a chromophore-binding peptide by electrospray ionization mass spectrometry and on the fact that the chromophore can be cleaved off the apoprotein upon reduction with a thiol reagent. The molecular mass of the chromophore, including an SH group, is 147.6 Da (+/- 0.5 Da); the cysteine residue to which it is bound is at sequence position 69.  相似文献   

18.
Searches using position specific scoring matrices (PSSMs) have been commonly used in remote homology detection procedures such as PSI-BLAST and RPS-BLAST. A PSSM is generated typically using one of the sequences of a family as the reference sequence. In the case of PSI-BLAST searches the reference sequence is same as the query. Recently we have shown that searches against the database of multiple family-profiles, with each one of the members of the family used as a reference sequence, are more effective than searches against the classical database of single family-profiles. Despite relatively a better overall performance when compared with common sequence-profile matching procedures, searches against the multiple family-profiles database result in a few false positives and false negatives. Here we show that profile length and divergence of sequences used in the construction of a PSSM have major influence on the performance of multiple profile based search approach. We also identify that a simple parameter defined by the number of PSSMs corresponding to a family that is hit, for a query, divided by the total number of PSSMs in the family can distinguish effectively the true positives from the false positives in the multiple profiles search approach.  相似文献   

19.
Protein C alpha coordinates are used to accurately reconstruct complete protein backbones and side-chain directions. This work employs potentials of mean force to align semirigid peptide groups around the axes that connect successive C alpha atoms. The algorithm works well for all residue types and secondary structure classes and is stable for imprecise C alpha coordinates. Tests on known protein structures show that root mean square errors in predicted main-chain and C beta coordinates are usually less than 0.3 A. These results are significantly more accurate than can be obtained from competing approaches, such as modeling of backbone conformations from structurally homologous fragments.  相似文献   

20.
Summary We examine in this paper one of the expected consequences of the hypothesis that modern proteins evolved from random heteropeptide sequences. Specifically, we investigate the lengthwise distributions of amino acids in a set of 1,789 protein sequences with little sequence identity using the run test statistic (r o) of Mood (1940,Ann. Math. Stat. 11, 367–392). The probability density ofr o for a collection of random sequences has mean=0 and variance=1 [the N(0,1) distribution] and can be used to measure the tendency of amino acids of a given type to cluster together in a sequence relative to that of a random sequence. We implement the run test using binary representations of protein sequences in which the amino acids of interest are assigned a value of 1 and all others a value of 0. We consider individual amino acids and sets of various combinations of them based upon hydrophobicity (4 sets), charge (3 sets), volume (4 sets), and secondary structure propensity (3 sets). We find that any sequence chosen randomly has a 90% or greater chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. We regard this as strong support for the random-origin hypothesis. However, we do observe significant deviations from the random expectation as might be expected after billions years of evolution. Two important global trends are found: (1) Amino acids with a strong α-helix propensity show a strong tendency to cluster whereas those with β-sheet or reverse-turn propensity do not. (2) Clustered rather than evenly distributed patterns tend to be preferred by the individual amino acids and this is particularly so for methionine. Finally, we consider the problem of reconciling the random nature of protein sequences with structurally meaningful periodic “patterns” that can be detected by sliding-window, autocorrelation, and Fourier analyses. Two examples, rhodopsin and bacteriorhodopsin, show that such patterns are a natural feature of random sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号