首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
What is the minimum number of letters required to fold a protein?   总被引:4,自引:0,他引:4  
Experimental studies have shown that the full sequence complexity of naturally occurring proteins is not required to generate rapidly folding and functional proteins, i.e. proteins can be designed with fewer than 20 letters. This raises the question of what is the minimum number of amino acid types required to encode complex protein folds? Here, we investigate this issue from three aspects. First, we study the minimum sequence complexity that can reserve the necessary structural information for detection of distantly related homologues. Second, we compare the ability of designing foldable model sequences over a wide range of reduced amino acid alphabets, which find the minimum number of letters that have the similar design ability as 20. Finally, we survey the lower bound of alphabet size of globular proteins in a non-redundant protein database. These different approaches give a remarkably consistent view, that the minimum number of letters required to fold a protein is around ten.  相似文献   

2.
    
Melo F  Marti-Renom MA 《Proteins》2006,63(4):986-995
Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs.  相似文献   

3.
Using an information theoretic formalism, we optimize classes of amino acid substitution to be maximally indicative of local protein structure. Our statistically-derived classes are loosely identifiable with the heuristic constructions found in previously published work. However, while these other methods provide a more rigid idealization of physicochemically constrained residue substitution, our classes provide substantially more structural information with many fewer parameters. Moreover, these substitution classes are consistent with the paradigmatic view of the sequence-to-structure relationship in globular proteins which holds that the three-dimensional architecture is predominantly determined by the arrangement of hydrophobic and polar side chains with weak constraints on the actual amino acid identities. More specific constraints are imposed on the placement of prolines, glycines, and the charged residues. These substitution classes have been used in highly accurate predictions of residue solvent accessibility. They could also be used in the identification of homologous proteins, the construction and refinement of multiple sequence alignments, and as a means of condensing and codifying the information in multiple sequence alignments for secondary structure prediction and tertiary fold recognition. © 1996 Wiley-Liss, Inc.  相似文献   

4.
Several choices of amino acid substitution matrices are currently available for searching and alignment applications. These choices were evaluated using the BLAST searching program, which is extremely sensitive to differences among matrices, and the Prosite catalog, which lists members of hundreds of protein families. Matrices derived directly from either sequence-based or structurebased alignments of distantly related proteins performed much better overall than extrapolated matrices based on the Dayhoff evolutionary model. Similar results were obtained with the FASTA searching program. Improved performance appears to be general rather than family-specific, reflecting improved accuracy in scoring alignments. An implementation of a multiple matrix strategy was also tested. While no combination of three matrices performed as well as the single best matrix, BLOSUM 62, good results were obtained using a combination of sequence-based and structure-based matrices. This hybrid set of matrices is likely to be useful in certain situations. Our results illustrate the importance of matrix selection and value of a comprehensive approach to evaluation of protein comparison tools. © 1993 Wiley-Liss, Inc.  相似文献   

5.
A novel method has been developed for acquiring the correct alignment of a query sequence against remotely homologous proteins by extracting structural information from profiles of multiple structure alignment. A systematic search algorithm combined with a group of score functions based on sequence information and structural information has been introduced in this procedure. A limited number of top solutions (15,000) with high scores were selected as candidates for further examination. On a test-set comprising 301 proteins from 75 protein families with sequence identity less than 30%, the proportion of proteins with completely correct alignment as first candidate was improved to 39.8% by our method, whereas the typical performance of existing sequence-based alignment methods was only between 16.1% and 22.7%. Furthermore, multiple candidates for possible alignment were provided in our approach, which dramatically increased the possibility of finding correct alignment, such that completely correct alignments were found amongst the top-ranked 1000 candidates in 88.3% of the proteins. With the assistance of a sequence database, completely correct alignment solutions were achieved amongst the top 1000 candidates in 94.3% of the proteins. From such a limited number of candidates, it would become possible to identify more correct alignment using a more time-consuming but more powerful method with more detailed structural information, such as side-chain packing and energy minimization, etc. The results indicate that the novel alignment strategy could be helpful for extending the application of highly reliable methods for fold identification and homology modeling to a huge number of homologous proteins of low sequence similarity. Details of the methods, together with the results and implications for future development are presented.  相似文献   

6.
Hemoglobin from the cobra snake, Naja naja naja, was isolated and its chains separated on a CM-cellulose column. The separation profile revealed an and two chains having the molar proportions of []2,[ 1]1,[ 2]1. The N-terminal amino acid sequence of the intact chains and of the CNBr peptides were carried out. The 2 chain was found to be heterogeneous comprising a minor component amounting to 11%. This later showed changes at two positions 9 and 14 in the first 30 residues sequenced.  相似文献   

7.
Summary

A conformational search by simulated annealing has been performed on two peptides derivated from the tetradecapeptide used to isolate the Xenopus laevis skin maturation RXVRG-endoprotease. The Ala 12 derivative, obtained by substitution in the hydrophobic C terminal fragment and the undecapeptide 4–14, obtained by deletion of an acidic rich tripeptide, were studied. No unique structure has been found for the tetradecapeptide Ala 12. This structural disorganization could explain the loss of activity of the endoprotease towards the subsituted peptide. For the undecapeptide, two different models in accordance with the NMR data were found. The conformational differences between these two models are locat ed in the consensus sequence and in each case an hairpin-like conformation is observed. These results could be related to the enhanced cleavage activity of the maturation enzyme. The obtained structures are also compared with those of the original tetradecapeptide.  相似文献   

8.
Two trypsin inhibitors, LA-1 and LA-2, have been isolated from ridged gourd (Luffa acutangula Linn.) seeds and purified to homogeneity by gel filtration followed by ion-exchange chromatography. The isoelectric point is atpH 4.55 for LA-1 and atpH 5.85 for LA-2. The Stokes radius of each inhibitor is 11.4 å. The fluorescence emission spectrum of each inhibitor is similar to that of the free tyrosine. The biomolecular rate constant of acrylamide quenching is 1.0×109 M–1 sec–1 for LA-1 and 0.8 × 109 M–1 sec–1 for LA-2 and that of K2HPO4 quenching is 1.6×1011 M–1 sec–1 for LA-1 and 1.2×1011M–1 sec–1 for LA-2. Analysis of the circular dichroic spectra yields 40%-helix and 60%-turn for La-1 and 45%-helix and 55%-turn for LA-2. Inhibitors LA-1 and LA-2 consist of 28 and 29 amino acid residues, respectively. They lack threonine, alanine, valine, and tryptophan. Both inhibitors strongly inhibit trypsin by forming enzymeinhibitor complexes at a molar ratio of unity. A chemical modification study suggests the involvement of arginine of LA-1 and lysine of LA-2 in their reactive sites. The inhibitors are very similar in their amino acid sequences, and show sequence homology with other squash family inhibitors.  相似文献   

9.
We report the partial amino acid sequence of chicken intestinal microvillar 110-kDa protein that, as a complex with calmodulin, has previously been shown to exhibit myosin-like ATPase and actin-binding activities. The sequence shows a high degree of similarity to the sequence of a novel vertebrate myosin I-like heavy chain encoded by a cDNA isolated from bovine intestine. This confirms that the bovine and chicken proteins are the first examples of Acanthamoeba myosin I-like proteins from higher eukaryotes. Comparison of available structural and functional data leads us to postulate that the myosin I family of proteins result from the fusion of a conserved myosin headlike motor domain, with variable COOH-terminal domains responsible for binding to specific intracellular structures.  相似文献   

10.
    
An automatic sequence search and analysis protocol (DomainFinder) based on PSI-BLAST and IMPALA, and using conservative thresholds, has been developed for reliably integrating gene sequences from GenBank into their respective structural families within the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath_new). DomainFinder assigns a new gene sequence to a CATH homologous superfamily provided that PSI-BLAST identifies a clear relationship to at least one other Protein Data Bank sequence within that superfamily. This has resulted in an expansion of the CATH protein family database (CATH-PFDB v1.6) from 19,563 domain structures to 176,597 domain sequences. A further 50,000 putative homologous relationships can be identified using less stringent cut-offs and these relationships are maintained within neighbour tables in the CATH Oracle database, pending further evidence of their suggested evolutionary relationship. Analysis of the CATH-PFDB has shown that only 15% of the sequence families are close enough to a known structure for reliable homology modeling. IMPALA/PSI-BLAST profiles have been generated for each of the sequence families in the expanded CATH-PFDB and a web server has been provided so that new sequences may be scanned against the profile library and be assigned to a structure and homologous superfamily.  相似文献   

11.
    
McGuffin LJ  Jones DT 《Proteins》2002,48(1):44-52
The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.  相似文献   

12.
吴艳  谈承杰  朱平 《生物信息学》2012,10(4):264-268
已有相关文献表明人类嗜T细胞病毒Ⅰ型(Human T-lymphotropic virus 1,记为HTLV-Ⅰ)的分布具有区域性,本文旨在提出不同的分析区域性的方法。首先从GenBank中选取来自亚洲、南美洲、非洲的共20条核苷酸序列,用分子生物学软件Vector NTI Suite分析各地区序列样本内部的同源性,然后以各序列的氨基酸含量为对象,定义一个全新的公式进行同源性分析,将该结果与其他研究者采用实验的方法的分析结果比较。结果发现不同的分析方法所得的结论均是一致的。这表明:HTLV-Ⅰ病毒的分布有明显的区域性,文章采用的研究方法对其他流行病学的研究是同样可行的。  相似文献   

13.
    
Summary The complete amino acid sequence of the major sialoglycoproteins of horse erythrocyte membranes, glycophorin HA, was determined by manual sequencing methods, using tryptic, chymotryptic, and cyanogen bromide fragments. Glycophorin HA is a polypeptide chain of 120 amino acid residues and contains 10 oligosaccharide units attached to the amino-terminal side of the molecule. Its amino terminus is pyroglutamic acid. All of the oligosaccharides are linked O-glycosidically to threonine or serine residues. The amino acid sequence is consistent with the transmembrane orientation of glycophorins.There is no significant homology between the glycosylated domains of horse, human, and porcine glycophorins, but there is a considerable homology between the hydrophobic domains of the three glycophorins, which interact with the lipid bilayer of the erythrocyte membrane.  相似文献   

14.
    
Liu HL  Lin JC 《Proteins》2004,55(3):558-567
Homology models of the pore loop domain of six eukaryotic potassium channels, Kv1.1-Kv1.6, were generated based on the crystallographic structure of KcsA. The results of amino acid sequence alignment indicate that these Kv channels are composed of two structurally and functionally independent domains: the N-terminal 'voltage sensor' domain and the C-terminal 'pore loop' domain. The homology models reveal that the pore loop domains of these Kv channels exhibit similar folds to those of KcsA. The structural features and specific packing of aromatic residues around the selectivity filter of these Kv channels are nearly identical to those of KcsA, whereas most of the structural variations occur in the turret as well as in the inner and outer helices. The distribution of polar and nonpolar side chains on the surfaces of the KcsA and Kv channels reveals that they exhibit a segregation of side chains common to most integral membrane proteins. As the hydrogen bond between Glu71 and Asp80 in KcsA plays an important role in stabilizing the channel, the substituted Val residue in the Kv family corresponding to Glu71 of KcsA stabilizes the channel by making hydrophobic contact with Tyr residue from the signature sequence of the selectivity filter. The homology models of these Kv channels provide particularly attractive subjects for further structure-based studies.  相似文献   

15.
Clostocin O is a phage tail-like bacteriocin produced by Clostridium saccharoperbutylacetonicum Nl–4. One particle of clostocin O had an activity to kill one sensitive organism. Clostocin O had also the lytic activity, but this lytic activity was not an essential action of clostocin O, because clostocin O was able to show a sufficient killing activity even under the condition to inhibit its lytic activity. The biosynthesis of macromolecules (protein, RNA and DNA) in sensitive organisms was inhibited by clostocin O infection. The amounts of macro-molecules of the infected organisms were held at the initial level.  相似文献   

16.
    
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2006,62(1):209-217
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.  相似文献   

17.
    
Ohlson T  Wallner B  Elofsson A 《Proteins》2004,57(1):188-197
To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.  相似文献   

18.
Several recent publications illustrated advantages of using sequence profiles in recognizing distant homologies between proteins. At the same time, the practical usefulness of distant homology recognition depends not only on the sensitivity of the algorithm, but also on the quality of the alignment between a prediction target and the template from the database of known proteins. Here, we study this question for several supersensitive protein algorithms that were previously compared in their recognition sensitivity (Rychlewski et al., 2000). A database of protein pairs with similar structures, but low sequence similarity is used to rate the alignments obtained with several different methods, which included sequence-sequence, sequence-profile, and profile-profile alignment methods. We show that incorporation of evolutionary information encoded in sequence profiles into alignment calculation methods significantly increases the alignment accuracy, bringing them closer to the alignments obtained from structure comparison. In general, alignment quality is correlated with recognition and alignment score significance. For every alignment method, alignments with statistically significant scores correlate with both correct structural templates and good quality alignments. At the same time, average alignment lengths differ in various methods, making the comparison between them difficult. For instance, the alignments obtained by FFAS, the profile-profile alignment algorithm developed in our group are always longer that the alignments obtained with the PSI-BLAST algorithms. To address this problem, we develop methods to truncate or extend alignments to cover a specified percentage of protein lengths. In most cases, the elongation of the alignment by profile-profile methods is reasonable, adding fragments of similar structure. The examples of erroneous alignment are examined and it is shown that they can be identified based on the model quality.  相似文献   

19.
This paper discusses the benefit of mapping paired cysteine mutation patterns as a guide to identifying the positions of protein disulfide bonds. This information can facilitate the computer modeling of protein tertiary structure. First, a simple, paired natural-cysteine-mutation map is presented that identifies the positions of putative disulfide bonds in protein families. The method is based on the observation that if, during the process of evolution, a disulfide-bonded cysteine residue is not conserved, then it is likely that its counterpart will also be mutated. For each target protein, protein databases were searched for the primary amino acid sequences of all known members of distinct protein families. Primary sequence alignment was carried out using PileUp algorithms in the GCG package. To search for correlated mutations, we listed only the positions where cysteine residues were highly conserved and emphasized the mutated residues. In proteins of known three-dimensional structure, a striking pattern of paired cysteine mutations correlated with the positions of known disulfide bridges. For proteins of unknown architecture, the mutation maps showed several positions where disulfide bridging might occur.  相似文献   

20.
    
The earliest proteins had to rely on amino acids available on early Earth before the biosynthetic pathways for more complex amino acids evolved. In extant proteins, a significant fraction of the ‘late’ amino acids (such as Arg, Lys, His, Cys, Trp and Tyr) belong to essential catalytic and structure-stabilizing residues. How (or if) early proteins could sustain an early biosphere has been a major puzzle. Here, we analysed two combinatorial protein libraries representing proxies of the available sequence space at two different evolutionary stages. The first is composed of the entire alphabet of 20 amino acids while the second one consists of only 10 residues (ASDGLIPTEV) representing a consensus view of plausibly available amino acids through prebiotic chemistry. We show that compact conformations resistant to proteolysis are surprisingly similarly abundant in both libraries. In addition, the early alphabet proteins are inherently more soluble and refoldable, independent of the general Hsp70 chaperone activity. By contrast, chaperones significantly increase the otherwise poor solubility of the modern alphabet proteins suggesting their coevolution with the amino acid repertoire. Our work indicates that while both early and modern amino acids are predisposed to supporting protein structure, they do so with different biophysical properties and via different mechanisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号