首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We report a comprehensive analysis of the numbers, lengths and amino acid compositions of transmembrane helices in 235 high-resolution structures of integral membrane proteins. The properties of 1551 transmembrane helices in the structures were compared with those obtained by analysis of the same amino acid sequences using topology prediction tools. Explanations for the 81 (5.2%) missing or additional transmembrane helices in the prediction results were identified. Main reasons for missing transmembrane helices were mis-identification of N-terminal signal peptides, breaks in α-helix conformation or charged residues in the middle of transmembrane helices and transmembrane helices with unusual amino acid composition. The main reason for additional transmembrane helices was mis-identification of amphipathic helices, extramembrane helices or hairpin re-entrant loops. Transmembrane helix length had an overall median of 24 residues and an average of 24.9 ± 7.0 residues and the most common length was 23 residues. The overall content of residues in transmembrane helices as a percentage of the full proteins had a median of 56.8% and an average of 55.7 ± 16.0%. Amino acid composition was analysed for the full proteins, transmembrane helices and extramembrane regions. Individual proteins or types of proteins with transmembrane helices containing extremes in contents of individual amino acids or combinations of amino acids with similar physicochemical properties were identified and linked to structure and/or function. In addition to overall median and average values, all results were analysed for proteins originating from different types of organism (prokaryotic, eukaryotic, viral) and for subgroups of receptors, channels, transporters and others.  相似文献   

2.
A suite of FORTRAN programs, PREF, is described for calculating preference functions from the data base of known protein structures and for comparing smoothed profiles of sequence-dependent preferences in proteins of unknown structure. Amino acid preferences for a secondary structure are considered as functions of a sequence environment. Sequence environment of amino acid residue in a protein is defined as an average over some physical, chemical, or statistical property of its primary structure neighbors. The frequency distribution of sequence environments in the data base of soluble protein structures is approximately normal for each amino acid type of known secondary conformation. An analytical expression for the dependence of preferences on sequence environment is obtained after each frequency distribution is replaced by corresponding Gaussian function. The preference for the α-helical conformation increases for each amino acid type with the increase of sequence environment of buried solvent-accessible surface areas. We show that a set of preference functions based on buried surface area is useful for predicting folding motifs in α-class proteins and in integral membrane proteins. The prediction accuracy for helical residues is 79% for 5 integral membrane proteins and 74% for 11 α-class soluble proteins. Most residues found in transmembrane segments of membrane proteins with known α-helical structure are predicted to be indeed in the helical conformation because of very high middle helix preferences. Both extramembrane and transmembrane helices in the photosynthetic reaction center M and L subunits are correctly predicted. We point out in the discussion that our method of conformational preference functions can identify what physical properties of the amino acids are important in the formation of particular secondary structure elements. © 1993 John Wiley & Sons, Inc.  相似文献   

3.
The amino acid sequences of proteins determine their three-dimensional structures and functions. However, how sequence information is related to structures and functions is still enigmatic. In this study, we show that at least a part of the sequence information can be extracted by treating amino acid sequences of proteins as a collection of English words, based on a working hypothesis that amino acid sequences of proteins are composed of short constituent amino acid sequences (SCSs) or “words”. We first confirmed that the English language highly likely follows Zipf''s law, a special case of power law. We found that the rank-frequency plot of SCSs in proteins exhibits a similar distribution when low-rank tails are excluded. In comparison with natural English and “compressed” English without spaces between words, amino acid sequences of proteins show larger linear ranges and smaller exponents with heavier low-rank tails, demonstrating that the SCS distribution in proteins is largely scale-free. A distribution pattern of SCSs in proteins is similar among species, but species-specific features are also present. Based on the availability scores of SCSs, we found that sequence motifs are enriched in high-availability sites (i.e., “key words”) and vice versa. In fact, the highest availability peak within a given protein sequence often directly corresponds to a sequence motif. The amino acid composition of high-availability sites within motifs is different from that of entire motifs and all protein sequences, suggesting the possible functional importance of specific SCSs and their compositional amino acids within motifs. We anticipate that our availability-based word decoding approach is complementary to sequence alignment approaches in predicting functionally important sites of unknown proteins from their amino acid sequences.  相似文献   

4.
Over one-third of protein structures contain metal ions, which are the necessary elements in life systems. Traditionally, structural biologists were used to investigate properties of metalloproteins (proteins which bind with metal ions) by physical means and interpreting the function formation and reaction mechanism of enzyme by their structures and observations from experiments in vitro. Most of proteins have primary structures (amino acid sequence information) only; however, the 3-dimension structures are not always available. In this paper, a direct analysis method is proposed to predict the protein metal-binding amino acid residues from its sequence information only by neural networks with sliding window-based feature extraction and biological feature encoding techniques. In four major bulk elements (Calcium, Potassium, Magnesium, and Sodium), the metal-binding residues are identified by the proposed method with higher than 90% sensitivity and very good accuracy under 5-fold cross validation. With such promising results, it can be extended and used as a powerful methodology for metal-binding characterization from rapidly increasing protein sequences in the future.  相似文献   

5.
Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., all-alpha, all-beta, alpha-beta proteins. As a first analysis, we evaluated amino acid propensities for helix, beta-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-alpha, all-beta, and alpha-beta proteins. For each class, the amino acid propensities for helix, beta-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results.  相似文献   

6.
Liang HK  Huang CM  Ko MT  Hwang JK 《Proteins》2005,59(1):58-63
Structural analysis is useful in elucidating structural features responsible for enhanced thermal stability of proteins. However, due to the rapid increase of sequenced genomic data, there are far more protein sequences than the corresponding three-dimensional (3D) structures. The usual sequence-based amino acid composition analysis provides useful but simplified clues about the amino acid types related to thermal stability of proteins. In this work, we developed a statistical approach to identify the significant amino acid coupling sequence patterns in thermophilic proteins. The amino acid coupling sequence pattern is defined as any 2 types of amino acids separated by 1 or more amino acids. Using this approach, we construct the rho profiles for the coupling patterns. The rho value gives a measure of the relative occurrence of a coupling pattern in thermophiles compared with mesophiles. We found that thermophiles and mesophiles exhibit significant bias in their amino acid coupling patterns. We showed that such bias is mainly due to temperature adaptation instead of species or GC content variations. Though no single outstanding coupling pattern can adequately account for protein thermostability, we can use a group of amino acid coupling patterns having strong statistical significance (p values < 10(-7)) to distinguish between thermophilic and mesophilic proteins. We found a good correlation between the optimal growth temperatures of the genomes and the occurrences of the coupling patterns (the correlation coefficient is 0.89). Furthermore, we can separate the thermophilic proteins from their mesophilic orthologs using the amino acid coupling patterns. These results may be useful in the study of the enhanced stability of proteins from thermophiles-especially when structural information is scarce. Proteins 2005. (c) 2005 Wiley-Liss, Inc.  相似文献   

7.
8.
We have critically evaluated hydrodynamic data from 21 proteins whose molecular dimensions are known from X-ray crystallography. We present two useful equations relating the molecular weights and sedimentation coefficients of globular proteins. The hydrodynamic data combined with data for small molecules from the literature indicate that failure of the Stokes equation occurs only for molecular weights <850. Calculated hydration values for the 21 proteins have a mean value and standard deviation of 0.53 ± 0.26 g H2O/g protein. Furthermore, statistical arguments indicate that only 5.3% of the variance is due to experimental error. The mean value and especially the dispersion of values are in sharp contrast to the values 0.36 ± 0.04 obtained by others from nmr measurements on frozen protein solutions. Hydration values calculated from nmr measurements are closely correlated with the number of charged and polar amino acid residues. In contrast to this result, our analysis of the amino acid compositions of the four proteins with the lowest hydration and the four monomeric proteins with the highest shows that the range of values we observe cannot be accounted for on the basis of amino acid composition. In fact there appears to be a weak correlation between the number of apolar residues and hydrodynamic hydration. We therefore conclude that the dispersion must result from variations in fine details of the surface structures of individual proteins. We propose a model of hemispherical clathrate cages which if correct, would account for the differences in the data obtained by these two methods.  相似文献   

9.
The large number of macromolecular structures deposited with the Protein Data Bank (PDB) describing complexes between proteins and either physiological compounds or synthetic drugs made it possible a systematic analysis of the interactions occurring between proteins and their ligands. In this work, the binding pockets of about 4000 PDB protein‐ligand complexes were investigated and amino acid and interaction types were analyzed. The residues observed with lowest frequency in protein sequences, Trp, His, Met, Tyr, and Phe, turned out to be the most abundant in binding pockets. Significant differences between drug‐like and physiological compounds were found. On average, physiological compounds establish with respect to drugs about twice as many hydrogen bonds with protein atoms, whereas drugs rely more on hydrophobic interactions to establish target selectivity. The large number of PDB structures describing homologous proteins in complex with the same ligand made it possible to analyze the conservation of binding pocket residues among homologous protein structures bound to the same ligand, showing that Gly, Glu, Arg, Asp, His, and Thr are more conserved than other amino acids. Also in the cases in which the same ligand is bound to unrelated proteins, the binding pockets showed significant conservation in the residue types. In this case, the probability of co‐occurrence of the same amino acid type in the binding pockets could be up to thirteen times higher than that expected on a random basis. The trends identified in this study may provide an useful guideline in the process of drug design and lead optimization. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

10.
We have developed experimental approaches for the construction of protocellular structures under simulated primitive earth conditions and studied their formation and characteristics. Three types of envelopes; protein envelopes, lipid envelopes, and lipid-protein envelopes are considered as candidates for protocellular structures. Simple protein envelopes and lipid envelopes are presumed to have originated at an early stage of chemical evolution, interaction mutually and then evolved into more complex envelopes composed of both lipids and proteins.Three kinds of protein envelopes were constructedin situ from amino acids under simulated primitive earth conditions such as a fresh water tide pool, a warm sea, and a submarine hydrothermal vent. One protein envelope was formed from a mixture of amino acid amides at 80 °C using multiple hydration-dehydration cycles. Marigranules, protein envelope structures, were produced from mixtures of glycine and acidic, basic and aromatic amino acids at 105 °C in a modified sea medium enriched with essential transition elements. Thermostable microspheres were also formed from a mixture of glycine, alanine, valine, and aspartic acid at 250 °C and above. The microspheres did not form at lower temperatures and consist of silicates and peptide-like polymers containing imide bonds and amino acid residues enriched in valine. Amphiphilic proteins with molecular weights of 2000 were necessary for the formation of the protein envelopes.Stable lipid envelopes were formed from different dialkyl phospholipids and fatty acids.Large, stable, lipid-protein envelopes were formed from egg lecithin and the solubilized marigranules. Polycations such as polylysine and polyhistidine, or basic proteins such as lysozyme and cytochromec also stabilized lipid-protein envelopes.  相似文献   

11.
12.
Two-dimensional Fourier transform methods for homonuclear proton NMR spectroscopy have been introduced by Wüthrich and Ernst as a means of extending assignments in spectra of proteins. Multinuclear two-dimensional approaches also appear promising. We are applying current one- and two-dimensional NMR methods to protein family members that differ from one another by one or more amino acid substitutions. The overall goal is to elucidate relationships among the sequences, structures, and functions of these proteins: for example, to delineate primary structural requirements for changes in observable properties such as conformation, amino acid side chain dynamics, hydrogen exchange dynamics, pK'a values, and oxidation-reduction potentials. The ovomucoids from a variety of species of birds, which include a single set of 12 pairs of third-domain proteins (Mr = 6062 for turkey third domain, similar for others) that differ by single amino acid substitutions, provide a favorable system for the study of the structural and dynamic effects of single amino acid replacements. X-ray crystallographic structures of two ovomucoid third domains are available. Other series of proteins being studied by these methods include the photosynthetic electron transport proteins ferredoxin and plastocyanin.  相似文献   

13.
This review summarized the data obtained by the author in studies on internal symmetry of the mirror type in primary structures of proteins. The methods for detection of symmetric segments in amino acid sequences are analyzed: (1) the method based on analysis of sequences of roots of amino acid codons; (2) the dot matrix method; (3) the method of internal symmetry scanning. The results of studies of internal symmetry in enzymes and signaling proteins are presented. The probable role of the internal symmetry in the structural-functional organization of proteins is discussed.  相似文献   

14.

Background  

Incorporating variable amino acid stereochemistry in molecular design has the potential to improve existing protein stability and create new topologies inaccessible to homochiral molecules. The Protein Data Bank has been a reliable, rich source of information on molecular interactions and their role in protein stability and structure. D-amino acids rarely occur naturally, making it difficult to infer general rules for how they would be tolerated in proteins through an analysis of existing protein structures. However, protein elements containing short left-handed turns and helices turn out to contain useful information. Molecular mechanisms used in proteins to stabilize left-handed elements by L-amino acids are structurally enantiomeric to potential synthetic strategies for stabilizing right-handed elements with D-amino acids.  相似文献   

15.
We seek to understand the interplay between amino acid sequence and local structure in proteins. Are some amino acids unique in their ability to fit harmoniously into certain local structures? What is the role of sequence in sculpting the putative native state folds from myriad possible conformations? In order to address these questions, we represent the local structure of each Cα atom of a protein by just two angles, θ and μ, and we analyze a set of more than 4,000 protein structures from the PDB. We use a hierarchical clustering scheme to divide the 20 amino acids into six distinct groups based on their similarity to each other in fitting local structural space. We present the results of a detailed analysis of patterns of amino acid specificity in adopting local structural conformations and show that the sequence‐structure correlation is not very strong compared with a random assignment of sequence to structure. Yet, our analysis may be useful to determine an effective scoring rubric for quantifying the match of an amino acid to its putative local structure.  相似文献   

16.
Crasto CJ  Feng J 《Proteins》2001,42(3):399-413
We performed an extensive sequence analysis on the loops of proteins. By dividing a loop databank derived from the Protein Data Bank into groups, we analyzed the chemical characteristics and the sequence preferences of loops of different lengths and loops connecting different secondary structures in proteins. We found that a large population of loops in our loop databank (94.4%) is either partially or completely surface-exposed. A majority of surface loops in proteins are hydrophilic, whereas the chemical characteristics of interior loops are relatively neutral according to Eisenberg's consensus hydrophobicity scale. As a first step in investigating the intrinsic sequence-structure relationship of loop sequences in proteins, we performed a neighbor-dependent sequence analysis that calculated the effect of the neighboring amino acid type on the loop propensity of residues in loops. This method enhances the statistical significance of residue propensity, thus allowing us to explore the positional preference of amino acids in loops. Our analysis yielded a series of amino acid dyads that showed high preference for loop conformation. The data presented in this study should prove useful for developing potential codes in recognizing loop sequences in proteins.  相似文献   

17.
The molecular recognition and discrimination of very similar ligand moieties by proteins are important subjects in protein–ligand interaction studies. Specificity in the recognition of molecules is determined by the arrangement of protein and ligand atoms in space. The three pyrimidine bases, viz. cytosine, thymine, and uracil, are structurally similar, but the proteins that bind to them are able to discriminate them and form interactions. Since nonbonded interactions are responsible for molecular recognition processes in biological systems, our work attempts to understand some of the underlying principles of such recognition of pyrimidine molecular structures by proteins. The preferences of the amino acid residues to contact the pyrimidine bases in terms of nonbonded interactions; amino acid residue–ligand atom preferences; main chain and side chain atom contributions of amino acid residues; and solvent-accessible surface area of ligand atoms when forming complexes are analyzed. Our analysis shows that the amino acid residues, tyrosine and phenyl alanine, are highly involved in the pyrimidine interactions. Arginine prefers contacts with the cytosine base. The similarities and differences that exist between the interactions of the amino acid residues with each of the three pyrimidine base atoms in our analysis provide insights that can be exploited in designing specific inhibitors competitive to the ligands.  相似文献   

18.
Amino acids interact with each other, especially with neighboring amino acids, to generate protein structures. We studied the pattern of association and repulsion of amino acids based on 24,748 protein-coding genes from human, 11,321 from mouse, and 15,028 from Escherichia coli, and documented the pattern of neighbor preference of amino acids. All amino acids have different preferences for neighbors. We have also analyzed 7,342 proteins with known secondary structure and estimated the propensity of the 20 amino acids occurring in three of the major secondary structures, i.e., helices, sheets, and turns. Much of the neighbor preference can be explained by the propensity of the amino acids in forming different secondary structures, but there are also a number of intriguing association and repulsion patterns. The similarity in neighbor preference among amino acids is significantly correlated with the number of amino acid substitutions in both mitochondrial and nuclear genes, with amino acids having similar sets of neighbors replacing each other more frequently than those having very different sets of neighbors. This similarity in neighbor preference is incorporated into a new index of amino acid dissimilarities that can predict nonsynonymous codon substitutions better than the two existing indices of amino acid dissimilarities, i.e., Grantham's and Miyata's distances.  相似文献   

19.
The amino acid sequences of some fiber proteins possibly have a periodic structure. This periodicity can be analyzed using the Fourier transform of the mathematical image of the symbol sequence of amino acid residues in proteins. One of several possible methods of Fourier transform has been chosen as optimal for the given study. This optimal Fourier transform has been used to analyze the periodic structures in several fiber proteins of bacteriophage T4. Amino acids from some groups form sequences of alternating elements with a relatively small period (T=15); those from other groups form sequences with other small periods (T=10 and T=8). Relatively large periods of amino acid arrangement, with the entire amino acid sequence of the protein being divided between them into four or six equal parts, is a new finding. The data on protein structural periodicity make it possible to align the amino acid sequences according to the periodic structures of both type. The results obtained agree with the results of previous crystallographic and electron microscopic studies.__________Translated from Molekulyarnaya Biologiya, Vol. 39, No. 2, 2005, pp. 321–329.Original Russian Text Copyright © 2005 by Simakova, Simakov.  相似文献   

20.
Nuclear magnetic resonance spectra of membrane proteins containing multiple transmembrane helices have proven difficult to resolve due to the redundancy of aliphatic and Ser/Thr residues in transmembrane domains and the low chemical shift dispersity exhibited by residues in alpha-helical structures. Although (13)C- and (15)N-labeling are useful tools in the biophysical analysis of proteins, selective labeling of individual amino acids has been used to help elucidate more complete structures and to probe ligand-protein interactions. In general, selective labeling has been performed in Escherichia coli expression systems using minimal media supplemented with a single labeled amino acid and nineteen other unlabeled amino acids and/or by using auxotrophs for specific amino acids. Growth in minimal media often results in low yields of cells or expression products. We demonstrate a method in which one labeled amino acid is added to a rich medium. These conditions resulted in high expression (> or =100 mg/L) of a test fusion protein and milligram quantities of the selectively labeled membrane peptide after cyanogen bromide cleavage to release the peptide from the fusion protein. High levels of (15)N incorporation and acceptable levels of cross-labeling into other amino acid residues of the peptide were achieved. Growth in rich media is a simple and convenient alternative to growth in supplemented minimal media and is readily applicable to the expression of proteins selectively labeled with specific amino acids.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号