首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hydropathy plots or window averages over local stretches of the sequence of residue hydrophobicity have revealed patterns related to various protein tertiary structural features. This has enabled identification of regions of the sequence that are at the surface or within the interior of globular soluble proteins, regions located within the lipid bilayer of transmembrane proteins, portions of the sequence that characterize repeating motifs, as well as motifs that usefully characterize different protein structural families. This, therefore, provides one example of the generally expressed maxim that "sequence determines structure". On the other hand, a number of previous investigations have shown the rapidly varying values of residue hydrophobicity along the sequence to be distributed approximately randomly. So one might question just how much of the sequence actually determines structure. It is, therefore, of interest to extract that part of this rapidly varying distribution of residue hydrophobicity that is responsible for the longer wavelength variations that correlate with protein tertiary structural features and to determine their prevalence within the entire distribution. This is accomplished by a finite Fourier analysis of the sequence of residue hydrophobicity and of a new measure of residue distance from the protein interior. Calculations are performed on a number of globins, immunoglobulins, cuprodoxins, and papain-like structures. The spectral power of the Fourier amplitudes of the frequencies extracted, whose inverse transforms underlie the windowed values of residue hydrophobicity is shown to be a small fraction of the total power of the hydrophobicity distribution and thereby consistent with a distribution that might appear to be predominantly random. The wide range of sequence identity between proteins having the same fold, all exhibiting similar small fractions of power amplitude that correlate with the longer wavelength inside-to-outside excursions of the amino acid residues, supports the general contention that close sequence identity is an expression of a close evolutionary relationship rather than an expression of structural similarity. Practical implications of the present analysis for protein structure prediction and engineering are also described.  相似文献   

2.
The role of repeating motifs in protein structures is thought to be as modular building blocks which allow an economic way of constructing complex proteins. In this work novel wavelet transform analysis techniques are used to detect and characterize repeating motifs in protein sequence and structure data, where the Kyte-Doolittle hydrophobicity scale (Eta Phi) and relative accessible surface area (rASA) data provide residue information about the protein sequence and structure, respectively. We analyze a variety of repeating protein motifs, TIM barrels, propellor blades, coiled coils and leucine-rich repeat structures. Detection and characterization of these motifs is performed using techniques based on the continuous wavelet transform (CWT). Results indicate that the wavelet transform techniques developed herein are a promising approach for the detection and characterization of repeating motifs for both structural and in some instances sequence data.  相似文献   

3.
R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

4.
MOTIVATION: While protein secondary structure is well understood, representing the repetitive nature of tertiary packing in proteins remains difficult. We have developed a construct called the relative packing group (RPG) that applies the clique concept from graph theory as a natural basis for defining the packing motifs in proteins. An RPG is defined as a clique of residues, where every member contacts all others as determined by the Delaunay tessellation. Geometrically similar RPGs define a regular element of tertiary structure or tertiary motif (TerMo). This intuitive construct provides a simple approach to characterize general repetitive elements of tertiary structure. RESULTS: A dataset of over 4 million tetrahedral RPGs was clustered using different criteria to characterize the various aspects of regular tertiary structure in TerMos. Grouping this data within the SCOP classification levels of Family, Superfamily, Fold, Class and PDB showed that similar packing is shared across different folds. Classification of RPGs based on residue sequence locality reveals topological preferences according to protein sizes and secondary structure. We find that larger proteins favor RPGs with three local residues packed against a non-local residue. Classifying by secondary structure, helices prefer mostly local residues, sheets favor at least two local residues, while turns and coil populate with more local residues. To depict these TerMos, we have developed 2 complementary and intuitive representations: (i) Dirichlet process mixture density estimation of the torsion angle distributions and (ii) kernel density estimation of the Cartesian coordinate distribution. The TerMo library and representations software are available upon request.  相似文献   

5.
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains.  相似文献   

6.
A hallmark of soluble globular protein tertiary structure is a hydrophobic core and a protein exterior populated predominantly by hydrophilic residues. Recent hydrophobic moment profiling of the spatial distribution of 30 globular proteins of diverse size and structure had revealed features of this distribution that were comparable. Analogous profiling of the hydrophobicity distribution of the alpha-helical buried bundles of several transmembrane proteins, as the lipid/protein interface is approached from within the bilayer, reveals spatial hydrophobicity profiles that contrast with those obtained for the soluble proteins. The calculations, which enable relative changes of hydrophobicity to be simply identified over the entire spatial extent of the multimer within the lipid bilayer, show the accumulated zero-order moments of the bundles to be mainly inverted with respect to that found for the soluble proteins. This indicates a statistical increase in the average residue hydrophobic content as the lipid bilayer is approached. This result differs from that of a relatively recent calculation and qualitatively agrees with earlier calculations involving lipid exposed and buried residues of the alpha-helices of transmembrane proteins. Spatial profiling, over the entire spatial extent of the multimer with scaled values of residue hydrophobicity, provides information that is not available from calculations using lipid exposure alone.  相似文献   

7.
MOTIVATION: The underlying assumption of many sequence-based comparative studies in proteomics is that different aspects of protein structure and therefore functionality may be linked to particular sequence motifs. This holds true if sequence similarity is sufficiently high, but in general the relationship between protein sequence and structure appears complex and is not well understood. RESULTS: Statistical analysis of multiple and pairwise structural alignments of protein structures within SCOP folds is performed. The results indicate that multiple conservation of residue identity is not common and that relationship between sequence and structure may be explained by a model based on the assumption that protein structure is tolerant to residue substitutions preserving hydropathic profile of the sequence. This model also explains the origin and specific value of the sequence similarity threshold, noticed in many previous studies, below which structural resemblance is not statistically expected.  相似文献   

8.
We present experimental evidence for the significant effect that water can have on the functional structure of proteins in solution. Human (HSA) and Bovine Serum Albumin (BSA) have an amino acid sequence identity of 75.52% and are chosen as model proteins. We employ EPR-based nanoscale distance measurements using double electron-electron resonance (DEER) spectroscopy and both albumins loaded with long chain fatty acids (FAs) in solution to globally (yet indirectly) characterize the tertiary protein structures from the bound ligands’ points of view. The complete primary structures and crystal structures of HSA and as of recently also BSA are available. We complement the picture as we have recently determined the DEER-derived solution structure of HSA and here present the corresponding BSA solution structure. The characteristic asymmetric FA distribution in the crystal structure of HSA can surprisingly be observed by DEER in BSA in solution. This indicates that the BSA conformational ensemble in solution seems to be narrow and close to the crystal structure of HSA. In contrast, for HSA in solution a much more symmetric FA distribution was found. Thus, conformational adaptability and flexibility dominate in the HSA solution structure while BSA seems to lack these properties. We further show that differences in amino acid hydropathies of specific structural regions in both proteins can be used to correlate the observed difference in the global (tertiary) solution structures with the differences on the primary structure level.  相似文献   

9.
10.
A fundamental characteristic of soluble globular protein structure is a hydrophobic core and protein exterior comprised predominantly of hydrophilic residues. This distribution of amino acid residue hydrophobicity, from protein interior to exterior, has recently been profiled with the use of hydrophobic moments. The calculations enable comparison of the radial hydrophobicity distribution of different proteins and had revealed two features common to 30 proteins of diverse size and structure. One, a global feature, is the overall shape of the second-order ellipsoidal hydrophobic moment. The second, a specific feature, is a quasi-invariant hydrophobic-ratio of distances. Both features are dependent upon the rates of increase, from protein interior to exterior, of the accumulated numbers of hydrophobic and hydrophilic amino acid residues. These rates can be simulated simply with a two-component nucleation model of protein hydrophobicity. The model provides insight into the origin of the shape of the observed hydrophobic moment profiles and of the observed range of hydrophobic ratios. Consistent with observation, it is shown that a relatively wide range of hydrophobic and hydrophilic rates of increase yield a relatively narrow range of hydrophobic ratios. Furthermore, the model identifies one factor, the decrease in residue density with increasing distance from the protein interior, that is critical in providing the range of values that is comparable with the observed range.  相似文献   

11.
A data collection which merges protein structural and sequence information is described. Structural superpositions amongst proteins with similar main-chain fold were performed or collected from the literature. Sequences taken from the protein primary structure databases were associated with the multiple structural alignments providing they were at least 50% homologous in residue identity to one of the structural sequences and at least 50% of the structural sequence residues were alignable. Such restrictions allow reasonable confidence that the primary sequences share the conformation of the tertiary structural templates, except in the less conserved loop regions. Multiple structural superpositions were collected for 38 familial groups containing a total of 209 tertiary structures; 45 structures had no superposable mates and were used individually. Other information is also provided as main-chain and side-chain conformational angles, secondary structural assignments and the like. Wedding the primary and tertiary structural data resulted in an 8-fold increase of data bank sequence entries over those associated with the known three-dimensional architectures alone.  相似文献   

12.
Cytochrome P450 enzymes are hemeproteins that catalyze the monooxygenation of a wide‐range of structurally diverse substrates of endogenous and exogenous origin. These heme monooxygenases receive electrons from NADH/NADPH via electron transfer proteins. The cytochrome P450 enzymes, which constitute a diverse superfamily of more than 8,700 proteins, share a common tertiary fold but < 25% sequence identity. Based on their electron transfer protein partner, cytochrome P450 proteins are classified into six broad classes. Traditional methods of protein classification are based on the canonical paradigm that attributes proteins’ function to their three‐dimensional structure, which is determined by their primary structure that is the amino acid sequence. It is increasingly recognized that protein dynamics play an important role in molecular recognition and catalytic activity. As the mobility of a protein is an intrinsic property that is encrypted in its primary structure, we examined if different classes of cytochrome P450 enzymes display any unique patterns of intrinsic mobility. Normal mode analysis was performed to characterize the intrinsic dynamics of five classes of cytochrome P450 proteins. The present study revealed that cytochrome P450 enzymes share a strong dynamic similarity (root mean squared inner product > 55% and Bhattacharyya coefficient > 80%), despite the low sequence identity (< 25%) and sequence similarity (< 50%) across the cytochrome P450 superfamily. Noticeable differences in Cα atom fluctuations of structural elements responsible for substrate binding were noticed. These differences in residue fluctuations might be crucial for substrate selectivity in these enzymes.  相似文献   

13.
We propose a simple yet reliable computational framework that characterizes the differential mass and hydrophobicity distribution within structural classes of proteins. Radial partitioning of protein interior that could successfully distinguish the mass and hydrophobicity distribution patterns in extremophilic proteins from that in their structurally aligned mesophilic counterparts. Distance-dependent mass and hydrophobicity magnitudes could retrieve vital structural insights; needed to probe the hidden connections between packing, folding and stability within different structural classes of proteins, with causality. New computational markers; one, to represent the total mass content; other, related to hydrophobic centrality of proteins, are proposed as well. Results reveal that mass and hydrophobicity packing within extremophilic proteins is indeed more compact than that in their mesophilic counterparts. Analysis of structural constraints within them vindicate it. Total mass (and hydrophobicity) content is found to be maximum in α/β thermophilic proteins and minimum for the all-α mesophilic proteins. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

14.
We have compared sequence and structural features of 35 proteins to their metabolic stabilities in HeLa cells. No relationship was observed between the half-life of an injected protein and its subunit molecular weight, isoelectric point, hydrophobicity, thermostability, surface charge density, or N-terminal residue. Other properties, including susceptibility to oxidation, specific combinations of amino acids, secondary structure composition, and solvent exposed residues, also failed to correlate with protein stability. Although a weak inverse correlation was obtained when stability was compared to asparagine and glutamine content, we conclude that the degradation of an injected protein is unlikely to be related to any single structural parameter. Rather, we hypothesize that it results from an interplay between subcellular location and still poorly defined surface features of the injected proteins.  相似文献   

15.
Structural genomics is the idea of covering protein space so that every protein sequence comes within model building distance of a protein of known structure. Unfortunately, reproducing the structural alignment of distantly related proteins is a difficult challenge to existing sequence alignment and motif search software. We have developed a new transitive alignment algorithm (MaxFlow), which generates accurate alignments between proteins deep in the twilight zone of sequence similarity, below 20% sequence identity. In particular, MaxFlow reliably identifies conserved core motifs between proteins which are only indirect PSI-Blast neighbours. Based on MaxFlow alignments, useful 3D models can be generated for all members of a superfamily from as few as a single structural template – despite hundreds of representatives at 40% sequence identity level and patchy detection of homology by PSI-Blast. We propose novel strategies for target prioritization using MaxFlow scores to predict the optimal templates in a superfamily. Our results support an increase in the granularity of covering protein space that has potentially enormous economic implications for planning the transition to the full production phase of structural genomics.  相似文献   

16.
17.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

18.
If it is assumed that the primary sequence determines the three-dimensional folded structure of a protein, then the regular folding patterns, such as alpha-helix, beta-sheet, and other ordered patterns in the three-dimensional structure must correspond to the periodic distribution of the physical properties of the amino acids along the primary sequence. An AutoRegressive Moving Average (ARMA) model method of spectral analysis is applied to analyze protein sequences represented by the hydrophobicity of their amino acids. The results for several membrane proteins of known structures indicate that the periodic distribution of hydrophobicity of the primary sequence is closely related to the regular folding patterns in a protein's three-dimensional structure. We also applied the method to the transmembrane regions of acetylcholine receptor alpha subunit and Shaker potassium channel for which no atomic resolution structure is available. This work is an extension of our analysis of globular proteins by a similar method.  相似文献   

19.
Tao Y  Julian RR 《Biochemistry》2012,51(8):1796-1802
A simple mass spectrometry-based method capable of examining protein structure called SNAPP (selective noncovalent adduct protein probing) is used to evaluate the structural consequences of point mutations in naturally occurring sequence variants from different species. SNAPP monitors changes in the attachment of noncovalent adducts to proteins as a function of structural state. Mutations that lead to perturbations to the electrostatic surface structure of a protein affect noncovalent attachment and are easily observed with SNAPP. Mutations that do not alter the tertiary structure or electrostatic surface structure yield similar results by SNAPP. For example, bovine, porcine, and human insulin all have very similar backbone structures and no basic or acidic residue mutations, and the SNAPP distributions for all three proteins are very similar. In contrast, four variants of cytochrome c (cytc) have varying degrees of sequence homology, which are reflected in the observed SNAPP distributions. Bovine and pigeon cytc have several basic or acidic residue substitutions relative to horse cytc, but the SNAPP distributions for all three proteins are similar. This suggests that these mutations do not significantly influence the protein surface structure. On the other hand, yeast cytc has the least sequence homology and exhibits a unique, though related, SNAPP distribution. Even greater differences are observed for lysozyme. Hen and human lysozyme have identical tertiary structures but significant variations in the locations of numerous basic and acidic residues. The SNAPP distributions are quite distinct for the two forms of lysozyme, suggesting significant differences in the surface structures. In summary, SNAPP experiments are relatively easy to perform, require minimal sample consumption, and provide a facile route for comparison of protein surface structure between highly homologous proteins.  相似文献   

20.
A specific treatment of recurrent structural motifs that represent the local bias information has been proven to be an important ingredient in de novo protein structure predication. Significant majority of methods for local structure are based on building blocks, which still suffer from its inherent discrete nature. Instead of using building blocks, this work presents a new protocol framework for local structural motifs prediction based on the direct locating along protein sequence and probabilistic sampling in a continuous (φ, ψ) space. The protein sequence was first scanned by an algorithm of sliding window with variable length of 7 to 19 residues, to match local segments to one of 82 motifs patterns in the fragment library. Identified segments were then labeled and modeled as the correlations of backbone torsion angles with mixture of bivariate cosine distributions in continuous (φ, ψ) space. 3D conformations of corresponding segments were finally sampled by using a backtrack algorithm to the hidden Markov model with single output of (φ, ψ). For local motifs in 50 proteins of testing set, about 62% of eight-residue segments located with high confidence value were predicted within 1.5 ? of their native structures by the method. Majority of local structural motifs were identified and sampled, which indicates the proposed protocol may at least serve as the foundation to obtain better protein tertiary structure prediction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号