首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many large proteins have evolved by internal duplication and fusion. For proteins with internal structural symmetry, this means that their sequences should be made up of identical repeats. However, many of these repeat signals can only be seen at the structural level yet. We suggested a method of recurrent correlation analysis to detect the sequence repeats of proteins directly from their sequences. It showed that the internal repetitions of the representative proteins in six folds of mainly beta class could be identified directly at the sequence level.  相似文献   

2.
Stevens TJ  Paoli M 《Proteins》2008,70(2):378-387
The beta-propeller fold is a phylogenetically widespread, common protein architecture able to support a range of different functions such as catalysis, ligand binding and transport, regulation and protein binding. Interestingly, it appears that the beta-propeller topology is also compatible with strikingly diverse sequences. Amongst this diversity, there are three large groups of proteins with related sequences and very important cellular and intercellular regulatory functions: WD, kelch, and YWTD proteins. A common characteristic between these protein families is that their sequences, while distinct, all contain internal repeats 40-45 residues long. Through a pangenomic analysis using internal repeat profiles derived from the structurally known propeller modules of the eukaryotic protein RCC1 and the related prokaryotic protein BLIP-II, we have defined a new superfamily of propeller repeats, the RCC1-like repeats (RLRs). These sequences turn out to be more phylogenetically widespread than other large groups of propeller proteins, occurring in both prokaryotic and eukaryotic genomes. Interestingly, our research showed that RLR domains with different numbers of repeats exist, ranging from 3 to 7, and possibly more. A novel, intriguing finding is the discovery of sequences with 3 repeats, as well as proteins with 10 modular units, though in the latter case it is not clear whether these are made of two 5-bladed domains or a single, novel 10-bladed propeller. In addition, the results indicate that circular permutation events may have taken place in the evolution of these proteins. It is now established that the group of RLR proteins is extremely numerous and is characterized by unique, remarkable features which place it in a position of special interest as an important superfamily of proteins in nature.  相似文献   

3.
Internal repeats in protein sequences have wide-ranging implications for the structure and function of proteins. A keen analysis of the repeats in protein sequences may help us to better understand the structural organization of proteins and their evolutionary relations. In this paper, a mathematical method for searching for latent periodicity in protein sequences is developed. Using this method, we identified simple sequence repeats in the alkaline proteases and found that the sequences could show the same periodicity as their tertiary structures. This result may help us to reduce difficulties in the study of the relationship between sequences and their structures.  相似文献   

4.
At least nine inherited neurodegenerative diseases, including Huntington's, are caused by poly(L-glutamine) (polyGln, polyQ) expansions > 35-40 repeats in widely or ubiquitously expressed proteins. Except for their expansions, these proteins have no sequence homologies, and their functions mostly remain unknown. Although each disease is characterized by a distinct pathology specific to a subset of neuronal cells, the formation of neuronal intranuclear aggregates containing protein with an expanded polyQ is the hallmark and common feature to most polyQ disorders. The neurodegeneration is thought to be caused by a toxic gain of function that occurs at the protein level and depends on the length of the expansion: Longer repeats cause earlier age of onset and more severe symptoms. To address whether there is a structural difference between polyQ having < 40 versus > 40 residues, we undertook an X-ray fiber diffraction study of synthetic polyQ peptides having varying numbers of residues: Ac-Q8-NH2, D2Q15K2, K2Q28K2, and K2Q45K2. These particular lengths bracket both the range of normalcy (9-36 repeats) and the pathological (45 repeats), and therefore could be indicative of the structural changes expected in expanded polyQ domains. Contrary to expectations of different length-dependent morphologies, we accounted for all the X-ray patterns by slablike, beta-sheet structures, approximately 20 A thick in the beta-chain direction, all having similar monoclinic lattices. Moreover, the slab thickness indicates that K2Q45K2, rather than forming a water-filled nanotube, must form multiple reverse turns.  相似文献   

5.
In recent years, a number of new protein structures that possess tandem repeats have emerged. Many of these proteins are comprised of tandem arrays of β-hairpins. Today, the amount and variety of the data on these β-hairpin repeat (BHR) structures have reached a level that requires detailed analysis and further classification. In this paper, we classified the BHR proteins, compared structures, sequences of repeat motifs, functions and distribution across the major taxonomic kingdoms of life and within organisms. As a result, we identified six different BHR folds in tandem repeat proteins of Class III (elongated structures) and one BHR fold (up-and-down β-barrel) in Class IV (“closed” structures). Our survey reveals the high incidence of the BHR proteins among bacteria and viruses and their possible relationship to the structures of amyloid fibrils. It indicates that BHR folds will be an attractive target for future structural studies, especially in the context of age-related amyloidosis and emerging infectious diseases. This work allowed us to update the RepeatsDB database, which contains annotated tandem repeat protein structures and to construct sequence profiles based on BHR structural alignments.  相似文献   

6.
Proteins that share even low sequence homologies are known to adopt similar folds. The beta-propeller structural motif is one such example. Identifying sequences that adopt a beta-propeller fold is useful to annotate protein structure and function. Often, tandem sequence repeats provide the necessary signal for identifying beta-propellers in proteins. In our recent analysis to identify cell surface proteins in archaeal and bacterial genomes, we identified some proteins that contain novel tandem repeats "LVIVD", "RIVW" and "LGxL". In this work, based on protein fold predictions and three-dimensional comparative modeling methods, we predicted that these repeat types fold as beta-propeller. Further, the evolutionary trace analysis of all proteins constituting amino acid sequence repeats in beta-propellers suggest that the novel repeats have diverged from a common ancestor.  相似文献   

7.
Comparative modeling methods can consistently produce reliable structural models for protein sequences with more than 25% sequence identity to proteins with known structure. However, there is a good chance that also sequences with lower sequence identity have their structural components represented in structural databases. To this end, we present a novel fragment-based method using sets of structurally similar local fragments of proteins. The approach differs from other fragment-based methods that use only single backbone fragments. Instead, we use a library of groups containing sets of sequence fragments with geometrically similar local structures and extract sequence related properties to assign these specific geometrical conformations to target sequences. We test the ability of the approach to recognize correct SCOP folds for 273 sequences from the 49 most popular folds. 49% of these sequences have the correct fold as their top prediction, while 82% have the correct fold in one of the top five predictions. Moreover, the approach shows no performance reduction on a subset of sequence targets with less than 10% sequence identity to any protein used to build the library.  相似文献   

8.
The biologically active state of many proteins requires their prior homo-oligomerisation. Such complexes are typically symmetrical, a feature that has been proposed to increase their stability and facilitate the evolution of allosteric regulation. We wished to examine the possibility that similar structures and properties could arise from genetic amplifications leading to internal symmetrical repeats. For this, we identified internal structural repeats in a nonredundant Protein Data Bank subset. While testing if repeats in proteins tend to be symmetrical, we found that about half of the large internal repeats are symmetrical, most frequently around a rotation axis of 180°. These repeats were most likely created by genetic amplification processes because they show significant sequence similarity. Symmetrical repeats tend to have a fixed number of copies corresponding to their rotational symmetry order, that is, two for 180° rotation axis, whereas asymmetrical repeats are in longer proteins and show copy number variability. When possible, we confirmed that proteins with symmetrical repeats folding as an n-mer have homologues lacking the repeat with a higher oligomerisation number corresponding to the rotation symmetry order of the repeat. Phylogenetic analyses of these protein families suggest that typically, but not always, symmetrical repeats arise in one single event from proteins that are homo-oligomers. These results suggest that oligomerisation and amplification of internal sequences can interplay in evolutionary terms because they result in functional analogues when the latter exhibit rotational symmetry.  相似文献   

9.
Silva PJ 《Proteins》2008,70(4):1588-1594
Hydrophobic cluster analysis (HCA) has long been used as a tool to detect distant homologies between protein sequences, and to classify them into different folds. However, it relies on expert human intervention, and is sensitive to subjective interpretations of pattern similarities. In this study, we describe a novel algorithm to assess the similarity of hydrophobic amino acid distributions between two sequences. Our algorithm correctly identifies as misattributions several HCA-based proposals of structural similarity between unrelated proteins present in the literature. We have also used this method to identify the proper fold of a large variety of sequences, and to automatically select the most appropriate structure for homology modeling of several proteins with low sequence identity to any other member of the protein data bank. Automatic modeling of the target proteins based on these templates yielded structures with TM-scores (vs. experimental structures) above 0.60, even without further refinement. Besides enabling a reliable identification of the correct fold of an unknown sequence and the choice of suitable templates, our algorithm also shows that whereas most structural classes of proteins are very homogeneous in hydrophobic cluster composition, a tenth of the described families are compatible with a large variety of hydrophobic patterns. We have built a browsable database of every major representative hydrophobic cluster pattern present in each structural class of proteins, freely available at http://www2.ufp.pt/ pedros/HCA_db/index.htm.  相似文献   

10.
Gromiha MM  Suwa M 《Proteins》2006,63(4):1031-1037
Discriminating outer membrane proteins (OMPs) from other folding types of globular and membrane proteins is an important task both for identifying OMPs from genomic sequences and for the successful prediction of their secondary and tertiary structures. In this work, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees, etc. for discriminating OMPs. We found that most of the machine learning techniques discriminate OMPs with similar accuracy. The neural network-based method could discriminate the OMPs from other proteins [globular/transmembrane helical (TMH)] at the fivefold cross-validation accuracy of 91.0% in a dataset of 1,088 proteins. The accuracy of discriminating globular proteins is 88.8% and that of TMH proteins is 93.7%. Further, the neural network method is tested with globular proteins belonging to 30 different folding types and it could successfully exclude 95% of the considered proteins. The proteins with SAM domain such as knottins, rubredoxin, and thioredoxin folds are eliminated with 100% accuracy. These accuracy levels are comparable to or better than other methods in the literature. We suggest that this method could be effectively used to discriminate OMPs and for detecting OMPs in genomic sequences.  相似文献   

11.
Extended retro (reversed) peptide sequences have not previously been accommodated within functional proteins. Here, we show that the entire transmembrane portion of the beta-barrel of the pore-forming protein alpha-hemolysin can be formed by retrosequences comprising a total of 175 amino acid residues, 25 contributed by the central sequence of each subunit of the heptameric pore. The properties of wild-type and retro heptamers in planar bilayers are similar. The single-channel conductance of the retro pore is 15% less than that of the wild-type heptamer and its current-voltage relationship denotes close to ohmic behavior, while the wild-type pore is weakly rectifying. Both wild-type and retro pores are very weakly anion selective. These results and the examination of molecular models suggest that beta-barrels may be especially accepting of retro sequences compared to other protein folds. Indeed, the ability to form a retro domain could be diagnostic of a beta-barrel, explaining, for example, the activity of the retro forms of many membrane-permeabilizing peptides. By contrast with the wild-type subunits, monomeric retro subunits undergo premature assembly in the absence of membranes, most likely because the altered central sequence fails to interact with the remainder of the subunit, thereby initiating assembly. Despite this difficulty, a technique was devised for obtaining heteromeric pores containing both wild-type and retro subunits. Most probably as a consequence of unfavorable interstrand side-chain interactions, the heteromeric pores are less stable than either the wild-type or retro homoheptamers, as judged by the presence of subconductance states in single-channel recordings. Knowledge about the extraordinary plasticity of the transmembrane beta-barrel of alpha-hemolysin will be very useful in the de novo design of functional membrane proteins based on the beta-barrel motif.  相似文献   

12.
Bub3p is a protein that mediates the spindle checkpoint, a signaling pathway that ensures correct chromosome segregation in organisms ranging from yeast to mammals. It is known to function by co-localizing at least two other proteins, Mad3p and the protein kinase Bub1p, to the kinetochore of chromosomes that are not properly attached to mitotic spindles, ultimately resulting in cell cycle arrest. Prior sequence analysis suggested that Bub3p was composed of three or four WD repeats (also known as WD40 and beta-transducin repeats), short sequence motifs appearing in clusters of 4-16 found in many hundreds of eukaryotic proteins that fold into four-stranded blade-like sheets. We have determined the crystal structure of Bub3p from Saccharomyces cerevisiae at 1.1 angstrom and a crystallographic R-factor of 15.3%, revealing seven authentic repeats. In light of this, it appears that many of these repeats therefore remain hidden in sequences of other proteins. Analysis of random and site-directed mutants identifies the surface of Bub3p involved in checkpoint function through binding of Bub1p and Mad3p. Sequence alignments indicate that these surfaces are mostly conserved across Bub3 proteins from diverse species. A structural comparison with other proteins containing WD repeats suggests that these folds may bind partner proteins using similar surface areas on the top and sides of the propeller. The sequences composing these regions are the most divergent within the repeat across all WD repeat proteins and could potentially be modulated to provide specificity in partner protein binding without perturbation of the core structure.  相似文献   

13.
Hudáky I  Perczel A 《Proteins》2008,70(4):1389-1407
The prolylproline sequence unit is found in several naturally occurring linear and cyclic peptides with immunosuppressive and toxic activity. Furthermore, Pro-Pro units are abundant in collagen, in ligand motifs binding to SH3 or WW domains, as well as in vital enzymes such as DNA glycosylase and thrombin. In all these sequence units a special role is dedicated to conformation in order to successfully fulfill the appropriate biological function. Therefore, a detailed analysis of the basic conformational properties of Pro-Pro is expected to reveal the versatile structural role of this sequence. PCM (polarizable continuum model) calculations on the basis of ab initio and density functional theory investigations using the model peptide HCO-L-Pro-L-Pro-NH2 are presented. Cis-trans isomerism, backbone conformation and ring puckering are studied. A systematic comparison is made to experimental data gained on L-prolyl-L-proline sequence units retrieved from the Protein Data Bank as well as from the Cambridge Structural Database. PCM data are in good agreement with high-resolution X-ray crystallography. Population data derived from energy calculations and those gained directly from statistics predict that 87% of the Pro-Pro sequence units adopt elongated structures, while 13% form beta-turns. Both approaches prefer the same 6 out of the 36 ideally possible backbone folds. Polyproline II unit (t epsilonL t epsilonL), other elongated structures (c epsilonL t epsilonL, t epsilonL t alphaL and t epsilonL t gammaL), type VIa (t epsilonL c alphaL) and type I or III beta-turns (t alphaL t alphaL) altogether describe 96% of the prolylproline sequences. In disordered proteins or domains, Pro-Pro sequence units may sample the various conformers and contribute to the segmental motions.  相似文献   

14.
Nucleoporins with phenylalanine-glycine repeats (FG Nups) function at the nuclear pore complex (NPC) to facilitate nucleocytoplasmic transport. In Saccharomyces cerevisiae, each FG Nup contains a large natively unfolded domain that is punctuated by FG repeats. These FG repeats are surrounded by hydrophilic amino acids (AAs) common to disordered protein domains. Here we show that the FG domain of Nups from human, fly, worm, and other yeast species is also enriched in these disorder-associated AAs, indicating that structural disorder is a conserved feature of FG Nups and likely serves an important role in NPC function. Despite the conservation of AA composition, FG Nup sequences from different species show extensive divergence. A comparison of the AA substitution rates of proteins with syntenic orthologs in four Saccharomyces species revealed that FG Nups have evolved at twice the rate of average yeast proteins with most substitutions occurring in sequences between FG repeats. The rapid evolution of FG Nups is poorly explained by parameters known to influence AA substitution rate, such as protein expression level, interactivity, and essentiality; instead their rapid evolution may reflect an intrinsic permissiveness of natively unfolded structures to AA substitutions. The overall lack of AA sequence conservation in FG Nups is sharply contrasted by discrete stretches of conserved sequences. These conserved sequences highlight known karyopherin and nucleoporin binding sites as well as other uncharacterized sites that may have important structural and functional properties.  相似文献   

15.
MOTIVATION: An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. RESULTS: We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. AVAILABILITY: Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID  相似文献   

16.
We used sequence and structural comparisons to determine the fold for eukaryotic ornithine decarboxylase, which we found is related to alanine racemase. These enzymes have no detectable sequence identity with any protein of known structure, including three pyridoxal phosphate-utilizing enzymes. Our studies suggest that the N-terminal domain of ornithine decarboxylase folds into a beta/alpha-barrel. Through the analysis of known barrel structures we developed a topographic model of the pyridoxal phosphate-binding domain of ornithine decarboxylase, which predicts that the Schiff base lysine and a conserved glycine-rich sequence both map to the C-termini of the beta-strands. Other residues in this domain that are likely to have essential roles in catalysis, substrate, and cofactor binding were also identified, suggesting that this model will be a suitable guide to mutagenic analysis of the enzyme mechanism.  相似文献   

17.
The aromatic di-alanine repeat is a novel 12-amino acid-long motif constituting alternate small and large hydrophobic residues that mediate the close packing of alpha-helices. A hidden Markov model profile was constructed from the motifs initially described in Soluble N-ethyl maleimide-sensitive factor attachment proteins (SNAP), a family of soluble proteins involved in intracellular membrane fusion. Scanning different sets of protein sequences showed unambiguously that this profile defines a structural motif independent of the tetratrico peptide repeat, another widespread alpha-helical motif. In addition to SNAP, aromatic di-alanine repeats are found in selective LIM homeodomain binding proteins (SLB) and in proteins from the Pyrococcus and Archaeoglobus prokaryotes.  相似文献   

18.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

19.
A previous NMR investigation of model decapeptides with identical beta-strand sequences and different turn sequences demonstrated that, in these peptide systems, the turn residues played a more predominant role in defining the type of beta-hairpin adopted than cross-strand side-chain interactions. This result needed to be tested in longer beta-hairpin forming peptides, containing more potentially stabilizing cross-strand hydrogen bonds and side-chain interactions that might counterbalance the influence of the turn sequence. In that direction, we report here on the design and 1H NMR conformational study of three beta-hairpin forming pentadecapeptides. The design consists of adding two and three residues at the N- and C-termini, respectively, of the previously studied decapeptides. One of the designed pentadecapeptides includes a potentially stabilizing R-E salt bridge to investigate the influence of this interaction on beta-hairpin stability. We suggest that this peptide self-associates by forming intermolecular salt bridges. The other two pentadecapeptides behave as monomers. A conformational analysis of their 1H NMR spectra reveals that they adopt different types of beta-hairpin structure despite having identical strand sequences. Hence, the beta-turn sequence drives beta-hairpin formation in the investigated pentadecapeptides that adopt beta-hairpins that are longer than the average protein beta-hairpins. These results reinforce our previous suggestion concerning the key role played by the turn sequence in directing the kind of beta-hairpin formed by designed peptides.  相似文献   

20.
The dramatically increasing number of new protein sequences arising from genomics 4 proteomics requires the need for methods to rapidly and reliably infer the molecular and cellular functions of these proteins. One such approach, structural genomics, aims to delineate the total repertoire of protein folds in nature, thereby providing three-dimensional folding patterns for all proteins and to infer molecular functions of the proteins based on the combined information of structures and sequences. The goal of obtaining protein structures on a genomic scale has motivated the development of high throughput technologies and protocols for macromolecular structure determination that have begun to produce structures at a greater rate than previously possible. These new structures have revealed many unexpected functional inferences and evolutionary relationships that were hidden at the sequence level. Here, we present samples of structures determined at Berkeley Structural Genomics Center and collaborators laboratories to illustrate how structural information provides and complements sequence information to deduce the functional inferences of proteins with unknown molecular functions.Two of the major premises of structural genomics are to discover a complete repertoire of protein folds in nature and to find molecular functions of the proteins whose functions are not predicted from sequence comparison alone. To achieve these objectives on a genomic scale, new methods, protocols, and technologies need to be developed by multi-institutional collaborations worldwide. As part of this effort, the Protein Structure Initiative has been launched in the United States (PSI; www.nigms.nih.gov/funding/psi.html). Although infrastructure building and technology development are still the main focus of structural genomics programs [1–6], a considerable number of protein structures have already been produced, some of them coming directly out of semi-automated structure determination pipelines [6–10]. The Berkeley Structural Genomics Center (BSGC) has focused on the proteins of Mycoplasma or their homologues from other organisms as its structural genomics targets because of the minimal genome size of the Mycoplasmas as well as their relevance to human and animal pathogenicity (http://www.strgen.org). Here we present several protein examples encompassing a spectrum of functional inferences obtainable from their three-dimensional structures in five situations, where the inferences are new and testable, and are not predictable from protein sequence information alone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号