首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

2.
The pellicles of alveolates (ciliates, apicomplexans, and dinoflagellates) share a common organization, yet perform very divergent functions, including motility, host cell invasion, and armor. The alveolate pellicle consists of a system of flattened membrane sacs (alveoli, which are the defining feature of the group) below the plasma membrane that is supported by a membrane skeleton as well as a network of microtubules and other filamentous elements. We recently showed that a family of proteins, alveolins, are common and unique to this pellicular structure in alveolates. To identify additional proteins that contribute to this structure, a pellicle proteome study was conducted for the ciliate Tetrahymena thermophila. We found 1,173 proteins associated with this structure, 45% (529 proteins) of which represented novel proteins without matches to other functionally characterized proteins. Expression of four newly identified T. thermophila pellicular proteins as green fluorescent protein-fusion constructs confirmed pellicular location, and one new protein located in the oral apparatus. Bioinformatic analysis revealed that 21% of the putative pellicular proteins, predominantly the novel proteins, contained highly repetitive regions with strong amino acid biases for particular residues (K, E, Q, L, I, and V). When the T. thermophila novel proteins were compared with apicomplexan genomic data, 278 proteins with high sequence similarity were identified, suggesting that many of these putative pellicular components are shared between the alveolates. Of these shared proteins, 126 contained the distinctive repeat regions. Localization of two such proteins in Toxoplasma gondii confirmed their role in the pellicle and in doing so identified two new proteins of the apicomplexan invasive structure--the apical complex. Screening broadly for these repetitive domains in genomic data revealed large and actively evolving families of such proteins in alveolates, suggesting that these proteins might underpin the diversity and utility of their unique pellicular structure.  相似文献   

3.
The inappropriate genetic expansion of various repetitive DNA sequences underlies over 20 distinct inherited diseases. The genetic context of these repeats in exons, introns and untranslated regions has played a major role in thinking about the mechanisms by which various repeat expansions might cause disease. Repeat expansions in exons are thought to give rise to expanded toxic protein repeats (i.e. polyQ). Repeat expansions in introns and UTRs (i.e. FXTAS) are thought to produce aberrant repeat-bearing RNAs that interact with and sequester a wide variety of essential proteins, resulting in cellular toxicity. However, a new phenomenon termed ‘repeat-associated nonAUG dependent (RAN) translation’ paints a new and unifying picture of how distinct repeat expansion-bearing RNAs might act as substrates for this noncanonical form of translation, leading to the production of a wide range of repeat sequence-specific-encoded toxic proteins. Here, we review how the model system Caenorhabditis elegans has been utilized to model many repeat disorders and discuss how RAN translation could be a previously unappreciated contributor to the toxicity associated with these different models.  相似文献   

4.
The loops which connect or flank helices/sheets in protein structures are known to be functionally important. However, ironically they also belong to the part of protein whose structure is least accurately predicted. Here, a new method to isolate and analyze loop regions in protein structure is proposed using the spatial coordinates of the solved three‐dimensional structure. The extent of dispersion among points of successive amino acid residues in the Ramachandran map of protein region is utilized to calculate the Mean Separation between these points in the Ramachandran Plot (MSRP). Based on analysis of 2935 protein secondary structure regions obtained using DSSP software, spanning a range from 2 to 64 residues, taken from a set of 170 proteins, it is shown that helices (MSRP < 17) and strands (MSRP < 64) stand effectively demarcated from the loop regions (MSRP > 130). Analysis of 43 DNA binding and 98 ligand binding proteins revealed several loop regions with clear change in MSRP subsequent to binding. The population of such loops correlated with the magnitude of backbone displacement in the protein subsequent to binding. Can changes in MSRP quantify the temporal oscillations in dihedral angles among structured/unstructured regions in proteins? Molecular dynamics simulations (10 ns) revealed that deviations in MSRP among different snapshots in the trajectory were at least twofold higher for unstructured proteins in comparison with ordered proteins. The above results validate the use of MSRP parameter as a tool to identify and investigate functionally active loops and unstructured regions in protein structures. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

5.
Local structural disorder imparts plasticity on linear motifs   总被引:5,自引:0,他引:5  
MOTIVATION: The dynamic nature of protein interaction networks requires fast and transient molecular switches. The underlying recognition motifs (linear motifs, LMs) are usually short and evolutionarily variable segments, which in several cases, such as phosphorylation sites or SH3-binding regions, fall into locally disordered regions. We probed the generality of this phenomenon by predicting the intrinsic disorder of all LM-containing proteins enlisted in the Eukaryotic Linear Motif (ELM) database. RESULTS: We demonstrated that LMs in average are embedded in locally unstructured regions, while their amino acid composition and charge/hydropathy properties exhibit a mixture characteristic of folded and disordered proteins. Overall, LMs are constructed by grafting a few specificity-determining residues favoring structural order on a highly flexible carrier region. These results establish a connection between LMs and molecular recognition elements of intrinsically unstructured proteins (IUPs), which realize a non-conventional mode of partner binding mostly in regulatory functions.  相似文献   

6.
Eukaryotic ribosomal RNA genes contain rapidly evolving regions of unknown function termed expansion segments. We present the comparative analysis of the primary and secondary structure of two expansion segments from the large subunit rRNA gene of ten species of Drosophila and the tsetse fly species Glossina morsitans morsitans. At the primary sequence level, most of the differences observed in the sequences obtained are single base substitutions. This is in marked contrast with observations in vertebrate species in which the insertion or deletion of repetitive motifs, probably generated by a DNA-slippage mechanism, is a major factor in the evolution of these regions. The secondary structure of the two regions, supported by multiple compensatory base changes, is highly conserved between the species examined and supports the existence of a general folding pattern for all eukaryotes. Intriguingly, the evolutionary rate of expansion segments is very slow relative to other genic and non-genic regions of the Drosophila genome. These results suggest that the evolution of expansion segments in the rDNA multigene family is a balance between the homogenization of new mutations by unequal crossing over and a combination of selection against some such mutations per se and selection for subsequent compensatory mutations, in order to maintain a particular RNA secondary structure.  相似文献   

7.
MOTIVATION: Partially and wholly unstructured proteins have now been identified in all kingdoms of life--more commonly in eukaryotic organisms. This intrinsic disorder is related to certain critical functions. Apart from their fundamental interest, unstructured regions in proteins may prevent crystallization. Therefore, the prediction of disordered regions is an important aspect for the understanding of protein function, but may also help to devise genetic constructs. RESULTS: In this paper we present a computational tool for the detection of unstructured regions in proteins based on two properties of unfolded fragments: (1) disordered regions have a biased composition and (2) they usually contain either small or no hydrophobic clusters. In order to quantify these two facts we first calculate the amino acid distributions in structured and unstructured regions. Using this distribution, we calculate for a given sequence fragment the probability to be part of either a structured or an unstructured region. For each amino acid, the distance to the nearest hydrophobic cluster is also computed. Using these three values along a protein sequence allows us to predict unstructured regions, with very simple rules. This method requires only the primary sequence, and no multiple alignment, which makes it an adequate method for orphan proteins. AVAILABILITY: http://genomics.eu.org/  相似文献   

8.
GlobPlot: Exploring protein sequences for globularity and disorder   总被引:2,自引:0,他引:2  
A major challenge in the proteomics and structural genomics era is to predict protein structure and function, including identification of those proteins that are partially or wholly unstructured. Non-globular sequence segments often contain short linear peptide motifs (e.g. SH3-binding sites) which are important for protein function. We present here a new tool for discovery of such unstructured, or disordered regions within proteins. GlobPlot (http://globplot.embl.de) is a web service that allows the user to plot the tendency within the query protein for order/globularity and disorder. We show examples with known proteins where it successfully identifies inter-domain segments containing linear motifs, and also apparently ordered regions that do not contain any recognised domain. GlobPlot may be useful in domain hunting efforts. The plots indicate that instances of known domains may often contain additional N- or C-terminal segments that appear ordered. Thus GlobPlot may be of use in the design of constructs corresponding to globular proteins, as needed for many biochemical studies, particularly structural biology. GlobPlot has a pipeline interface--GlobPipe--for the advanced user to do whole proteome analysis. GlobPlot can also be used as a generic infrastructure package for graphical displaying of any possible propensity.  相似文献   

9.
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks.  相似文献   

10.
The II-III loop of the dihydropyridine receptor (DHPR) alpha(1s) subunit is a modulator of the ryanodine receptor (RyR1) Ca(2+) release channel in vitro and is essential for skeletal muscle contraction in vivo. Despite its importance, the structure of this loop has not been reported. We have investigated its structure using a suite of NMR techniques which revealed that the DHPR II-III loop is an intrinsically unstructured protein (IUP) and as such belongs to a burgeoning structural class of functionally important proteins. The loop does not possess a stable tertiary fold: it is highly flexible, with a strong N-terminal helix followed by nascent helical/turn elements and unstructured segments. Its residual structure is loosely globular with the N and C termini in close proximity. The unstructured nature of the II-III loop may allow it to easily modify its interaction with RyR1 following a surface action potential and thus initiate rapid Ca(2+) release and contraction. The in vitro binding partner for the II-III was investigated. The II-III loop interacts with the second of three structurally distinct SPRY domains in RyR1, whose function is unknown. This interaction occurs through two preformed N-terminal alpha-helical regions and a C-terminal hydrophobic element. The A peptide corresponding to the helical N-terminal region is a common probe of RyR function and binds to the same SPRY domain as the full II-III loop. Thus the second SPRY domain is an in vitro binding site for the II-III loop. The possible in vivo role of this region is discussed.  相似文献   

11.
Highly reiterated repetitive domains occur within the protein antigens of many parasitic taxa, including Plasmodium, Trypanosoma, Leishmania and Toxoplasma. In malaria it has been proposed that repeat regions may function as ligands for host proteins, or serve to suppress the development of immunity through a strategy of serological crossreactivity. In this article Louis Schofield presents a novel hypothesis, based on empirical evidence, that repetitive domains in antigens do not elicit protective immune responses and instead have evolved as a mechanism of immune evasion by their ability to induce thymus-independent B-cell activation. It is also proposed that this unusual response is associated with several forms of immunosuppression. The hypothesis has the added attraction of helping to explain several distinctive features of the molecular biology, evolution and immunology of repetitive regions in protein antigens of parasites.  相似文献   

12.
The preS1 of hepatitis B virus (HBV) is located at the outermost part of the envelope protein and possesses several functionally important regions such as hepatocyte receptor-binding site and virus-neutralizing epitopes. As the first step to understand the structure-function relationship for the preS1 antigen, we have purified the preS1 and performed its structural characterization by circular dichroism (CD) spectroscopy. The preS1 was purified to near homogeneity from bacterially expressed glutathione S-transferase (GST)-preS1 fusion protein by two-step purification, affinity chromatography on glutathione-agarose column, and cation-exchange chromatography on Mono S column. The CD analysis showed that the purified preS1, which was largely unstructured in aqueous solution, acquired a significant (16%) alpha-helical structure when analyzed in 50% trifluoroethanol or 20 mM SDS. The results suggest that the preS1 assumes a mainly unstructured conformation and may form induced secondary structures upon binding to target proteins or under hydrophobic environment.  相似文献   

13.
Intrinsic disorder in cell-signaling and cancer-associated proteins   总被引:3,自引:0,他引:3  
The number of intrinsically disordered proteins known to be involved in cell-signaling and regulation is growing rapidly. To test for a generalized involvement of intrinsic disorder in signaling and cancer, we applied a neural network predictor of natural disordered regions (PONDR VL-XT) to four protein datasets: human cancer-associated proteins (HCAP), signaling proteins (AfCS), eukaryotic proteins from SWISS-PROT (EU_SW) and non-homologous protein segments with well-defined (ordered) 3D structure (O_PDB_S25). PONDR VL-XT predicts >or=30 consecutive disordered residues for 79(+/-5)%, 66(+/-6)%, 47(+/-4)% and 13(+/-4)% of the proteins from HCAP, AfCS, EU_SW, and O_PDB_S25, respectively, indicating significantly more intrinsic disorder in cancer-associated and signaling proteins as compared to the two control sets. The disorder analysis was extended to 11 additional functionally diverse categories of human proteins from SWISS-PROT. The proteins involved in metabolism, biosynthesis, and degradation together with kinases, inhibitors, transport, G-protein coupled receptors, and membrane proteins are predicted to have at least twofold less disorder than regulatory, cancer-associated and cytoskeletal proteins. In contrast to 44.5% of the proteins from representative non-membrane categories, just 17.3% of the cancer-associated proteins had sequence alignments with structures in the Protein Data Bank covering at least 75% of their lengths. This relative lack of structural information correlated with the greater amount of predicted disorder in the HCAP dataset. A comparison of disorder predictions with the experimental structural data for a subset of the HCAP proteins indicated good agreement between prediction and observation. Our data suggest that intrinsically unstructured proteins play key roles in cell-signaling, regulation and cancer, where coupled folding and binding is a common mechanism.  相似文献   

14.
A variety of neurodegenerative disorders are associated with the expansion of trinucleotide repeat (TNR) sequences. These repetitive sequences are prone to adopting non-canonical structures, such as intrastrand stem-loop hairpins. Indeed, the formation and persistence of these hairpins during DNA replication and/or repair have been proposed as factors that facilitate TNR expansion. Given this proposed contribution of TNR hairpins to the expansion mechanism, disruption of such structures via strand invasion offers a potential means to negate the disease-initiating expansion. In this work, we investigated the strand invading abilities of a (CTG)3 unstructured nucleic acid on a (CAG)10 TNR hairpin. Using fluorescence, optical, and electrophoretic methods, instantaneous disruption of the (CAG)10 hairpin by (CTG)3 was observed at low temperatures. Additionally, we have identified three distinct duplex-like species that form between (CAG)10 and (CTG)3; these include 1, 2, or 3 (CTG)3 sequences hybridized to (CAG)10. The results presented here showcase (CTG)3 as an invader of a TNR hairpin and suggest that unstructured nucleic acids could serve as a scaffold to design agents to prevent TNR expansion.  相似文献   

15.
Repeat expansion diseases such as fragile X syndrome (FXS) result from increases in the size of a specific tandem repeat array. In addition to large expansions, small changes in repeat number and deletions are frequently seen in FXS pedigrees. No mouse model accurately recapitulates all aspects of this instability, particularly the occurrence of large expansions. This may be due to differences between mice and humans in CIS and/or TRANS-acting factors that affect repeat stability. The identification of such factors may help reveal the expansion mechanism and allow the development of suitable animal models for these disorders. We have examined the effect of age, dietary folate, and mutations in the Werner's syndrome helicase (WRN) and TRP53 genes on FXS repeat instability in mice. WRN facilitates replication of the FXS repeat and enhances Okazaki fragment processing, thereby reducing the incidence of processes that have been suggested to lead to expansion. p53 is a protein involved in DNA damage surveillance and repair. We find two types of repeat instability in these mice, small changes in repeat number that are seen at frequencies approaching 100%, and large deletions which occur at a frequency of about 10%. The frequency of these events was independent of WRN, p53, parental age, or folate levels. The large deletions occur at the same frequency in mice homozygous and heterozygous for the repeat suggesting that they are not the result of an interallelic recombination event. In addition, no evidence of large expansions was seen. Our data thus show that the absence of repeat expansions in mice is not due to a more efficient WRN protein or p53-mediated error correction mechanism, and suggest that these proteins, or the pathways in which they are active, may not be involved in expansion in humans either. Moreover, the fact that contractions occur in the absence of expansions suggests that these processes occur by different mechanisms.  相似文献   

16.
17.
Dafforn TR  Smith CJ 《EMBO reports》2004,5(11):1046-1052
It is commonly assumed that a protein must adopt a tertiary structure to achieve its active native state and that regions of a protein that are devoid of alpha-helix or beta-sheet structures are functionally inert. Although extended proline-rich regions are recognized as presenting binding motifs to, for example, Src homology 2 (SH2) and SH3 domains, the idea persists that natively unfolded regions in functional proteins are simply 'spacers' between the folded domains. Such a view has been challenged in recent years and the importance of natively unfolded proteins in biology is now being recognized. In this review, we highlight the role of natively unfolded domains in the field of endocytosis, and show that some important endocytic proteins lack a traditionally folded structure and harbour important binding motifs in their unstructured linker regions.  相似文献   

18.
Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein–protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e.g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.  相似文献   

19.
We have performed a statistical analysis of unstructured amino acid residues in protein structures available in the databank of protein structures. Data on the occurrence of disordered regions at the ends and in the middle part of protein chains have been obtained: in the regions near the ends (at distance less than 30 residues from the N- or C-terminus), there are 66% of unstructured residues (38% are near the N-terminus and 28% are near the C-terminus), although these terminal regions include only 23% of the amino acid residues. The frequencies of occurrence of unstructured residues have been calculated for each of 20 types in different positions in the protein chain. It has been shown that relative frequencies of occurrence of unstructured residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain; amino acid residues of the same type have different probabilities to be unstructured in the terminal regions and in the middle part of the protein chain. The obtained frequencies of occurrence of unstructured residues in the middle part of the protein chain have been used as a scale for predicting disordered regions from amino acid sequence using the method (FoldUnfold) previously developed by us. This scale of frequencies of occurrence of unstructured residues correlates with the contact scale (previously developed by us and used for the same purpose) at a level of 95%. Testing the new scale on a database of 427 unstructured proteins and 559 completely structured proteins has shown that this scale can be successfully used for the prediction of disordered regions in protein chains.  相似文献   

20.
Polyglutamine (polyQ) repeat-containing proteins are widespread in the human proteome but only nine of them are associated with highly incapacitating neurodegenerative disorders. The genetic expansion of the polyQ tract in disease-related proteins triggers a series of events resulting in neurodegeneration. The polyQ tract plays the leading role in the aggregation mechanism, but other elements modulate the aggregation propensity in the context of the full-length proteins, as implied by variations in the length of the polyQ tract required to trigger the onset of a given polyQ disease. Intrinsic features such as the presence of aggregation-prone regions (APRs) outside the polyQ segments and polyQ-flanking sequences, which synergistically participate in the aggregation process, are emerging for several disease-related proteins. The inherent polymorphic structure of polyQ stretches places the polyQ proteins in a central position in protein–protein interaction networks, where interacting partners may additionally shield APRs or reshape the aggregation course. Expansion of the polyQ tract perturbs the cellular homeostasis and contributes to neuronal failure by modulating protein–protein interactions and enhancing toxic oligomerization. Post-translational modifications further regulate self-assembly either by directly altering the intrinsic aggregation propensity of polyQ proteins, by modulating their interaction with different macromolecules or by modifying their withdrawal by the cell quality control machinery. Here we review the recent data on the multifaceted aggregation pathways of disease-related polyQ proteins, focusing on ataxin-3, the protein mutated in Machado-Joseph disease. Further mechanistic understanding of this network of events is crucial for the development of effective therapies for polyQ diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号