首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many protein regions have been shown to be intrinsically disordered, lacking unique structure under physiological conditions. These intrinsically disordered regions are not only very common in proteomes, but also crucial to the function of many proteins, especially those involved in signaling, recognition, and regulation. The goal of this work was to identify the prevalence, characteristics, and functions of conserved disordered regions within protein domains and families. A database was created to store the amino acid sequences of nearly one million proteins and their domain matches from the InterPro database, a resource integrating eight different protein family and domain databases. Disorder prediction was performed on these protein sequences. Regions of sequence corresponding to domains were aligned using a multiple sequence alignment tool. From this initial information, regions of conserved predicted disorder were found within the domains. The methodology for this search consisted of finding regions of consecutive positions in the multiple sequence alignments in which a 90% or more of the sequences were predicted to be disordered. This procedure was constrained to find such regions of conserved disorder prediction that were at least 20 amino acids in length. The results of this work included 3,653 regions of conserved disorder prediction, found within 2,898 distinct InterPro entries. Most regions of conserved predicted disorder detected were short, with less than 10% of those found exceeding 30 residues in length.  相似文献   

2.
Leishmaniasis is a vector-borne disease caused by the protozoa Leishmania. We have analyzed and compared the sequences of three experimental exoproteomes of Leishmania promastigotes from different species to determine their specific features and to identify new candidate proteins involved in interactions of Leishmania with the host. The exoproteomes differ from the proteomes by a decrease in the average molecular weight per protein, in disordered amino acid residues and in basic proteins. The exoproteome of the visceral species is significantly enriched in sites predicted to be phosphorylated as well as in features frequently associated with molecular interactions (intrinsic disorder, number of disordered binding regions per protein, interaction and/or trafficking motifs) compared to the other species. The visceral species might thus have a larger interaction repertoire with the host than the other species. Less than 10% of the exoproteomes contain heparin-binding and RGD sequences, and ~ 30% the host targeting signal RXLXE/D/Q. These latter proteins might thus be exported inside the host cell during the intracellular stage of the infection. Furthermore we have identified nine protein families conserved in the three exoproteomes with specific combinations of Pfam domains and selected eleven proteins containing at least three interaction and/or trafficking motifs including two splicing factors, phosphomannomutase, 2,3-bisphosphoglycerate-independent phosphoglycerate mutase, the paraflagellar rod protein-1D and a putative helicase. Their role in host–Leishmania interactions warrants further investigation but the putative ATP-dependent DEAD/H RNA helicase, which contains numerous interaction motifs, a host targeting signal and two disordered regions, is a very promising candidate.  相似文献   

3.
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains.  相似文献   

4.
5.
Assembly of the cytoskeletal protein FtsZ into a ring‐like structure is required for bacterial cell division. Structurally, FtsZ consists of four domains: the globular N‐terminal core, a flexible linker, 8–9 conserved residues implicated in interactions with modulatory proteins, and a highly variable set of 4–10 residues at its very C terminus. Largely ignored and distinguished by lack of primary sequence conservation, the linker is presumed to be intrinsically disordered. Here we employ genetics, biochemistry and cytology to dissect the role of the linker in FtsZ function. Data from chimeric FtsZs substituting the native linker with sequences from unrelated FtsZs as well as a helical sequence from human beta‐catenin indicate that while variations in the primary sequence are well tolerated, an intrinsically disordered linker is essential for Bacillus subtilis FtsZ assembly. Linker lengths ranging from 25 to 100 residues supported FtsZ assembly, but replacing the B. subtilis FtsZ linker with a 249‐residue linker from Agrobacterium tumefaciens FtsZ interfered with cell division. Overall, our results support a model in which the linker acts as a flexible tether allowing FtsZ to associate with the membrane through a conserved C‐terminal domain while simultaneously interacting with itself and modulatory proteins in the cytoplasm.  相似文献   

6.
L Jermutus  V Guez  H Bedouelle 《Biochimie》1999,81(3):235-244
The C-terminal domain (residues 320-419) of tyrosyl-tRNA synthetase (TyrRS) from Bacillus stearothermophilus is disordered in the crystal structure and involved in the binding of the anticodon arm of tRNA(Tyr). The sequences of 11 TyrRSs of prokaryotic or mitochondrial origins were aligned and the alignment showed the existence of conserved residues in the sequences of the C-terminal domains. A consensus could be deduced from the application of five programs of secondary structure prediction to the 11 sequences of the query set. These results suggested that the sequences of the C-terminal domains determined a precise and conserved secondary structure. They predicted that the C-terminal domain would have a mixed fold (alpha/beta or alpha+beta), with the alpha-helices in the first half of the sequence and the beta-strands mainly in its second half. Several programs of fold recognition from sequence alone, by threading onto known structures, were applied but none of them identified a type of fold that would be common to the different sequences of the query set. Therefore, the fold of the C-terminal, anticodon binding domain might be novel.  相似文献   

7.
MOTIVATION: A large, high-quality database of homologous sequence alignments with good estimates of their corresponding phylogenetic trees will be a valuable resource to those studying phylogenetics. It will allow researchers to compare current and new models of sequence evolution across a large variety of sequences. The large quantity of data may provide inspiration for new models and methodology to study sequence evolution and may allow general statements about the relative effect of different molecular processes on evolution. RESULTS: The Pandit 7.6 database contains 4341 families of sequences derived from the seed alignments of the Pfam database of amino acid alignments of families of homologous protein domains (Bateman et al., 2002). Each family in Pandit includes an alignment of amino acid sequences that matches the corresponding Pfam family seed alignment, an alignment of DNA sequences that contain the coding sequence of the Pfam alignment when they can be recovered (overall, 82.9% of sequences taken from Pfam) and the alignment of amino acid sequences restricted to only those sequences for which a DNA sequence could be recovered. Each of the alignments has an estimate of the phylogenetic tree associated with it. The tree topologies were obtained using the neighbor joining method based on maximum likelihood estimates of the evolutionary distances, with branch lengths then calculated using a standard maximum likelihood approach.  相似文献   

8.
Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies.  相似文献   

9.
A phosphoprotein (P) is found in all viruses of the Mononegavirales order. These proteins form homo-oligomers, fulfil similar roles in the replication cycles of the various viruses, but differ in their length and oligomerization state. Sequence alignments reveal no sequence similarity among proteins from viruses belonging to the same family. Sequence analysis and experimental data show that phosphoproteins from viruses of the Paramyxoviridae contain structured domains alternating with intrinsically disordered regions. Here, we used predictions of disorder of secondary structure, and an analysis of sequence conservation to predict the domain organization of the phosphoprotein from Sendai virus, vesicular stomatitis virus (VSV) and rabies virus (RV P). We devised a new procedure for combining the results from multiple prediction methods and locating the boundaries between disordered regions and structured domains. To validate the proposed modular organization predicted for RV P and to confirm that the putative structured domains correspond to autonomous folding units, we used two-hybrid and biochemical approaches to characterize the properties of several fragments of RV P. We found that both central and C-terminal domains can fold in isolation, that the central domain is the oligomerization domain, and that the C-terminal domain binds to nucleocapsids. Our results suggest a conserved organization of P proteins in the Rhabdoviridae family in concatenated functional domains resembling that of the P proteins in the Paramyxoviridae family.  相似文献   

10.
Classifications of proteins into groups of related sequences are in some respects like a periodic table for biology, allowing us to understand the underlying molecular biology of any organism. Pfam is a large collection of protein domains and families. Its scientific goal is to provide a complete and accurate classification of protein families and domains. The next release of the database will contain over 10,000 entries, which leads us to reflect on how far we are from completing this work. Currently Pfam matches 72% of known protein sequences, but for proteins with known structure Pfam matches 95%, which we believe represents the likely upper bound. Based on our analysis a further 28,000 families would be required to achieve this level of coverage for the current sequence database. We also show that as more sequences are added to the sequence databases the fraction of sequences that Pfam matches is reduced, suggesting that continued addition of new families is essential to maintain its relevance.  相似文献   

11.
The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain–motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information.  相似文献   

12.
We previously studied a 16‐amino acid‐residue fragment of the C‐terminal β‐hairpin of the B3 domain (residues 46–61), [IG(46–61)] of the immunoglobulin binding protein G from Streptoccocus, and found that hydrophobic interactions and the turn region play an important role in stabilizing the structure. Based on these results, we carried out systematic structural studies of peptides derived from the sequence of IG (46–61) by systematically shortening the peptide by one residue at a time from both the C‐ and the N‐terminus. To determine the structure and stability of two resulting 12‐ and 14‐amino acid‐residue peptides, IG(48–59) and IG(47–60), respectively, we carried out circular dichroism, NMR, and calorimetric studies of these peptides in pure water. Our results show that IG(48–59) possesses organized three‐dimensional structure stabilized by hydrophobic interactions (Tyr50–Phe57 and Trp48–Val59) at T = 283 and 305 K. At T = 313 K, the structure breaks down because of increased chain entropy, but the turn region is preserved in the same position observed for the structure of the whole protein. The breakdown of structure occurs near the melting temperature of this peptide (Tm = 310 K) measured by differential scanning calorimetry (DSC). The melting temperature of IG(47–60) determined by DSC is Tm = 330 K and its structure is similar to that of the native β‐hairpin at all (lower) temperatures examined (283–313 K). Both of these truncated sequences are conserved in all known amino acid sequences of the B domains of the immunoglobulin binding protein G from bacteria. Thus, this study contributes to an understanding of the mechanism of folding of this whole family of proteins, and provides information about the mechanism of formation and stabilization of a β‐hairpin structural element. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

13.
We analyzed the mouse forebrain cytosolic phosphoproteome using sequential (protein and peptide) IMAC purifications, enzymatic dephosphorylation, and targeted tandem mass spectrometry analysis strategies. In total, using complementary phosphoenrichment and LC-MS/MS strategies, 512 phosphorylation sites on 540 non-redundant phosphopeptides from 162 cytosolic phosphoproteins were characterized. Analysis of protein domains and amino acid sequence composition of this data set of cytosolic phosphoproteins revealed that it is significantly enriched in intrinsic sequence disorder, and this enrichment is associated with both cellular location and phosphorylation status. The majority of phosphorylation sites found by MS were located outside of structural protein domains (97%) but were mostly located in regions of intrinsic sequence disorder (86%). 368 phosphorylation sites were located in long regions of disorder (over 40 amino acids long), and 94% of proteins contained at least one such long region of disorder. In addition, we found that 58 phosphorylation sites in this data set occur in 14-3-3 binding consensus motifs, linear motifs that are associated with unstructured regions in proteins. These results demonstrate that in this data set protein phosphorylation is significantly depleted in protein domains and significantly enriched in disordered protein sequences and that enrichment of intrinsic sequence disorder may be a common feature of phosphoproteomes. This supports the hypothesis that disordered regions in proteins allow kinases, phosphatases, and phosphorylation-dependent binding proteins to gain access to target sequences to regulate local protein conformation and activity.  相似文献   

14.
Regions of conserved disorder prediction (CDP) were found in protein domains from all available InterPro member databases, although with varying frequency. These CDP regions were found in proteins from all kingdoms of life, including viruses. However, eukaryotes had 1 order of magnitude more proteins containing long disordered regions than did archaea and bacteria. Sequence conservation in CDP regions varied, but was on average slightly lower than in regions of conserved order. In some cases, disordered regions evolve faster than ordered regions, in others they evolve slower, and in the rest they evolve at roughly the same rate. A variety of functions were found to be associated with domains containing conserved disorder. The most common were DNA/RNA binding, and protein binding. Many ribosomal proteins also were found to contain conserved disordered regions. Other functions identified included membrane translocation and amino acid storage for germination. Due to limitations of current knowledge as well as the methodology used for this work, it was not determined whether these functions were directly associated with the predicted disordered region. However, the functions associated with conserved disorder in this work are in agreement with the functions found in other studies to correlate to disordered regions. We have established that intrinsic disorder may be more common in bacterial and archaeal proteins than previously thought, but this disorder is likely to be used for different purposes than in eukaryotic proteins, as well as occurring in shorter stretches of protein. Regions of predicted disorder were found to be conserved within a large number of protein families and domains. Although many think of such conserved domains as being ordered, in fact a significant number of them contain regions of disorder that are likely to be crucial to their functions.  相似文献   

15.
Abstract: Proteins are often classified in a binary fashion as either structured or disordered. However this approach has several deficits. Firstly, protein folding is always conditional on the physiochemical environment. A protein which is structured in some circumstances will be disordered in others. Secondly, it hides a fundamental asymmetry in behavior. While all structured proteins can be unfolded through a change in environment, not all disordered proteins have the capacity for folding. Failure to accommodate these complexities confuses the definition of both protein structural domains and intrinsically disordered regions. We illustrate these points with an experimental study of a family of small binding domains, drawn from the RNA polymerase of mumps virus and its closest relatives. Assessed at face value the domains fall on a structural continuum, with folded, partially folded, and near unstructured members. Yet the disorder present in the family is conditional, and these closely related polypeptides can access the same folded state under appropriate conditions. Any heuristic definition of the protein domain emphasizing conformational stability divides this domain family in two, in a way that makes no biological sense. Structural domains would be better defined by their ability to adopt a specific tertiary structure: a structure that may or may not be realized, dependent on the circumstances. This explicitly allows for the conditional nature of protein folding, and more clearly demarcates structural domains from intrinsically disordered regions that may function without folding.  相似文献   

16.
Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 527 manually verified families which are available for browsing and on-line searching via the World Wide Web in the UK at http://www.sanger.ac.uk/Pfam/ and in the US at http://genome.wustl. edu/Pfam/ Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome.  相似文献   

17.
Structural genomics projects require strategies for rapidly recognizing protein sequences appropriate for routine structure determination. For large proteins, this strategy includes the dissection of proteins into structural domains that form stable native structures. However, protein dissection essentially remains an empirical and often a tedious process. Here, we describe a simple strategy for rapidly identifying structural domains and assessing their structures. This approach combines the computational prediction of sequence regions corresponding to putative domains with an experimental assessment of their structures and stabilities by NMR and biochemical methods. We tested this approach with nine putative domains predicted from a set of 108 Thermus thermophilus HB8 sequences using PASS, a domain prediction program we previously reported. To facilitate the experimental assessment of the domain structures, we developed a generic 6-hour His-tag-based purification protocol, which enables the sample quality evaluation of a putative structural domain in a single day. As a result, we observed that half of the predicted structural domains were indeed natively folded, as judged by their HSQC spectra. Furthermore, two of the natively folded domains were novel, without related sequences classified in the Pfam and SMART databases, which is a significant result with regard to the ability of structural genomics projects to uniformly cover the protein fold space.  相似文献   

18.
Intrinsic disorder in the Protein Data Bank   总被引:2,自引:0,他引:2  
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.  相似文献   

19.
Abstract

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only ~7% of proteins are observed in the corresponding PDB structures, and only ~25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, “Observed” (which correspond to structured regions), “Not observed” (regions with missing electron density, potentially disordered), “Uncharacterized,” and “Ambiguous,” depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a ‘fragment’ or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. “Non-observed,” “Ambiguous,” and “Uncharacterized” regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR® VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the “Observed” dataset are ordered, and that the “Not observed” regions are mostly disordered. The “Uncharacterized” regions possess some tendency toward order, whereas the predictions for the short “Ambiguous” regions are really ambiguous. Long “Ambiguous” regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be “wobbly” domains.

Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset ~10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and ~40% of the proteins possess short regions (≥10 and <30 amino-acid long) of missing and ambiguous residues.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号