首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 353 毫秒
1.
Twenty-seven protein sequence elements, six to nine amino acids long, were extracted from 15 phylogenetically diverse complete prokaryotic proteomes. The elements are present in all of these proteomes, with at least one copy each (omnipresent elements), and have presumably been conserved since the last universal common ancestor (LUCA). All these omnipresent elements are identified in crystallized protein structures as parts of highly conserved closed loops, 25–30 residues long, thus representing the closed-loop modules discovered in 2000 by Berezovsky et al. The omnipresent peptides make up seven distinct groups, of which the largest groups, Aleph and Beth, contain 18 and four elements, respectively, which are related but different, while five other groups are represented by only one element each. The LUCA modules appear with one or several copies per protein molecule in a variety of combinations depending on the functional identity of the corresponding protein. The functional involvement of individual LUCA modules is outlined on the basis of known protein annotations. Analyses of all the related sequences in a large, formatted protein sequence space suggest that many, if not all, of the 27 omnipresent elements have a common sequence origin. This sequence space network analysis may lead to elucidation of the earliest stages of protein evolution.  相似文献   

2.
Elucidating protein function from its structure is central to the understanding of cellular mechanisms. This involves deciphering the dependence of local structural motifs on sequence. These structural motifs may be stabilized by direct or water‐mediated hydrogen bonding among the constituent residues. π‐Turns, defined by interactions between (i) and (i + 5) positions, are large enough to contain a central space that can embed a water molecule (or a protein moiety) to form a stable structure. This work is an analysis of such embedded π‐turns using a nonredundant dataset of protein structures. A total of 2965 embedded π‐turns have been identified, as also 281 embedded Schellman motif, a type of π‐turn which occurs at the C‐termini of α‐helices. Embedded π‐turns and Schellman motifs have been classified on the basis of the protein atoms of the terminal turn residues that are linked by the embedded moiety, conformation, residue composition, and compared with the turns that have terminal residues connected by direct hydrogen bonds. Geometrically, the turns have been fitted to a circle and the position of the linker relative to its center analyzed. The hydroxyl group of Ser and Thr, located at (i + 3) position, is the most prominent linker for the side‐chain mediated π‐turns. Consideration of residue conservation among homologous sequences indicates the terminal and the linker positions to be the most conserved. The embedded π‐turn as a binding site (for the linker) is discussed in the context of “nest,” a concave depression that is formed in protein structures with adjacent residues having enantiomeric main‐chain conformations. © 2013 Wiley Periodicals, Inc. Biopolymers 101: 441–453, 2014.  相似文献   

3.
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5′TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5′ UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5′TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.  相似文献   

4.
Employing whole-genome analysis we have characterized a large family of genes coding for calpain-related proteins in three kinetoplastid parasites. We have defined a total of 18 calpain-like sequences in Trypanosoma brucei, 27 in Leishmania major, and 24 in Trypanosoma cruzi. Sequence characterization revealed a well-conserved protease domain in most proteins, although residues critical for catalytic activity were frequently altered. Many of the proteins contain a novel N-terminal sequence motif unique to kinetoplastids. Furthermore, 24 of the sequences contain N-terminal fatty acid acylation motifs indicating association of these proteins with intracellular membranes. This extended family of proteins also includes a group of sequences that completely lack a protease domain but is specifically related to other kinetoplastid calpain-related proteins by a highly conserved N-terminal domain and by genomic organization. All sequences lack the C-terminal calmodulin-related calcium-binding domain typical of most mammalian calpains. Our analysis emphasizes the highly modular structure of calpains and calpain-like proteins, suggesting that they are involved in diverse cellular functions. The discovery of this surprisingly large family of calpain-like proteins in lower eukaryotes that combines novel and conserved sequence modules contributes to our understanding of the evolution of this abundant protein family. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor : Dr. John Oakeshott]  相似文献   

5.
It has recently been discovered that globular proteins are universally built from standard loop-n-lock units of about 30 amino acid residues. The hypothesis has been put forward on the loop stage in the protein evolution when the units were autonomous. Later they joined together making longer chains. One would expect that the early individual loop-n-lock elements might still be detected in modern protein sequences as remnants of the hypothetical 30-residue sequence prototypes. Among several strong sequence motifs, extracted from protein sequences of 23 complete bacterial proteomes, one 32-residue prototype was studied here in detail. Numerous sequence segments related to the prototype are identified in the crystal structures of proteins of a PDB_SELECT database. Analysis of the respective chain trajectories for the cases with different degrees of sequence conservation confirms that the majority of the segments correspond to the closed loops. In the evolutionary diversification of the prototypes the secondary structure yields first, while the sequence is still moderately conserved. The last feature to go is the chain return property. Apparently, the opening of the loops would severely destabilize the protein fold, which explains their conservation.  相似文献   

6.
Cobalamin-dependent methionine synthase is a large enzyme composed of structurally and functionally distinct regions. Recent studies have begun to define the roles of several regions of the protein. In particular, the structure of a 27 kDa cobalamin-binding fragment of the enzyme from Escherichia coli has been determined by X-ray crystallography, and has revealed the motifs and interactions responsible for recognition of the cofactor. The amino acid sequences of several adenosylcobalamin-dependent enzymes, the methylmalonyl coenzyme A mutases and glutamate mutases, show homology with the cobalamin-binding region of methionine synthase and retain conserved residues that are determinants for the binding of the prosthetic group, suggesting that these mutases and methionine synthase share common three-dimensional structures.  相似文献   

7.
MOTIVATION: RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS: We present a new method for predicting common RNA secondary structure motifs in a set of functionally or evolutionarily related RNA sequences. This method is based on comparison of stems (palindromic helices) between sequences and is implemented by applying graph-theoretical approaches. It first finds all possible stable stems in each sequence and compares stems pairwise between sequences by some defined features to find stems conserved across any two sequences. Then by applying a maximum clique finding algorithm, it finds all significant stems conserved across at least k sequences. Finally, it assembles in topological order all possible compatible conserved stems shared by at least k sequences and reports a number of the best assembled stem sets as the best candidate common structure motifs. This method does not require prior structural alignment of the sequences and is able to detect pseudoknot structures. We have tested this approach on some RNA sequences with known secondary structures, in which it is capable of detecting the real structures completely or partially correctly and outperforms other existing programs for similar purposes. AVAILABILITY: The algorithm has been implemented in C++ in a program called comRNA, which is available at http://ural.wustl.edu/softwares.html  相似文献   

8.
Based on the similarity between the TIGR (trabecular-meshwork inducible glucocorticoid response) (also known as myocilin) and olfactomedin protein families identified throughout the length of the TIGR protein, we have identified more distantly related proteins to determine the elements essential to the function/structure of the TIGR and olfactomedin proteins. Using a sequence walk method and the Shotgun program, we have identified a family including 31 olfactomedin domain-containing sequences. Multiple sequence alignments and secondary structure analyses were used to identify conserved sequence elements. Pairwise identity in the olfactomedin domain ranges from 8 to 64%, with an average pairwise identity of 24%. The N-terminal regions of the proteins fall into two subgroups, one including the TIGR and olfactomedin families and another group of apparently unrelated domains. The TIGR and olfactomedin sequences display conserved motifs including a residual leucine zipper region and maintain a similar secondary structure throughout the N-terminal region. The correlation between conserved elements and disease-associated mutations and apparent polymorphisms in human TIGR was also examined to evaluate the apparent importance of conserved residues to the function/structure of TIGR. Several residues have been identified as essential to the function and/or structure of the human TIGR protein based on their degree of conservation across the family and their implication in the pathogenesis of primary open-angle glaucoma. Additionally, we have identified a group of chitinase sequences containing several of the highly conserved motifs present in the C-terminal region of the olfactomedin domain-containing sequences.  相似文献   

9.
Homology of 18 amino acid sequences of lens gamma-crystallins of several vertebrates: frog, mouse, rat, calf and human being--has been considered. Pair sequence homology varies in the range from 57 to 100%, the mean value is equal to 74%. The spatial structures have been determined only for two calf gamma-crystallins. The protein molecule consists of four-fold repeated "motifs" (patterns) which are joint in two domains. After comparison of 18 gamma-crystallin sequences it was found that "motifs" domains and whole protein molecules have about 10, 30 and 58% conservative residues, respectively, that seem to be related to the evolution of these structural units. Structure analysis shows that almost all the conservative residues have an important structural meaning and play a basic role in the domain and molecular structure organization. This result allows us to make a conclusion about the homology of spatial structures of all considered gamma-crystallins of vertebrates.  相似文献   

10.
We compared two different approaches to sequence information analysis from the expressed sequence tag (EST) library constructed for the venom glands of the spider Agelena orientalis. Some results were more illustrative and reliable by the contig analysis technique, whereas our novel method, with specific structural markers introduced for protein structure detection, allowed us to overcome some limitations of the contig analysis. A novel technique was suggested for the identification in data banks of the spider's ion channel inhibitor toxins using primary structure features common to all spiders. Analysis of about 150 polypeptides made it possible to introduce 3 primary structure motifs for spider toxins: the Principal Structural Motif (PSM), which postulates the existence of 6 amino acid residues between the first and second cysteine residue and the Cys-Cys sequence at a distance of 5-10 amino acid residues from the second cysteine; the Extra Structural Motif (ESM), which postulates the existence of a pair of CXC fragments in the C-region; and the Processing Quadruplet Motif (PQM), which specifies the Arg residue at position -1 and Glu residues at positions -2, -3, and/or -4 in the precursor sequences just before the postprocessing site. In the processed data bank we found 48 toxinlike structures with ion channel inhibitor motifs. These include agelenin earlier isolated from Agelena opulenta and 25 more homologous sequences, 15 homologs of mu-agatoxin 2 from the spider Agelenopsis aperta, 3 structures with low homology to omega-agatoxin-IIIA, and 4 new structures. Also we showed that toxinlike structures exceed two thirds of the overall database sequences.  相似文献   

11.
Protein tyrosine phosphorylation is an important regulatory mechanisms in cell physiology. While the protein tyrosine kinase (PTKase) family has been extensively studied, only six protein tyrosine phosphatases (PTPases) have been described. By Southern blot analysis, genomic DNA from several different phyla were found to cross-hybridize with a cDNA probe encoding the human leukocyte-common antigen (LCA; CD45) PTPase domains. To pursue this observation further, total mRNA from the protochordate Styela plicata was used as a tempalte to copy and amplify, using polymerase chain reaction (PCR) technology, PTPase domains. Twenty-seven distinct sequences were identified that contain hallmark residues of PTPases; two of these are similar to described mammalian PTPases. Southern blot analysis indicates that at least one other Styela sequence is highly conserved in a variety of phyla. Seven of the Styela domains have significant similarity to each other, indicating a subfamily of PTPases. However, most of the sequences are disparate. A comparison of the 27 Styela sequences with the ten known PTPase domain sequences reveals that only three residues are absolutely conserved and identifies regions that are highly divergent. The data indicate that the PTPase family will be equally as large and diverse as the PTKases. The extent and diversity of the PTPase family suggests that these enzymes are, in their own right, important regulators of cell behavior.The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the accession numbers M37986-M38041.  相似文献   

12.
Predictive motifs derived from cytosine methyltransferases.   总被引:36,自引:51,他引:36       下载免费PDF全文
Thirteen bacterial DNA methyltransferases that catalyze the formation of 5-methylcytosine within specific DNA sequences possess related structures. Similar building blocks (motifs), containing invariant positions, can be found in the same order in all thirteen sequences. Five of these blocks are highly conserved while a further five contain weaker similarities. One block, which has the most invariant residues, contains the proline-cysteine dipeptide of the proposed catalytic site. A region in the second half of each sequence is unusually variable both in length and sequence composition. Those methyltransferases that exhibit significant homology in this region share common specificity in DNA recognition. The five highly conserved motifs can be used to discriminate the known 5-methylcytosine forming methyltransferases from all other methyltransferases of known sequence, and from all other identified proteins in the PIR, GenBank and EMBL databases. These five motifs occur in a mammalian methyltransferase responsible for the formation of 5-methylcytosine within CG dinucleotides. By searching the unidentified open reading frames present in the GenBank and EMBL databases, two potential 5-methylcytosine forming methyltransferases have been found.  相似文献   

13.
In a previous paper we obtained ten (orthogonal) factors, linear combinations of which can express the properties of the 20 naturally occurring amino acids. In this paper, we assume that the most important properties (linear combinations of these ten factors) that determine the three-dimensional structure of a protein are conserved properties, i.e., are those that have been conserved during evolution. Two definitions of a conserved property are presented: (1) a conserved property for an average protein is defined as that linear combination of the ten factors that optimally expresses the similarity of one amino acid to another (hence, little change during evolution), as given by the relatedness odds matrix of Dayhoff et al.; (2) a conserved property for each position in the amino acid sequence (locus) of a specific family of homologous proteins (the cytochromec family or the globin family) is defined as that linear combination of the ten factors that is common among a set of amino acids at a given locus when the sequences are properly aligned. When the specificity at each locus is averaged over all loci, the same features are observed for three expressions of these two definitions, namely the conserved property for an average protein, the average conserved property for the cytochromec family, and the average conserved property for the globin family; we find that bulk and hydrophobicity (information about packing and long-range interactions) are more important than other properties, such as the preference for adopting a specific backbone structure (information about short-range interactions). We also demonstrate that the sequence profile of a conserved property, defined for each locus of a protein family (definition 2), corresponds uniquely to the three-dimensional structure, while the conserved property for an average protein (definition 1) is not useful for the prediction of protein structure. The amino acid sequences of numerous proteins are searched to find those that are similar, in terms of the conserved properties (definition 2), to sequences of the same size from one of the homologous families (cytochromec and globin, respectively) for whose loci the conserved properties were defined. Many similar sequences are found, the number of similarities decreasing with increasing size of the segment. However, the segments must be rather long (15 residues) before the comparisons become meaningful. As an example, one sufficiently large sequence (20 residues) from a protein of known structure (apo-liver alcohol dehydrogenase that is not a member of either family) is found to be similar in the conserved properties to a particular sequence of a member of the family of human hemoglobin chains, and the two sequences have similar structures. This means that, since conserved properties are expected to be structure determinants, we can use the conserved properties to predict an initial protein structure for subsequent energy minimization for a protein for which the conserved properties are similar to those of a family of proteins with a sufficiently large number of homologous amino acid sequences; such a large number of homologous sequences is required to define a conserved property for each locus of the homologous protein family.  相似文献   

14.
Voltage-gated ion channels (VGCs) mediate selective diffusion of ions across cell membranes to enable many vital cellular processes. Three-dimensional structure data are lacking for VGC proteins; hence, to better understand their function, there is a need to identify the conserved motifs using sequence analysis methods. In this study, we have used a profile-to-profile alignment method to identify several new conserved motifs specific to each transmembrane segment (TMS) of the voltage-sensing and the pore-forming modules of Ca2+, Na+, and K+ channel subfamilies. For Ca2+ and Na+, the functional theme of motif conservation is similar in all segments while they differ with those of the K+ channel proteins. Nevertheless, the conservation is strikingly similar in the S4 segment of the voltage-sensing module across all subfamilies. In each subfamily and for each TMS, we have identified conserved motifs/residues and correlated their functional significance and disease associations in human, using mutational data from the literature.  相似文献   

15.
The development of remote homology detection methods is a challenging area in Bioinformatics. Sequence analysis-based approaches that address this problem have employed the use of profiles, templates and Hidden Markov Models (HMMs). These methods often face limitations due to poor sequence similarities and non-uniform sequence dispersion in protein sequence space. Search procedures are often asymmetrical due to over or under-representation of some protein families and outliers often remain undetected. Intermediate sequences that share high similarities with more than one protein can help overcome such problems. Methods such as MulPSSM and Cascade PSI-BLAST that employ intermediate sequences achieve better coverage of members in searches. Others employ peptide modules or conserved patterns of motifs or residues and are effective in overcoming dependencies on high sequence similarity to establish homology by using conserved patterns in searches. We review some of these recent methods developed in India in the recent past.  相似文献   

16.
The sequence of a cloned Anopheles stephensi gene showed 72% inferred amino acid identity with Drosophila melanogaster Dox-A2 and 93% with its putative ortholog in Anopheles gambiae. Dox-A2 is the reported but herein disputed structural locus for diphenol oxidase A2. Database searches identified Dox-A2 related gene sequences from 15 non-insect species from diverse groups. Phylogenetic trees based on alignments of inferred protein sequences, DNA, and protein motif searches and protein secondary structure predictions produced results consistent with expectations for genes that are orthologous. The only inconsistency was that the C-terminus appears to be more primitive in the yeasts than in plants. In mammals, plants, and yeast these genes have been shown to code for a non-ATPase subunit of the PA700 (19S) regulatory complex of 26S proteasome. The analyses indicated that the insect genes contain no divergent structural features, which taken within an appraisal of all available data, makes the reported alternative function highly improbable. A plausible additional role, in which the 26S proteasome is implicated in regulation of phenol oxidase, would also apply to at least the mammalian genes. No function has yet been reported for the other included sequences. These were from genome projects and included Caenorhabiditus elegans, Arabidopsis thaliana, Fugu rubripes, and Toxoplasma gondii. A consensus of the results predicts a protein containing exceptionally long stretches of helix with a hydrophilic C-terminus. Phosphorylation site motifs were identified at two conserved positions. Possible SRY and GATA-1 binding motifs were found at conserved positions upstream of the mosquito genes. The location of A. stephensi Dox-A2 was determined by in situ hybridization at 34D on chromosome arm 3R. It is in a conserved gene cluster with respect to the other insects. However, the A. stephensi cluster contains a gene showing significant sequence identity to human and pigeon carnitine acetyltransferase genes, therefore showing divergence with the distal end of the D. melanogaster cluster. Received: 3 July 1998 / Accepted: 22 December 1999  相似文献   

17.
From protein sequence space to elementary protein modules   总被引:2,自引:0,他引:2  
Frenkel ZM  Trifonov EN 《Gene》2008,408(1-2):64-71
The formatted protein sequence space is built from identical size fragments of prokaryotic proteins (112 complete proteomes). Connecting sequence-wise similar fragments (points in the space) results in the formation of numerous networks, that combine sometimes different types of proteins sharing, though, fragments with similar or distantly related sequences. The networks are mapped on individual protein sequences revealing distinct regions (modules) associated with prominent networks with well-defined functional identities. Presence of multiple sites of sequence conservation (modules) in a given protein sequence suggests that the annotated protein function may be decomposed in "elementary" subfunctions of the respective modules. The modules correspond to previously discovered conserved closed loop structures and their sequence prototypes.  相似文献   

18.
The complement control protein (CCP) modules (also known as short consensus repeats) are defined by a consensus sequence within a stretch of about 60 amino acid residues. These modules have been identified more than 140 times in over 20 proteins, including 12 proteins of the complement system. The solution structure of the 16th CCP module from human complement factor H has been determined by a combination of 2-dimensional nuclear magnetic resonance spectroscopy and restrained simulated annealing. In all, 548 structurally important nuclear Overhauser enhancement cross-peaks were quantified as distance restraints and, together with 41 experimentally measured angle restraints, were incorporated into a simulated annealing protocol to determine a family of closely related structures that satisfied the experimental observations. The CCP structure is shown to be based on a beta-sandwich arrangement; one face made up of three beta-strands hydrogen-bonded to form a triple-stranded region at its centre and the other face formed from two separate beta-strands. Both faces of the molecule contribute highly conserved hydrophobic side-chains to a compact core. The regions between the beta-strands are composed of both well-defined turns and less well-defined loops. Analysis of CCP sequence alignments, in light of the determined structure, reveals a high degree of conservation amongst residues of obvious structural importance, while almost all insertions, deletions or replacements observed in the known sequences are found in the less well-defined loop regions. On the basis of these observations it is postulated that models of other CCP modules that are based on the structure presented here will be accurate. Certain families of CCP modules differ from the consensus in that they contain extra cysteine residues. As a test of structural consensus, the extra disulphide bridges are shown to be easily accommodated within the determined CCP model.  相似文献   

19.
Phosphoinositide-specific phospholipase C (PLC) is involved in Ca2+ mediated signalling events that lead to altered cellular status. Using various sequence-analysis methods, we identified two conserved motifs in known PLC sequences. The identified motifs are located in the C2 domain of plant PLCs and are not found in any other protein. These motifs are specifically found in the Ca2+ binding loops and form adjoining beta strands. Further, we identified certain conserved residues that are highly distinct from corresponding residues of animal PLCs. The motifs reported here could be used to annotate plant-specific phospholipase C sequences. Furthermore, we demonstrated that the C2 domain alone is capable of targeting PLC to the membrane in response to a Ca2+ signal. We also showed that the binding event results from a change in the hydrophobicity of the C2 domain upon Ca2+ binding. Bioinformatic analyses revealed that all PLCs from Arabidopsis and rice lack a transmembrane domain, myristoylation and GPI-anchor protein modifications. Our bioinformatic study indicates that plant PLCs are located in the cytoplasm, the nucleus and the mitochondria. Our results suggest that there are no distinct isoforms of plant PLCs, as have been proposed to exist in the soluble and membrane associated fractions. The same isoform could potentially be present in both subcellular fractions, depending on the calcium level of the cytosol. Overall, these data suggest that the C2 domain of PLC plays a vital role in calcium signalling.  相似文献   

20.
The refined crystal structures of chicken, yeast and trypanosomal triosephosphate isomerase (TIM) have been compared. TIM is known to exist in an "open" (unliganded) and "closed" (liganded) conformation. For chicken TIM only the refined open structure is available, whereas for yeast TIM and trypanosomal TIM refined structures of both the open and the closed structure have been used for this study. Comparison of these structures shows that the open structures of chicken TIM, yeast TIM and trypanosomal TIM are essentially identical. Also it is shown that the closed structures of yeast TIM and trypanosomal TIM are essentially identical. The conformational difference between the open and closed structures concerns a major shift (7 A) in loop-6. Minor shifts are observed in the two adjacent loops, loop-5 (1 A) and loop-7 (1 A). The pairwise comparison of the three different TIM barrels shows that the 105C alpha atoms of the core superimpose within 0.9 A. The sequences of these three TIMs have a pairwise sequence identity of approximately 50%. The residues that line the active site are 100% conserved. The residues interacting with each other across the dimer interface show extensive variability, but the direct hydrogen bonds between the two subunits are well conserved. The orientation of the two monomers with respect to each other is almost identical in the three different TIM structures. There are 56 (22%) conserved residues out of approximately 250 residues in 13 sequences. The functions of most of these conserved residues can be understood from the available open and closed structures of the three different TIMs. Some of these residues are quite far from the active site. For example, at a distance of 19 A from the active site there is a conserved saltbridge interaction between residues at the C-terminal ends of alpha-helix-6 and alpha-helix-7. This anchoring contrasts with the large conformational flexibility of loop-6 and loop-7 near the N termini of these helices. The flexibility of loop-6 is facilitated by a conserved large empty cavity near the N terminus of alpha-helix-6, which exists only in the open conformation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号