首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Huang Y  Xiao Y 《Proteins》2007,68(1):267-272
Protein folds may evolve from short peptide ancestors via gene duplication and fusion. For proteins with internal structural symmetry, this means that their sequences should be made up of identical repeats. However, many of these repeat signals can only be seen at the structural level yet. Motivated by the fact that proteins may have similar structures if their sequences have more than 25% identical amino acids, we suggest a method to detect the sequence repeats of proteins directly from their sequences. Using this method, we show that the internal repetitions of the immunoglobulin folds could be identified directly at the sequence level.  相似文献   

2.
Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results. AVAILABILITY: http://bioserv.rpbs.jussieu.fr/swelfe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

3.
The circumsporozoite gene of the Plasmodium cynomolgi complex   总被引:14,自引:0,他引:14  
An analysis of the circumsporozoite (CS) genes of six closely related plasmodia is presented. Like other plasmodial antigens, the CS protein contains tandem repeats flanked by conventional nonrepeated sequences. Our analysis shows that the repeats, which encode the immunodominant epitope of the CS protein, diverge more rapidly than the remainder of the gene, and that the maintenance and evolution of the repeats cannot be explained as the result of selection at the protein level. We argue that a mechanism acts directly on the DNA sequence to constrain the internal divergence of the repeats, and as a result promotes their rapid divergence between taxa.  相似文献   

4.
The biologically active state of many proteins requires their prior homo-oligomerisation. Such complexes are typically symmetrical, a feature that has been proposed to increase their stability and facilitate the evolution of allosteric regulation. We wished to examine the possibility that similar structures and properties could arise from genetic amplifications leading to internal symmetrical repeats. For this, we identified internal structural repeats in a nonredundant Protein Data Bank subset. While testing if repeats in proteins tend to be symmetrical, we found that about half of the large internal repeats are symmetrical, most frequently around a rotation axis of 180°. These repeats were most likely created by genetic amplification processes because they show significant sequence similarity. Symmetrical repeats tend to have a fixed number of copies corresponding to their rotational symmetry order, that is, two for 180° rotation axis, whereas asymmetrical repeats are in longer proteins and show copy number variability. When possible, we confirmed that proteins with symmetrical repeats folding as an n-mer have homologues lacking the repeat with a higher oligomerisation number corresponding to the rotation symmetry order of the repeat. Phylogenetic analyses of these protein families suggest that typically, but not always, symmetrical repeats arise in one single event from proteins that are homo-oligomers. These results suggest that oligomerisation and amplification of internal sequences can interplay in evolutionary terms because they result in functional analogues when the latter exhibit rotational symmetry.  相似文献   

5.
MOTIVATION: An estimated 25% of all eukaryotic proteins contain repeats, which underlines the importance of duplication for evolving new protein functions. Internal repeats often correspond to structural or functional units in proteins. Methods capable of identifying diverged repeated segments or domains at the sequence level can therefore assist in predicting domain structures, inferring hypotheses about function and mechanism, and investigating the evolution of proteins from smaller fragments. RESULTS: We present HHrepID, a method for the de novo identification of repeats in protein sequences. It is able to detect the sequence signature of structural repeats in many proteins that have not yet been known to possess internal sequence symmetry, such as outer membrane beta-barrels. HHrepID uses HMM-HMM comparison to exploit evolutionary information in the form of multiple sequence alignments of homologs. In contrast to a previous method, the new method (1) generates a multiple alignment of repeats; (2) utilizes the transitive nature of homology through a novel merging procedure with fully probabilistic treatment of alignments; (3) improves alignment quality through an algorithm that maximizes the expected accuracy; (4) is able to identify different kinds of repeats within complex architectures by a probabilistic domain boundary detection method and (5) improves sensitivity through a new approach to assess statistical significance. AVAILABILITY: Server: http://toolkit.tuebingen.mpg.de/hhrepid; Executables: ftp://ftp.tuebingen.mpg.de/pub/protevo/HHrepID  相似文献   

6.
Internal repeats in protein sequences have wide-ranging implications for the structure and function of proteins. A keen analysis of the repeats in protein sequences may help us to better understand the structural organization of proteins and their evolutionary relations. In this paper, a mathematical method for searching for latent periodicity in protein sequences is developed. Using this method, we identified simple sequence repeats in the alkaline proteases and found that the sequences could show the same periodicity as their tertiary structures. This result may help us to reduce difficulties in the study of the relationship between sequences and their structures.  相似文献   

7.
Comparison of ARM and HEAT protein repeats   总被引:18,自引:0,他引:18  
ARM and HEAT motifs are tandemly repeated sequences of approximately 50 amino acid residues that occur in a wide variety of eukaryotic proteins. An exhaustive search of sequence databases detected new family members and revealed that at least 1 in 500 eukaryotic protein sequences contain such repeats. It also rendered the similarity between ARM and HEAT repeats, believed to be evolutionarily related, readily apparent. All the proteins identified in the database searches could be clustered by sequence similarity into four groups: canonical ARM-repeat proteins and three groups of the more divergent HEAT-repeat proteins. This allowed us to build improved sequence profiles for the automatic detection of repeat motifs. Inspection of these profiles indicated that the individual repeat motifs of all four classes share a common set of seven highly conserved hydrophobic residues, which in proteins of known three-dimensional structure are buried within or between repeats. However, the motifs differ at several specific residue positions, suggesting important structural or functional differences among the classes. Our results illustrate that ARM and HEAT-repeat proteins, while having a common phylogenetic origin, have since diverged significantly. We discuss evolutionary scenarios that could account for the great diversity of repeats observed.  相似文献   

8.
Nucleoporins with phenylalanine-glycine repeats (FG Nups) function at the nuclear pore complex (NPC) to facilitate nucleocytoplasmic transport. In Saccharomyces cerevisiae, each FG Nup contains a large natively unfolded domain that is punctuated by FG repeats. These FG repeats are surrounded by hydrophilic amino acids (AAs) common to disordered protein domains. Here we show that the FG domain of Nups from human, fly, worm, and other yeast species is also enriched in these disorder-associated AAs, indicating that structural disorder is a conserved feature of FG Nups and likely serves an important role in NPC function. Despite the conservation of AA composition, FG Nup sequences from different species show extensive divergence. A comparison of the AA substitution rates of proteins with syntenic orthologs in four Saccharomyces species revealed that FG Nups have evolved at twice the rate of average yeast proteins with most substitutions occurring in sequences between FG repeats. The rapid evolution of FG Nups is poorly explained by parameters known to influence AA substitution rate, such as protein expression level, interactivity, and essentiality; instead their rapid evolution may reflect an intrinsic permissiveness of natively unfolded structures to AA substitutions. The overall lack of AA sequence conservation in FG Nups is sharply contrasted by discrete stretches of conserved sequences. These conserved sequences highlight known karyopherin and nucleoporin binding sites as well as other uncharacterized sites that may have important structural and functional properties.  相似文献   

9.
Apolipoprotein B (apoB) is the major protein component of plasma low density lipoproteins (LDL) and, through its binding to the LDL receptor, it plays a prominent role in lipoprotein metabolism and in the development of atherosclerosis. Specially developed computer programs were applied to detect potential internal repeats in the human apoB sequence and homology of some of these repeats with other apolipoproteins. The simultaneous computer alignment of several (repeated) sequences, carried out in an iterative way to generate consensus sequences, showed the presence of repeated amphipathic helical regions and of repeated hydrophobic proline-rich domains. Extensive Monte-Carlo statistics were used to demonstrate the statistical significance of the internal repeats. Both classes of repeats may contribute to the specific lipid-binding characteristics of apoB. Additional homology, detected between apoB and apoE, the other apolipoprotein-ligand of the LDL receptor, further defined the structural requirements for this receptor-ligand interaction. The computer programs developed in this study should also be useful for detecting internal repeats in other proteins.  相似文献   

10.
Mapping the stability distributions of proteins in their native folded states provides a critical link between structure, thermodynamics, and function. Linear repeat proteins have proven more amenable to this kind of mapping than globular proteins. C-terminal deletion studies of YopM, a large, linear leucine-rich repeat (LRR) protein, show that stability is distributed quite heterogeneously, yet a high level of cooperativity is maintained [1]. Key components of this distribution are three interfaces that strongly stabilize adjacent sequences, thereby maintaining structural integrity and promoting cooperativity.To better understand the distribution of interaction energy around these critical interfaces, we studied internal (rather than terminal) deletions of three LRRs in this region, including one of these stabilizing interfaces. Contrary to our expectation that deletion of structured repeats should be destabilizing, we find that internal deletion of folded repeats can actually stabilize the native state, suggesting that these repeats are destabilizing, although paradoxically, they are folded in the native state. We identified two residues within this destabilizing segment that deviate from the consensus sequence at a position that normally forms a stacked leucine ladder in the hydrophobic core. Replacement of these nonconsensus residues with leucine is stabilizing. This stability enhancement can be reproduced in the context of nonnative interfaces, but it requires an extended hydrophobic core. Our results demonstrate that different LRRs vary widely in their contribution to stability, and that this variation is context-dependent. These two factors are likely to determine the types of rearrangements that lead to folded, functional proteins, and in turn, are likely to restrict the pathways available for the evolution of linear repeat proteins.  相似文献   

11.
A census of protein repeats.   总被引:20,自引:0,他引:20  
In this study, we analyzed all known protein sequences for repeating amino acid segments. Although duplicated sequence segments occur in 14 % of all proteins, eukaryotic proteins are three times more likely to have internal repeats than prokaryotic proteins. After clustering the repetitive sequence segments into families, we find repeats from eukaryotic proteins have little similarity with prokaryotic repeats, suggesting most repeats arose after the prokaryotic and eukaryotic lineages diverged. Consequently, protein classes with the highest incidence of repetitive sequences perform functions unique to eukaryotes. The frequency distribution of the repeating units shows only weak length dependence, implicating recombination rather than duplex melting or DNA hairpin formation as the limiting mechanism underlying repeat formation. The mechanism favors additional repeats once an initial duplication has been incorporated. Finally, we show that repetitive sequences are favored that contain small and relatively water-soluble residues. We propose that error-prone repeat expansion allows repetitive proteins to evolve more quickly than non-repeat-containing proteins.  相似文献   

12.
All the protein sequences from SWISS-PROT database were analyzed for occurrence of single amino acid repeats, tandem oligo-peptide repeats, and periodically conserved amino acids. Single amino acid repeats of glutamine, serine, glutamic acid, glycine, and alanine seem to be tolerated to a considerable extent in many proteins. Tandem oligo-peptide repeats of different types with varying levels of conservation were detected in several proteins and found to be conspicuous, particularly in structural and cell surface proteins. It appears that repeated sequence patterns may be a mechanism that provides regular arrays of spatial and functional groups, useful for structural packing or for one to one interactions with target molecules. To facilitate further explorations, a database of Tandem Repeats in Protein Sequences (TRIPS) has been developed and is available at URL: http://www.ncl-india.org/trips.  相似文献   

13.
Rapid automatic detection and alignment of repeats in protein sequences   总被引:11,自引:0,他引:11  
Heger A  Holm L 《Proteins》2000,41(2):224-237
Many large proteins have evolved by internal duplication and many internal sequence repeats correspond to functional and structural units. We have developed an automatic algorithm, RADAR, for segmenting a query sequence into repeats. The segmentation procedure has three steps: (i) repeat length is determined by the spacing between suboptimal self-alignment traces; (ii) repeat borders are optimized to yield a maximal integer number of repeats, and (iii) distant repeats are validated by iterative profile alignment. The method identifies short composition biased as well as gapped approximate repeats and complex repeat architectures involving many different types of repeats in the query sequence. No manual intervention and no prior assumptions on the number and length of repeats are required. Comparison to the Pfam-A database indicates good coverage, accurate alignments, and reasonable repeat borders. Screening the Swissprot database revealed 3,000 repeats not annotated in existing domain databases. A number of these repeats had been described in the literature but most were novel. This illustrates how in times when curated databases grapple with ever increasing backlogs, automatic (re)analysis of sequences provides an efficient way to capture this important information.  相似文献   

14.
The relationship between the amino acid sequence and the three-dimensional structure of proteins with internal repeats is discussed. In particular, correlations between the amino acid composition and the ability to fold in a unique structure, as well as classification of the structures based on their repeat length, are described. This analysis suggests rules that can be used for the structural prediction of repeat-containing proteins. The paper is focused on prediction and modeling of solenoid-like proteins with the repeat length ranging between 5 and 40 residues. The models of leucine-rich repeat proteins and bacterial proteins with pentapeptide repeats are examined in light of the recently solved structures of the related molecules.  相似文献   

15.
Full-consensus designed ankyrin repeat proteins were designed with one to six identical repeats flanked by capping repeats. These proteins express well in Escherichia coli as soluble monomers. Compared to our previously described designed ankyrin repeat protein library, randomized positions have now been fixed according to sequence statistics and structural considerations. Their stability increases with length and is even higher than that of library members, and those with more than three internal repeats are resistant to denaturation by boiling or guanidine hydrochloride. Full denaturation requires their heating in 5 M guanidine hydrochloride. The folding and unfolding kinetics of the proteins with up to three internal repeats were analyzed, as the other proteins could not be denatured. Folding is monophasic, with a rate that is nearly identical for all proteins (∼ 400-800 s− 1), indicating that essentially the same transition state must be crossed, possibly the folding of a single repeat. In contrast, the unfolding rate decreases by a factor of about 104 with increasing repeat number, directly reflecting thermodynamic stability in these extraordinarily slow denaturation rates. The number of unfolding phases also increases with repeat number. We analyzed the folding thermodynamics and kinetics both by classical two-state and three-state cooperative models and by an Ising-like model, where repeats are considered as two-state folding units that can be stabilized by interacting with their folded nearest neighbors. This Ising model globally describes both equilibrium and kinetic data very well and allows for a detailed explanation of the ankyrin repeat protein folding mechanism.  相似文献   

16.
Choi S  Jeon J  Yang JS  Kim S 《Proteins》2008,71(1):68-80
Symmetry plays significant roles in protein structure and function. Particularly, symmetric interfaces are known to act as switches for two-state conformational change. Membrane proteins often undergo two-state conformational change during the transport process of ion channels or the active/inactive transitions in receptors. Here, we provide the first comprehensive analyses of internal repeat symmetry in membrane proteins. We examined the known membrane protein structures and found that, remarkably, nearly half of them have internal repeat symmetry. Moreover, we found that the conserved cores of these internal repeats are positioned at the interface of symmetric units when they are mapped on structures. Because of the large sequence divergence that occurs between internal repeats, the inherent symmetry present in protein sequences often has only been detected after structure determination. We therefore developed a sensitive procedure to predict the internal repeat symmetry from sequence information and identified 4653 proteins that are likely to have internal repeat symmetry.  相似文献   

17.
Amino acid sequences are known to constantly mutate and diverge unless there is a limiting condition that makes such a change deleterious. However, closer examination of the sequence and structure reveals that a few large, cryptic repeats are nevertheless sequentially conserved. This leads to the question of why only certain repeats are conserved at the sequence level. It would be interesting to find out if these sequences maintain their conservation at the three-dimensional structure level. They can play an active role in protein and nucleotide stability, thus not only ensuring proper functioning but also potentiating malfunction and disease. Therefore, insights into any aspect of the repeats — be it structure, function or evolution — would prove to be of some importance. This study aims to address the relationship between protein sequence and its three-dimensional structure, by examining if large cryptic sequence repeats have the same structure.  相似文献   

18.
Summary Hybridization experiments indicated that the maize genome contains a family of sequences closely related to the Ds1 element originally characterized from theAdh1-Fm335 allele of maize. Examples of these Ds1-related segments were cloned and sequenced. They also had the structural properties of mobile genetic elements, i.e., similar length and internal sequence homology with Ds1, 10- or 11-bp terminal inverted repeats, and characteristic duplications of flanking genomic DNA. All sequences with 11-bp terminal inverted repeats were flanked by 8-bp duplications, but the duplication flanking one sequence with 10-bp inverted repeats was only 6 bp. Similar Ds1-related sequences were cloned fromTripsacum dactyloides. They showed no more divergence from the maize sequences than the individual maize sequences showed when compared with each other. No consensus sequence was evident for the sites at which these sequences had inserted in genomic DNA.  相似文献   

19.
B Bornet  C Muller  F Paulus  M Branchard 《Génome》2002,45(5):890-896
Inter simple sequence repeat (ISSR) sequences as molecular markers can lead to the detection of polymorphism and also be a new approach to the study of SSR distribution and frequency. In this study, ISSR amplification with nonanchored primer was performed in closely related cauliflower lines. Fourty-four different amplified fragments were sequenced. Sequences of PCR products are delimited by the expected motifs and number of repeats, which validates the ISSR nonanchored primer amplification technique. DNA and amino acids homology search between internal sequences and databases (i) show that the majority of the internal regions of ISSR had homologies with known sequences, mainly with genes coding for proteins implicated in DNA interaction or gene expression, which reflected the significance of amplified ISSR sequences and (ii) display long and numerous homologies with the Arabidopsis thaliana genome. ISSR amplifications revealed a high conservation of these sequences between Arabidopsis thaliana and Brassica oleracea var. botrytis. Thirty-four of the 44 ISSRs had one or several perfect or imperfect internal microsatellites. Such distribution indicates the presence in genomes of highly concentrated regions of SSR, or "SSR hot spots." Among the four nonanchored primers used in this study, trinucleotide repeats, and especially (CAA)5, were the most powerful primers for ISSR amplifications regarding the number of amplified bands, level of polymorphism, and their nature.  相似文献   

20.
La-related protein 1 (LARP1) regulates the stability of many mRNAs. These include 5′TOPs, mTOR-kinase responsive mRNAs with pyrimidine-rich 5′ UTRs, which encode ribosomal proteins and translation factors. We determined that the highly conserved LARP1-specific C-terminal DM15 region of human LARP1 directly binds a 5′TOP sequence. The crystal structure of this DM15 region refined to 1.86 Å resolution has three structurally related and evolutionarily conserved helix-turn-helix modules within each monomer. These motifs resemble HEAT repeats, ubiquitous helical protein-binding structures, but their sequences are inconsistent with consensus sequences of known HEAT modules, suggesting this structure has been repurposed for RNA interactions. A putative mTORC1-recognition sequence sits within a flexible loop C-terminal to these repeats. We also present modelling of pyrimidine-rich single-stranded RNA onto the highly conserved surface of the DM15 region. These studies lay the foundation necessary for proceeding toward a structural mechanism by which LARP1 links mTOR signalling to ribosome biogenesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号