首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Summary: Low-complexity, repetitive protein sequences with alimited amino acid palette are abundant in nature, and manyof them play an important role in the structure and functionof certain types of proteins. However, such repetitive sequencesoften do not have rigidly defined motifs. Consequently, theidentification of these low-complexity repetitive elements hasproven challenging for existing pattern-matching algorithms.Here we introduce a new web-tool SubSeqer (http://compsysbio.org/subseqer/)which uses graphical visualization methods borrowed from proteininteraction studies to identify and characterize repetitiveelements in low-complexity sequences. Given their abundance,we suggest that SubSeqer represents a valuable resource forthe study of typically neglected low-complexity sequences. Contact: jparkin{at}sickkids.ca Associate Editor: Limsoon Wong  相似文献   

2.
The distribution in the human genome of the largest family of mobile elements, the Alu sequences, has been investigated for the past 30 years, and the vast majority of Alu sequences were shown to have the highest density in GC-rich isochores. Ten years ago, it was discovered, however, that the small "youngest" (most recently transposed) Alu families had a strikingly different distribution compared with the "old" families. This raised the question as to how this change took place in evolution. We solved what was considered to be a "mystery" by 1) revisiting our previous results on the integration and stability of retroviral sequences, and 2) assessing the densities of acceptor sites TTTT/AA in isochore families. We could conclude 1) that the open state of chromatin structure plays a crucial role in allowing not only the initial integration of retroviral sequences but also that of the youngest Alu sequences, and 2) that the distribution of old Alus can be explained as due to Alu sequences being unstable in the GC-poor isochores but stable in the compositionally matching GC-rich isochores, again in line with what happens in the case of retroviral sequences.  相似文献   

3.
Genetically engineered mouse antibodies are now commonly in clinical use. However, their development is limited because the human immune system tends to regard them as foreign and this triggers an immune response. The solution is to make engineered antibodies appear more human. Here, we propose a method to assess the "degree of humanness" of antibody sequences providing a tool that may contribute to predictions of antigenicity. We analyzed sequences of antibodies belonging to various chains/classes in human and mouse. Our analysis of metrics based on percentage sequence identity between antibody sequences shows distinct differences between human and mouse sequences. Based on mean sequence identity and standard deviation, we calculated Z-scores for data sets of antibody sequences extracted from the Kabat database. We applied the analysis to a set of humanized and chimeric antibodies and to human germline sequences. We conclude that this approach may aid in the selection of more suitable mouse variable domains for antibody engineering to render them more human but in general, we find that typicality of a sequence compared with the expressed human repertoire is not well correlated with antigenicity. We have provided a Web server allowing humanness to be assigned for a sequence.  相似文献   

4.
The functional significance of evolutionarily conserved motifs/patterns of short regions in proteins is well documented. Although a large number of sequences are conserved, only a small fraction of these are invariant across several organisms. Here, we have examined the structural features of the functionally important peptide sequences, which have been found invariant across diverse bacterial genera. Ramachandran angles (phi,psi) have been used to analyze the conformation, folding patterns and geometrical location (buried/exposed) of these invariant peptides in different crystal structures harboring these sequences. The analysis indicates that the peptides preferred a single conformation in different protein structures, with the exception of only a few longer peptides that exhibited some conformational variability. In addition, it is noticed that the variability of conformation occurs mainly due to flipping of peptide units about the virtual C(alpha)...C(alpha) bond. However, for a given invariant peptide, the folding patterns are found to be similar in almost all the cases. Over and above, such peptides are found to be buried in the protein core. Thus, we can safely conclude that these invariant peptides are structurally important for the proteins, since they acquire unique structures across different proteins and can act as structural determinants (SD) of the proteins. The location of these SD peptides on the protein chain indicated that most of them are clustered towards the N-terminal and middle region of the protein with the C-terminal region exhibiting low preference. Another feature that emerges out of this study is that some of these SD peptides can also play the roles of "fold boundaries" or "hinge nucleus" in the protein structure. The study indicates that these SD peptides may act as chain-reversal signatures, guiding the proteins to adopt appropriate folds. In some cases the invariant signature peptides may also act as folding nuclei (FN) of the proteins.  相似文献   

5.
Sub-unit vaccines are synthetic or recombinant peptides representing T- or B-cell epitopes of major protein antigens from a particular pathogen. Epitope selection requires the synthesis of peptides that overlap the protein sequences and screening for the most effective ones. In this study a new method of immunogenic peptide selection based on the analysis of information structure of protein sequences is suggested. The analysis of known B-cell epitope location in the information structure of Aspergillus fumigatus proteins Asp f 2 and Asp f 3 has shown that epitopes are scattered along the sequences of proteins for the exception of sites with Increased Degree Information Coordination (IDIC). Based on these results peptides from different allergens such as Asp f 2, Der p 1, and Fel d 1 were selected and produced in a recombinant form in the context of yeast virus-like particles (VLPs). Immunization of mice with VLPs containing peptides form allergens has induced the production of IgG able to recognize full-length antigens. This result suggests that the analysis of information structure of proteins can be used for the selection of peptides possessing cryptic B-cell epitope activity.  相似文献   

6.
Frith MC 《PloS one》2011,6(12):e28819
Detection of sequences that are homologous, i.e. descended from a common ancestor, is a fundamental task in computational biology. This task is confounded by low-complexity tracts (such as atatatatatat), which arise frequently and independently, causing strong similarities that are not homologies. There has been much research on identifying low-complexity tracts, but little research on how to treat them during homology search. We propose to find homologies by aligning sequences with "gentle" masking of low-complexity tracts. Gentle masking means that the match score involving a masked letter is min(0,S), where S is the unmasked score. Gentle masking slightly but noticeably improves the sensitivity of homology search (compared to "harsh" masking), without harming specificity. We show examples in three useful homology search problems: detection of NUMTs (nuclear copies of mitochondrial DNA), recruitment of metagenomic DNA reads to reference genomes, and pseudogene detection. Gentle masking is currently the best way to treat low-complexity tracts during homology search.  相似文献   

7.
Much attention is being paid to protein databases as an important information source for proteome research. Although used extensively for similarity searches, protein databases themselves have not fully been characterized. In a systematic attempt to reveal protein-database characters that could contribute to revealing how protein chains are constructed, frequency distributions of all possible combinatorial sets of three, four, and five amino acids ("triplets," "quartets," and "pentats"; collectively called constituent sequences) have been examined in the nonredundant (nr) protein database, demonstrating the existence of nonrandom bias in their "availability" at the population level. Nonexistent short sequences of pentats were found that showed low availability in biological proteins against their expected probabilities of occurrence. Among them, six representative ones were successfully synthesized as peptides with reasonably high yields in a conventional Fmoc method, excluding the possibility that a putative physicochemical energy barrier in forming them could be a direct cause for the low availability. They were also expressed as soluble fusion proteins in a conventional Escherichia coli BL21Star(DE3) system with reasonably high yield, again excluding a possible difficulty in their biological synthesis. Together, these results suggest that information on three-dimensional structures and functions of proteins exists in the context of connections of short constituent sequences, and that proteins are composed of evolutionarily selected constituent sequences, which are reflected in their availability differences in the database. These results may have biological implications for protein structural studies.  相似文献   

8.
In this work, we discovered a fundamental connection between selection for protein stability and emergence of preferred structures of proteins. Using a standard exact three-dimensional lattice model we evolve sequences starting from random ones and determine the exact native structure after each mutation. Acceptance of mutations is biased to select for stable proteins. We found that certain structures, "wonderfolds", are independently discovered numerous times as native states of stable proteins in many unrelated runs of selection. The strong dependence of lattice fold usage on the structural determinant of designability quantitatively reproduces uneven fold usage in natural proteins. Diversity of sequences that fold into wonderfold structures gives rise to superfamilies, i.e. sets of dissimilar sequences that fold into the same or very similar structures. The present work establishes a model of pre-biotic structure selection, which identifies dominant structural patterns emerging upon optimization of proteins for survival in a hot environment. Convergently discovered pre-biotic initial superfamilies with wonderfold structures could have served as a seed for subsequent biological evolution involving gene duplications and divergence.  相似文献   

9.
Low-complexity sequences are extremely abundant in eukaryotic proteins for reasons that remain unclear. One hypothesis is that they contribute to the formation of novel coding sequences, facilitating the generation of novel protein functions. Here, we test this hypothesis by examining the content of low-complexity sequences in proteins of different age. We show that recently emerged proteins contain more low-complexity sequences than older proteins and that these sequences often form functional domains. These data are consistent with the idea that low-complexity sequences may play a key role in the emergence of novel genes.  相似文献   

10.
Membrane fusion requires restructuring of lipid bilayers mediated by fusogenic membrane proteins. Peptides that correspond to natural transmembrane sequences or that have been designed to mimic them, such as low-complexity “Leu-Val” (LV) peptide sequences, can drive membrane fusion, presumably by disturbing the lipid bilayer structure. Here, we assess how peptides of different fusogenicity affect membrane structure using solid state NMR techniques. We find that the more fusogenic variants induce an unaligned lipid phase component and a large degree of phase separation as observed in 31P 2D spectra. The data support the idea that fusogenic peptides accumulate PE in a non-bilayer phase which may be critical for the induction of fusion.  相似文献   

11.
Numerous short peptides have been shown to form β‐sheet amyloid aggregates in vitro. Proteins that contain such sequences are likely to be problematic for a cell, due to their potential to aggregate into toxic structures. We investigated the structures of 30 proteins containing 45 sequences known to form amyloid, to see how the proteins cope with the presence of these potentially toxic sequences, studying secondary structure, hydrogen‐bonding, solvent accessible surface area and hydrophobicity. We identified two mechanisms by which proteins avoid aggregation: Firstly, amyloidogenic sequences are often found within helices, despite their inherent preference to form β structure. Helices may offer a selective advantage, since in order to form amyloid the sequence will presumably have to first unfold and then refold into a β structure. Secondly, amyloidogenic sequences that are found in β structure are usually buried within the protein. Surface exposed amyloidogenic sequences are not tolerated in strands, presumably because they lead to protein aggregation via assembly of the amyloidogenic regions. The use of α‐helices, where amyloidogenic sequences are forced into helix, despite their intrinsic preference for β structure, is thus a widespread mechanism to avoid protein aggregation.  相似文献   

12.
Mechanisms leading to gene variations are responsible for the diversity of species and are important components of the theory of evolution. One constraint on gene evolution is that of protein foldability; the three-dimensional shapes of proteins must be thermodynamically stable. We explore the impact of this constraint and calculate properties of foldable sequences using 3660 structures from the Protein Data Bank. We seek a selection function that receives sequences as input, and outputs survival probability based on sequence fitness to structure. We compute the number of sequences that match a particular protein structure with energy lower than the native sequence, the density of the number of sequences, the entropy, and the "selection" temperature. The mechanism of structure selection for sequences longer than 200 amino acids is approximately universal. For shorter sequences, it is not. We speculate on concrete evolutionary mechanisms that show this behavior.  相似文献   

13.
Distinctive properties of signal sequences from bacterial lipoproteins   总被引:10,自引:0,他引:10  
We have compared a number of attributes (hydrophobicity, amino acid size, charge and secondary structure propensities) of signal sequences from bacterial lipoproteins with the same attributes of signal peptides from other prokaryotic proteins (non-lipoproteins). Lipoprotein leader sequences tend to be shorter, more hydrophobic and bulky, and they have stronger conformational preferences, the most conspicuous being a predicted beta-turn comprising positions 2 or 3 of the mature protein. Another distinctive feature is a maximum in the local energy profile between positions -1 and +2. With one exception (beta-lactamase III), the lipoproteins do not have Pro in their signal peptides, and they tend to have fewer Ser and Thr but more Gly than non-lipoproteins. Lipoproteins also lack a net negative charge in the N-terminal regions of the mature proteins. The signal peptides of the bacteriocin plasmid-coded lysis proteins appear to be unique in that they have all the ascribed features of lipoprotein signals; these characteristics can be used to guide signal peptide mutagenesis experiments and to construct new secretion vehicles.  相似文献   

14.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

15.
The molecular evolution of signal peptides   总被引:5,自引:0,他引:5  
Williams EJ  Pal C  Hurst LD 《Gene》2000,253(2):313-322
Signal peptides direct mature peptides to their appropriate cellular location, after which they are cleaved off. Very many random alternatives can serve the same function. Of all coding sequences, therefore, signal peptides might come closest to being neutrally evolving. Here we consider this issue by examining the molecular evolution of 76 mouse-rat orthologues, each with defined signal peptides. Although they do evolve rapidly, they evolve about half as fast as neutral sequences. This indicates that a substantial proportion of mutations must be under stabilizing selection. A few putative signal sequences lack a hydrophobic core and these tend to be more slowly evolving than others, indicating even stronger stabilizing selection. However, closer scrutiny suggests that some of these represent mis-annotations in GenBank. It is also likely that some of the substitutions are not neutral. We find, for example, that the rate of protein evolution correlates with that of the mature peptide. This may be a result of compensatory evolution. We also find that signal peptides of immune genes tend to be faster evolving than the average, which suggests an association with antagonistic co-evolution. Previous reports also indicated that the signal peptide of the imprinted gene, Igf2r, is also unusually fast evolving. This, it was hypothesized, might also be indicative of antagonistic co-evolution. Comparison of Igf2r's signal peptide evolution shows that, although it is not an outlier, its rate of evolution is comparable to that of many of the faster evolving immune system signal sequences and 5/6 of the amino acid changes do not conserve hydrophobicity. This is at least suggestive that there is something unusual about Igf2r's signal sequence.  相似文献   

16.
Expected rates and modes of evolution of enhancer sequences   总被引:11,自引:1,他引:10  
  相似文献   

17.
When protein sequences divergently evolve under functional constraints, some individual amino acid replacements that reverse the charge (e.g. Lys to Asp) may be compensated by a replacement at a second position that reverses the charge in the opposite direction (e.g. Glu to Arg). When these side-chains are near in space (proximal), such double replacements might be driven by natural selection, if either is selectively disadvantageous, but both together restore fully the ability of the protein to contribute to fitness (are together "neutral"). Accordingly, many have sought to identify pairs of positions in a protein sequence that suffer compensatory replacements, often as a way to identify positions near in space in the folded structure. A "charge compensatory signal" might manifest itself in two ways. First, proximal charge compensatory replacements may occur more frequently than predicted from the product of the probabilities of individual positions suffering charge reversing replacements independently. Conversely, charge compensatory pairs of changes may be observed to occur more frequently in proximal pairs of sites than in the average pair. Normally, charge compensatory covariation is detected by comparing the sequences of extant proteins at the "leaves" of phylogenetic trees. We show here that the charge compensatory signal is more evident when it is sought by examining individual branches in the tree between reconstructed ancestral sequences at nodes in the tree. Here, we find that the signal is especially strong when the positions pairs are in a single secondary structural unit (e.g. alpha helix or beta strand) that brings the side-chains suffering charge compensatory covariation near in space, and may be useful in secondary structure prediction. Also, "node-node" and "node-leaf" compensatory covariation may be useful to identify the better of two equally parsimonious trees, in a way that is independent of the mathematical formalism used to construct the tree itself. Further, compensatory covariation may provide a signal that indicates whether an episode of sequence evolution contains more or less divergence in functional behavior. Compensatory covariation analysis on reconstructed evolutionary trees may become a valuable tool to analyze genome sequences, and use these analyses to extract biomedically useful information from proteome databases.  相似文献   

18.
19.
Mutational robustness describes the extent to which a phenotype remains unchanged in the face of mutations. Theory predicts that the strength of direct selection for mutational robustness is at most the magnitude of the rate of deleterious mutation. As far as nucleic acid sequences are concerned, only long sequences in organisms with high deleterious mutation rates and large population sizes are expected to evolve mutational robustness. Surprisingly, recent studies have concluded that molecules that meet none of these conditions--the microRNA precursors (pre-miRNAs) of multicellular eukaryotes--show signs of selection for mutational and/or environmental robustness. To resolve the apparent disagreement between theory and these studies, we have reconstructed the evolutionary history of Drosophila pre-miRNAs and compared the robustness of each sequence to that of its reconstructed ancestor. In addition, we "replayed the tape" of pre-miRNA evolution via simulation under different evolutionary assumptions and compared these alternative histories with the actual one. We found that Drosophila pre-miRNAs have evolved under strong purifying selection against changes in secondary structure. Contrary to earlier claims, there is no evidence that these RNAs have been shaped by either direct or congruent selection for any kind of robustness. Instead, the high robustness of Drosophila pre-miRNAs appears to be mostly intrinsic and likely a consequence of selection for functional structures.  相似文献   

20.
As a step toward selecting folded proteins from libraries of randomized sequences, we have designed a 'loop entropy reduction'-based phage-display method. The basic premise is that insertion of a long disordered sequence into a loop of a host protein will substantially destabilize the host because of the entropic cost of closing a loop in a disordered chain. If the inserted sequence spontaneously folds into a stable structure with the N and C termini close in space, however, this entropic cost is diminished. The host protein function can, therefore, be used to select folded inserted sequences without relying on specific properties of the inserted sequence. This principle is tested using the IgG binding domain of protein L and the lck SH2 domain as host proteins. The results indicate that the loop entropy reduction screen is capable of discriminating folded from unfolded sequences when the proper host protein and insertion point are chosen.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号