首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: We consider the problem of identifying low-complexity regions (LCRs) in a protein sequence. LCRs are regions of biased composition, normally consisting of different kinds of repeats. RESULTS: We define new complexity measures to compute the complexity of a sequence based on a given scoring matrix, such as BLOSUM 62. Our complexity measures also consider the order of amino acids in the sequence and the sequence length. We develop a novel graph-based algorithm called GBA to identify LCRs in a protein sequence. In the graph constructed for the sequence, each vertex corresponds to a pair of similar amino acids. Each edge connects two pairs of amino acids that can be grouped together to form a longer repeat. GBA finds short subsequences as LCR candidates by traversing this graph. It then extends them to find longer subsequences that may contain full repeats with low complexities. Extended subsequences are then post-processed to refine repeats to LCRs. Our experiments on real data show that GBA has significantly higher recall compared to existing algorithms, including 0j.py, CARD, and SEG. AVAILABILITY: The program is available on request.  相似文献   

2.
3.
We propose a model that explains the hierarchical organization of proteins in fold families. The model, which is based on the evolutionary selection of proteins by their native state stability, reproduces patterns of amino acids conserved across protein families. Due to its dynamic nature, the model sheds light on the evolutionary time-scales. By studying the relaxation of the correlation function between consecutive mutations at a given position in proteins, we observe separation of the evolutionary time-scales: at short time intervals families of proteins with similar sequences and structures are formed, while at long time intervals the families of structurally similar proteins that have low sequence similarity are formed. We discuss the evolutionary implications of our model. We provide a "profile" solution to our model and find agreement between predicted patterns of conserved amino acids and those actually observed in nature.  相似文献   

4.
The metabolic cycle of Saccharomyces cerevisiae consists of alternating oxidative (respiration) and reductive (glycolysis) energy-yielding reactions. The intracellular concentrations of amino acid precursors generated by these reactions oscillate accordingly, attaining maximal concentration during the middle of their respective yeast metabolic cycle phases. Typically, the amino acids themselves are most abundant at the end of their precursor’s phase. We show that this metabolic cycling has likely biased the amino acid composition of proteins across the S. cerevisiae genome. In particular, we observed that the metabolic source of amino acids is the single most important source of variation in the amino acid compositions of functionally related proteins and that this signal appears only in (facultative) organisms using both oxidative and reductive metabolism. Periodically expressed proteins are enriched for amino acids generated in the preceding phase of the metabolic cycle. Proteins expressed during the oxidative phase contain more glycolysis-derived amino acids, whereas proteins expressed during the reductive phase contain more respiration-derived amino acids. Rare amino acids (e.g., tryptophan) are greatly overrepresented or underrepresented, relative to the proteomic average, in periodically expressed proteins, whereas common amino acids vary by a few percent. Genome-wide, we infer that 20,000 to 60,000 residues have been modified by this previously unappreciated pressure. This trend is strongest in ancient proteins, suggesting that oscillating endogenous amino acid availability exerted genome-wide selective pressure on protein sequences across evolutionary time. Electronic supplementary material  The online version of this article (doi:) contains supplementary material, which is available to authorized users. Benjamin L. de Bivort and Ethan O. Perlstein have contributed equally to this work.  相似文献   

5.
Tillier ER  Biro L  Li G  Tillo D 《Proteins》2006,63(4):822-831
Approaches for the determination of interacting partners from different protein families (such as ligands and their receptors) have made use of the property that interacting proteins follow similar patterns and relative rates of evolution. Interacting protein partners can then be predicted from the similarity of their phylogenetic trees or evolutionary distances matrices. We present a novel method called Codep, for the determination of interacting protein partners by maximizing co-evolutionary signals. The order of sequences in the multiple sequence alignments from two protein families is determined in such a manner as to maximize the similarity of substitution patterns at amino acid sites in the two alignments and, thus, phylogenetic congruency. This is achieved by maximizing the total number of interdependencies of amino acids sites between the alignments. Once ordered, the corresponding sequences in the two alignments indicate the predicted interacting partners. We demonstrate the efficacy of this approach with computer simulations and in analyses of several protein families. A program implementing our method, Codep, is freely available to academic users from our website: http://www.uhnresearch.ca/labs/tillier/.  相似文献   

6.
Information about conformational properties of a protein is contained in the hydrophobicity values of the amino acids in its primary sequence. We have investigated the possibility of extracting meaningful evolutionary information from the comparison of the hydrophobicity values of the corresponding amino acids in the sequences of homologous proteins. Distance matrices for six families of homologous proteins were made on the basis of the differences in hydrophobicity values of the amino acids. The phylogenetic trees constructed from such matrices were at least as good (as judged from their faithful reflection of evolutionary relationships), as trees constructed from the usual minimum mutation distance matrix.  相似文献   

7.
Selection on running capacity has created rat phenotypes of high-capacity runners (HCRs) that have enhanced cardiac function and low-capacity runners (LCRs) that exhibit risk factors of metabolic syndrome. We analysed hearts of HCRs and LCRs from generation 22 of selection using DIGE and identified proteins from MS database searches. The running capacity of HCRs was six-fold greater than LCRs. DIGE resolved 957 spots and proteins were unambiguously identified in 369 spots. Protein expression profiling detected 67 statistically significant (p<0.05; false discovery rate <10%, calculated using q-values) differences between HCRs and LCRs. Hearts of HCR rats exhibited robust increases in the abundance of each enzyme of the β-oxidation pathway. In contrast, LCR hearts were characterised by the modulation of enzymes associated with ketone body or amino acid metabolism. LCRs also exhibited enhanced expression of antioxidant enzymes such as catalase and greater phosphorylation of α B-crystallin at serine 59, which is a common point of convergence in cardiac stress signalling. Thus, proteomic analysis revealed selection on low running capacity is associated with perturbations in cardiac energy metabolism and provided the first evidence that the LCR cardiac proteome is exposed to greater oxidative stress.  相似文献   

8.
During evolution, organisms have gained functional complexity mainly by modifying and improving existing functioning systems rather than creating new ones ab initio. Here we explore the interplay between two processes which during evolution have had major roles in the acquisition of new functions: gene duplication and protein domain rearrangements. We consider four possible evolutionary scenarios: gene families that have undergone none of these event types; only gene duplication; only domain rearrangement, or both events. We characterize each of the four evolutionary scenarios by functional attributes. Our analysis of ten fungal genomes indicates that at least for the fungi clade, species significantly appear to gain complexity by gene duplication accompanied by the expansion of existing domain architectures via rearrangements. We show that paralogs gaining new domain architectures via duplication tend to adopt new functions compared to paralogs that preserve their domain architectures. We conclude that evolution of protein families through gene duplication and domain rearrangement is correlated with their functional properties. We suggest that in general, new functions are acquired via the integration of gene duplication and domain rearrangements rather than each process acting independently.  相似文献   

9.
10.
Tanaka J  Yanagawa H  Doi N 《PloS one》2011,6(3):e18034
Although modern proteins consist of 20 different amino acids, it has been proposed that primordial proteins consisted of a small set of amino acids, and additional amino acids have gradually been recruited into the genetic code. This hypothesis has recently been supported by comparative genome sequence analysis, but no direct experimental approach has been reported. Here, we utilized a novel experimental approach to test a hypothesis that native-like globular proteins might be easily simplified by a set of putative primitive amino acids with retention of its structure and function than by a set of putative new amino acids. We performed in vitro selection of a functional SH3 domain as a model from partially randomized libraries with different sets of amino acids using mRNA display. Consequently, a library rich in putative primitive amino acids included a larger number of functional SH3 sequences than a library rich in putative new amino acids. Further, the functional SH3 sequences were enriched from the primitive library slightly earlier than from a randomized library with the full set of amino acids, while the function and structure of the selected SH3 proteins with the primitive alphabet were comparable with those from the 20 amino acid alphabet. Application of this approach to various combinations of codons in protein sequences may be useful not only for clarifying the precise order of the amino acid expansion in the early stages of protein evolution but also for efficiently creating novel functional proteins in the laboratory.  相似文献   

11.
12.
Genome sequencing revealed an extreme AT-rich genome and a profusion of asparagine repeats associated with low complexity regions (LCRs) in proteins of the malarial parasite Plasmodium falciparum. Despite their abundance, the function of these LCRs remains unclear. Because they occur in almost all families of plasmodial proteins, the occurrence of LCRs cannot be associated with any specific metabolic pathway; yet their accumulation must have given selective advantages to the parasite. Translation of these asparagine-rich LCRs demands extraordinarily high amounts of asparaginylated tRNAAsn. However, unlike other organisms, Plasmodium codon bias is not correlated to tRNA gene copy number. Here, we studied tRNAAsn accumulation as well as the catalytic capacities of the asparaginyl-tRNA synthetase of the parasite in vitro. We observed that asparaginylation in this parasite can be considered standard, which is expected to limit the availability of asparaginylated tRNAAsn in the cell and, in turn, slow down the ribosomal translation rate when decoding asparagine repeats. This observation strengthens our earlier hypothesis considering that asparagine rich sequences act as “tRNA sponges” and help cotranslational folding of parasite proteins. However, it also raises many questions about the mechanistic aspects of the synthesis of asparagine repeats and about their implications in the global control of protein expression throughout Plasmodium life cycle.  相似文献   

13.
14.
Lantibiotic synthetases are remarkable biocatalysts generating conformationally constrained peptides with a variety of biological activities by repeatedly utilizing two simple posttranslational modification reactions: dehydration of Ser/Thr residues and intramolecular addition of Cys thiols to the resulting dehydro amino acids. Since previously reported lantibiotic synthetases show no apparent homology with any other known protein families, the molecular mechanisms and evolutionary origin of these enzymes are unknown. In this study, we present a novel class of lanthionine synthetases, termed LanL, that consist of three distinct catalytic domains and demonstrate in vitro enzyme activity of a family member from Streptomyces venezuelae. Analysis of individually expressed and purified domains shows that LanL enzymes install dehydroamino acids via phosphorylation of Ser/Thr residues by a protein kinase domain and subsequent elimination of the phosphate by a phosphoSer/Thr lyase domain. The latter has sequence homology with the phosphothreonine lyases found in various pathogenic bacteria that inactivate host mitogen activated protein kinases. A LanC-like cyclase domain then catalyzes the addition of Cys residues to the dehydro amino acids to form the characteristic thioether rings. We propose that LanL enzymes have evolved from stand-alone protein Ser/Thr kinases, phosphoSer/Thr lyases, and enzymes catalyzing thiol alkylation. We also demonstrate that the genes for all three pathways to lanthionine-containing peptides are widespread in Nature. Given the remarkable efficiency of formation of lanthionine-containing polycyclic peptides and the latter''s high degree of specificity for their cognate cellular targets, it is perhaps not surprising that (at least) three distinct families of polypeptide sequences have evolved to access this structurally and functionally diverse class of compounds.  相似文献   

15.
Most eubacteria, and all eukaryotes examined thus far, encode homologs of the DNA mismatch repair protein MutS. Although eubacteria encode only one or two MutS-like proteins, eukaryotes encode at least six distinct MutS homolog (MSH) proteins, corresponding to conserved (orthologous) gene families. This suggests evolution of individual gene family lines of descent by several duplication/specialization events. Using quantitative phylogenetic analyses (RASA, or relative apparent synapomorphy analysis), we demonstrate that comparison of complete MutS protein sequences, rather than highly conserved C-terminal domains only, maximizes information about evolutionary relationships. We identify a novel, highly conserved middle domain, as well as clearly delineate an N-terminal domain, previously implicated in mismatch recognition, that shows family-specific patterns of aromatic and charged amino acids. Our final analysis, in contrast to previous analyses of MutS-like sequences, yields a stable phylogenetic tree consistent with the known biochemical functions of MutS/MSH proteins, that now assigns all known eukaryotic MSH proteins to a monophyletic group, whose branches correspond to the respective specialized gene families. The rooted phylogenetic tree suggests their derivation from a mitochondrial MSH1-like protein, itself the descendent of the MutS of a symbiont in a primitive eukaryotic precursor.  相似文献   

16.
The trypsin family of serine proteases is one of the most studied protein families, with a wealth of amino acid sequence information available in public databases. Since trypsin-like enzymes are widely distributed in living organisms in nature, likely evolutionary scenarios have been proposed. A novel methodology for Fourier transformation of biological sequences (FOTOBIS) is presented. The methodology is well suited for the identification of the size and extent of short repeats in protein sequences. In the present paper the trypsin family of enzymes is analyzed with FOTOBIS and strong evidence for tandem gene duplication is found. A likely evolutionary path for the development of present-day trypsins involved an intrinsic extensive tandem gene duplication of a small DNA fragment of 15–18 nucleotides, corresponding to five or six amino acids. This ancestral trypsin gene was subsequently duplicated, leading to the earliest version of a full-sized trypsin, from which the contemporary trypsins have developed. Received: 22 November 1997 / Accepted: 26 January 1998  相似文献   

17.
The evolutionary expansion of CAG repeats in human triplet expansion disease genes is intriguing because of their deleterious phenotype. In the past, this expansion has been suggested to reflect a broad genomewide expansion of repeats, which would imply that mutational and evolutionary processes acting on repeats differ between species. Here, we tested this hypothesis by analyzing repeat- and flanking-sequence evolution in 28 repeat-containing genes that had been sequenced in humans and mice and by considering overall lengths and distributions of CAG repeats in the two species. We found no evidence that these repeats were longer in humans than in mice. We also found no evidence for preferential accumulation of CAG repeats in the human genome relative to mice from an analysis of the lengths of repeats identified in sequence databases. We then investigated whether sequence properties, such as base and amino acid composition and base substitution rates, showed any relationship to repeat evolution. We found that repeat-containing genes were enriched in certain amino acids, presumably as the result of selection, but that this did not reflect underlying biases in base composition. We also found that regions near repeats showed higher nonsynonymous substitution rates than the remainder of the gene and lower nonsynonymous rates in genes that contained a repeat in both the human and the mouse. Higher rates of nonsynonymous mutation in the neighborhood of repeats presumably reflect weaker purifying selection acting in these regions of the proteins, while the very low rate of nonsynonymous mutation in proteins containing a CAG repeat in both species presumably reflects a high level of purifying selection. Based on these observations, we propose that the mutational processes giving rise to polyglutamine repeats in human and murine proteins do not differ. Instead, we propose that the evolution of polyglutamine repeats in proteins results from an interplay between mutational processes and selection.  相似文献   

18.
19.

Background  

Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.  相似文献   

20.
In sinoatrial node cells of the heart, beating rate is controlled, in part, by local Ca2(+) releases (LCRs) from the sarcoplasmic reticulum, which couple to the action potential via electrogenic Na(+)/Ca2(+) exchange. We observed persisting, roughly periodic LCRs in depolarized rabbit sinoatrial node cells (SANCs). The features of these LCRs were reproduced by a numerical model consisting of a two-dimensional array of stochastic, diffusively coupled Ca2(+) release units (CRUs) with fixed refractory period. Because previous experimental studies showed that β-adrenergic receptor stimulation increases the rate of Ca2(+) release through each CRU (dubbed I(spark)), we explored the link between LCRs and I(spark) in our model. Increasing the CRU release current I(spark) facilitated Ca2(+)-induced-Ca2(+) release and local recruitment of neighboring CRUs to fire more synchronously. This resulted in a progression in simulated LCR size (from sparks to wavelets to global waves), LCR rhythmicity, and decrease of LCR period that parallels the changes observed experimentally with β-adrenergic receptor stimulation. The transition in LCR characteristics was steeply nonlinear over a narrow range of I(spark), resembling a phase transition. We conclude that the (partial) periodicity and rate regulation of the "Calcium clock" in SANCs are emergent properties of the diffusive coupling of an ensemble of interacting stochastic CRUs. The variation in LCR period and size with I(spark) is sufficient to account for β-adrenergic regulation of SANC beating rate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号