首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The advent of full genome sequences provides exceptionally rich data sets to explore molecular and evolutionary mechanisms that shape divergence among and within genomes. In this study, we use multivariate analysis to determine the processes driving genome-wide patterns of amino usage in the obligate endosymbiont Buchnera and its close free-living relative Escherichia coli. In the AT-rich Buchnera genome, the primary source of variation in amino acid usage differentiates high- and low-expression genes. Amino acids of high-expression Buchnera genes are generally less aromatic and use relatively GC-rich codons, suggesting that selection against aromatic amino acids and against amino acids with AT-rich codons is stronger in high-expression genes. Selection to maintain hydrophobic amino acids in integral membrane proteins is a primary factor driving protein evolution in E. coli but is a secondary factor in Buchnera. In E. coli, gene expression is a secondary force driving amino acid usage, and a correlation with tRNA abundance suggests that translational selection contributes to this effect. Although this and previous studies demonstrate that AT mutational bias and genetic drift influence amino acid usage in Buchnera, this genome-wide analysis argues that selection is sufficient to affect the amino acid content of proteins with different expression and hydropathy levels.  相似文献   

2.
Automatic comparison of compositionally biased genomes, such as that of the malarial causative agent Plasmodium falciparum (82% adenosine + thymidine), with genomes of average composition, is currently limited. Indeed, popular tools such as BLAST require that amino acid distributions be similar in aligned sequences. However, the P. falciparum genome is so biased that six amino acids account for more than 50% of the protein composition. One reason for the comparison methods failure lies in the compositional difference between the query and the subject proteomes, which is not taken into account in the amino acid substitution matrices. This paper introduces a method to derive substitution matrices, in particular BLOSUM 62, in the frame of the information theory. It allows the construction of non-symmetrical matrices, taking into account the non-symmetric amino acid distributions. The dirAtPf family of matrices allowing the comparison of P. falciparum and A. thaliana is given as an example. This paper further provides an analysis of the obtained matrices in the frame of the information theory, supporting the discrimination advantage they bring.  相似文献   

3.
Correlations between genomic GC contents and amino acid frequencies were studied in the homologous sequences of 12 eubacterial genomes. Results show that amino acids encoded by GC-rich codons increases significantly with genomic GC contents, whereas opposite trend was observed in case of amino acids encoded by GC-poor codons. Further studies show all the amino acids do not change in the predicted direction according to their genomic GC pressure, suggesting that protein evolution is not entirely dictated by their nucleotide frequencies. Amino acid substitution matrix calculated among hydrophobic, amphipathic and hydrophilic amino acid groups' shows that amphipathic and hydrophilic amino acids are more frequently substituted by hydrophobic amino acids than from hydrophobic to hydrophilic or amphipathic amino acids. This indicates that nucleotide bias induces a directional changes in proteome composition in such a way that underwent strong changes in hydropathy values. In fact, significant increases in hydrophobicity values have also been observed with the increase of genomic GC contents. Correlations between GC contents and amino acid compositions in three different predicted protein secondary structures show that hydropathy values increases significantly with GC contents in aperiodic and helix structures whereas strand structure remains insensitive with the genomic GC levels. The relative importance of mutation and selection on the evolution of proteins have been discussed on the basis of these results.  相似文献   

4.
The Selective Advantage of Synonymous Codon Usage Bias in Salmonella   总被引:1,自引:0,他引:1  
The genetic code in mRNA is redundant, with 61 sense codons translated into 20 different amino acids. Individual amino acids are encoded by up to six different codons but within codon families some are used more frequently than others. This phenomenon is referred to as synonymous codon usage bias. The genomes of free-living unicellular organisms such as bacteria have an extreme codon usage bias and the degree of bias differs between genes within the same genome. The strong positive correlation between codon usage bias and gene expression levels in many microorganisms is attributed to selection for translational efficiency. However, this putative selective advantage has never been measured in bacteria and theoretical estimates vary widely. By systematically exchanging optimal codons for synonymous codons in the tuf genes we quantified the selective advantage of biased codon usage in highly expressed genes to be in the range 0.2–4.2 x 10−4 per codon per generation. These data quantify for the first time the potential for selection on synonymous codon choice to drive genome-wide sequence evolution in bacteria, and in particular to optimize the sequences of highly expressed genes. This quantification may have predictive applications in the design of synthetic genes and for heterologous gene expression in biotechnology.  相似文献   

5.
Delineation of the complement of proteins comprising the zygote and ookinete, the early developmental stages of Plasmodium within the mosquito midgut, is fundamental to understand initial molecular parasite-vector interactions. The published proteome of Plasmodium falciparum does not include analysis of the zygote/ookinete stages, nor does that of P. berghei include the zygote stage or secreted proteins. P. gallinaceum zygote, ookinete, and ookinete-secreted/released protein samples were prepared and subjected to Multidimensional protein identification technology (MudPIT). Peptides of P. gallinaceum zygote, ookinete, and ookinete-secreted proteins were identified by MS/MS, mapped to ORFs (> 50 amino acids) in the extent P. gallinaceum whole genome sequence, and then matched to homologous ORFs in P. falciparum. A total of 966 P. falciparum ORFs encoding orthologous proteins were identified; just over 40% of these predicted proteins were found to be hypothetical. A majority of putative proteins with predicted secretory signal peptides or transmembrane domains were hypothetical proteins. This analysis provides a more comprehensive view of the hitherto unknown proteome of the early mosquito midgut stages of P. falciparum. The results underpin more robust study of Plasmodium-mosquito midgut interactions, fundamental to the development of novel strategies of blocking malaria transmission.  相似文献   

6.
The cDNA clone RXF12, which encodes a xylanase (EC 3.2.1.8), was isolated from Arabidopsis thaliana. The C-terminal half of the amino acid sequence of the deduced protein, named AtXyn1, showed similarity with the catalytic domain of barley xylanase X-1. The N-terminal half of AtXyn1 also contained three regions with sequences similar to cellulose-binding domains (CBDs). A xylanase assay revealed that transgenic A. thaliana plants expressing exogenous AtXyn1 fused with enhanced green fluorescent protein (EGFP) possessed approximately twice as much xylanase activity as wild-type plants. Observation by fluorescence microscopy of transgenic A. thaliana plants expressing a fusion protein of AtXyn1 and EGFP suggested that AtXyn1 is a cell wall protein. Analysis of the localization of beta-glucuronidase (GUS) activity in transgenic A. thaliana plants containing a chimeric gene with the upstream sequence of the AtXyn1 gene and the GUS gene demonstrated that the AtXyn1 gene is predominantly expressed in vascular bundles, but not in vessel cells. These data suggest that AtXyn1 is involved in the secondary cell wall metabolism of vascular bundle cells. A database search revealed that four putative xylanase genes exist in the A. thaliana genome, besides the AtXyn1 gene. Of these, two also contain several regions with sequences similar to CBDs in their N-terminal regions. Comparison of the amino acid sequences of the five xylanases suggests a possible process for their molecular evolution.  相似文献   

7.
Anamika  Srinivasan N  Krupa A 《Proteins》2005,58(1):180-189
Protein kinases are central to regulation of cellular signaling in the eukaryotes. Well-conserved and lineage-specific protein kinases have previously been identified from various completely sequenced genomes of eukaryotes. The current work describes a genome-wide analysis for protein kinases encoded in the Plasmodium falciparum genome. Using a few different profile matching methods, we have identified 99 protein kinases or related proteins in the parasite genome. We have classified these kinases into subfamilies and analyzed them in the context of noncatalytic domains that occur in these catalytic kinase domain-containing proteins. Compared to most eukaryotic protein kinases, these sequences vary significantly in terms of their lengths, inserts in catalytic domains, and co-occurring domains. Catalytic and noncatalytic domains contain long stretches of repeats of positively charged and other polar amino acids. Various components of the cell cycle, including 4 cyclin-dependent kinase (CDK) homologues, 2 cyclins, 1 CDK regulatory subunit, and 1 kinase-associated phosphatase, are identified. Identification of putative mitogen-activated protein (MAP) Kinase and MAP Kinase Kinase of P. falciparum suggests a new paradigm in the highly conserved signaling pathway of eukaryotes. The calcium-dependent kinase family, well represented in P. falciparum, shows varying domain combinations with EF-hands and pleckstrin homology domains. The analysis reveals a new subfamily of protein kinases having limited sequence similarity with previously known subfamilies. A new transmembrane kinase with 6 membrane-spanning regions is identified. Putative apicoplast targeting sequences have been detected in some of these protein kinases, suggesting their export to the apicoplast.  相似文献   

8.
The evolutionary potential of a gene is constrained not only by the amino acid sequence of its product, but by its DNA sequence as well. The topology of the genetic code is such that half of the amino acids exhibit synonymous codons that can reach different subsets of amino acids from each other through single mutation. Thus, synonymous DNA sequences should access different regions of the protein sequence space through a limited number of mutations, and this may deeply influence the evolution of natural proteins. Here, we demonstrate that this feature can be of value for manipulating protein evolvability. We designed an algorithm that, starting from an input gene, constructs a synonymous sequence that systematically includes the codons with the most different evolutionary perspectives; i.e., codons that maximize accessibility to amino acids previously unreachable from the template by point mutation. A synonymous version of a bacterial antibiotic resistance gene was computed and synthesized. When concurrently submitted to identical directed evolution protocols, both the wild type and the recoded sequence led to the isolation of specific, advantageous phenotypic variants. Simulations based on a mutation isolated only from the synthetic gene libraries were conducted to assess the impact of sub-functional selective constraints, such as codon usage, on natural adaptation. Our data demonstrate that rational design of synonymous synthetic genes stands as an affordable improvement to any directed evolution protocol. We show that using two synonymous DNA sequences improves the overall yield of the procedure by increasing the diversity of mutants generated. These results provide conclusive evidence that synonymous coding sequences do experience different areas of the corresponding protein adaptive landscape, and that a sequence''s codon usage effectively constrains the evolution of the encoded protein.  相似文献   

9.
Rare codons in E. coli and S. typhimurium signal sequences   总被引:8,自引:0,他引:8  
D M Burns  I R Beacham 《FEBS letters》1985,189(2):318-324
Codon usage has been examined in the signal sequences of 27 genes encoding proteins which possess leader peptides, and are inner-membrane located or exported. The results have been compared with codon usage in the corresponding coding sequences of most of the mature proteins. A bias is observed in the usage of rare codons for two of the three hydrophobic amino acids for which there are rare codons. Since hydrophobic residues are predominant in leader peptides, we suggest that a resulting concentration of rare codons in the signal sequence may play a role (or have played a role in the evolutionary past) in the secretion process by delaying translation.  相似文献   

10.
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.  相似文献   

11.
A segment of 1160 nucleotides of the FMDV genome has been sequenced using three overlapping fragments of cloned cDNA from FMDV strain O1K. This sequence contains the coding sequence for the viral capsid protein VP1 as shown by its homology to known and newly determined amino acid sequences from this man antigenic polypeptide of the FMDV virion. The structural gene for VP1 comprises 639 nucleotides which specify a sequence of 213 amino acids for the VP1 protein. The coding sequence is not flanked by start and stop codons which is consistent with the mode of biosynthesis of VP1 by post-translational processing of a polyprotein precursor.  相似文献   

12.
13.
Pyrrolysine and selenocysteine use dissimilar decoding strategies   总被引:1,自引:0,他引:1  
Selenocysteine (Sec) and pyrrolysine (Pyl) are known as the 21st and 22nd amino acids in protein. Both are encoded by codons that normally function as stop signals. Sec specification by UGA codons requires the presence of a cis-acting selenocysteine insertion sequence (SECIS) element. Similarly, it is thought that Pyl is inserted by UAG codons with the help of a putative pyrrolysine insertion sequence (PYLIS) element. Herein, we analyzed the occurrence of Pyl-utilizing organisms, Pyl-associated genes, and Pyl-containing proteins. The Pyl trait is restricted to several microbes, and only one organism has both Pyl and Sec. We found that methanogenic archaea that utilize Pyl have few genes that contain in-frame UAG codons, and many of these are followed with nearby UAA or UGA codons. In addition, unambiguous UAG stop signals could not be identified. This bias was not observed in Sec-utilizing organisms and non-Pyl-utilizing archaea, as well as with other stop codons. These observations as well as analyses of the coding potential of UAG codons, overlapping genes, and release factor sequences suggest that UAG is not a typical stop signal in Pyl-utilizing archaea. On the other hand, searches for conserved Pyl-containing proteins revealed only four protein families, including methylamine methyltransferases and transposases. Only methylamine methyltransferases matched the Pyl trait and had conserved Pyl, suggesting that this amino acid is used primarily by these enzymes. These findings are best explained by a model wherein UAG codons may have ambiguous meaning and Pyl insertion can effectively compete with translation termination for UAG codons obviating the need for a specific PYLIS structure. Thus, Sec and Pyl follow dissimilar decoding and evolutionary strategies.  相似文献   

14.
Different codons encoding the same amino acid are not used equally in protein-coding sequences. In bacteria, there is a bias towards codons with high translation rates. This bias is most pronounced in highly expressed proteins, but a recent study of synthetic GFP-coding sequences did not find a correlation between codon usage and GFP expression, suggesting that such correlation in natural sequences is not a simple property of translational mechanisms. Here, we investigate the effect of evolutionary forces on codon usage. The relation between codon bias and protein abundance is quantitatively analyzed based on the hypothesis that codon bias evolved to ensure the efficient usage of ribosomes, a precious commodity for fast growing cells. An explicit fitness landscape is formulated based on bacterial growth laws to relate protein abundance and ribosomal load. The model leads to a quantitative relation between codon bias and protein abundance, which accounts for a substantial part of the observed bias for E. coli. Moreover, by providing an evolutionary link, the ribosome load model resolves the apparent conflict between the observed relation of protein abundance and codon bias in natural sequences and the lack of such dependence in a synthetic gfp library. Finally, we show that the relation between codon usage and protein abundance can be used to predict protein abundance from genomic sequence data alone without adjustable parameters.  相似文献   

15.
以甘蓝型油菜新鲜嫩叶为实验材料提取其总DNA,以其为模板,根据拟南芥Toc33基因编码区序列设计引物,PCR扩增甘蓝型油菜叶绿体外膜蛋白转运机器的构件蛋白基因Toc33,得到两条扩增带,测序结果显示克隆到的两个片段分别长1370bp、1490bp,将这两个片段分别命名为Bn Tpc33-1,Bn Toc33-2,序列比较发现它们之间的同源性为78%,其中外显子的同源性为96%,而内含子的同源性仅为60%。为研究Toc33与同一基因家族的Toc34基因功能间的关系,对拟南芥、油菜、诸葛菜等植物的Toc33、Toc34蛋白序列进行比较分析并构建了分子系统进化树。  相似文献   

16.
17.
We show that in animal mitochondria homologous genes that differ in guanine plus cytosine (G + C) content code for proteins differing in amino acid content in a manner that relates to the G + C content of the codons. DNA sequences were analyzed using square plots, a new method that combines graphical visualization and statistical analysis of compositional differences in both DNA and protein. Square plots divide codons into four groups based on first and second position A + T (adenine plus thymine) and G + C content and indicate differences in amino acid content when comparing sequences that differ in G + C content. When sequences are compared using these plots, the amino acid content is shown to correlate with the nucleotide bias of the genes. This amino acid effect is shown in all protein-coding genes in the mitochondrial genome, including cox I, cox II, and cyt b, mitochondrial genes which are commonly used for phylogenetic studies. Furthermore, nucleotide content differences are shown to affect the content of all amino acids with A + T- and G + C-rich codons. We speculate that phylogenetic analysis of genes so affected may tend erroneously to indicate relatedness (or lack thereof) based only on amino acid content. Received: 3 July 1996 / Accepted: 6 November 1996  相似文献   

18.
The genomic era has seen a remarkable increase in the number of genomes being sequenced and annotated. Nonetheless, annotation remains a serious challenge for compositionally biased genomes. For the preliminary annotation, popular nucleotide and protein comparison methods such as BLAST are widely employed. These methods make use of matrices to score alignments such as the amino acid substitution matrices. Since a nucleotide bias leads to an overall bias in the amino acid composition of proteins, it is possible that a genome with nucleotide bias may have introduced atypical amino acid substitutions in its proteome. Consequently, standard matrices fail to perform well in sequence analysis of these genomes. To address this issue, we examined the amino acid substitution in the AT-rich genome of Plasmodium falciparum, chosen as a reference and reconstituted a substitution matrix in the genome's context. The matrix was used to generate protein sequence alignments for the parasite proteins that improved across the functional regions. We attribute this to the consistency that may have been achieved amid the target and background frequencies calculated exclusively in our study. This study has important implications on annotation of proteins that are of experimental interest but give poor sequence alignments with standard conventional matrices.  相似文献   

19.
Many malarial antigens contain extensive arrays of tandemly repeated short amino acid sequences, and much of the antibody response induced by malaria infections is directed against these repeats. Indeed, it has been hypothesized that these repeats function to elicit a relatively ineffective T-cell-independent antibody response by the host. In order to test this hypothesis, tandem repeats of Plasmodium species were examined for a bias in composition favoring amino acids likely to form epitopes for the antibody. The genome of Plasmodium is very A+T-rich, and nucleotide compositional bias will, in itself, lead to a high proportion of hydrophilic amino acids. When this bias was controlled for, Plasmodium antigens did not show a higher proportion of hydrophilic amino acids than expected, but there was a significant reduction in the proportion of hydrophobic amino acids in the repeats of the antigens. The amino acid composition of the repeats was thus strikingly different from those seen both in the remainder of the antigens and in a sample of Plasmodium falciparum housekeeping genes.  相似文献   

20.
1. A procedure is described for the detection and assessment of informational complementarity in an amino acid sequence; it is based on possible autocomplementarity in the mRNA, and involves codon-to-codon matching. 2. This procedure was applied to myelin basic protein, a variety of protamines, histone IV, silk fibroin, rat skin collagen alpha1 chain and a sheep keratin. A multiplicity of extensive low-probability informational symmetries, based on codon-to-codon matching, were detected. 3. These low-probability orderings, which are independent of the actual mRNA codons, are rationalized in terms of the evolutionary ordering of the amino acid sequences concerned, in such a way that constraints on the secondary structure of the coding polynucleotides were satisfied. This possible interpretation is supported by a number of significant common properties of the protein sequences analysed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号