首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Correspondence analysis of amino acid frequencies was applied to 75 complete coding sequences from the unicellular parasite Giardia lamblia, and it was found that three major factors influence the variability of amino acidic composition of proteins. The first trend strongly correlated with (a) the cysteine content and (b) the mean weight of the amino acids used in each protein. The second trend correlated with the global levels of hydropathy and aromaticity of each protein. Both axes might be related with the defense of the parasite to oxygen free radicals. Finally, the third trend correlated with the expressivity of each gene, indicating that in G. lamblia highly expressed sequences display a tendency to preferentially use a subset of the total amino acids.  相似文献   

2.
Summary The genomes of human viruses herpes simplex 1 (HSV1) and varicella zoster (VZV), although similar in biology, largely concordant in gene order, and identical in many amino acid segments, differ widely in their genomic G+C (abbreviated S) content, which is high in HSV1 (68%) and low in VZV (46%). This paper analyzes several striking codon usage contrasts. The S difference in coding regions is dramatically large in codon site 3, S3, about 42%. The large difference in S3 is maintained at the same level in a subset of closely similar genes and even in corresponding identical amino acid blocks. A similar difference in S levels in silent site 1 (S1) is found in leucine and arginine. The difference in S3 levels occurs in every gene and in every multicodon amino acid form. The S difference also exists in amino acid usage, with HSV1 using significantly more codon types SSN, while VZV uses more codon types WWN (where W stands for A or T). The nonoverlapping and narrow histograms of S3 gene frequencies in both viruses suggest that the difference has arisen and been maintained by a process of selective rather than nonselective effects. This is in sharp contrast to the relatively large variance seen for highly similar genes in the human versus yeast analysis. Interpretations and hypotheses to explain the HSV1 vs VZV condon usage disparity relate to virus-host interactions, to the role of viral genes in DNA metabolism, to availability of molecular resources (molecular Gause exclusion principle), and to differences in genomic structure.  相似文献   

3.
Minimal bacterial gene set comprises the genetic elements needed for survival of engineered bacterium on a rich medium. This set is estimated to include 300–350 protein-coding genes. One way of simplifying an organism with such a minimal genome even further is to constrain the amino acid content of its proteins. In this study, comparative genomics approaches and the results of gene knockout experiments were used to extrapolate the minimal gene set of mollicutes, and bioinformatics combined with the knowledge-based analysis of the structure-function relationships in these proteins and their orthologs, paralogs and analogs was applied to examine the challenges of completely replacing the rarest residue, cysteine. Among several known functions of cysteine residues, their roles in the active centers of the enzymes responsible for deoxyribonucleoside synthesis and transfer RNA modification appear to be crucial, as no alternative chemistry is known for these reactions. Thus, drastic reduction of the content of the rarest amino acid in a minimal proteome appears to be possible, but its complete elimination is challenging.  相似文献   

4.
Summary We have analyzed the correlation that exists between the GC levels of third and first or second codon position for about 1400 human coding sequences. The linear relationship that was found indicates that the large differences in GC level of third codon positions of human genes are paralleled by smaller differences in GC levels of first and second codon positions. Whereas third codon position differences correspond to very large differences in codon usage within the human genome, the first and second codon position differences correspond to smaller, yet very remarkable, differences in the amino acid composition of encoded proteins. Because GC levels of codon positions are linearly correlated with the GC levels of the isochores harboring the corresponding genes, both codon usage and amino acid composition are different for proteins encoded by genes located in isochores of different GC levels. Furthermore, we have also shown that a linear relationship with a unity slope and a correlation coefficient of 0.77 exists between GC levels of introns and exons from the 238 human genes currently available for this analysis. Introns are, however, about 5% lower in GC, on average, than exons from the same genes.  相似文献   

5.
Quantifying the distribution of fitness effects among newly arising mutations in the human genome is key to resolving important debates in medical and evolutionary genetics. Here, we present a method for inferring this distribution using Single Nucleotide Polymorphism (SNP) data from a population with non-stationary demographic history (such as that of modern humans). Application of our method to 47,576 coding SNPs found by direct resequencing of 11,404 protein coding-genes in 35 individuals (20 European Americans and 15 African Americans) allows us to assess the relative contribution of demographic and selective effects to patterning amino acid variation in the human genome. We find evidence of an ancient population expansion in the sample with African ancestry and a relatively recent bottleneck in the sample with European ancestry. After accounting for these demographic effects, we find strong evidence for great variability in the selective effects of new amino acid replacing mutations. In both populations, the patterns of variation are consistent with a leptokurtic distribution of selection coefficients (e.g., gamma or log-normal) peaked near neutrality. Specifically, we predict 27–29% of amino acid changing (nonsynonymous) mutations are neutral or nearly neutral (|s|<0.01%), 30–42% are moderately deleterious (0.01%<|s|<1%), and nearly all the remainder are highly deleterious or lethal (|s|>1%). Our results are consistent with 10–20% of amino acid differences between humans and chimpanzees having been fixed by positive selection with the remainder of differences being neutral or nearly neutral. Our analysis also predicts that many of the alleles identified via whole-genome association mapping may be selectively neutral or (formerly) positively selected, implying that deleterious genetic variation affecting disease phenotype may be missed by this widely used approach for mapping genes underlying complex traits.  相似文献   

6.
F Yamao  Y Andachi  A Muto  T Ikemura    S Osawa 《Nucleic acids research》1991,19(22):6119-6122
Transfer RNAs of Mycoplasma capricolum were separated by two-dimensional polyacrylamide gel electrophoresis, and the relative abundance of each of the 28 known tRNA species was measured. There existed a correlation between the relative amount of isoacceptor tRNAs and the frequency in choosing synonymous codons that could be translated by the isoacceptors. Furthermore, it was observed that the total amount of tRNAs for a particular amino acid was paralleled by the composition of the amino acid in ribosomal proteins. A similar relationship was obtained from reexamination of the previous data on Escherichia coli tRNAs, suggesting that the amount of tRNAs for an amino acid is affected by the usage of the amino acid in proteins.  相似文献   

7.
Biased usage of synonymous codons has been elucidated under the perspective of cellular tRNA abundance for quite a long time now. Taking advantage of publicly available gene expression data for Saccharomyces cerevisiae, a systematic analysis of the codon and amino acid usages in two different coding regions corresponding to the regular (helix and strand) as well as the irregular (coil) protein secondary structures, have been performed. Our analyses suggest that apart from tRNA abundance, mRNA folding stability is another major evolutionary force in shaping the codon and amino acid usage differences between the highly and lowly expressed genes in S. cerevisiae genome and surprisingly it depends on the coding regions corresponding to the secondary structures of the encoded proteins. This is obviously a new paradigm in understanding the codon usage in S. cerevisiae. Differential amino acid usage between highly and lowly expressed genes in the regions coding for the irregular protein secondary structure in S. cerevisiae is expounded by the stability of the mRNA folded structure. Irrespective of the protein secondary structural type, the highly expressed genes always tend to encode cheaper amino acids in order to reduce the overall biosynthetic cost of production of the corresponding protein. This study supports the hypothesis that the tRNA abundance is a consequence of and not a reason for the biased usage of amino acid between highly and lowly expressed genes.  相似文献   

8.
Highly expressed genes in any species differ in the usage frequency of synonymous codons. The relative recurrence of an event of the favored codon pair (amino acid pairs) varies between gene and genomes due to varying gene expression and different base composition. Here we propose a new measure for predicting the gene expression level, i.e., codon plus amino bias index (CABI). Our approach is based on the relative bias of the favored codon pair inclination among the genes, illustrated by analyzing the CABI score of the Medicago truncatula genes. CABI showed strong correlation with all other widely used measures (CAI, RCBS, SCUO) for gene expression analysis. Surprisingly, CABI outperforms all other measures by showing better correlation with the wet-lab data. This emphasizes the importance of the neighboring codons of the favored codon in a synonymous group while estimating the expression level of a gene.  相似文献   

9.
Keratinocytes are the main cell type of the epidermis. They secrete a variety of proteins and peptides that have diverse roles in epidermal physiology. In this report, we present purification and partial amino acid sequence of LEKTI, a serine proteinase inhibitor, and DAN (NO3) zinc-finger protein, a tumor suppressor protein of neuroblastoma, from human keratinocyte conditioned medium. Epidermal keratinocytes were isolated from human foreskin and serially passaged in a defined medium (MSBM). At confluence of the fourth passage, MSBM medium was replaced with protein-free Dulbecco's modified Eagle medium/F12 (DMEM:F12) 3:1 base medium and collected every 24 h for 4 days. Medium was pooled and concentrated using a stirred cell concentrator. Concentrated medium was diluted 1:1 in 50 mM sodium phosphate, pH 8 buffer, and loaded onto a preparative heparin affinity column. Proteins/peptides were purified from heparin column passthrough by the combination of preparative and analytical FPLC-based gel filtration chromatography and reversed-phase HPLC. Samples electroblotted onto a PVDF support were sequenced by Edman degradation in a gas-phase sequencing system.  相似文献   

10.
The definition of a typical sec-dependent bacterial signal peptide contains a positive charge at the N-terminus, thought to be required for membrane association. In this study the amino acid distribution of all Escherichia coli secretory proteins were analysed. This revealed that there was a statistically significant bias for lysine at the second codon position (P2), consistent with a role for the positive charge in secretion. Removal of the positively charged residue P2 in two different model systems revealed that a positive charge is not required for protein export. A well-characterized feature of large amino acids like lysine at P2 is inhibition of N-terminal methionine removal by methionyl amino-peptidase (MAP). Substitution of lysine at P2 for other large or small amino acids did not affect protein export. Analysis of codon usage revealed that there was a bias for the AAA lysine codon at P2, suggesting that a non-coding function for the AAA codon may be responsible for the strong bias for lysine at P2 of secretory signal sequences. We conclude that the selection for high translation initiation efficiency maybe the selective pressure that has led to codon and consequent amino acid usage at P2 of secretory proteins.  相似文献   

11.
Monoclonal antibodies (mAb) specific for mercuric ions were isolated from BALB/c mice injected with a mercury-containing, hapten-carrier complex. The antibodies reacted by enzyme-linked immunosorbent assay with bovine serum albumin-glutathione-mercuric chloride (BSA-GSH-HgCl) but not with BSA-GSH without mercury. Nucleotide sequences from polymerase chain reaction products encoding six of the antibody heavy-chain variable regions and seven light-chain variable regions revealed that all the antibodies contained an unpaired cysteine residue in one hypervariable region, which is unusual for murine antibodies. Mutagenesis of the cysteine to either tyrosine or serine in one of the Hg-binding antibodies, mAb 4A10, eliminated mercury binding. However, of two influenza-specific antibodies that contain cysteine residues at the same position as mAb 4A10, one reacted with mercury, although not so strongly as 4A10, whereas the other did not react at all. These results suggested that, in addition to an unpaired cysteine, there are other structural features, not yet identified, that are important for creating an appropriate environment for mercury binding. The antibodies described here could be useful for investigating mechanisms of metal-protein interactions and for characterizing antibody responses to structurally simple haptens.  相似文献   

12.

Background  

Understanding the molecular details of protein-DNA interactions is critical for deciphering the mechanisms of gene regulation. We present a machine learning approach for the identification of amino acid residues involved in protein-DNA interactions.  相似文献   

13.
When the amino acid usage of all completely sequenced prokaryotes is studied by multivariate analysis (MVA), it is known that the genomic molar content of guanine plus cytosine (GC) and optimal growth temperature (Topt) have a dominant effect. Furthermore, these two factors are associated to the first two axes of different MVA, and thus, nearly independent among them. However, it was recently shown that for several Families of prokaryotes there are significant and positive correlations between GC and Topt. This trend is particularly clear within Bacillaceae, where there are species displaying a broad range of variations for these two factors. In this paper we report that (a) Topt and genomic GC are the main factors shaping amino acid usage but are not independent between them, (b) the usage of cysteine is the second source of variability, and finally (c) the global hydrophobicity of the encoded proteins of each species is the third main factor.  相似文献   

14.
G Funatsu  M R Islam  Y Minami  K Sung-Sil  M Kimura 《Biochimie》1991,73(7-8):1157-1161
The amino acid sequences of eleven RIPs sequenced to date have been compared in the expectation that this would be useful in the location of functionally and/or structurally important sites of these molecules. In addition to several highly conserved hydrophobic amino acids, thirteen absolutely conserved residues have been found in ricin A-chain: Tyr21, Phe24, Arg29, Tyr80, Tyr123, Gly140, Ala165, Glu177, Ala178, Arg180, Glu208, Asn209 and Trp211. The role of these residues as well as of the C-terminal region have been discussed based on the results of chemical and enzymatic modifications, site-directed mutagenesis, and deletion studies.  相似文献   

15.
16.
Genes involved in the symbiotic interactions between the nitrogen-fixing endosymbiont Bradyrhizobium japonicum, and its leguminous host are mostly clustered in a symbiotic island (SI), acquired by the bacterium through a process of horizontal transfer. A comparative analysis of the codon and amino acid usage in core and SI genes/proteins of B. japonicum has been carried out in the present study. The mutational bias, translational selection, and gene length are found to be the major sources of variation in synonymous codon usage in the core genome as well as in SI, the strength of translational selection being higher in core genes than in SI. In core proteins, hydrophobicity is the main source of variation in amino acid usage, expressivity and aromaticity being the second and third important sources. But in SI proteins, aromaticity is the chief source of variation, followed by expressivity and hydrophobicity. In SI proteins, both the mean molecular weight and mean aromaticity of individual proteins exhibit significant positive correlation with gene expressivity, which violate the cost-minimization hypothesis. Investigation of nucleotide substitution patterns in B. japonicum and Mesorhizobium loti orthologous genes reveals that both synonymous and non-synonymous sites of highly expressed genes are more conserved than their lowly expressed counterparts and this conservation is more pronounced in the genes present in core genome than in SI.  相似文献   

17.
A comparative study of the compositional properties of various protein sets from both cellular and viral organisms is presented. Invariants and contrasts of amino acid usages have been discerned for different protein function classes and for different species using robust statistical methods based on quantile distributions and stochastic ordering relationships. In addition, a quantitative criterion to assess amino acid compositional extremes relative to a reference protein set is proposed and applied. Invariants of amino acid usage relate mainly to the central range of quantile distributions, whereas contrasts occur mainly in the tails of the distributions, especially contrasts between eukaryote and prokaryote species. Influences from genomic constraint are evident, for example, in the arginine:lysine ratios and the usage frequencies of residues encoded by G + C-rich versus A + T-rich codon types. The structurally similar amino acids, glutamate versus aspartate and phenylalanine versus tyrosine, show stochastic dominance relationships for most species protein sets favoring glutamate and phenylalanine respectively. The quantile distribution of hydrophobic amino acid usages in prokaryote data dominates the corresponding quantile distribution in human data. In contrast, glutamate, cysteine, proline and serine usages in human proteins dominate the corresponding quantile distributions in Escherichia coli. E. coli dominates human in the use of basic residues, but no dominance ordering applies to acidic residues. The discussion centers on commonalities and anomalies of the amino acid compositional spectrum in relation to species, function, cellular localization, biochemical and steric attributes, complexity of the amino acid biosynthetic pathway, amino acid relative abundances and founder effects.  相似文献   

18.
Prediction of RNA binding sites in proteins from amino acid sequence   总被引:3,自引:0,他引:3  
RNA-protein interactions are vitally important in a wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses. We have developed a computational tool for predicting which amino acids of an RNA binding protein participate in RNA-protein interactions, using only the protein sequence as input. RNABindR was developed using machine learning on a validated nonredundant data set of interfaces from known RNA-protein complexes in the Protein Data Bank. It generates a classifier that captures primary sequence signals sufficient for predicting which amino acids in a given protein are located in the RNA-protein interface. In leave-one-out cross-validation experiments, RNABindR identifies interface residues with >85% overall accuracy. It can be calibrated by the user to obtain either high specificity or high sensitivity for interface residues. RNABindR, implementing a Naive Bayes classifier, performs as well as a more complex neural network classifier (to our knowledge, the only previously published sequence-based method for RNA binding site prediction) and offers the advantages of speed, simplicity and interpretability of results. RNABindR predictions on the human telomerase protein hTERT are in good agreement with experimental data. The availability of computational tools for predicting which residues in an RNA binding protein are likely to contact RNA should facilitate design of experiments to directly test RNA binding function and contribute to our understanding of the diversity, mechanisms, and regulation of RNA-protein complexes in biological systems. (RNABindR is available as a Web tool from http://bindr.gdcb.iastate.edu.).  相似文献   

19.
20.
Myristoylation by the myristoyl-CoA:protein N-myristoyltransferase (NMT) is an important lipid anchor modification of eukaryotic and viral proteins. Automated prediction of N-terminal N-myristoylation from the substrate protein sequence alone is necessary for large-scale sequence annotation projects but it requires a low rate of false positive hits in addition to a sufficient sensitivity.Our previous analysis of substrate protein sequence variability, NMT sequences and 3D structures has revealed motif properties in addition to the known PROSITE motif that are utilized in a new predictor described here. The composite prediction function (with separate ad hoc parameterization (a) for queries from non-fungal eukaryotes and their viruses and (b) for sequences from fungal species) consists of terms evaluating amino acid type preferences at sequences positions close to the N terminus as well as terms penalizing deviations from the physical property pattern of amino acid side-chains encoded in multi-residue correlation within the motif sequence. The algorithm has been validated with a self-consistency and two jack-knife tests for the learning set as well as with kinetic data for model substrates. The sensitivity in recognizing documented NMT substrates is above 95 % for both taxon-specific versions. The corresponding rate of false positive prediction (for sequences with an N-terminal glycine residue) is close to 0.5 %; thus, the technique is applicable for large-scale automated sequence database annotation. The predictor is available as public WWW-server with the URL http://mendel.imp.univie.ac.at/myristate/. Additionally, we propose a version of the predictor that identifies a number of proteolytic protein processing sites at internal glycine residues and that evaluates possible N-terminal myristoylation of the protein fragments.A scan of public protein databases revealed new potential NMT targets for which the myristoyl modification may be of critical importance for biological function. Among others, the list includes kinases, phosphatases, proteasomal regulatory subunit 4, kinase interacting proteins KIP1/KIP2, protozoan flagellar proteins, homologues of mitochondrial translocase TOM40, of the neuronal calcium sensor NCS-1 and of the cytochrome c-type heme lyase CCHL. Analyses of complete eukaryote genomes indicate that about 0.5 % of all encoded proteins are apparent NMT substrates except for a higher fraction in Arabidopsis thaliana ( approximately 0.8 %).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号