首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.  相似文献   

2.
The similarity of two nucleotide sequences is often expressed in terms of evolutionary distance, a measure of the amount of change needed to transform one sequence into the other. Given two sequences with a small distance between them, can their similarity be explained by their base composition alone? The nucleotide order of these sequences contributes to their similarity if the distance is much smaller than their average permutation distance, which is obtained by calculating the distances for many random permutations of these sequences. To determine whether their similarity can be explained by their dinucleotide and codon usage, random sequences must be chosen from the set of permuted sequences that preserve dinucleotide and codon usage. The problem of choosing random dinucleotide and codon-preserving permutations can be expressed in the language of graph theory as the problem of generating random Eulerian walks on a directed multigraph. An efficient algorithm for generating such walks is described. This algorithm can be used to choose random sequence permutations that preserve (1) dinucleotide usage, (2) dinucleotide and trinucleotide usage, or (3) dinucleotide and codon usage. For example, the similarity of two 60-nucleotide DNA segments from the human beta-1 interferon gene (nucleotides 196-255 and 499-558) is not just the result of their nonrandom dinucleotide and codon usage.   相似文献   

3.
4.
A computer program (PINCERS) is described for use in the design of synthetic genes and mixed-probe DNA sequences. A protein sequence is reverse translated with generation of synonymous codons at each position producing a degenerate sequence. In order to locate potential restriction enzyme sites, the degenerate sequence is searched with a library of restriction enzymes for sites that utilize any combination of synonymous codons. These sites are indicated in a map so that they may be incorporated into the synthetic gene sequence. The program allows the user to select the appropriate codon usage table for the organism of interest and then to set a threshold usage frequency below which codons are not generated. PINCERS may also be used to assist in planning the synthesis of mixed-probe DNA sequences for cross-hybridization experiments. It can identify regions of specified length with the protein sequence that have the least overall degeneracy, thereby minimizing the number of probes to be synthesized and, therefore, maximizing the concentration of a given probe sequence.  相似文献   

5.
Lavner Y  Kotlar D 《Gene》2005,345(1):127-138
We study the interrelations between tRNA gene copy numbers, gene expression levels and measures of codon bias in the human genome. First, we show that isoaccepting tRNA gene copy numbers correlate positively with expression-weighted frequencies of amino acids and codons. Using expression data of more than 14,000 human genes, we show a weak positive correlation between gene expression level and frequency of optimal codons (codons with highest tRNA gene copy number). Interestingly, contrary to non-mammalian eukaryotes, codon bias tends to be high in both highly expressed genes and lowly expressed genes. We suggest that selection may act on codon bias, not only to increase elongation rate by favoring optimal codons in highly expressed genes, but also to reduce elongation rate by favoring non-optimal codons in lowly expressed genes. We also show that the frequency of optimal codons is in positive correlation with estimates of protein biosynthetic cost, and suggest another possible action of selection on codon bias: preference of optimal codons as production cost rises, to reduce the rate of amino acid misincorporation. In the analyses of this work, we introduce a new measure of frequency of optimal codons (FOP'), which is unaffected by amino acid composition and is corrected for background nucleotide content; we also introduce a new method for computing expected codon frequencies, based on the dinucleotide composition of the introns and the non-coding regions surrounding a gene.  相似文献   

6.
Adaptive codon usage provides evidence of natural selection in one of its most subtle forms: a fitness benefit of one synonymous codon relative to another. Codon usage bias is evident in the coding sequences of a broad array of taxa, reflecting selection for translational efficiency and/or accuracy as well as mutational biases. Here, we quantify the magnitude of selection acting on alternative codons in genes of the nematode Caenorhabditis remanei, an outcrossing relative of the model organism C. elegans, by fitting the expected mutation-selection-drift equilibrium frequency distribution of preferred and unpreferred codon variants to the empirical distribution. This method estimates the intensity of selection on synonymous codons in genes with high codon bias as N(e)s = 0.17, a value significantly greater than zero. In addition, we demonstrate for the first time that estimates of ongoing selection on codon usage among genes, inferred from nucleotide polymorphism data, correlate strongly with long-term patterns of codon usage bias, as measured by the frequency of optimal codons in a gene. From the pattern of polymorphisms in introns, we also infer that these findings do not result from the operation of biased gene conversion toward G or C nucleotides. We therefore conclude that coincident patterns of current and ancient selection are responsible for shaping biased codon usage in the C. remanei genome.  相似文献   

7.
A computer program, which runs on MS-DOS personal computers, is described that assists in the design of synthetic genes coding for proteins. The goal of the program is the design of a gene which (i) contains as many unique restriction sites as possible and (ii) uses a specific codon usage. The gene designed according to the criteria above is (i) suitable for 'modular mutagenesis' experiments and (ii) optimized for expression. The program 'reverse-translates' protein sequences into degenerated DNA sequences, generates a map of potential restriction sites and locates sequence positions where unique restriction sites can be accommodated. The nucleic acid sequence is then 'refined' according to a specific codon usage to remove any degeneration. Unique restriction sites, if potentially present, can be 'forced' into the degenerated nucleic acid sequence by using 'priority codes' assigned to different restriction sequences.  相似文献   

8.
Although non-coding RNA (ncRNA) genes do not encode proteins, they play vital roles in cells by producing functionally important RNAs. In this paper, we present a novel method for predicting ncRNA genes based on compositional features extracted directly from gene sequences. Our method consists of two Support Vector Machine (SVM) models--Codon model which uses codon usage features derived from ncRNA genes and protein-coding genes and Kmer model which utilizes features of nucleotide and dinucleotide frequency extracted respectively from ncRNA genes and randomly chosen genome sequences. The 10-fold cross-validation accuracy for the two models is found to be 92% and 91%, respectively. Thus, we could make an automatic prediction of ncRNA genes in one genome without manual filtration of protein-coding genes. After applying our method in Sulfolobus solfataricus genome, 25 prediction results have been generated according to 25 cut-off pairs. We have also applied the approach in E. coli and found our results comparable to those of previous studies. In general, our method enables automatic identification of ncRNA genes in newly sequenced prokaryotic genomes.  相似文献   

9.
Isolation and structure of a rat cytochrome c gene   总被引:18,自引:0,他引:18  
We screened a Charon 4A-rat genomic library using the cloned iso-1 cytochrome c gene from Saccharomyces cerevisiae as a specific hybridization probe. Eight different recombinant phages homologous to a coding region subfragment of the yeast gene were isolated. Nucleotide sequence analysis of a 0.96-kilobase portion of one of these established the existence of a gene coding for a cytochrome c identical in amino acid sequence with that of mouse. The rat polypeptide chain sequence had not previously been determined. In contrast to the yeast iso-1 and iso-2 cytochrome c genes, neither of which have introns, the rat gene contains a single 105-base pair intervening sequence interrupting glycine codon 56. The overall nucleotide sequence homology between cytochrome c genes of yeast and rat is about 62%, with areas of greater homology coinciding with four regions of functionally constrained amino acid sequences. Two of these regions displayed 85-90% DNA sequence homology, including the longest consecutive homologous stretch of 14 nucleotides, corresponding to amino acids 47-52 of the rat protein. Somewhat less homology was observed in the DNA-specifying amino acids 70-80, which are invariant residues in most known cytochrome c molecules. Thermal dissociation of the yeast probe from the homologous rat DNA was at about 58 degrees C in 0.39 M Na+. These results establish that cytochrome c genes may be isolated by interspecies hybridization between widely divergent organisms.  相似文献   

10.
Summary Based on the rates of synonymous substitution in 42 protein-codin gene pairs from rat and human, a correlation is shown to exist between the frequency of the nucleotides in all positions of the codon and the synonymous substitution rate. The correlation coefficients were positive for A and T and negative for C and G. This means that AT-rich genes accumulate more synonymous substitutions than GC-rich genes. Biased patterns of mutation could not account for this phenomenon. Thus, the variation in synonymous substitution rates and the resulting unequal codon usage must be the consequence of selection against A and T in synonymous positions. Most of the varition in rates of synonymous substitution can be explained by the nucleotide composition in synonymous positions. Codon-anticodon interactions, dinucleotide frequencies, and contextual factors influence neither the rates of synonymous substitution nor codon usage. Interestingly, the nucleotide in the second position of codons (always a nonsynonymous position) was found to affect the rate of synonymous substitution. This finding links the rate of nonsynonymous substitution with the synonymous rate. Consequently, highly conservative proteins are expected to be encoded by genes that evolve slowly in terms of synonymous substitutions, and are consequently highly biased in their codon usage.  相似文献   

11.
Summary The nucleic acid sequences coding for 23 H3 histone genes from a variety of species have been analyzed using a computer assisted alignment and analysis program. Although these histones are highly conserved within and between highly divergent species, they represent various classes of histones whose patterns of expression are distinctively regulated. Surprisingly, in dendrograms derived from these comparisons, H3 sequences cluster according to their modes of regulation rather than phylogenetically. These clusters are generated from highly distinctive patterns of codon usage within the functional gene classes. We suggest that one factor involved in specifying the differing codon usage patterns between functional classes is a difference in requirements for rapid translation of mRNA. In addition, the data presented here, together with structural and sequence information, suggest a heterodox evolutionary model in which genes related to the intron-bearing, basally expressed H3.3 vertebrate genes are the ancestors of the intronless H3. 1 class of genes of higher eukaryotes. The H3. 1 class must have arisen, therefore, following duplication of a primitive H3.3 gene, but prior to the plant-animal divergence. Implications of the data presented are discussed with regard to functional and evolutionary relationships.  相似文献   

12.
Different mechanisms regulate the expression level of tissue specific genes in human. Here we report some compositional features such as codon usage bias, amino acid usage bias, codon frequency, and base composition which may be potentially related to mRNA amount of tissue specific tumor suppressor genes. Our findings support the possibility that structural elements in gene and protein may play an important role in the regulation of tumor suppressor genes, development, and tumorigenesis. The data presented here can open broad vistas in the understanding and treatment of a variety of human malignancies.  相似文献   

13.
14.
We present a simple method to detect pathogenicity islands and anomalous gene clusters in bacterial genomes. The method uses iterative discriminant analysis to define genomic regions that deviate most from the rest of the genome in three compositional criteria: G+C content, dinucleotide frequency and codon usage. Using this method, we identify many virulence-related gene islands, e.g. encoding protein secretion systems, adhesins, toxins, and other anomalous gene clusters, such as prophages. The program and the whole dataset, including the catalogs of genes in the detected anomalous segments, are publicly available at http://compbio.sibsnet.org/projects/pai-ida/. This program can be used in searching for virulence-related factors in newly sequenced bacterial genomes.  相似文献   

15.
Summary This paper reports on the relationship between the number of silent differences and the codon usage changes in the lineages leading to human and rat. Examination of 102 pairs of homologous genes gives rise to four main conclusions: (1) We have previously demonstrated the existence of a codon usage change (called the minor shift) between human and rat; this was confirmed here with a larger sample. For genes with extreme C+G frequencies, the C+G level in the third codon position is less extreme in rat than in human. (2) Protein similarity and percentage of positive differences are the two main factors that discriminate homologous genes when characterized by differences between rat and human. By definition, positive differences result from silent changes between A or T and C or G with a direction implying a C+G content variation in the same direction as the overall gene variation. (3) For genes showing both codon usage change and low protein similarity, a majority of amino acid replacements contributes to C+G level variation in positions I and II in the same direction as the variation in position III. This is thus a new example of protein evolution due to constraints acting at the DNA level. (4) In heavy isochores (high C+G content) no direct correlation exists between codon usage change (measured by the dissymmetry of differences) and silent dissimilarity. In light isochores the opposite situation is observed: modification of codon usage is associated with a high synonymous dissimilarity. This result shows that, in some cases, modification of constraints acting at the DNA level could accelerate divergence between genomes.  相似文献   

16.
Role of the code redundancy in determining cotranslational protein folding   总被引:1,自引:0,他引:1  
It has been demonstrated earlier in our laboratory that rare codon clusters can determine the boundaries of the polypeptide chain fragments of the same secondary structure type during the co-translational protein folding. According to this data, co-translational protein folding can occur under condition of a correlation between the frequency of codon choice in mRNAs and the relative abundance of their isoaccepting tRNAs. The alterations in the spectrum and concentrations of the isoaccepting tRNAs in different cells were demonstrated by many authors. The existence of a mechanism of the coordinate regulation of the levels (activities) of the isoaccepting tRNAs, corresponding aminoacyl-tRNA synthetases and mRNAs predominantly translated at a given moment of time can be suggested. Such a mechanism can ensure the needed accuracy of the protein folding process. Analysis of gene sequences of various pro- and eukaryotic organisms carried out in the present work revealed that the codon usage frequency spectra of simultaneously synthesized proteins are similar. The relative appearance of the most rare and frequent codons in investigated gene sequences displays a high degree of conservatism. It has also been found that structural-homologous proteins from different organisms (cytochromes c, myoglobins) have very similar codon frequency distribution profiles. This property retains despite the significant variations in the codon usage spectra in the investigated gene sequences. The data obtained indicate that the codon distribution in mRNAs whose diversity is mainly conditioned by the genetic code redundance is a program that determines translational rates of different mRNA parts thus controlling the spatial folding of the synthesized peptide chain.  相似文献   

17.
MOTIVATION: Analysis of the functions of microorganisms and their dynamics in the environment is essential for understanding microbial ecology. For analysis of highly similar sequences of a functional gene family using microarrays, the previous long oligonucleotide probe design strategies have not been useful in generating probes. RESULTS: We developed a Hierarchical Probe Design (HPD) program that designs both sequence-specific probes and hierarchical cluster-specific probes from sequences of a conserved functional gene based on the clustering tree of the genes, specifically for analyses of functional gene diversity in environmental samples. HPD was tested on datasets for the nirS and pmoA genes. Our results showed that HPD generated more sequence-specific probes than several popular oligonucleotide design programs. With a combination of sequence-specific and cluster-specific probes, HPD generated a probe set covering all the sequences of each test set. AVAILABILITY: http://brcapp.kribb.re.kr/HPD/  相似文献   

18.
A novel bias in codon third-letter usage was found in Escherichia coli genes with low fractions of "optimal codons", by comparing intact sequences with control random sequences. Third-letter usage has been found to be biased according to preference in codon usage and to doublet preference from the following first letter. The present study examines third-letter usage in the context of the nucleotide sequence when these preferences are considered. In order to exclude any influence by these factors, the random sequences were generated such that the amino acid sequence, codon usage, and the doublet frequency in each gene were all preserved. Comparison of intact sequences with these randomly generated sequences reveals that third letters of codons show a strong preference for the purine/pyrimidine pattern of the next codons: purine (R) is preferred to pyrimidine (Y) at the third site when followed by an R-Y-R codon, and pyrimidine is preferred when followed by an R-R-Y, an R-Y-Y or a Y-R-Y codon. This bias is probably related to interactions of tRNA molecules in the ribosome.  相似文献   

19.
Codon usage in bacteria: correlation with gene expressivity   总被引:153,自引:53,他引:100       下载免费PDF全文
The nucleic acid sequence bank now contains over 600 protein coding genes of which 107 are from prokaryotic organisms. Codon frequencies in each new prokaryotic gene are given. Analysis of genetic code usage in the 83 sequenced genes of the Escherichia coli genome (chromosome, transposons and plasmids) is presented, taking into account new data on gene expressivity and regulation as well as iso-tRNA specificity and cellular concentration. The codon composition of each gene is summarized using two indexes: one is based on the differential usage of iso-tRNA species during gene translation, the other on choice between Cytosine and Uracil for third base. A strong relationship between codon composition and mRNA expressivity is confirmed, even for genes transcribed in the same operon. The influence of codon use of peptide elongation rate and protein yield is discussed. Finally, the evolutionary aspect of codon selection in mRNA sequences is studied.  相似文献   

20.
As shown in the accompanying paper (5), the oligonucleotide composition of the E. coli genome is highly asymmetric for sequences up to 6 bp in length when ranked from highest to lowest abundance. We show here that this largely reflects codon usage because heavily used codons were found in the highly abundant oligomers whereas rarely used codons, with some exceptions, occurred in sequences in low abundance. Furthermore, linear regression analysis revealed a strong correlation between the frequencies of each trinucleotide and its usage as a codon. Dinucleotides are also not randomly distributed across each codon position and the dinucleotide composition of genes that are transcribed but not translated (rRNA and tRNA genes) was highly related to that seen in genes encoding polypeptides. However, 45 tetra-, 8 penta-, and 6 hexanucleotides were significantly over- or underabundant by Markov chain analysis and could not be accounted for by codon usage. Of these underrepresented sequences, many were palindromes, including the Dam methylation site.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号