首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In genetic language a peculiar arrangement of biological information is provided by overlapping genes in which the same region of DNA can code for functionally unrelated messages. In this work, the informational content of overlapping genes belonging to prokaryotic and eukaryotic viruses was analyzed. Using information theory indices, we identified in the regions of overlap a first pattern, exhibiting a more uniform base composition and more severe constraints in base ordering with respect to the nonoverlapping regions. This pattern was found to be peculiar to coliphage, avian hepatitis B virus, human lentivirus, and plant luteovirus families. A second pattern, characterized by the occurrence of similar compositional constraints in both types of coding regions, was found to be limited to plant tymoviruses. At the level of codon usage, a low degree of correlation between overlapping and nonoverlapping coding regions characterized the first pattern, whereas a close link was found in tymoviruses, indicating a fine adaptation of the overlapping frame to the original codon choice of the virus. As a result of codon usage correlation analysis, deductions concerning the origin and evolution of several overlapping frames were also proposed. Comparison of amino acid composition revealed an increased frequency of amino acid residues with a high level of degeneracy (arginine, leucine, and serine) in the proteins encoded by overlapping genes; this peculiar feature of overlapping genes can be viewed as a way with which they may expand their coding ability and gain new, specialized functions. Received: 28 October 1996 / Accepted: 29 January 1997  相似文献   

2.
We have studied the statistical constraints on synonymous codon choice to evaluate various proposals regarding the origin of the bias in synonymous codon usage observed by Fiers et al. (1975), Air et al. (1976), Grantham et al. (1980) and others. We have determined the statistical dependence of the degenerate third base on either of its nearest neighbors in mitochondrial, prokaryotic, and eukaryotic coding sequences. We noted an increasing dependence of the third base on its nearest neighbors in moving from mitochrondria to prokaryotes to eukaryotes.A statistical model assuming random equiprobable selection of synonymous codons was found grossly adequate for the mitochondria, but totally indequate for prokaryotes and eukaryotes. A model assuming selection of synonymous codons reflecting a genomic strategy, i.e. the genome hypothesis of Grantham et al. (1980), gave a good approximation of the mitochondrial sequences. A statistical model which exactly maintains codon frequency, but allows the position of corresponding synonymous codons to vary was only grossly adequate for prokaryotes and totally inadequate for eukaryotes. The results of these simulations are consistent with the measures on experimental sequences and suggest that a “frequency constraint” model such as that of Grantham et al. (1980) may be an adequate explanation of the codon usage in mitochondria. However, in addition to this frequency constraint, there may be constraints on synonymous codon choice in prokaryotes due to codon context. Furthermore, any proposal to explain codon usage in eukaryotes must involve a constraint on the context of a codon in the sequence.  相似文献   

3.
MOTIVATION: Viral genomes tend to code in overlapping reading frames to maximize informational content. This may result in atypical codon bias and particular evolutionary constraints. Due to the fast mutation rate of viruses, there is additional strong evidence for varying selection between intra- and intergenomic regions. The presence of multiple coding regions complicates the concept of K(a)/K(s) ratio, and thus begs for an alternative approach when investigating selection strengths. Building on the paper by McCauley and Hein, we develop a method for annotating a viral genome coding in overlapping reading frames. We introduce an evolutionary model capable of accounting for varying levels of selection along the genome, and incorporate it into our prior single sequence HMM methodology, extending it now to a phylogenetic HMM. Given an alignment of several homologous viruses to a reference sequence, we may thus achieve an annotation both of coding regions as well as selection strengths, allowing us to investigate different selection patterns and hypotheses. RESULTS: We illustrate our method by applying it to a multiple alignment of four HIV2 sequences, as well as of three Hepatitis B sequences. We obtain an annotation of the coding regions, as well as a posterior probability for each site of the strength of selection acting on it. From this we may deduce the average posterior selection acting on the different genes. Whilst we are encouraged to see in HIV2, that the known to be conserved genes gag and pol are indeed annotated as such, we also discover several sites of less stringent negative selection within the env gene. To the best of our knowledge, we are the first to subsequently provide a full selection annotation of the Hepatitis B genome by explicitly modelling the evolution within overlapping reading frames, and not relying on simple K(a)/K(s) ratios.  相似文献   

4.
5.
The large open reading frames of insertion sequences from Escherichia coli were examined for their spatial pattern of codon usage bias and distribution of rarely used codons. There is a bias in codon usage that is generally lower toward the terminal ends of the coding regions, which is reflected in the occurrence of an excess of nonpreferred codons in the 3 portions of the coding regions as compared with the 5 portions. In contrast, typical chromosomal genes have a lower codon usage bias toward the 5 ends of the coding regions. These results imply that the selective forces reflected in codon usage bias may differ according to position within the coding sequence. In addition, these constraints apparently differ in important ways between genes contained in insertion sequences and those in the chromosome.  相似文献   

6.
MOTIVATION: Overlapping gene coding sequences (CDSs) are particularly common in viruses but also occur in more complex genomes. Detecting such genes with conventional gene-finding algorithms can be difficult for several reasons. If an overlapping CDS is on the same read-strand as a known CDS, then there may not be a distinct promoter or mRNA. Furthermore, the constraints imposed by double-coding can result in atypical codon biases. However, these same constraints lead to particular mutation patterns that may be detectable in sequence alignments. RESULTS: In this paper, we investigate several statistics for detecting double-coding sequences with pairwise alignments--including a new maximum-likelihood method. We also develop a model for double-coding sequence evolution. Using simulated sequences generated with the model, we characterize the distribution of each statistic as a function of sequence composition, length, divergence time and double-coding frame. Using these results, we develop several algorithms for detecting overlapping CDSs. The algorithms were tested on known overlapping CDSs and other overlapping open reading frames (ORFs) in the hepatitis B virus (HBV), Escherichia coli and Salmonella typhimurium genomes. The algorithms should prove useful for detecting novel overlapping genes--especially short coding ORFs in viruses. AVAILABILITY: Programs may be obtained from the authors. SUPPLEMENTARY INFORMATION: http://biochem.otago.ac.nz/double.html.  相似文献   

7.
Summary Doublet preference analysis was carried out on coding and noncoding regions ofEscherichia coli, Saccharomyces cerevisiae, and human mitochondrial and nuclear DNA. The preference pattern in 1–2 and 2–3 doublets inE. coli andS. cerevisiae correlated with that in noncoding regions. The 3-1 doublet preference inE. coli genes with low optimal codon frequency and inS. cerevisiae genes also showed a correlation with each of their noncoding doublet preference. A mechanism to explain these double preference correlations in doublet preference is presented: mutational biases, the origin of the noncoding region doublet preference, evolved so as to maintain the 1–2 and 2–3 doublet preference, which is determined by codon usage. These biases then acted on the 3-1 doublet, which was almost free of coding constraints, resulting in a similar preference in this doublet.  相似文献   

8.
Purifying and directional selection in overlapping prokaryotic genes   总被引:4,自引:0,他引:4  
In overlapping genes, the same DNA sequence codes for two proteins using different reading frames. Analysis of overlapping genes can help in understanding the mode of evolution of a coding region from noncoding DNA. We identified 71 pairs of convergent genes, with overlapping 3' ends longer than 15 nucleotides, that are conserved in at least two prokaryotic genomes. Among the overlap regions, we observed a statistically significant bias towards the 123:132 phase (i.e. the second codon base in one gene facing the degenerate third position in the second gene). This phase ensures the least mutual constraint on nonconservative amino acid replacements in both overlapping coding sequences. The excess of this phase is compatible with directional (positive) selection acting on the overlapping coding regions. This could be a general evolutionary mode for genes emerging from noncoding sequences, in which the protein sequence has not been subject to selection.  相似文献   

9.
A method for measuring the non-random bias of a codon usage table   总被引:7,自引:3,他引:4       下载免费PDF全文
We describe a new statistical method for measuring bias in the codon usage table of a gene. The test is based on the multinomial and Poisson distributions. The method is used to scan DNA sequences and measure the strength of codon preference. For E. Coli we show that the strength of codon preference is related to levels of gene expression. The method can also be used to compare base triplet frequencies with those expected from the base composition. This second type of codon bias test is useful for distinguishing coding from non-coding regions.  相似文献   

10.
ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.  相似文献   

11.
12.
The hepatitis C virus (HCV) infects at least 3 % of people worldwide, and continues to cause substantial morbidity and mortality. To better understand the phylodynamics and molecular evolution of the HCV 1a, a phylogenetic analysis of 186 full-length genomic sequences isolated from five countries between 1977 and 2009 was conducted in this study. Nucleotide substitution rates and molecular epidemiology were assessed by Bayesian coalescent analysis using time-stamped entire coding sequences. We showed that the substitution rates of ten genomic regions are diverse and higher than those of previously estimated. The coalescent analysis indicated that the transmission of subtype 1a probably started in the second half of the twentieth century and an explosion of epidemics occurred between 1960s and 1980s. Selection analysis suggested that the HCV 1a evolves under purifying selection. However, a total of 58 positively selected sites were detected and further analysis suggested that these sites may play an important role in adaptive evolution of HCV 1a strains. In addition, the codon usage and the factors accounting for shaping the codon usage pattern of HCV 1a were investigated to evaluate the dynamics of the virus evolution. Surveys of codon usage variation showed that mutational pressure and selection pressure account for HCV 1a codon usage pattern.  相似文献   

13.

Background  

Detecting new coding sequences (CDSs) in viral genomes can be difficult for several reasons. The typically compact genomes often contain a number of overlapping coding and non-coding functional elements, which can result in unusual patterns of codon usage; conservation between related sequences can be difficult to interpret – especially within overlapping genes; and viruses often employ non-canonical translational mechanisms – e.g. frameshifting, stop codon read-through, leaky-scanning and internal ribosome entry sites – which can conceal potentially coding open reading frames (ORFs).  相似文献   

14.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

15.
This paper describes a computer method that uses codon preference to help find protein coding regions in long DNA sequences. The method can distinguish between introns and exons and can help to detect sequencing errors.  相似文献   

16.
Codon preference in Dictyostelium discoideum.   总被引:15,自引:5,他引:10       下载免费PDF全文
Dictyostelium discoideum is of increasing interest as a model eukaryotic cell because its many attributes have recently been expanded to include improved genetic and biochemical manipulability. The ability to transform Dictyostelium using drug resistance as a selectable marker (1) and to gene target by high frequency homologous integration (2) makes this organism particularly useful for molecular genetic approaches to cell structure and function. Given this background, it becomes important to analyze the codon preference used in this organism. Dictyostelium displays a strong and unique overall codon preference. This preference varies between different coding regions and even varies between coding regions from the same gene family. The degree of codon preference may be correlated with expression levels but not with the developmental time of expression of the gene product. The strong codon preference can be applied to identify coding regions in Dictyostelium DNA and aid in the design of oligonucleotide probes for cloning Dictyostelium genes.  相似文献   

17.
The trpB and trpA coding regions of the polycistronic trp mRNA of Escherichia coli are separated by overlapping translation stop and start codons. Efficient translation of the trpA coding region is subject to translational coupling, i.e., maximal trpA expression is dependent on prior translation of the trpB coding region. Previous studies demonstrated that the trpA Shine-Dalgarno sequence (within trpB) and/or the location of the trpB stop codon influenced trpA expression. To examine the effect of stop codon location specifically, we constructed plasmids in which different nucleotide sequences preceding the trpA start codon were retained, and only the reading frame was changed. When trpB translation proceeded in the wild type reading frame and terminated at the normal trpB stop codon, trpA polypeptide levels were elevated over the levels observed when translation stopped before or after the natural trpB stop codon. The proximity of the trpB stop codon to the trpA start codon therefore markedly influences trpA expression.  相似文献   

18.
MOTIVATION: Detecting genes in viral genomes is a complex task. Due to the biological necessity of them being constrained in length, RNA viruses in particular tend to code in overlapping reading frames. Since one amino acid is encoded by a triplet of nucleic acids, up to three genes may be coded for simultaneously in one direction. Conventional hidden Markov model (HMM)-based gene-finding algorithms may typically find it difficult to identify multiple coding regions, since in general their topologies do not allow for the presence of overlapping or nested genes. Comparative methods have therefore been restricted to likelihood ratio tests on potential regions as to being double or single coding, using the fact that the constrictions forced upon multiple-coding nucleotides will result in atypical sequence evolution. Exploiting these same constraints, we present an HMM based gene-finding program, which allows for coding in unidirectional nested and overlapping reading frames, to annotate two homologous aligned viral genomes. Our method does not insist on conserved gene structure between the two sequences, thus making it applicable for the pairwise comparison of more distantly related sequences. RESULTS: We apply our method to 15 pairwise alignments of six different HIV2 genomes. Given sufficient evolutionary distance between the two sequences, we achieve sensitivity of approximately 84-89% and specificity of approximately 97-99.9%. We additionally annotate three pairwise alignments of the more distantly related HIV1 and HIV2, as well as of two different hepatitis viruses, attaining results of approximately 87% sensitivity and approximately 98.5% specificity. We subsequently incorporate prior knowledge by 'knowing' the gene structure of one sequence and annotating the other conditional on it. Boosting accuracy close to perfect we demonstrate that conservation of gene structure on top of nucleotide sequence is a valuable source of information, especially in distantly related genomes. AVAILABILITY: The Java code is available from the authors.  相似文献   

19.
20.
The aim of this study was to analyze patterns of nucleotidic composition and codon usage in the pea aphid genome (Acyrthosiphon pisum). A collection of 60,000 expressed sequence tags (ESTs) in the pea aphid has been used to automatically reconstruct 5809 coding sequences (CDSs), based on similarity with known proteins and on coding style recognition. Reconstructions were manually checked for ribosomal proteins, leading to tentatively reconstruct the nea-complete set of this category. Pea aphid coding sequences showed a shift toward AT (especially at the third codon position) compared to drosophila homologues. Genes with a putative high level of expression (ribosomal and other genes with high EST support) remained more GC3-rich and had a distinct codon usage from bulk sequences: they exhibited a preference for C-ending codons and CGT (for arginine), which thus appeared optimal for translation. However, the discrimination was not as strong as in drosophila, suggesting a reduced degree of translational selection. The space of variation in codon usage for A. pisum appeared to be larger than in drosophila, with a substantial fraction of genes that remained GC3-rich. Some of those (in particular some structural proteins) also showed high levels of codon bias and a very strong preference for C-ending codons, which could be explained either by strong translational selection or by other mechanisms. Finally, genomic traces were analyzed to build 206 fragments containing a full CDS, which allowed studying the correlations between GC contents of coding and those of noncoding (flanking and introns) sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号