首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The 3-base periodicity, identified as a pronounced peak at the frequency N/3 (N is the length of the DNA sequence) of the Fourier power spectrum of protein coding regions, is used as a marker in gene-finding algorithms to distinguish protein coding regions (exons) and noncoding regions (introns) of genomes. In this paper, we reveal the explanation of this phenomenon which results from a nonuniform distribution of nucleotides in the three coding positions. There is a linear correlation between the nucleotide distributions in the three codon positions and the power spectrum at the frequency N/3. Furthermore, this study indicates the relationship between the length of a DNA sequence and the variance of nucleotide distributions and the average Fourier power spectrum, which is the noise signal in gene-finding methods. The results presented in this paper provide an efficient way to compute the Fourier power spectrum at N/3 and the noise signal in gene-finding methods by calculating the nucleotide distributions in the three codon positions.  相似文献   

2.
不具有3-碱基周期性的编码序列初探   总被引:4,自引:0,他引:4  
对120个较短编码序列(<1 200 bp)的Fourier频谱进行分析表明,3-碱基周期性在短编码序列中并不是绝对存在的.统计分析提示,编码序列有无3-碱基周期性与序列的碱基组成和分布、所编码蛋白质氨基酸的选用和顺序以及同义密码子的使用都有一定的关系.一般地,非周期-3序列中A+U含量高于G+C含量,周期-3序列的情况则相反;非周期-3序列中碱基在密码子三个位点上的分布比周期-3序列中的分布均匀;非周期-3序列密码子和氨基酸的使用偏向没有周期-3序列的大.在利用Fourier分析方法预测DNA序列中的基因和外显子时,应充分考虑到这些现象.  相似文献   

3.
The complete nucleotide sequence of the mitochondrial DNA of the amphioxus Branchiostoma lanceolatum has been determined. This mitochondrial genome is small (15 076 bp) because of the short size of the two rRNA genes and the tRNA genes. In addition, this genome contains a very short non-coding region (57 bp) with no sequence reminiscent of a control region. The organisation of the coding genes, as well as of the two rRNA genes, is identical to that of the sea lamprey. Some differences in the repartition of the tRNA genes occur when compared to the lamprey. The mitochondrial codon usage of the amphioxus is reminiscent of that of urochordates since the AGA codon is read as a glycine and not as a stop codon as in vertebrates. Moreover, the base composition at the wobble positions of the codon is strongly biased toward guanine. Altogether, these data clearly emphasise the close relationships between amphioxus and vertebrates, and reinforce the notion that prochordates may be viewed as the brother group of vertebrates.  相似文献   

4.
It is well known that due to the degeneracy of genetic code, most of the silent substitutions appear in the third codon position, so the mutation frequency of the third codon position is much higher than that of the first two positions. However, it remains unknown whether the directionality of point mutation in three codon positions is similar or not. In this paper, through analyzing 15 sets of orthologous genes, it is revealed that most of the substitution types are significantly different between any two codon positions, especially between the 2nd and the 3rd phases. Furthermore, the average frequencies of each type of substitution calculated from the fifteen sets of orthologous genes are similar to those identified in single nucleotide polymorphisms (SNPs) of human and mouse genome. The present analyses suggest that the nucleotide substitution in protein-coding sequences is not only context-dependent (so called neighboring-nucleotide effects), but also phase-dependent, which is of significance to improving the prevalent nucleotide-evolution models.  相似文献   

5.
The hepatitis B virus (HBV) has a circular DNA genome of about 3,200 base pairs. Economical use of the genome with overlapping reading frames may have led to severe constraints on nucleotide substitutions along the genome and to highly variable rates of substitution among nucleotide sites. Nucleotide sequences from 13 complete HBV genomes were compared to examine such variability of substitution rates among sites and to examine the phylogenetic relationships among the HBV variants. The maximum likelihood method was employed to fit models of DNA sequence evolution that can account for the complexity of the pattern of nucleotide substitution. Comparison of the models suggests that the rates of substitution are different in different genes and codon positions; for example, the third codon position changes at a rate over ten times higher than the second position. Furthermore, substantial variation of substitution rates was detected even after the effects of genes and codon positions were corrected; that is, rates are different at different sites of the same gene or at the same codon position. Such rates after the correction were also found to be positively correlated at adjacent sites, which indicated the existence of conserved and variable domains in the proteins encoded by the viral genome. A multiparameter model validates the earlier finding that the variation in nucleotide conservation is not random around the HBV genome. The test for the existence of a molecular clock suggests that substitution rates are more or less constant among lineages. The phylogenetic relationships among the viral variants were examined. Although the data do not seem to contain sufficient information to resolve the details of the phylogeny, it appears quite certain that the serotypes of the viral variants do not reflect their genetic relatedness. Correspondence to: Z. Yang  相似文献   

6.
《Genomics》2020,112(6):4657-4665
Given the high therapeutic value of the staphylococcal phage, the genome co-evolution of the phage and the host has gained great attention. Though the genome-wide AT richness in staphylococcal phages has been well-studied with nucleotide usage bias, here we proved that host factor, lifestyle and taxonomy are also important factors in understanding the phage nucleotide usages bias using information entropy formula. Such correlation is especially prominent when it comes to the synonymous codon usages of staphylococcal phages, despite the overall scattered codon usage pattern represented by principal component analysis. This strong relationship is explained by nucleotide skew which testified that the usage biases of nucleotide at different codon positions are acting on synonymous codons. Therefore, our study reveals a hidden relationship of genome evolution with host limitation and phagic phenotype, providing new insight into phage genome evolution at genetic level.  相似文献   

7.
Flanking regulatory long terminal repeats (LTRs) in Human endogenous retrovirus (HERV) is a kind of typical DNA repeat that is widespread in the human genome. Currently, many algorithms have been developed to detect the latent periodicity of a wide range of DNA repeats. However, no such attempt was made for HERV LTRs. The present study focused on the investigation of the possible sequence periodic patterns in the HERV LTRs and their regulatory mechanisms. We calculated the sequence periods of 5′, 3′ and combined LTRs in HERVs with our devised matrix simulation algorithm. It is interesting that 5′ and 3′ LTRs have the same period of 7, and combined LTRs have a period of 9. These results indicated that HERV LTRs have predominant periodic patterns. Based on the obtained sequence periodicity, we constructed periodic consensus sequences of 5′, 3′ and combined LTRs. As to 5′ and 3′ LTRs with the same period – 7, we manually scanned the nucleotide bases in the corresponding positions of their periodic consensus sequences, and found some positions have the nucleotide base unchanged, such as the 1st, 5th and 7th positions. These conservative nucleotide base positions represent critical binding sites of regulatory LTRs, and may be indicative of conserved regulatory mechanisms in LRT-participating regulatory networks.  相似文献   

8.
The complete mitochondrial genome sequence of the parasitic nematode Strongyloides stercoralis was determined, and its organisation and structure compared with other nematodes for which complete mitochondrial sequence data were available. The mitochondrial genome of S. stercoralis is 13,758 bp in size and contains 36 genes (all transcribed in the clockwise direction) but lacks the atp8 gene. This genome has a high T content (55.9%) and a low C content (8.3%). Corresponding to this T content, there are 16 (poly-T) tracts of >/=12 Ts distributed across the genome. In protein-coding genes, the T bias is greatest (76.4%) at the third codon position compared with the first and second codon positions. Also, the C content is higher at the first (9.3%) and second (13.4%) codon positions than at the third (2%) position. These nucleotide biases have a significant effect on predicted codon usage patterns and, hence, on amino acid compositions of the mitochondrial proteins. Interestingly, six of the 12 protein-coding genes are predicted to employ a unique initiation codon (TTT), which has not yet been reported for any other animal mitochondrial genome. The secondary structures predicted for the 22 transfer RNA (trn) genes and the two ribosomal RNA (rrn) genes are similar to those of other nematodes. In contrast, the gene arrangement in the mitochondrial genome of S. stercoralis is different from all other nematodes studied to date, revealing only a limited number of shared gene boundaries (atp6-nad2 and cox2-rrnL). Evolutionary analyses of mitochondrial nucleotide and amino acid sequence data sets for S. stercoralis and seven other nematodes demonstrate that the mitochondrial genome provides a rich source of phylogenetically informative characters. In conclusion, the S. stercoralis mitochondrial genome, with its unique gene order and characteristics, should provide a resource for comparative mitochondrial genomics and systematics studies of parasitic nematodes.  相似文献   

9.
10.
It has been reported earlier that the relative di-nucleotide frequency (RDF) in different parts of a genome is similar while the frequency is variable among different genomes. So RDF is termed as genome signature in bacteria. It is not known if the constancy in RDF is governed by genome wide mutational bias or by selection. Here we did comparative analysis of RDF between the inter-genic and the coding sequences in seventeen bacterial genomes, whose gene expression data was available. The constraint on di-nucleotides was found to be higher in the coding sequences than that in the inter-genic regions and the constraint at the 2nd codon position was more than that in the 3rd position within a genome. Further analysis revealed that the constraint on di-nucleotides at the 2nd codon position is greater in the high expression genes (HEG) than that in the whole genomes as well as in the low expression genes (LEG). We analyzed RDF at the 2nd and the 3rd codon positions in simulated coding sequences that were computationally generated by keeping the codon usage bias (CUB) according to genome G+C composition and the sequence of amino acids unaltered. In the simulated coding sequences, the constraint observed was significantly low and no significant difference was observed between the HEG and the LEG in terms of di-nucleotide constraint. This indicated that the greater constraint on di-nucleotides in the HEG was due to the stronger selection on CUB in these genes in comparison to the LEG within a genome. Further, we did comparative analyses of the RDF in the HEG rpoB and rpoC of 199 bacteria, which revealed a common pattern of constraints on di-nucleotides at the 2nd codon position across these bacteria. To validate the role of CUB on di-nucleotide constraint, we analyzed RDF at the 2nd and the 3rd codon positions in simulated rpoB/rpoC sequences. The analysis revealed that selection on CUB is an important attribute for the constraint on di-nucleotides at these positions in bacterial genomes. We believe that this study has come with major findings of the role of CUB on di-nucleotide constraint in bacterial genomes.  相似文献   

11.
《Gene》1998,215(2):405-413
Biases in the codon usage and base compositions at three codon sites in different genes of A+T-rich Gram-negative bacterium Haemophillus influenzae and G+C-rich Gram-positive bacterium Mycobacterium tuberculosis have been examined to address the following questions: (1) whether the synonymous codon usage in organisms having highly skewed base compositions is totally dictated by the mutational bias as reported previously (Sharp, P.M., Devine, K.M., 1989. Codon usage and gene expression level in Dictyostelium discoideum: highly expressed genes do `prefer' optimal codons. Nucleic Acids Res. 17, 5029–5039), or is also controlled by translational selection; (2) whether preference of G in the first codon positions by highly expressed genes, as reported in Escherichia coli (Gutierrez, G., Marquez, L., Marin, A., 1996. Preference for guanosine at first codon position in highly expressed Escherichia coli genes. A relationship with translational efficiency. Nucleic Acids Res. 24, 2525–2527), is true in other bacteria; and (3) whether the usage of bases in three codon positions is species-specific. Result presented here show that even in organisms with high mutational bias, translational selection plays an important role in dictating the synonymous codon usage, though the set of optimal codons is chosen in accordance with the mutational pressure. The frequencies of G-starting codons are positively correlated to the level of expression of genes, as estimated by their Codon Adaptation Index (CAI) values, in M. tuberculosis as well as in H. influenzae in spite of having an A+T-rich genome. The present study on the codon preferences of two organisms with oppositely skewed base compositions thus suggests that the preference of G-starting codons by highly expressed genes might be a general feature of bacteria, irrespective of their overall G+C contents. The ranges of variations in the frequencies of individual bases at the first and second codon positions of genes of both H. influenzae and M. tuberculosis are similar to those of E. coli, implying that though the composition of all three codon positions is governed by a selection-mutation balance, the mutational pressure has little influence in the choice of bases at the first two codon positions, even in organisms with highly biased base compositions.  相似文献   

12.
A repeating unit of the histone gene cluster from Drosophila simulans containing the H1, H2A, H2B and H4 genes (the H3 gene region has already been analyzed) was cloned and analyzed. A nucleotide sequence of about 4.6 kbp was determined to study the nucleotide divergence and molecular evolution of the histone gene cluster. Comparison of the structure and nucleotide sequence with those of Drosophila melanogaster showed that the four histone genes were located at identical positions and in the same directions. The proportion of different nucleotide sites was 6.3% in total. The amino acid sequence of H1 was divergent, with a 5.1% difference. However, no amino acid change has been observed for the other three histone proteins. Analysis of the GC contents and the base substitution patterns in the two lineages, D. melanogaster and D. simulans, with a common ancestor showed the following. 1) A strong negative correlation was found between the GC content and the nucleotide divergence in the whole repeating unit. 2) The mode of molecular evolution previously found for the H3 gene was also observed for the whole repeating unit of histone genes; the nucleotide substitutions were stationary in the 3' and spacer regions, and there was a directional change of the codon usage to the AT-rich codons. 3) No distinct difference in the mode or pattern of molecular evolution was detected for the histone gene repeating unit in the D. melanogaster and D. simulans lineages. These results suggest that selectional pressure for the coding regions of histones, which eliminate A and T, is less effective in the D. melanogaster and D. simulans lineages than in the other GC-rich species.  相似文献   

13.
We have isolated and analysed a 2 kb region of the mitochondrial genome of Arabidopsis thaliana (Columbia) showing a high level of nucleotide identity with the mitochondrial (mt) rps14 small-subunit ribosomal protein gene from Oenothera berteriana and Vicia faba, as well as with an open reading frame (ORF) located upstream of the nad3 locus in O. berteriana. The rps14 locus is present as a single copy in the A. thaliana mt genome and has a translational stop codon located near the initiation codon, as well as a deletion of one nucleotide that disturbs the coding sequence. The cloning and sequencing of nine amplified mt rps14 cDNAs clearly demonstrated that this gene is transcribed and that the mRNA precursors are edited at three positions, all involving C-to-U conversions. No editing events changing the stop codon and restoring the correct coding sequence were witnessed within the 9 individual cDNA clones. Therefore, we conclude that the single rps14 sequence of the mitochondrial genome from A. thealiana is in fact a pseudogene that is transcribed and edited but not translated.  相似文献   

14.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

15.
J L Weber 《Gene》1987,52(1):103-109
The genome of the human malaria parasite Plasmodium falciparum has an A + T content of about 82%, higher than any other organism whose DNA has been characterized. Computer analysis of 36 kb of available nucleotide sequences from this species showed that the coding regions, with an A + T content of 69.0%, are flanked by more A + T-rich regions of 86.0% A + T. Within the coding sequences, the A/T ratio was 1.68 in the mRNA sense strand, and overall A + T content in the three codon positions increased in the order 1st-2nd-3rd position. Codons with T or especially A in the third position were strongly preferred. Codon usage among individual parasite genes was very similar compared to genes from other species. Dinucleotide frequencies for the parasite DNA were close to those expected for a random sequence with the known base composition, except that the CpG frequency in the coding sequences was low.  相似文献   

16.
Variation in GC content, GC skew and AT skew along genomic regions was examined at third codon positions in completely sequenced prokaryotes. Eight out of nine eubacteria studied show GC and AT skews that change sign at the origin of replication. The leading strand in DNA replication is G-T rich at codon position 3 in six eubacteria, but C-T rich in two Mycoplasma species. In M. genitalium the AT and GC skews are symmetrical around the origin and terminus of replication, whereas its GC content variation has been shown to have a centre of symmetry elsewhere in the genome. Borrelia burgdorferi and Treponema pallidum show extraordinary extents of base composition skew correlated with direction of DNA replication. Base composition skews measured at third codon positions probably reflect mutational biases, whereas those measured over all bases in a sequence (or at codon positions 1 and 2) can be strongly affected by protein considerations due to the tendency in some bacteria for genes to be transcribed in the same direction that they are replicated. Consequently in some species the direction of skew for total genomic DNA is opposite to that for codon position 3. Received: 2 February 1998 / Accepted: 15 June 1998  相似文献   

17.
Wada and colleagues have shown that, whether prokaryotic or eukaryotic, each gene has a "homostabilising propensity" to adopt a relatively uniform GC percentage (GC%). Accordingly, each gene can be viewed as a "microisochore" occupying a discrete GC% niche of relatively uniform base composition amongst its fellow genes. Although first, second and third codon positions usually differ in GC%, each position tends to maintain a uniform, gene-specific GC% value. Thus, within a genome, genic GC% values can cover a wide range. This is most evident at third codon positions, which are least constrained by amino acid encoding needs. In 1991, Wada and colleagues further noted that, within a phylogenetic group, genomic GC% values can also cover a wide range. This is again most evident at third codon positions. Thus, the dispersion of GC% values among genes within a genome matches the dispersion of GC% values among genomes within a phylogenetic group. Wada described the context-independence of plots of different codon position GC% values against total GC% as a "universal" characteristic. Several studies relate this to recombination. We have confirmed that third codon positions usually relate more to the genes that contain them than to the species. However, in genomes with extreme GC% values (low or high), third codon positions tend to maintain a constant GC%, thus relating more to the species than to the genes that contain them. Genes in an extreme-GC% genome collectively span a smaller GC% range, and mainly rely on first and second codon positions for differentiation as "microisochores". Our results are consistent with the view that differences in GC% serve to recombinationally isolate both genome sectors (facilitating gene duplication) and genomes (facilitating genome duplication, e.g. speciation). In intermediate-GC% genomes, conflict between the needs of the species and the needs of individual genes within that species is minimal. However, in extreme-GC% genomes there is a conflict, which is settled in favour of the species (i.e. group selection) rather than in favour of the gene (genic selection).  相似文献   

18.
This paper analyzes the nucleotide sequences of three viruses: Kunjin, west Nile, and yellow fever. Each virus has one long open reading frame of greater than 10,200 nucleotides that codes for four structural and seven nonstructural genes. The Kunjin and west Nile viruses are the most closely related pair, when assessed on the basis of matches between their nucleotide sequences. As would be expected, the matching is least for bases at third-position codon sites and is greatest for second-position sites. Statistics are presented for the numbers of mismatches that are transitions or transversions. Nucleotide base usage is also reported. To each of the 33 virus-gene segments, nonhomogeneous Markov chain models have been fitted to describe the sequences of nucleotide bases. The models allow for different transition probabilities ("transition" is used in the mathematical sense here) and for different degrees of dependency, at the three sites in the codons. Reasonably satisfactory fits can be obtained for many of the genes by using models that are first order for both first- and second-position sites in the codon but that are second order for third-position sites. One consequence of such a model is that the correlation between one amino acid and the next is limited to the correlation of the last base of the former with the first base of the latter. Other consequences are that the model can (and does) prohibit the occurrence of stop codons within a gene and that subsequences of only first-position bases, or only third-position bases, are also first-order Markov chains. In theory, second-position subsequences may not be Markov chains at all. In practice, the data suggest that each of these subsequences is effectively a zero-order Markov chain, i.e., bases spaced three apart are statistically independent. Stationarity of nucleotide base distributions can be interpreted in either of two ways: (1) spatially along the sites or (2) temporally at each site. These interpretations must often be inconsistent, when the former allows for Markov dependence between adjacent sites whereas the latter assumes independence between sites. The inconsistency can be overcome, for these viruses, if subsequences at different codon positions are analyzed separately.  相似文献   

19.

Background  

The order Tetraodontiformes consists of approximately 429 species of fishes in nine families. Members of the order exhibit striking morphological diversity and radiated into various habitats such as freshwater, brackish and coastal waters, open seas, and deep waters along continental shelves and slopes. Despite extensive studies based on both morphology and molecules, there has been no clear resolution except for monophyly of each family and sister-group relationships of Diodontidae + Tetraodontidae and Balistidae + Monacanthidae. To address phylogenetic questions of tetraodontiform fishes, we used whole mitochondrial genome (mitogenome) sequences from 27 selected species (data for 11 species were newly determined during this study) that fully represent all families and subfamilies of Tetraodontiformes (except for Hollardinae of the Triacanthodidae). Partitioned maximum likelihood (ML) and Bayesian analyses were performed on two data sets comprising concatenated nucleotide sequences from 13 protein-coding genes (all positions included; third codon positions converted into purine [R] and pyrimidine [Y]), 22 transfer RNA and two ribosomal RNA genes (total positions = 15,084).  相似文献   

20.
Singer GA  Hickey DA 《Gene》2003,317(1-2):39-47
A number of recent studies have shown that thermophilic prokaryotes have distinguishable patterns of both synonymous codon usage and amino acid composition, indicating the action of natural selection related to thermophily. On the other hand, several other studies of whole genomes have illustrated that nucleotide bias can have dramatic effects on synonymous codon usage and also on the amino acid composition of the encoded proteins. This raises the possibility that the thermophile-specific patterns observed at both the codon and protein levels are merely reflections of a single underlying effect at the level of nucleotide composition. Moreover, such an effect at the nucleotide level might be due entirely to mutational bias. In this study, we have compared the genomes of thermophiles and mesophiles at three levels: nucleotide content, codon usage and amino acid composition. Our results indicate that the genomes of thermophiles are distinguishable from mesophiles at all three levels and that the codon and amino acid frequency differences cannot be explained simply by the patterns of nucleotide composition. At the nucleotide level, we see a consistent tendency for the frequency of adenine to increase at all codon positions within the thermophiles. Thermophiles are also distinguished by their pattern of synonymous codon usage for several amino acids, particularly arginine and isoleucine. At the protein level, the most dramatic effect is a two-fold decrease in the frequency of glutamine residues among thermophiles. These results indicate that adaptation to growth at high temperature requires a coordinated set of evolutionary changes affecting (i) mRNA thermostability, (ii) stability of codon-anticodon interactions and (iii) increased thermostability of the protein products. We conclude that elevated growth temperature imposes selective constraints at all three molecular levels: nucleotide content, codon usage and amino acid composition. In addition to these multiple selective effects, however, the genomes of both thermophiles and mesophiles are often subject to superimposed large changes in composition due to mutational bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号