首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Survey of plant short tandem DNA repeats   总被引:46,自引:0,他引:46  
Length variations in simple sequence tandem repeats are being given increased attention in plant genetics. Some short tandem repeats (STRs) from a few plant species, mainly those at the dinucleotide level, have been demonstrated to show polymorphisms and Mendelian inheritance. In the study reported here a search for all of the possible STRs ranging from mononucleotide up to tetranucleotide repeats was carried out on EMBL and GenBank DNA sequence databases of 3026 kb nuclear DNA and 1268 kb organelle DNA in 54 and 28 plant species (plus algae), respectively. An extreme rareness of STRs (4 STRs in 1268 kb DNA) was detected in organelle compared with nuclear DNA sequences. In nuclear DNA sequences, (AT)n sequences were the most abundant followed by (A)n · (T)n, (AG)n · (CT)n, (AAT)n · (ATT)n, (AAC)n · (GTT), (AGC)n · (GCT)n, (AAG)n · (CTT)n, (AATT)n · (TTAA)n, (AAAT)n · (ATTT)n and (AC)n · (GT)n sequences. A total of 130 STRs were found, including 49 (AT)n sequences in 31 species, giving an average of 1 STR every 23.3 kb and 1 (AT)n STR every 62 kb. An abundance comparable to that for the dinucleotide repeat was observed for the tri- and tetranucleotide repeats together. On average, there was 1 STR every 64.6 kb DNA in monocotyledons versus 1 every 21.2 kb DNA in dicotyledons. The fraction of STRs that contained G-C basepairs increased as the G+C contents went up from dicotyledons, monocotyledons to algae. While STRs of mono-, di- and tetranucleotide repeats were all located in non coding regions, 57% of the trinucleotide STRs containing G-C basepairs resided in coding regions.  相似文献   

2.
De novo origin of coding sequence remains an obscure issue in molecular evolution. One of the possible paths for addition (subtraction) of DNA segments to (from) a gene is stop codon shift. Single nucleotide substitutions can destroy the existing stop codon, leading to uninterrupted translation up to the next stop codon in the gene’s reading frame, or create a premature stop codon via a nonsense mutation. Furthermore, short indels-caused frameshifts near gene’s end may lead to premature stop codons or to translation past the existing stop codon. Here, we describe the evolution of the length of coding sequence of prokaryotic genes by change of positions of stop codons. We observed cases of addition of regions of 3′UTR to genes due to mutations at the existing stop codon, and cases of subtraction of C-terminal coding segments due to nonsense mutations upstream of the stop codon. Many of the observed stop codon shifts cannot be attributed to sequencing errors or rare deleterious variants segregating within bacterial populations. The additions of regions of 3′UTR tend to occur in those genes in which they are facilitated by nearby downstream in-frame triplets which may serve as new stop codons. Conversely, subtractions of coding sequence often give rise to in-frame stop codons located nearby. The amino acid composition of the added region is significantly biased, compared to the overall amino acid composition of the genes. Our results show that in prokaryotes, shift of stop codon is an underappreciated contributor to functional evolution of gene length.  相似文献   

3.
Sorimachi K  Okayasu T 《Amino acids》2008,34(4):661-668
When nucleotide (G, C, T and A) contents were plotted against each nucleotide, their relationships were clearly expressed by a linear formula, y = αx + β in the coding and non-coding regions. This linear relationship was obtained from the complete single-stranded DNA. Similarly, nucleotide contents at all three codon positions were expressed by linear regression lines based on the content of each nucleotide. In addition, 64 codon usages were also expressed by linear formulas against nucleotide content. Thus, the nucleotide content not only in coding sequence but also in non-coding sequence can be expressed by a linear formula, y = αx + β, in 145 organisms (112 bacteria, 15 archaea and 18 eukaryotes). Based on these results, the ratio of C/T, G/T, C/A or G/A one can essentially estimate all four nucleotide contents in the complete single-stranded DNA, and the determination of any ratio of two kinds of nucleotides can essentially estimate four nucleotide contents, nucleotide contents at the three different codon positions and codon distributions at 64 codons in the coding region. The maximum and minimum values of G content were ∼0.35 and ∼0.15, respectively, among various organisms examined. Codon evolution occurs according to linear formulas between these two values. Electronic supplementary material The online version of this article (doi:) contains supplementary material, which is available to authorized users.  相似文献   

4.
5.
Background: Mitochondrial ND gene, which encodes NADH dehydrogenase, is the first enzyme of the mitochondrial electron transport chain. Leigh syndrome, a neurodegenerative disease caused by mutation in the ND2 gene (T4681C), is associated with bilateral symmetric lesions in basal ganglia and subcortical brain regions. Therefore, it is of interest to analyze mitochondrial DNA to glean information for evolutionary relationship. This study highlights on the analysis of compositional dynamics and selection pressure in shaping the codon usage patterns in the coding sequence of MT-ND2 gene across pisces, aves and mammals by using bioinformatics tools like effective number of codons (ENC), codon adaptation index (CAI), relative synonymous codon usage (RSCU) etc. Results: We observed a low codon usage bias as reflected by high ENC values in MT-ND2 gene among pisces, aves and mammals. The most frequently used codons were ending with A/C at the 3rd position of codon and the gene was AT rich in all the three classes. The codons TCA, CTA, CGA and TGA were over represented in all three classes. The F1 correspondence showed significant positive correlation with G, T3 and CAI while the F2 axis showed significant negative correlation with A and T but significant positive correlation with G, C, G3, C3, ENC, GC, GC1, GC2 and GC3. Conclusions: The codon usage bias in MTND2 gene is not associated with expression level. Mutation pressure and natural selection affect the codon usage pattern in MT-ND 2 gene.  相似文献   

6.
樟树叶绿体基因组密码子偏好性分析   总被引:3,自引:0,他引:3  
秦政  郑永杰  桂丽静  谢谷艾  伍艳芳 《广西植物》2018,38(10):1346-1355
为分析樟树(Cinnamomum camphora)叶绿体基因组密码子偏好性使用模式,该研究利用CodonW、EMBOSS、R语言等软件和程序,对53条樟树叶绿体基因组密码子使用模式及偏好性进行了系统分析。结果表明:樟树叶绿体基因的有效密码子数(ENC)在36.82~59.30之间,表明密码子的偏好性较弱。相对同义密码子使用度(RSCU)分析发现RSCU>1的密码子有32个,其中28个以A、U结尾,表明第3位密码子偏好使用A和U碱基。中性绘图分析发现GC3与GC12的相关性不显著,回归曲线斜率为0.049,说明密码子偏好性主要受到自然选择的影响。ENC-plot分析发现大部分基因落在曲线的下方,同样表明选择是影响密码子偏好性的主要因素。该研究发现共有9个密码子(UUU、CUU、UCA、ACA、UAU、AAU、GAU、UGA、GGA)被鉴定为樟树叶绿体基因组的最优密码子。  相似文献   

7.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

8.
王艳  赵懿琛  赵德刚 《广西植物》2021,41(2):274-282
为了解杜仲基因密码子使用模式,该文以杜仲基因组密码子为研究对象,运用CodonW软件对杜仲的320个蛋白编码基因进行同义密码子相对使用频率(RSCU)分析、ENC-GC3s关联分析编码基因的密码子ENC值、PR2-plot偏倚分析编码基因的密码子碱基使用频率,并运用CUSP软件与Codon Usage Database...  相似文献   

9.
Using all currently predicted coding regions in the honeybee genome, a novel form of synonymous codon bias is presented that affects the usage of particular codons dependent on the surrounding nucleotides in the coding region. Nucleotides at the third codon site are correlated, dependent on their weak (adenine [A] or thyamine [T]) versus strong (guanine [G] or cytosine [C]) status, to nucleotides on the first codon site which are dependent on their purine (A/G) versus pyrimidine (C/T) status. In particular, for adjacent third and first site nucleotides, weak–pyrimidine and strong–purine nucleotide combinations occur much more frequently than the underabundant weak–purine and strong–pyrimidine nucleotide combinations. Since a similar effect is also found in the noncoding regions, but is present for all adjacent nucleotides, this coding effect is most likely due to a genome-wide context-dependent mutation error correcting mechanism in combination with selective constraints on adjacent first and second nucleotide pairs within codons. The position-dependent relationship of synonymous codon usage is evidence for a novel form of codon position bias which utilizes the redundancy in the genetic code to minimize the effect of nucleotide mutations within coding regions. [Reviewing Editor: Dr. Brian Morton]  相似文献   

10.
Behura SK  Severson DW 《Gene》2012,504(2):226-232
We present a detailed genome-scale comparative analysis of simple sequence repeats within protein coding regions among 25 insect genomes. The repetitive sequences in the coding regions primarily represented single codon repeats and codon pair repeats. The CAG triplet is highly repetitive in the coding regions of insect genomes. It is frequently paired with the synonymous codon CAA to code for polyglutamine repeats. The codon pairs that are least repetitive code for polyalanine repeats. The frequency of hexanucleotide and dinucleotide motifs of codon pair repeats is significantly (p<0.001) different in the Drosophila species compared to the non-Drosophila species. However, the frequency of synonymous and non-synonymous codon pair repeats varies in a correlated manner (r(2)=0.79) among all the species. Results further show that perfect and imperfect repeats have significant association with the trinucleotide and hexanucleotide coding repeats in most of these insects. However, only select species show significant association between the numbers of perfect/imperfect hexamers and repeat coding for single amino acid/amino acid pair runs. Our data further suggests that genes containing simple sequence coding repeats may be under negative selection as they tend to be poorly conserved across species. The sequences of coding repeats of orthologous genes vary according to the known phylogeny among the species. In conclusion, the study shows that simple sequence coding repeats are important features of genome diversity among insects.  相似文献   

11.
Biased codon usage is common in eukaryotic and prokaryotic genes. Evidence from Escherichia, Saccharomyces, and Drosophila indicates that it favors translational efficiency and accuracy. However, to date no functional advantages have been identified in the codon–anticodon interactions involving the most frequently used (preferred) codons. Here we present evidence that forces not related to the individual codon–anticodon interaction may be involved in determining which synonymous codons are preferred or avoided. We show that the ``off-frame' trinucleotide motif preferences inferrable from Drosophila coding regions are often in the same direction as Drosophila's ``in-frame' codon preferences, i.e., its codon usage. The off-frame preferences were inferred from the nonrandomness of the location of confamilial synonymous codons along coding regions—a pattern often described as a context dependence of nucleotide choice at synonymous positions or as codon-pair bias. We relied on randomizations of the location of confamilial codons that do not alter, and cannot be influenced by, the encoded amino acid sequences, codon usage, or base composition of the genes examined. The statistically significant congruency of in-frame and off-frame trinucleotide preferences suggests that the same kind of reading-frame-independent force(s) may also influence synonymous codon choice. These forces may have produced biases in codon usage that then led to the evolution of the translational advantages of these motifs as preferred codons. Under this scenario, tRNA pool size differences between preferred and nonpreferred codons initially were evolved to track the default overrepresentation of codons with preferred motifs. The motif preference hypothesis can explain the structuring of codon preferences and the similarities in the codon usages of distantly related organisms. Received: 10 November 1998 / Accepted: 23 February 1999  相似文献   

12.
Synonymous codon usage of 53 protein coding genes in chloroplast genome of Coffea arabica was analyzed for the first time to find out the possible factors contributing codon bias. All preferred synonymous codons were found to use A/T ending codons as chloroplast genomes are rich in AT. No difference in preference for preferred codons was observed in any of the two strands, viz., leading and lagging strands. Complex correlations between total base compositions (A, T, G, C, GC) and silent base contents (A3, T3, G3, C3, GC3) revealed that compositional constraints played crucial role in shaping the codon usage pattern of C. arabica chloroplast genome. ENC Vs GC3 plot grouped majority of the analyzed genes on or just below the left side of the expected GC3 curve indicating the influence of base compositional constraints in regulating codon usage. But some of the genes lie distantly below the continuous curve confirmed the influence of some other factors on the codon usage across those genes. Influence of compositional constraints was further confirmed by correspondence analysis as axis 1 and 3 had significant correlations with silent base contents. Correlation of ENC with axis 1, 4 and CAI with 1, 2 prognosticated the minor influence of selection in nature but exact separation of highly and lowly expressed genes could not be seen. From the present study, we concluded that mutational pressure combined with weak selection influenced the pattern of synonymous codon usage across the genes in the chloroplast genomes of C. arabica.  相似文献   

13.
A very powerful method for detecting functional constraints operative in biological macromolecules is presented. This method entails performing a base permanence analysis of protein coding genes at each codon position simultaneously in different species. It calculates the degree of permanence of subregions of the gene by dividing it into segments,c codons long, counting how many sites remain unchanged in each segment among all species compared. By comparing the base permanence among several sequences with the expectations based on a stochastic evolutionary process, gene regions showing different degrees of conservation can be selected. This means that wherever the permanence deviates significantly from the expected value generated by the simulation, the corresponding regions are considered “constrained” or “hypervariable”. The constrained regions are of two types: α and β. The α regions result from constraints at the amino acid level, whereas the β regions are those probably involved in “control” processing. The method has been applied to mitochondrial genes coding for subunit 6 of the ATPase and subunit 1 of the cytochrome oxidase in four mammalian species: human, rat, mouse, and cow. In the two mitochondrial genes a few regions that are highly conserved in all codon positions have been identified. Among these regions a sequence, common to both genes, that is complementary to a strongly conserved region of 12S rRNA has been found. This method can also be of great help in studying molecular evolution mechanisms.  相似文献   

14.
15.
16.
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.  相似文献   

17.
I have analysed the coding regions of 96 eukaryotic genes for their use of iso-coding codons. Specific codons occur more frequently in specific positions in all members of some gene families than would be expected if codon choice was determined solely by the frequency of codon usage. In the absence of evidence a priori for selection for particular codons at particular positions, I term such co-occurring codons “coincident codons”. Coincident codons are not confined to particular regions of genes, and their occurrence is not detectably linked with the location of introns in the genomic sequence. Their presence is partly but not completely explained by the exchange of sequence between similar functional genes within a species: homologous genes from different organisms also possess the same codons at some sites with greater than expected frequencies. The relative excess of coincident codons correlates well with the overall length of the genes analysed, but not with the length of mRNA or coding regions, or with qualitative features of gene structure or expression. This, and the unusual sequence environment of coincident codons, suggests that they are a feature of the overall secondary structure of the heterogeneous nuclear RNA. Such considerations suggest approaches for optimizing the expression of exogenous genes in eukaryotic systems, and for predicting the structure of genes for which only partial sequence data is available.  相似文献   

18.
Oligonucleotide usage in archaeal and bacterial genomes can be linked to a number of properties, including codon usage (trinucleotides), DNA base-stacking energy (dinucleotides), and DNA structural conformation (di- to tetranucleotides). We wanted to assess the statistical information potential of different DNA ‘word-sizes’ and explore how oligonucleotide frequencies differ in coding and non-coding regions. In addition, we used oligonucleotide frequencies to investigate DNA composition and how DNA sequence patterns change within and between prokaryotic organisms. Among the results found was that prokaryotic chromosomes can be described by hexanucleotide frequencies, suggesting that prokaryotic DNA is predominantly short range correlated, i.e., information in prokaryotic genomes is encoded in short oligonucleotides. Oligonucleotide usage varied more within AT-rich and host-associated genomes than in GC-rich and free-living genomes, and this variation was mainly located in non-coding regions. Bias (selectional pressure) in tetranucleotide usage correlated with GC content, and coding regions were more biased than non-coding regions. Non-coding regions were also found to be approximately 5.5% more AT-rich than coding regions, on average, in the 402 chromosomes examined. Pronounced DNA compositional differences were found both within and between AT-rich and GC-rich genomes. GC-rich genomes were more similar and biased in terms of tetranucleotide usage in non-coding regions than AT-rich genomes. The differences found between AT-rich and GC-rich genomes may possibly be attributed to lifestyle, since tetranucleotide usage within host-associated bacteria was, on average, more dissimilar and less biased than free-living archaea and bacteria.  相似文献   

19.
The complete nucleotide sequence of complementary DNA coding for a variant surface glycoprotein (VSG 117) of Trypanosoma brucei has been determined and compared with amino acid sequence data for the mature protein. This has revealed several interesting and novel features about the synthesis and processing of VSG 117: (1) the primary translation product of the VSG 117 gene includes hydrophobic extensions at both the NH2 and COOH termini that are not found on mature VSG 117; (2) the glycosylated residue at the mature COOH terminus is aspartate, a residue that is not known to be glycosylated in any other system; (3) the nucleotide sequence shows an unusual dinucleotide frequency and codon usage for the gene.  相似文献   

20.
该研究以2株野生沙枣(Elaeagnus angustifolia Linn.)嫩枝经温室水培后的嫩叶为材料,采用CTAB法分别提取总DNA,并利用第二代测序技术进行总DNA从头测序,组装后得到2株沙枣叶绿体基因组全序列,并详细分析了其蛋白质编码基因密码子使用的偏好性及其原因,为沙枣叶绿体基因工程和分子系统进化等研究奠定基础。结果显示:(1)组装得到沙枣叶绿体基因组序列全长150 546 bp,由长度为81 113 bp的长单拷贝(LSC)区域和25 494 bp的短单拷贝(SSC)区域,以及1对分隔开它们的长18 445 bp的反向重复序列(IRS)组成;注释共得到132个基因,包括86个蛋白编码基因、38个tRNA基因和8个rRNA基因。(2)沙枣叶绿体基因组蛋白编码基因密码子的第三位碱基GC含量(GC_3)为28.47%,明显低于整个叶绿体基因组GC含量(37%),也低于第一位(GC_1)和第二位(GC_2)碱基的GC含量,说明密码子对AT碱基结尾有偏好性;其中, UCU、CCU、UGU、GCU、CUU、GAU、UCA和UAA为最优密码子。(3)同义密码子相对使用频率(RSCU)分析发现,影响密码子使用模式的因素并不单一,密码子的偏好性受到突变、选择及其他因素的共同影响,并且自然选择表达引起的序列差异比突变对密码子偏好性的影响要显著;中性绘图分析、有效密码子数(ENC-plot)分析和奇偶偏好性(PR2-plot)分析表明,沙枣叶绿体基因组使用密码子的偏性受选择的影响更大。(4)通过最大似然法、最大简约法和贝叶斯方法对胡颓子科6个物种和1个枣的叶绿体基因序列构建系统发育树,与它们使用密码子偏性聚类的结果一致,表明叶绿体基因组使用密码子偏性与物种的亲缘关系相关。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号