首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
【目的】识别原核生物全基因组中的16S rRNA基因。【方法】本文依据基因序列的GC碱基含量、碱基3-周期性和马尔可夫链3个方面的特性,构建了识别原核生物全基因组中16S rRNA基因的三层过滤模型。【结果】经检验,模型的特异性、敏感性和马修斯相关系数分别为99.58%、91.60%和91.49%。【结论】结果表明,本文所提出的方法可以高效、准确地识别出16S rRNA基因。  相似文献   

2.
应用生物信息学方法,对已完成测序的62种细菌基因组进行:1.同一密码子碱基位置上不同碱基分布频率的比较;2.不同密码子碱基位置上同一碱基分布频率的比较。结果显示:1.三个密码子碱基位置上及四种碱基的分布频率差异存在显著性;2.三个密码子碱基位置上的四种碱基的分布频率显著相关。结论提示,在细菌的进化过程中,任一密码子碱基位置上任一碱基的分布有可能受到所处密码子碱基位置及其他三种碱基分布的影响。  相似文献   

3.
Yeast基因组编码区特征参数的研究   总被引:1,自引:0,他引:1  
以碱基成分偏移量D值为基本参数定义参数d,以d为Yeast编码区的特征参数,对Yeast的第1、2、3类ORF(open reading frame)进行了统计,得到d的特征参数区间,并且,以此区间为标准为Yeast的6类ORF,以及5′帽、3′尾、内含子、组分随机序列等非编码序列进行了检验。结果表明,d作编码区的特征参数是可行的,它可以很好地区分编码序列和非编码序列。别外,又讨论了参数d与基因表达水平(用CAI值来衡量)的关系。发现,参数d与基因表达水平成很好的正相关关系。发现密码子的第1位点和第2位点的某些碱基分布与基因表达水平有关。  相似文献   

4.
基于归1000密码子使用频次,从垂直和水平两个方向研究了不同进化阶层宿主-病毒密码子使用的若干统计特征及其协同进化规律。结果表明,病毒密码子和氨基酸多样性总体上高于相应宿主;细菌-噬菌体、真菌-真菌病毒、无脊椎动物-无脊椎动物病毒的密码子使用频次匹配较好,细菌-噬菌体、无脊椎动物-无脊椎动物病毒的氨基酸丰度匹配较好;病毒基因AU含量、AU3s含量总体上高于相应宿主;病毒和宿主密码子第一位碱基含量均GA>CU;密码子第二位碱基含量均A>U>C>G;病毒与宿主编码区碱基的相对分子量和相对π电子共振能总体上存在较明显的跟随现象。病毒和宿主基因在密码子使用频次等多个统计特征方面既呈现协同进化趋势,又有较明显的分化。  相似文献   

5.
催乳素受体基因与羊驼繁殖性能关系的初探   总被引:4,自引:0,他引:4  
通过氯仿/异戊醇法制备羊驼血液基因组DNA,采用PCR方法首次扩增出羊驼催乳素受体基因(prolactin receptor gene,PRLR)exon8-exon9序列(GenBank登录号为DQ198164),该片段长度为622bp。通过NCBI blast(http://www.ncbi.nlm.nih.gov/BLAST/)比较,结果表明:该序列包括exon8的82bp、intron8全序列472bp和exon9的68bp。同源性比较发现,羊驼PRLR基因exon8和exon9核苷酸序列与其它哺乳动物的相应区域的同源性特高,均≥92%;同时还发现羊驼exon8引物后第19个碱基为G,而其它哺乳动物(猪除外)均为A,猪则是在羊驼exon8引物后的第34个碱基处由G变为A,通过推导氨基酸序列分析发现,这种单碱基的突变使得羊驼与其它哺乳动物相比,该处的氨基酸由亮氨酸取代了异亮氨酸;在羊驼exon9引物前第22个碱基处也发生了A-G碱基替换现象,但这个碱基的突变发生在密码子的第3个碱基上,编码的氨基酸均为脯氨酸。在这些动物中只有羊驼为单胎动物,羊驼exon8核苷酸序列中A-G的碱基替换并引起编码氨基酸序列发生改变是否与羊驼繁殖性能有关还有待进一步研究。  相似文献   

6.
为分析栽培大豆和野生大豆线粒体基因组的密码子使用特征差异,该文以其线粒体基因组编码序列为研究对象,比较其密码子偏性形成的影响因素和演化过程。结果表明:(1)栽培大豆和野生大豆线粒体基因组编码区的GC含量分别为44.56%和44.58%,说明栽培大豆和野生大豆线粒体编码基因均富含A/T碱基。(2)栽培大豆和野生大豆线粒体基因组密码子第1位、第2位GC含量平均值与第3位GC含量的相关性均呈极显著水平,说明突变在其密码子偏性形成中的作用不可忽略; PR2-plot分析显示,在同义密码子第3位碱基的使用频率上,嘌呤低于嘧啶; Nc-plot分析中Nc比值位于-0.1~0.2区间的基因数占总基因数的95%以上;突变和选择等多重因素共同作用影响了大豆线粒体基因组编码序列密码子使用偏性的形成。(3)有20、21个密码子分别被确定为栽培大豆和野生大豆线粒体基因组编码序列的最优密码子,其中除丝氨酸TCC密码子外均以A或T结尾。综上结果认为,栽培大豆线粒体密码子偏性的形成受选择的影响要高于野生大豆,这可能是栽培大豆由野生大豆经长期人工栽培驯化的结果。  相似文献   

7.
76种细菌DNA双链碱基使用频率的比较及其意义   总被引:1,自引:0,他引:1  
应用生物信息学方法,对已完成测序的76种细菌基因组进行比较,分析细菌基因组中编码区及密码子上碱基使用频率情况,结果显示:1.先导链与滞后链上在编码区的碱基使用频率无明显差异且显著正相关;2.先地链与滞后链在第一,第二,第三密码子碱基使用频率基本一致且显著正相关,结果表明,选择压力及自然突变对DNA双链总体碱基分布的影响相等。  相似文献   

8.
以7种古菌、46种细菌和10种真核生物的基因组为样本,考虑碱基间的短程关联和长程关联作用,得到编码序列的密码对和基因间序列的三联体对中不同位点的二核苷酸频率,据此构建了基于编码序列和基因间序列的系统发生关系。无论是基于编码序列还是基因间序列对信息进行聚类,古菌或真核均被聚在一支上,表明聚类参数的选择是合适的;与基于氨基酸序列构建的系统发生关系进行两两比较,发现大部分硬壁菌的编码序列与基因间序列之间,以及编码序列与氨基酸序列之间的进化都存在较大差异。通过分析认为,只有综合考虑这三类序列的进化信息,才可能得到更自然的系统发生关系。  相似文献   

9.
为确定澳洲坚果光壳种(Macadamia integrifolia Maiden&Betche)叶绿体基因组密码子偏好性形成的主要影响因素,本研究通过其叶绿体基因组的51条蛋白编码序列,系统分析其密码子的使用模式及其特征.密码子偏好性参数分析结果显示,叶绿体基因密码子3位碱基的GC含量次序为GC1>GC2>GC3;有效...  相似文献   

10.
陈凡国  夏光敏 《遗传》2005,27(6):941-947
通过基因组PCR克隆了小麦济南177的一个ω-醇溶蛋白基因同源序列(ω1236),该序列包括部分5′、3′ 侧翼序列和全部可能的编码序列,没有内含子,但在第87、117、125、160、198、313、357和365氨基酸残基处有终止密码子,所有8个终止密码的形成都是碱基转换的结果。比较分析发现,该序列与一个ω-醇溶蛋白基因序列(AB059812)有98%的同源性。推导的氨基酸序列表明该基因符合禾谷类醇溶蛋白的特点。系统进化分析表明,该序列与小麦ω醇溶蛋白在进化上亲缘关系较近,与α-,β-,γ-醇溶蛋白基因关系较远。ω1236推导的氨基酸序列编码了一个可能的47.2 kDa的蛋白质。转化大肠杆菌发现,在IPTG诱导后最初2 h,有小肽段产生,这说明在该基因序列中可能存在终止密码,这与测序结果是一致的。该研究为用PCR技术克隆ω醇溶蛋白基因和进一步研究ω醇溶蛋白基因的结构和功能积累了资料。  相似文献   

11.
12.
The frequencies of occurrence of four bases in the first, second and third codon positions and in the total coding sequences have been calculated by the codon usage table published in 1990 by Ikemura et al. The distribution of frequencies are further analysed in detail by a graphic technique presented recently by us. Formulas expressing the frequencies of four bases in the first and second codon positions in terms of frequencies of amino acids have been given. It is shown by the graphic analysis that for 90 species, in the first codon position the purine bases are dominant and in most cases G is the most dominant base. In the second codon position A is the most dominant base, while G is the least dominant base. In the third codon position the G + C content varies from 0.1 to 0.9, keeping the A + C content equal to 1/2 and G content equal to that of C, approximately. If the frequencies for bases A, C, G and U in the total coding sequences are denoted by a, c, g and u, respectively, it is found that the unequal formula: a2 + c2 + g2 + u2 less than 1/3, is valid for each of the 90 species including the human and E.coli etc.  相似文献   

13.
The amino acid sequences of some fiber proteins possibly have a periodic structure. This periodicity can be analyzed using the Fourier transform of the mathematical image of the symbol sequence of amino acid residues in proteins. One of several possible methods of Fourier transform has been chosen as optimal for the given study. This optimal Fourier transform has been used to analyze the periodic structures in several fiber proteins of bacteriophage T4. Amino acids from some groups form sequences of alternating elements with a relatively small period (T=15); those from other groups form sequences with other small periods (T=10 and T=8). Relatively large periods of amino acid arrangement, with the entire amino acid sequence of the protein being divided between them into four or six equal parts, is a new finding. The data on protein structural periodicity make it possible to align the amino acid sequences according to the periodic structures of both type. The results obtained agree with the results of previous crystallographic and electron microscopic studies.__________Translated from Molekulyarnaya Biologiya, Vol. 39, No. 2, 2005, pp. 321–329.Original Russian Text Copyright © 2005 by Simakova, Simakov.  相似文献   

14.
With the exponential growth of genomic sequences, there is an increasing demand to accurately identify protein coding regions (exons) from genomic sequences. Despite many progresses being made in the identification of protein coding regions by computational methods during the last two decades, the performances and efficiencies of the prediction methods still need to be improved. In addition, it is indispensable to develop different prediction methods since combining different methods may greatly improve the prediction accuracy. A new method to predict protein coding regions is developed in this paper based on the fact that most of exon sequences have a 3-base periodicity, while intron sequences do not have this unique feature. The method computes the 3-base periodicity and the background noise of the stepwise DNA segments of the target DNA sequences using nucleotide distributions in the three codon positions of the DNA sequences. Exon and intron sequences can be identified from trends of the ratio of the 3-base periodicity to the background noise in the DNA sequences. Case studies on genes from different organisms show that this method is an effective approach for exon prediction.  相似文献   

15.
Accurate identification of protein-coding regions (exons) in DNA sequences has been a challenging task in bioinformatics. Particularly the coding regions have a 3-base periodicity, which forms the basis of all exon identification methods. Many signal processing tools and techniques have been applied successfully for the identification task but still improvement in this direction is needed. In this paper, we have introduced a new promising model-independent time-frequency filtering technique based on S-transform for accurate identification of the coding regions. The S-transform is a powerful linear time-frequency representation useful for filtering in time-frequency domain. The potential of the proposed technique has been assessed through simulation study and the results obtained have been compared with the existing methods using standard datasets. The comparative study demonstrates that the proposed method outperforms its counterparts in identifying the coding regions.  相似文献   

16.
Base composition, codon usages and amino acid usages have been analyzed by taking 529 orthologous sequences of Aquifex aeolicus and Bacillus subtilis, having different optimal growth temperatures. These two bacteria do not have significant difference in overall GC composition, but GC(1+2) and GC3 levels were found to vary significantly. Significant increments in purine content and GC3 composition have been observed in the coding sequences of Aquifex aeolicus than its Bacillus subtilis counterparts. Correspondence analyses on codon and amino acid usages reveal that variation in base composition actually influences their codon and amino acid usages. Two selection pressures acting on the nucleotide level (GC3 and purine enrichment), causes variation in the amino acid usage differently in different protein secondary structures. Our results suggest that adaptation of amino acid usages in coil structure of Aquifex aeolicus proteins is under the control of both purine increment and GC3 composition, whereas the adaptation of the amino acids in the helical region of thermophilic bacteria is strongly influenced by the purine content. Evolutionary perspectives concerning the temperature adaptation of DNA and protein molecules of these two bacteria have been discussed on the basis of these results.  相似文献   

17.
Periodicity in DNA coding sequences: implications in gene evolution   总被引:2,自引:0,他引:2  
In this paper we have employed Fourier analysis of DNA coding and non-coding sequences in an attempt to identify possible patterns in gene sequences. It was found that while intronic sequences show a rather random pattern, coding sequences show periodicities and in particular a periodicity of 3. We were able to reconstruct such patterns by assuming a gene having one codon occurring in about 40% of the sequence. This could indicate that the predominant presence of codons all starting from the same base could confer the observed periodicities. Indeed, it was found that proteins do obey this rule. Implications of this finding in gene evolution are discussed.  相似文献   

18.
Sequences of amino acids of some fiber proteins may have a periodic structure. To analyze this periodicity Fourier transform of a mathematical image of symbolic sequence of amino acids in a protein is sometimes used. In this work we employed one (out of few possible) particular way of doing Fourier transform as the most straightforward and optimal. Employing this optimal Fourier transform method we analyzed periodicity of fiber proteins in bacteriophage T4. As a result we managed to confirm that a certain periodicity exists in the investigated proteins. It was found that for a number of proteins the alternation of elements of the same group in the amino acid sequence with a rather small period T = 15 exists, whereas for some other proteins alternations have small periods 10 and 8. The new result is a discovery of relatively large periods of amino acids alternations, which divide the amino acids sequence of the protein into 4 or 6 equal parts. These data on the amino acids periodicity allowed us to align amino acids sequences in accordance with the established periods of both types, in agreement with certain results obtained in X-ray crystallography and electron microscopy experiments.  相似文献   

19.
20.
A simple model is put forward to explain the long-known three-base periodicity in coding DNA. We propose the concept of same-phase triplet clustering, i.e. a condition wherein a triplet appears several times in one phase without interruption by the two other possible phases. For instance, in the sequence (i): NTT_GNN_NTT_GNN_NTT_GNN_NNN_NTT_GNN (where N is any nucleotide but combinations producing TTG are excluded) there would be clustering of same-phase TTG because this triplet appears uninterruptedly in phase 2. In contrast, in the sequence (ii): TTG_NTT_GNN_NNT_TGN_NNN_NTT_GNN there is no same-phase clustering because neighboring TTGs are all in different phases. Observe also that in sequence (i) TTG triplets are separated by 3, 3 and 6 nucleotides (3n distances), while in sequence (ii) they are separated by 1, 4 and 5 nucleotides (non-3n distances). In this work, we demonstrate that in coding DNA the 3n distances generated by (i)-type sequences proportionally outnumber the non-3n distances generated by (ii)-type sequences, this condition would be the basis of three-base periodicity. Randomized sequences had (i)- and (ii)-type sequences too but clustering was statistically different. To prove our model we generated (i)-type sequences in a randomized sequence by inducing clustering of same-phase triplets. In agreement with the model this sequence displayed three-base periodicity. Furthermore, two- and four-base periodicities could also be induced by artificially inducing clustering of duplets and tetraplets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号