首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
大肠杆菌与酵母菌基因特定序列信息参量的研究   总被引:7,自引:1,他引:7  
提出核酸序列的矩阵表示形式,按位点定义了有生物学意义的信息参数M1(1)、M2(l)和M3(l)。着重研究了不同表达水平的大肠杆菌(Escherichia coli,E.coli)的SD序列(Shine-Dalgrno region,SD)以及大肠杆菌(E.coli)和酵母菌(Yeast)基因起始、终止密码子邻近区域核酸序列的碱基关联性与保守性。并求出相应矩阵的本征值,给出了信息参量与基因表达水平的关系。发现信息参量体现了原核生物和真核生物释放起始区域的显著差异,而且真核生物碱基起始区域的单碱基保守性程度及碱基关联性程度要比原核生物强。  相似文献   

2.
研究了Escherichiacoli(115个基因)和SacharomycesYeast(97个基因)核酸序列的密码子使用频率与基因表达水平的关系.将同义密码子按使用频率统计值分成三种特性的密码子:最适密码子(H)、非最适密码子(L)和稀有密码子(R),对每一基因序列的编码区,算出它们各自出现的概率P(H),P(L)和P(R).以P(H)和P(R)为指标,用图论法聚类,发现每种生物的高低表达基因明显分开,基因表达水平被分为四级:甚高表达基因(VH)、高表达基因(H)、较低表达基因(LM)和低表达基因(LL).每类基因的表达水平与实验结果保持了很好的相关性,与E.coli和Yeast的现有资料相比,符合很好.  相似文献   

3.
真鲷肿瘤坏死因子α((TNFα)cDNA的克隆与表达   总被引:2,自引:0,他引:2  
采用同源克隆和末端快速扩增 (RACE)的方法 ,从真鲷中克隆到 12 6 4bp的TNFα全cDNA编码序列 (Gen Bank登录号为AY314 0 10 )。该序列包括 6 6 6bp可读框 (ORF) ,85bp的 5′末端非编码区 (UTR)以及 5 14bp的 3′末端非编码区。序列的 3′UTR含 5个mRNA不稳定模体 (motif)和 3个内毒素应答序列 ,但是未发现基因的PolyA加尾信号。由该序列推导的多肽含有明确的跨膜域 ,TNF家族的标签序列 (signature) ,TNF2家族分布 (profile)文件 ,粘多糖结合位点和细胞粘附位点。进化分析显示 ,真鲷TNF与哺乳类TNFα和TNFβ具有较高的相似性 ,源于共同的祖先。但结构及表达分析表明 ,它是TNFα而不是TNFβ。表达研究显示 ,真鲷TNFα存在组成型和诱导型两种表达形式 ,表现在刺激与非刺激的真鲷中 ,TNFα均可在部分组织中表达 ,但是在受刺激鱼体中基因表达的组织分布显著增多。  相似文献   

4.
牦牛α-乳清蛋白基因的克隆与序列分析   总被引:13,自引:0,他引:13  
根据奶牛α-乳清蛋白基因序列设计引物,用PCR方法扩增并克隆了牦牛(Poephagens grunnieus)α-乳清蛋白基因的全序列。结果表明,在671-2689bp之间,共有4个外显子和3个内含子,牦牛α-乳清蛋白基因共编码142个氨基酸,其中第1-19氨基酸之间的短肽为信号肽序列。牦牛的α-乳清蛋白基因有较高水平的表达可能与基因内非编码序列碱基突变引起的回文结构消失有关。该基因5′侧翼序列在结构上牦牛和牛基本相同,只有MGF因子识别位点稍有差别。且牦牛的该序列更符合Groenen等1994年总结的该因子识别位点的模式序列,因此牦牛的该基因5′调控区可能更适于进行组织特异性表达的转基因动物的制作研究。  相似文献   

5.
利用5′/3′RACE PCR技术,从桃(Prunus persica (L.) Batsch)果实中克隆了植物乙烯生物合成的关键酶--ACC合酶的全长cDNA pacs,对pacs基因进行全序列测定表明,该基因全长1 848个碱基,编码区为1 449个碱基,5′端有177个碱基的非编码区序列,3′端有219个碱基的非编码区序列(不包括终止密码子TAA).pacs基因编码区共编码483个氨基酸,蛋白质大小为54 kD,等电点为6.43.pacs与番茄(S19677)、梅(AB031026)、番木瓜(U68216)、苹果(AB034993)等其他植物ACC合酶cDNA氨基酸序列同源性分别为65%、70%、75%、90%,并存在与这些ACC合酶氨基酸的活性位点保守序列SLSKDMGFPGFR.RT-PCR结合杂交分析表明,pacs和我们以前克隆的桃ACC合酶cDNA pacs12(AF467782)在叶片和花中基因表达模式基本一致,伤处理和IAA均能诱导叶片pacs 和pacs12基因的表达,但pacs在伤处理叶片的表达水平比pacs12高;pacs 和pacs12基因在果实表达有所不同,pacs在绿熟和成熟果实中均有表达,而pacs12在绿熟果实中基本检测不到,在成熟果实中才有表达,两者在果实中的表达水平比伤处理和IAA处理叶片和花中要低.  相似文献   

6.
利用重新定义的碱基片段的相对模式含量,研究了大肠杆菌和酵母基因序列的三碱基偏好模。发现在前导区,起始密码子模式的相对模式含量明显低于其它区,大肠杆菌基因拖尾区和编码终止区中,“SSS”型模式的相对模式含量平均较高。并研究了基因前导区和拖尾区偏好模或禁用模出现频数与基因表达水平的关系,发现基因前导区偏好模使用频数的多少与基因表达水平成正相关关系,而禁用模使用频数的多少与基因表达水平则成反相关关系;在基因拖尾区,偏好模或禁用模使用频数的多少与基因表达水平的关系对于两种生物是不同的。  相似文献   

7.
为提高非翻译区剪接位点识别的精度,提出一种统计概率与支持向量机相结合的识别方法 .该方法主要分为两个阶段,第一阶段应用统计学方法对非翻译区(UTR)序列进行描述,将序列中各碱基之间的相关性、位置特异性、保守性等特征用概率形式描述,以概率参数作为第二阶段支持向量机的输入向量,第二阶段应用带有多项式核函数的支持向量机(SVM)对剪接位点进行识别.通过对人类5′UTR剪接位点数据集进行测试,结果表明:该方法对非翻译区剪接位点的识别取得了很好的效果.  相似文献   

8.
人类蛋白编码基因局部GC水平相关性分析   总被引:2,自引:0,他引:2  
陈祥贵  胡军  杨潇 《遗传》2008,30(9):1169-1174
GC含量是基因组DNA序列碱基组成的重要特征, 蕴涵基因结构、功能和进化信息。文中通过从公共数据库提取7 992个非冗余的人类蛋白质编码基因DNA序列, 分析了基因序列不同区域的局部GC含量和相关性。结果表明: 基因局部GC含量呈现不均一性, 5′非翻译区GC水平最高, 为62.56%; 而3′非翻译区GC水平最低, 为43.97%。3′侧翼序列的GC含量能较好地代表基因所在区域DNA长片段的GC水平。虽然开放阅读框的GC含量比内含子、3′非翻译区和3′侧翼序列的GC含量高, 但4个区域的GC含量之间均存在较高的相关性。密码子第三位置的平均GC含量(GC3)为58.09%, 显著高于密码子第一位置和第二位置的GC含量, 且与开放阅读框的GC水平高度相关, 相关系数高达0.91。GC3与内含子、3′非翻译区、3′侧翼序列的GC水平相关性也较高, GC3对3′侧翼序列的GC含量的直线回归斜率为1.25。因此, GC3可作为基因所在区域GC水平变化的敏感性指标。而密码子第一位置和第二位置以及5′侧翼序列和5′非翻译区GC水平与基因其他区域的GC水平的相关性较弱。该研究结果提示: 基因蛋白编码区密码子第三位置、内含子、3′非翻译区和3′侧翼序列的碱基可能经历了相近的进化过程, 而蛋白编码区密码子第一位置和第二位置、5′侧翼序列和5′非翻译区由于功能的需要而经历了不同的突变和选择。  相似文献   

9.
大肠杆菌启动子特征参数的统计分析   总被引:1,自引:0,他引:1  
林昊 《生物信息学》2009,7(1):37-39,43
首先统计了683条大肠杆菌sigrna70启动子序列的每个位点单碱基频率,并计算了每个位点单碱基体现保守性的M1(1)值和相应涨落限,从而获得多个大于涨落限的保守位点。其次,对大肠杆菌的转录起始位点到翻译起始位点的距离进行了统计,发现这个距离的范围是0-1000bp。大肠杆菌启动子还分布于一些特定的基因间和编码区,分别是的DIV基因间,55%的TAN基因间和6%的编码区。这些启动子的特征是启动子辨识的重要参数。  相似文献   

10.
李昕  陈宏  王文 《动物学研究》2005,26(3):225-229
非编码区序列在基因表达调控中起着重要作用,但其在进化过程中是否受到选择作用一直较难检测。最近有一些研究使用平均的核苷酸替换速率与中性序列的核苷酸替换速率的比值(ω)作为检测非编码区总体受选择作用的指标;但是对于非编码区而言,了解具体哪些核苷酸受到选择作用更具有意义。我们借鉴Nielsen&Yang(1998)检测单个氨基酸位点是否受选择作用的思路,在最大似然法的模型下,提出一种在核苷酸位点水平上对自然选择作用检测的方法。本方法能够检测在进化过程中对功能分化有重要贡献的核苷酸位点,包括编码和非编码区。将此方法应用于熟知的受到正选择作用的蛋白编码基因序列(HIV-1包装蛋白基因编码区),均能够检测到那些已知的受到正选择的核苷酸(密码子)位点,说明此方法可以有效地在核苷酸位点水平检测选择作用;又将此方法应用于非编码区(CTGF基因5′UTR),也得到了良好的结果。  相似文献   

11.
This study examines the relationship between DNA sequence variation and level of gene expression in four metallothionein genes from wild rice Oryza rufipogon. The nucleotide diversity was 0.0028 to 0.0117 over the entire coding and non-coding region, and it was negatively correlated with gene expression for three type 2 metallothionein genes. In contrast, codon bias and percent of preferred codons correlated positively with gene expression. These results indicate that the intensity of natural selection depends on the level of gene expression, which in turn shapes the level of nucleotide polymorphism. In addition, significant linkage disequilibria were frequent between the metallothionein genes, although significance was not confirmed after multiple test correction. This result suggests that metallothionein genes expressed at different levels are epistatic with respect to fitness, and that gene expression is an important factor determining level of DNA polymorphism.  相似文献   

12.
The context-dependent expression of genes is the core for biological activities, and significant attention has been given to identification of various factors contributing to gene expression at genomic scale. However, so far this type of analysis has been focused either on relation between mRNA expression and non-coding sequence features such as upstream regulatory motifs or on correlation between mRNA abundance and non-random features in coding sequences (e.g., codon usage and amino acid usage). In this study multiple regression analyses of the mRNA abundance and all sequence information in Desulfovibrio vulgaris were performed, with the goal to investigate how much coding and non-coding sequence features contribute to the variations in mRNA expression, and in what manner they act together. Using the AlignACE program, 442 over-represented motifs were identified from the upstream 100bp region of 293 genes located in the known regulons. Regression of mRNA expression data against the measures of coding and non-coding sequence features indicated that 54.1% of the variations in mRNA abundance can be explained by the presence of upstream motifs, while coding sequences alone contribute to 29.7% of the variations in mRNA abundance. Interestingly, most of contribution from coding sequences is overlapping with that from upstream motifs; thereby a total of 60.3% of the variations in mRNA abundance can be explained when coding and non-coding information was included. This result demonstrates that upstream regulatory motifs and coding sequence information contribute to the overall mRNA expression in a combinatorial rather than an additive manner.  相似文献   

13.
ORF organization and gene recognition in the yeast genome   总被引:3,自引:0,他引:3  
Some rules on gene recognition and ORF organization in the Saccharomyces cerevisiae genome are demonstrated by statistical analyses of sequence data. This study includes: (a) The random frame rule-that the six reading frames W1, W2, W3, C1, C2 and C3 in the double-stranded genome are randomly occupied by ORFs (related phenomena on ORF overlapping are also discussed). (b) The inhomogeneity rule-coding and non-coding ORFs differ in inhomogeneity of base composition in the three codon positions. By use of the inhomogeneity index (IHI), one can make a distinction between coding (IHI > 14) and non-coding (IHI 相似文献   

14.
A method for measuring the non-random bias of a codon usage table   总被引:7,自引:3,他引:4       下载免费PDF全文
We describe a new statistical method for measuring bias in the codon usage table of a gene. The test is based on the multinomial and Poisson distributions. The method is used to scan DNA sequences and measure the strength of codon preference. For E. Coli we show that the strength of codon preference is related to levels of gene expression. The method can also be used to compare base triplet frequencies with those expected from the base composition. This second type of codon bias test is useful for distinguishing coding from non-coding regions.  相似文献   

15.
转座因子对水稻同义密码子使用偏性的影响   总被引:1,自引:0,他引:1  
利用635个包含完整转座因子插入的粳稻CDS序列,对转座因子如何影响基因编码区的碱基组成及基因的表达水平,进而对基因同义密码子的使用偏性产生影响进行了详细分析。结果表明:转座因子插入极显著地影响到基因编码区的同义密码子使用但并非唯一因素;转座因子对不同基因的表达水平具有多重影响,有的基因表达被抑制,有的反而增强,但总的来说它减少了基因表达水平对同义密码子使用的影响程度。  相似文献   

16.
The nucleotide sequence of a cDNA clone (pML10) for chicken cardiac myosin light chain is described. The cDNA insert contains 613 nucleotides representing the entire coding sequence, with the exception of nine NH2-terminal amino acids, and the full 3'-non-coding region of 146 nucleotides. The missing 5' terminus of the mRNA, not represented in the clone pML10, was obtained by extension of the cDNA using a 43 nucleotide long internal EcoR1 fragment as a primer. The non-coding region contains several direct and inverted repeated sequences and the polyadenylation signal sequence AATAAA. The coding portion exhibits non-random usage of synonymous codons with a strong bias for codons ending in G and C.  相似文献   

17.
周雪平  刘勇 《病毒学报》1997,13(3):240-246
根据烟草花叶病毒U1株系序列,人工合成引物,用RT法合成了cDNA后,通过PCR技术扩增并克隆了烟草花叶病毒蚕豆株系的外壳蛋白的基因和3‘端非编码区。DNA序列测定结果表明,外壳蛋白基因全长480个碱基,编码158个氨基酸,3’端非编码区全长204个碱基,与TMV-U1株系的同源率为100%。  相似文献   

18.
Sauvage C  Bierne N  Lapègue S  Boudry P 《Gene》2007,406(1-2):13-22
DNA sequence polymorphism and codon usage bias were investigated in a set of 41 nuclear loci in the Pacific oyster Crassostrea gigas. Our results revealed a very high level of DNA polymorphism in oysters, in the order of magnitude of the highest levels reported in animals to date. A total of 290 single nucleotide polymorphisms (SNPs) were detected, 76 of which being localised in exons and 214 in non-coding regions. Average density of SNPs was estimated to be one SNP every 60 bp in coding regions and one every 40 bp in non-coding regions. Non-synonymous substitutions contributed substantially to the polymorphism observed in coding regions. The non-synonymous to silent diversity ratio was 0.16 on average, which is fairly higher to the ratio reported in other invertebrate species recognised to display large population sizes. Therefore, purifying selection does not appear to be as strong as it could have been expected for a species with a large effective population size. The level of non-synonymous diversity varied greatly from one gene to another, in accordance with varying selective constraints. We examined codon usage bias and its relationship with DNA polymorphism. The table of optimal codons was deduced from the analysis of an EST dataset, using EST counts as a rough assessment of gene expression. As recently observed in some other taxa, we found a strong and significant negative relationship between codon bias and non-synonymous diversity suggesting correlated selective constraints on synonymous and non-synonymous substitutions. Codon bias as measured by the frequency of optimal codons for expression might therefore provide a useful indicator of the level of constraint upon proteins in the oyster genome.  相似文献   

19.
The nucleotide sequence of the gene which encodes the major outer-shell glycoprotein of UK bovine rotavirus has been determined. The dsRNA genome segment encoding this protein was converted into ds cDNA and cloned into pBR322 for sequence studies. The gene is 1062 base pairs in length and contains a single, long, open reading-frame capable of coding for a protein of 326 amino-acids. This would leave 5' and 3' non-coding regions of 48 and 36 nucleotides in the mRNA. The predicted amino-acid sequence contains three possible glycosylation sites of the type Asn-X-Ser Thr, and an extremely hydrophobic N-terminal region. This sequence is discussed in the light of the known properties and functions of the protein.  相似文献   

20.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号