首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 906 毫秒
1.
在基因表达调控中,长度在200~500bp之间的短CpG岛具有非常重要的作用,然而目前并没有一种非常好的方法寻找短CpG岛。基于给定长度DNA片段上碱基随机分布的排列组合算法,我们定义了一种计算CpG观察预期比的新方法。结合DNA片段长度和GC含量这两个参数,该方法给出了人类21号和22号染色体上CpG岛分布的预测结果。根据CpG岛与基因功能区、Alu重复序列和UCSC的CpG岛对比分析,本研究给出了新的CpG岛判断准则:(1)CpG岛不小于200bp;(2)GC占比不小于50%;(3)CpG观察预期比不小于1.4。通过与Takai方法的对比分析显示,新方法能够显著地排除Alu重复序列对CpG岛预测的影响,并且能够准确预测具有更短长度的CpG岛在DNA片段上的分布。多基因转录起始位点基因分析结果表明,短CpG岛是UCSC的CpG岛的核心组成部分,短CpG岛是参与基因表达调控的核心元件。本研究为预测和分析短CpG岛在人类基因调控中的作用提供了必要的手段。  相似文献   

2.
目的:探讨人Toll样受体9(TLR9)基因启动子区序列特征。方法:利用生物信息学技术预测人TLR9基因启动子区域、转录因子结合位点和CpG岛分布。结果:人TLR9基因启动子区有1434个转录因子结合位点,人和小鼠保守区域内存在23个共同的转录因子结合位点,人TLR9基因启动子区包含长572 bp的CpG岛。结论:人TLR9基因启动子区相关生物信息学的研究,提高了针对启动子的研究效率,并为预测基因启动子的功能提供了重要信息。  相似文献   

3.
在人与小鼠中,A SCL2基因是一个母源表达的印记基因,在早期胚胎和胎盘发育中起重要作用。牛A SCL2基因的印记状态和印记的分子机理还没有被研究。本研究采用生物信息学方法对牛A SCL2基因分子进化、启动子和CpG岛区域以及蛋白的高级结构进行分析和预测,为进一步揭示该基因生物学功能和其分子调控机理奠定基础。对21种哺乳动物A SCL2基因的mRNA序列进化分析表明:这21种哺乳动物间的遗传距离小于0.536,且牛与猪遗传距离最小,为0.106,与基因进化树分析结果一致。CpG岛在线软件预测显示,在牛中,该基因上游5 k序列中有三个CpG岛。启动子在线软件预测和转录因子分析相结合显示,启动子最可能位于该基因5'端上游4725~4775 bp处CpG岛区域内,此区域包括大量潜在转录因子结合位点,并在4734 bp处存在一个TATA框。蛋白质在线软件分析表明,A SCL2基因编码一种螺旋-环-螺旋形转录因子,有α-螺旋、β-转角和无规则卷曲3种二级结构。  相似文献   

4.
目的:探讨人Daintain 基因5''调控区的序列特征。方法:利用在线软件BLAST、Neural Network Promoter Prediction、Promoter 2.0、Promoter SCAN、EMBOSS、CpG Island Searcher 和TF SEARCH预测人Daintain 基因启动子区域、CpG 岛分布和转录因子结 合位点。结果:人Daintain 基因5''调控区存在1 个CAAT盒。Daintain 基因可能存在6 个启动子位点,CpG岛可能位于216 bp 区 间( 23 ~ 238 bp)。评分85 分以上时,该序列存在251 个可能的转录因子结合位点;评分90 分以上时,该序列存在70 个可能的转 录因子结合位点;评分95 分以上时,该序列存在16个可能的转录因子结合位点;评分100 分以上时,该序列存在7 个可能的转 录因子结合位点;这些结合的转录因子基本是与免疫细胞增殖或性别发生有关。结论:人Daintain 基因5''调控区的生物信息学研 究表明其转录受甲基化和多种转录因子的调控,为研究Daintain 基因启动子的功能提供理论基础。  相似文献   

5.
6.
目的:分析富含丝氨酸和精氨酸的剪接因子2(SRSF2)基因序列和表达产物的特征。方法:运用生物信息学相关软件分析和预测人类和小鼠SRSF2基因的同源区段、开放读框、启动子区域、转录因子结合位点、CpG岛分布情况,分析预测小鼠SRSF2基因蛋白产物的功能结构域以及与其他蛋白的相互作用。结果:人类和小鼠SRSF2基因共有3个同源区段、19个开放读框、4个相同的转录因子结合位点,2个基因的CpG岛各项参数基本一致;小鼠SRSF2蛋白会与至少10种其他蛋白因子发生相互作用。结论:SRSF2基因及其蛋白产物的生物信息学分析,为相关研究提供了重要的信息基础。  相似文献   

7.
双向启动子(bidirectional promoter)是指位于两个相邻且转录方向相反的基因之间 的一段DNA序列.双向启动子的双向转录机制可能是两个RNA聚合酶同时聚集在无核小 体区的边界,然后在两个方向上起始转录.双向启动子在真核生物基因组中广泛分布 ,大多数的双向启动子缺少TATA盒,而具有较高的GC含量和丰富的CpG岛.本文概述了 双向启动子双向起始转录的最新研究,并对其在双向转录基因对共表达和稳定性表达 调控中的作用及其应用做了详细阐述.  相似文献   

8.
转录起始位点的计算定位是基因转录调控研究的重要内容,但现有方法的识别性能较低。文章作者在已有原核启动子识别算法的基础上,提出了一种基于滑动窗口的原核转录起始位点计算定位方法,通过在合理限定的定位范围内对序列进行滑动扫描,来预测转录起始位点的位置。首先根据窗口序列的交迭组分特征和启动子其它特征分别建立二次判别分类器,用其计算对应位置的似然得分,再利用转录起始位点与翻译起始位点的间隔经验分布信息对似然得分进行修正,最后依照似然得分的分布情况由阈值定位算法确定预测位置。对大肠杆菌真实序列数据的测试结果表明,该定位算法可实现对真实转录起始位点位置的有效预测,与已有算法相比,当敏感性指标同为0.85左右时,特异性指标可从0.20提高至0.65,从而使得定位准确率提高了约20个百分点。  相似文献   

9.
pi-hit-1基因是本实验室通过空间诱变找到的一个水稻新基因。为了对pi-hit-1基因启动子结构和功能进行研究,首先使用植物启动子分析数据库(PlantProm DB-TSSP,TFSEARCH,PLACE及PlantCARE)对该基因转录调控区序列进行预测分析,结果显示该基因上游调控区存在多个顺式元件,主要集中在翻译起始位点前300bp的区域,转录起始位点位于翻译起始位点前100bp,在转录起始位点前132bp存在TATA box元件。凝胶电泳迁移率实验(EMSA)发现翻译起始位点上游约300bp存在转录因子特异结合位点,为该基因的核心启动子,这与预测结果一致。采用系统生物学的方法研究水稻新基因pi-hit-1启动子结构,发现了该基因的核心启动子元件,为研究空间环境如何影响基因的转录调控提供了重要依据。  相似文献   

10.
启动子区域的CpG岛的异常甲基化是识别癌症的重要标志之一。目前已经建立的一些CpG岛的预测思想和方法都有自身的缺点,基于模糊理论的预测思想从贴进度的角度来判定CpG岛,能更容易地找到被以往方法所忽略的具有更多生物学意义的CpG岛。本研究通过构建属于CpG岛集合的隶属函数,计算候选序列的隶属度,找出所有的可接受隶属程度的CpG岛。将该方法应用于UCSC数据库中的一段序列(hg18.chr1.31618510.31623510.-1000)进行预测,发现提高了预测的精确度。可见应用模糊理论预测CpG岛具有一定的可行性,利用选取不同的截集,可以得到更为精确的CpG岛。  相似文献   

11.
MOTIVATION: Translation initiation sites (TISs) of genes are the key points of protein synthesis. Exact recognition of TISs in eukaryotic genes is one of the most important tasks in gene-finding algorithms. However, the task has not been satisfactorily fulfilled up to the present. Here, we propose a cooperatively scanning model for recognizing TISs and the first exons of eukaryotic genes on the basis of the structural characteristics of multi-exon genes. RESULTS: The model was employed to cooperatively scan the TISs and 3' splicing sites in eukaryotic genes, and the TISs and the first exons of 132 mammalian gene sequences are identified to evaluate the model. Accuracy of exactly recognizing the TISs and the first exons has been found to amount respectively to 64.4 and 51.5%. We believe that the model will be a useful tool for genome annotation and that it can be easily incorporated into other algorithms to achieve higher accuracy in recognizing TISs and the first exons. AVAILABILITY: The program is available upon request.  相似文献   

12.
With the rapid increase of DNA databases of human and other eukaryotic model organisms, a large great number of genes need to be distinguished from the DNA databases. Exact recognition of translation initiation sites (TISs) of eukaryotic genes is very important to understand the translation initiation process, predict the detailed structure of eukaryotic genes, and annotate uncharacterized sequences. The problem has not been solved satisfactorily, especially for recognizing TISs of the eukaryotic genes with shorter first exons. It is an important task for extracting new features and finding new powerful algorithms for recognizing TISs of eukaryotic genes. In this paper, the important characteristics of shorter flanking fragments around TISs are extracted and an expectation-maximization (EM) algorithm based on incomplete data is used to recognize TISs of eukaryotic genes. The accuracy is up to 87.8% over a six-fold cross-validation test. The result shows that the identification variables are effectively extracted and the EM algorithm is a powerful tool to predict the TISs of eukaryotic genes. The algorithm also can be applied to other classification or clustering tasks in bioinformatics.  相似文献   

13.
CpG islands, genes and isochores in the genomes of vertebrates   总被引:6,自引:0,他引:6  
B A?ssani  G Bernardi 《Gene》1991,106(2):185-195
We have shown that human genes associated with CpG islands increase in number as they increase in % of guanine + cytosine (GC) levels, and that most genes associated with CpG islands are located in the GC-richest compartment of the human genome. This is an independent confirmation of the concentration gradient of CpG islands (detected as HpaII tiny fragments, or HTF) which was demonstrated in the genome of warm-blooded vertebrates [A?ssani and Bernardi, Gene 106 (1991) 173-183]. We then reassessed the location of CpG islands using the data currently available and confirmed that CpG islands are most frequently located in the 5'-flanking sequences of genes and that they overlap genes to variable extents. We have shown that such extents increase with the increasing GC levels of genes, the GC-richest genes being completely included in CpG islands. Under such circumstances, we have investigated the properties of the 'extragenic' CpG islands located in the 5'-flanking segments of homologous genes from both warm- and cold-blooded vertebrates. We have confirmed that, in cold-blooded vertebrates, CpG islands are often absent; when present, they have lower GC and CpG levels; the latter attain, however, statistically expected values. Finally, we have shown that CpG doublets increase with the increasing GC of exons, introns and intergenic sequences (including 'extragenic' CpG islands) in the genomes from both warm- and cold-blooded vertebrates. The correlations found are the same for both classes of vertebrates, and are similar for exons, introns and intergenic sequences (including 'extragenic' CpG islands). The findings just outlined indicate that the origin and evolution of CpG islands in the vertebrate genome are associated with compositional transitions (GC increases) in genes and isochores.  相似文献   

14.
We screened plant genome sequences, primarily from rice and Arabidopsis thaliana, for CpG islands, and identified DNA segments rich in CpG dinucleotides within these sequences. These CpG-rich clusters appeared in the analysed sequences as discrete peaks and occurred at the frequencies of one per 4.7 kb in rice and one per 4.0 kb in A. thaliana. In rice and A. thaliana, most of the CpG-rich clusters were associated with genes, which suggests that these clusters are useful landmarks in genome sequences for identifying genes in plants with small genomes. In contrast, in plants with larger genomes, only a few of the clusters were associated with genes. These plant CpG-rich clusters satisfied the criteria used for identifying human CpG islands, which suggests that these CpG clusters may be regarded as plant CpG islands. The position of each island relative to the 5'-end of its associated gene varied considerably. Genes in the analysed sequences were grouped into five classes according to the position of the CpG islands within their associated genes. A large proportion of the genes belonged to one of two classes, in which a CpG island occurred near the 5'-end of the gene or covered the whole gene region. The position of a plant CpG island within its associated gene appeared to be related to the extent of tissue-specific expression of the gene; the CpG islands of most of the widely expressed rice genes occurred near the 5'-end of the genes.  相似文献   

15.
A common practice in computational genomic analysis is to use a set of ‘background’ sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such ‘background’ sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by ‘shuffling’ real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.  相似文献   

16.
Given the genomic abundance and susceptibility to DNA methylation, interspersed repetitive sequences in the human genome can be exploited as valuable resources in genome-wide methylation studies. To learn about the relationships between DNA methylation and repeat sequences, we performed a global measurement of CpG dinucleotide frequencies for interspersed repetitive sequences and inferred germline methylation patterns in the human genome. Although extensive CpG depletion was observed for most repeat sequences, those in the proximity to CpG islands have been relatively removed from germline methylation being the potential source of germline activation. We also investigated the CpG depletion patterns of Alu pairs to see whether they might play an active role in germline methylation. Two kinds of Alu pairs, direct or inverted pairs classified according to the orientation, showed contrast CpG depletion patterns with respect to separating distance of Alus, i.e., as two Alu elements are more closely spaced in a pair, a higher extent of CpG depletion was observed in inverted orientation and vice versa for directly repetitive Alu pairs. This suggests that specific organization of repetitive sequences, such as inverted Alu pairs, might play a role in triggering DNA methylation consistent with a homology-dependent methylation hypothesis.  相似文献   

17.
18.
Isolation of CpG islands from large genomic clones   总被引:4,自引:0,他引:4  
  相似文献   

19.
Tandem repeats in the CpG islands of imprinted genes   总被引:4,自引:0,他引:4  
Hutter B  Helms V  Paulsen M 《Genomics》2006,88(3):323-332
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号