首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 265 毫秒
1.
基于氨基酸特征序列对人类Rh血型系统的蛋白质结构分析   总被引:1,自引:0,他引:1  
高雷  朱平 《生物信息学》2009,7(4):248-251
利用代数学中同态思想和物理中的“粗粒化”思想,以及HP模型,根据a,t,c,g的化学结构分类,提出了DNA序列的特征序列概念(σ-,τ-,σ∩τ-)并推广到蛋白质序列中,从而给出一种数值刻划,将蛋白质序列简化成一个(0,1)序列,基于上述给出特征序列的方法,根据氨基酸分子量与简并度的关系,提出了另外一种DNA序列的特征序列概念(-)并推广到蛋白质序列中,进而给出了另外一种数值刻划,将蛋白质序列简化成一个(0,1,2)序列,通过比较RHD基因和RHCE基因的特征序列的数值刻划图,得出RHD基因和RHCE基因均偏爱使用低分子量且高简并度的氨基酸。  相似文献   

2.
基于CGR的DNA序列的时间序列模型(英文)   总被引:1,自引:0,他引:1  
高洁  蒋丽丽  徐振源 《生物信息学》2010,8(2):156-160,164
利用DNA序列的混沌游戏表示(chaos game representation,CGR),提出了将2维DNA图谱转化成相应的类谱格式的方法。该方法不仅提供了一个较好的视觉表示,而且可将DNA序列转化成一个时间序列。利用CGR坐标将DNA序列转化成CGR弧度序列,并引入长记忆ARFIMA(p,d,q)模型去拟合此类序列,发现此类序列中有显著的长相关性且拟合度很好。  相似文献   

3.
刘娟  高洁 《生物信息学》2011,9(2):97-101
用时间序列模型来分析乙型、丙型这两种流感病毒,对乙流、丙流病毒DNA序列提供了一种新的时间序列模型,即CGR弧度序列。利用CGR坐标将乙流、丙流病毒DNA序列转换成CGR弧度序列,且引入长记忆ARFIMA模型去拟合这两类序列。发现随机找来的10条乙流序列,10条丙流序列都具有长相关性且拟合很好,并且还发现这两种病毒序列可以尝试用不同的ARFIMA模型ARFIMA(0,d,4)模型,ARFIMA(1,d,1)模型去识别。  相似文献   

4.
鉴定9个新的RHD基因mRNA可变剪接体   总被引:1,自引:0,他引:1  
许先国  吴俊杰  洪小珍  朱发明  严力行 《遗传》2006,28(10):1213-1218
为了研究各种RHD基因mRNA可变剪接体的基因结构, 应用逆转录聚合酶链反应(RT-PCR)检测正常人脐血样本RHD mRNA, 对RHD cDNA进行TA克隆和序列分析, 对各可变剪接体的剪接位点进行DNA序列分析, 并将RHD mRNA进行表达序列标签(ESTs)分析。结果在28个阳性克隆中, 除全长RHD cDNA外, 共检测到12种(包括9种新的)RHD可变剪接体, 发现外显子遗漏、5′和3′剪接位点变异3种剪接形式, 涉及外显子2~9, 其中6种新的剪接体同时存在RHD和RHCE基因同源杂交现象。ESTs分析还检索到内含子保留形式的剪接体。研究表明, RHD基因mRNA存在复杂的可变剪接机制, 除已报道的剪接体外, 检测到9种新的RHD可变剪接体, 并发现了可变剪接和同源杂交并存现象。  相似文献   

5.
基因表达调控中的核因子作用   总被引:7,自引:0,他引:7  
利用病毒和动物系统对基因表达调控进行了广泛和深入的研究,发现了顺式作用调节序列,鉴定了序列专一的DNA结合蛋白,DNA与蛋白质相互识别、结合及蛋白质与蛋白质相互作用中起作用的蛋白质结构域,并且对调节蛋白基因的克隆和序列进行了分析.基因表达调控领域又由于植物基因调控机制取得的发展而得到了补充,文章着重介绍植物基因中的DNA与蛋白质间的作用;植物调节蛋白基因的分离;这一领域的今后研究方向及展望.  相似文献   

6.
张堃  赵静静  唐旭清 《生命科学研究》2011,15(2):101-106,124
基于经典HP模型,利用蛋白质序列的矩阵图谱表达法(MGR)及数值刻画的思想提出了一种新的蛋白质序列的比对方法,通过观察蛋白质序列的数值刻画图及计算两蛋白质序列之间的欧氏距离d,对木聚糖酶两家族的蛋白质序列进行了相似性分析.发现被划分为同一木聚糖酶家族的蛋白质序列之间的相似性更大,而且蛋白质序列的相似性程度与分子大小、结构和分子进化相关.  相似文献   

7.
基于DNA序列数据挖掘算法研究   总被引:1,自引:0,他引:1  
引入数据挖掘技术,研究DNA序列数据内在规律性,并给出DNA序列分类问题的算法.综合考虑碱基组的出现概率以及相邻氨基酸之间的关系,从DNA序列片段的个案中密码子分布密度角度出发,提取出已知类别的DNA序列片段的特征;应用分类的逐步判别分析方法,剔除判别能力不显著的变量,给出DNA序列分类的判别函数.仿真结果表明,该算法具有分类计算公式简单且分类结果精度的优点.  相似文献   

8.
cis基因交换形成RHD-CE(2-9)-D等位基因   总被引:5,自引:0,他引:5  
邵超鹏  李桢  熊文  周一炎  李雪梅 《遗传》2005,27(4):561-565
以往通过基因组DNA分析,分别在高加索人和中国人中观察到少数Rh阴性个体存在RHD基因第1和第10外显子,但是该等位基因形成的具体分子机制尚有争论。本文分别针对RHD基因mRNA的5′-和3′-非编码区设计一对特异性引物,通过逆转录PCR(RT-PCR)和cDNA测序,分析2例RHD基因阳性(拥有第1和第10外显子)、D抗原表型阴性个体的全长mRNA/cDNA序列,同时以1例正常Rh阳性个体(CcDDee)作对照。结果正常Rh阳性个体拥有正常RHD基因mRNA,2名携带RHD基因的Rh阴性个体则均检出存在与正常RHD基因或RHCE基因转录产物相同长度、以及相同外显子构成的mRNA,但该转录子的第1和第10外显子及3′-非编码区序列与RHD基因一致,而第2~9外显子全部序列与RHCE(e)基因mRNA相同,表明2名个体均存在RHD-CE(2~9)-D融合RHD等位基因,即其RHD基因的第2~9外显子被同源RHCE(e)基因替换,导致不能编码正常RhD蛋白,形成个体D抗原阴性表型。  相似文献   

9.
蛋白质工程是生物技术中正在开发的一个新领域。由于它是一门从改变基因入手,制造新的蛋白质的技术学科,因此改变基因的方法就成为蛋白质工程的主要内容之一。 十年来由于重组基因和DNA序列分析方法得到成功,因此很多科学家的注意力都集中到研究DNA编码区域序列结构与功能的关系,发展了各种体内,体外突变方法。改变某一特定区域的DNA结构,用以确定DNA特定区域的功能。  相似文献   

10.
随着高通量DNA测序技术的飞速发展,越来越多的物种完成了基因组测序.定位编码基因、确定编码基因结构是基因组注释的基本任务,然而以往的基因组注释方法主要依赖于DNA及RNA序列信息.为了更加精确地解读完成测序的基因组,我们需要整合多种类型的组学数据进行基因组注释.近年来,基于串联质谱技术的蛋白质组学已经发展成熟,实现了对蛋白质组的高覆盖,使得利用串联质谱数据进行基因组注释成为可能.串联质谱数据一方面可以对已注释的基因进行表达验证,另一方面还可以校正原注释基因,进而发现新基因,实现对基因组序列的重新注释.这正是当前进展较快的蛋白质基因组学的研究内容.利用该方法系统地注释已完成测序的基因组已成为解读基因组的一个重要补充.本文综述了蛋白质基因组学的主要研究内容和研究方法,并展望了该研究方向未来的发展.  相似文献   

11.
Chaos Game Representation (CGR) can recognize patterns in the nucleotide sequences, obtained from databases, of a class of genes using the techniques of fractal structures and by considering DNA sequences as strings composed of four units, G, A, T and C. Such recognition of patterns relies only on visual identification and no mathematical characterization of CGR is known. The present report describes two algorithms that can predict the presence or absence of a stretch of nucleotides in any gene family. The first algorithm can be used to generate DNA sequences represented by any point in the CGR. The second algorithm can simulate known CGR patterns for different gene families by setting the probabilities of occurrence of different di- or trinucleotides by a trial and error process using some guidelines and approximate rules-of-thumb. The validity of the second algorithm has been tested by simulating sequences that can mimic the CGRs of vertebrate non-oncogenes, proto-oncogenes and oncogenes. These algorithms can provide a mathematical basis of the CGR patterns obtained using nucleotide sequences from databases.  相似文献   

12.
Similar to the chaos game representation (CGR) of DNA sequences proposed by Jeffrey (Nucleic Acid Res. 18 (1990) 2163), a new CGR of protein sequences based on the detailed HP model is proposed. Multifractal and correlation analyses of the measures based on the CGR of protein sequences from complete genomes are performed. The Dq spectra of all organisms studied are multifractal-like and sufficiently smooth for the Cq curves to be meaningful. The Cq curves of bacteria resemble a classical phase transition at a critical point. The correlation distance of the difference between the measure based on the CGR of protein sequences and its fractal background is also proposed to construct a more precise phylogenetic tree of bacteria.  相似文献   

13.
Analysis of genomic sequences by Chaos Game Representation   总被引:4,自引:0,他引:4  
MOTIVATION: Chaos Game Representation (CGR) is an iterative mapping technique that processes sequences of units, such as nucleotides in a DNA sequence or amino acids in a protein, in order to find the coordinates for their position in a continuous space. This distribution of positions has two properties: it is unique, and the source sequence can be recovered from the coordinates such that distance between positions measures similarity between the corresponding sequences. The possibility of using the latter property to identify succession schemes have been entirely overlooked in previous studies which raises the possibility that CGR may be upgraded from a mere representation technique to a sequence modeling tool. RESULTS: The distribution of positions in the CGR plane were shown to be a generalization of Markov chain probability tables that accommodates non-integer orders. Therefore, Markov models are particular cases of CGR models rather than the reverse, as currently accepted. In addition, the CGR generalization has both practical (computational efficiency) and fundamental (scale independence) advantages. These results are illustrated by using Escherichia coli K-12 as a test data-set, in particular, the genes thrA, thrB and thrC of the threonine operon.  相似文献   

14.
Chaos game representation of gene structure.   总被引:21,自引:2,他引:19       下载免费PDF全文
This paper presents a new method for representing DNA sequences. It permits the representation and investigation of patterns in sequences, visually revealing previously unknown structures. Based on a technique from chaotic dynamics, the method produces a picture of a gene sequence which displays both local and global patterns. The pictures have a complex structure which varies depending on the sequence. The method is termed Chaos Game Representation (CGR). CGR raises a new set of questions about the structure of DNA sequences, and is a new tool for investigating gene structure.  相似文献   

15.
The abundance of genes encoding aromatic ring-hydroxylating dioxygenases (RHDs) in the groundwater at an aromatic hydrocarbon-contaminated landfill near Sydney, Australia, was determined by quantitative DNA-DNA hybridization using class II RHD genes as probes. There were marked differences in hybridization signal intensity against DNA extracted from the groundwater at seven different locations across this heterogeneous site. This was interpreted as indicating variation in RHD gene abundance. Clone libraries of polymerase chain reaction (PCR)-amplified RHD gene fragments were constructed from DNA from each of the groundwater samples. The libraries from the samples with greater RHD gene abundance were dominated by a group of bacterial class II RHD genes, designated the S-cluster, that has yet to be found in cultured isolates. These groundwater samples contained no detectable petroleum hydrocarbons. A second group of class II RHD gene sequences, designated the T-cluster, dominated RHD gene clone libraries prepared from groundwater samples that contained detectable levels of total petroleum and aromatic hydrocarbons but lower RHD gene abundance. The hosts and in situ expression of these novel genes, and the substrates of the enzymes they encode, remain unknown. The scarcity of genes from known aromatic hydrocarbon-degrading bacteria and the numerical dominance of the novel genes suggest that the hosts of these novel genes may play an important role in aromatic hydrocarbon degradation at this site.  相似文献   

16.
A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy , a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.  相似文献   

17.
Summary Chaos game representation (CGR) is a novel holistic approach that provides a visual image of a DNA sequence quite different from the traditional linear arrangement of nucleotides. Although it is known that CGR patterns depict base composition and sequentiality, the biological significance of the specific features of each pattern is not understood. To systematically examine these features, we have examined the coding sequences of 7 human globin genes and 29 relatively conserved alcohol dehydrogenase (Adh) genes from phylogenetically divergent species. The CGRs of human globin cDNAs were similar to one another and to the entire human globin gene complex. Interestingly, human globin CGRs were also strikingly similar to human Adh CGRs. Adh CGRs were similar for genes of the same or closely related species but were different for relatively conserved Adh genes from distantly related species. Dinucleotide frequencies may account for the self-similar pattern that is characteristic of vertebrate CGRs and the genome-specific features of CGR patterns. Mutational frequencies of dinucleotides may vary among genome types. The special features of CG dinucleotides of vertebrates represent such an example. The CGR patterns examined thus far suggest that the evolution of a gene and its coding sequence should not be examined in isolation. Consideration should be given to genome-specific differential mutation rates for different dinucleotides or specific oligonucleotides. Offprint requests to: S. M. Singh  相似文献   

18.
Many studies have demonstrated the presence of scale invariance and long-range correlation in animal and human neuronal spike trains. The methodologies to extract the fractal or scale-invariant properties, however, do not address the issue as to the existence within the train of fine temporal structures embedded in the global fractal organisation. The present study addresses this question in human spike trains by the chaos game representation (CGR) approach, a graphical analysis with which specific temporal sequences reveal themselves as geometric structures in the graphical representation. The neuronal spike train data were obtained from patients whilst undergoing pallidotomy. Using this approach, we observed highly structured regions in the representation, indicating the presence of specific preferred sequences of interspike intervals within the train. Furthermore, we observed that for a given spike train, the higher the magnitude of its scaling exponent, the more pronounced the geometric patterns in the representation and, hence, higher probability of occurrence of specific subsequences. Given its ability to detect and specify in detail the preferred sequences of interspike intervals, we believe that CGR is a useful adjunct to the existing set of methodologies for spike train analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号