首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
Using chaos game representation we introduce a novel and straightforward method for identifying similarities/dissimilarities between DNA sequences of the same type, from different organisms. A matrix is associated to each CGR pattern and the similarities result from the comparison between the matrices of the sequences of interest. Three different methods of analysis of the resulting difference matrix are considered: a 3-dimensional representation giving both local and global information, a numerical characterization by defining an n-letter word similarity measure and a statistical evaluation. The method is illustrated by implementation to the study of albumin nucleotides sequences from eight mammal species taking as reference the human albumin.  相似文献   

2.
基于混沌游走方法的Rh血型系统中RHD基因的分析   总被引:3,自引:0,他引:3  
高雷  齐斌  朱平 《生命科学研究》2009,13(5):408-412
利用基于经典HP模型的蛋白质序列混沌游走方法(chaos game representation,CGR),给出了RHD基因的蛋白质序列CGR图,可视作蛋白质序列二级结构的一个特征图谱描述.对临床上的血型鉴别有一定的参考价值.另外.还根据由Jeffrey在1990年提出的描绘DNA序列的CGR方法,给出了RHD基因的DNA序列的CGR图.并且根据RHD基因DNA序列的CGR图算出了尺日D基因相应的马尔可夫两步转移概率矩阵,从概率矩阵表可以看出RHD基因对编码氨基酸的三联子的第3个碱基的使用偏好性.  相似文献   

3.
Abraham TM  Loeb DD 《Journal of virology》2007,81(21):11577-11584
Previous analysis of hepatitis B virus (HBV) indicated base pairing between two cis-acting sequences, the 5' half of the upper stem of epsilon and phi, contributes to the synthesis of minus-strand DNA. Our goal was to identify other cis-acting sequences on the pregenomic RNA (pgRNA) involved in the synthesis of minus-strand DNA. We found that large portions of the pgRNA could be deleted or substituted without an appreciable decrease in the level of minus-strand DNA synthesized, indicating that most of the pgRNA is dispensable and that a specific size of the pgRNA is not required for this process. Our results indicated that the cis-acting sequences for the synthesis of minus-strand DNA are present near the 5' and 3' ends of the pgRNA. In addition, we found that the first-strand template switch could be directed to a new location when a 72-nucleotide (nt) fragment, which contained the cis-acting sequences present near the 3' end of the pgRNA, was introduced at that location. Within this 72-nt region, we uncovered two new cis-acting sequences, which flank the acceptor site. We show that one of these sequences, named omega and located 3' of the acceptor site, base pairs with phi to contribute to the synthesis of minus-strand DNA. Thus, base pairing between three cis-acting elements (5' half of the upper stem of epsilon, phi, and omega) are necessary for the synthesis of HBV minus-strand DNA. We propose that this topology of pgRNA facilitates first-strand template switch and/or the initiation of synthesis of minus-strand DNA.  相似文献   

4.
Abstract

For high accuracy classification of DNA sequences through Convolutional Neural Networks (CNNs), it is essential to use an efficient sequence representation that can accelerate similarity comparison between DNA sequences. In addition, CNN networks can be improved by avoiding the dimensionality problem associated with multi-layer CNN features. This paper presents a new approach for classification of bacterial DNA sequences based on a custom layer. A CNN is used with Frequency Chaos Game Representation (FCGR) of DNA. The FCGR is adopted as a sequence representation method with a suitable choice of the frequency k-lengthen words occurrence in DNA sequences. The DNA sequence is mapped using FCGR that produces an image of a gene sequence. This sequence displays both local and global patterns. A pre-trained CNN is built for image classification. First, the image is converted to feature maps through convolutional layers. This is sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information using the pooling layers. The Random Projection (RP) with an activation function, which carries data with a decent variety with some randomness, is suggested instead of the pooling layers. The feature reduction is achieved while keeping the high accuracy for classifying bacteria into taxonomic levels. The simulation results show that the proposed CNN based on RP has a trade-off between accuracy score and processing time.  相似文献   

5.
6.
为了完善DNA序列对称理论,本文将12阶DNA群(D群)推广为24阶DNA全对称群(Dd群)。DNA全对称群被定义为特殊的交换群(S4),其交换元素是DNA序列的4个碱基。DNA群与四面体群(T群)同构,DNA全对称群与正四面体全对称群(Td群)同内,D群是Dd群的一个子群。本文还推导出了Dd群12个新元素的矩阵表。Dd群的乘法表,得到了在Dd群操作四碱基A,C,G和T的变换表等。  相似文献   

7.
ABSTRACT: BACKGROUND: Gene finding is a complicated procedure that encapsulates algorithms for coding sequence modeling, identification of promoter regions, issues concerning overlapping genes and more. In the present study we focus on coding sequence modeling algorithms; that is, algorithms for identification and prediction of the actual coding sequences from genomic DNA. In this respect, we promote a novel multivariate method known as Canonical Powered Partial Least Squares (CPPLS) as an alternative to the commonly used Interpolated Markov model (IMM). Comparisons between the methods were performed on DNA, codon and protein sequences with highly conserved genes taken from several species with different genomic properties. RESULTS: The multivariate CPPLS approach classified coding sequence substantially better than the commonly used IMM on the same set of sequences. We also found that the use of CPPLS with codon representation gave significantly better classification results than both IMM with protein (p < 0.001) and with DNA (p < 0.001). Further, although the mean performance was similar, the variation of CPPLS performance on codon representation was significantly smaller than for IMM (p < 0.001). CONCLUSIONS: The performance of coding sequence modeling can be substantially improved by using an algorithm based on the multivariate CPPLS method applied to codon or DNA frequencies.  相似文献   

8.
We have developed a method for identifying consensus patternsin a set of unaligned DNA sequences known to bind a common proteinor to have some other common biochemical function. The methodis based on a tnatrix representation of binding site patterns.Each row of the matrix represents one of the four possible bases,each column represents one of the positions of the binding siteand each element is determined by the frequency the indicatedbase occurs at the indicated position. The goal of the methodis to find the most significant matrix-i.e. the one with thelowest probability of occurring by chance-out of all the matricesthat can be formed from the set of related sequences. The reliabilityof the method improves with the number of sequences, while thetime required increases only linearly with the number of sequences.To test this method, we analysed 11 DNA sequences containingpromoters regulated by the Escherichia coli LexA protein. Thematrices we' found were consistent with the known consensussequence, and could distinguish the generally accepted LexAbinding sites from other DNA sequences. Received on November 6, 1989; accepted on December 20, 1989  相似文献   

9.
10.
11.
DNA sequences seen in the normal character-based representation appear to have a formidable mixing of the four nucleotides without any apparent order. Nucleotide frequencies and distributions in the sequences have been studied extensively, since the simple rule given by Chargaff almost a century ago that equates the total number of purines to the pyrimidines in a duplex DNA sequence. While it is difficult to trace any relationship between the bases from studies in the character representation of a DNA sequence, graphical representations may provide a clue. These novel representations of DNA sequences have been useful in providing an overview of base distribution and composition of the sequences and providing insights into many hidden structures. We report here our observation based on a graphical representation that the intra-purine and intra-pyrimidine differences in sequences of conserved genes generally follow a quadratic distribution relationship and show that this may have arisen from mutations in the sequences over evolutionary time scales. From this hitherto undescribed relationship for the gene sequences considered in this report we hypothesize that such relationships may be characteristic of these sequences and therefore could become a barrier to large scale sequence alterations that override such characteristics, perhaps through some monitoring process inbuilt in the DNA sequences. Such relationship also raises the possibility of intron sequences playing an important role in maintaining the characteristics and could be indicative of possible intron-late phenomena.  相似文献   

12.
We introduce a novel 2D graphical representation of DNA sequences based on the pairs of the neighboring nucleotides (PNNs). Then we get the PNNs' distributions and obtain a y-M. The construction of the PNN-curve has some important advantages (1) It avoids loss of information and the PNN-curve standing for DNA sequences does not overlap or intersect with itself. (2) The novel 2D representation is more sensitive. The utility of this method can be illustrated by the examination of similarities/dissimilarities among the coding sequences of the first exon of beta-globin gene of eleven different species in Table 2.  相似文献   

13.
Mapping nucleotide sequences onto a "DNA walk" produces a novel representation of DNA that can then be studied quantitatively using techniques derived from fractal landscape analysis. We used this method to analyze 11 complete genomic and cDNA myosin heavy chain (MHC) sequences belonging to 8 different species. Our analysis suggests an increase in fractal complexity for MHC genes with evolution with vertebrate > invertebrate > yeast. The increase in complexity is measured by the presence of long-range power-law correlations, which are quantified by the scaling exponent alpha. We develop a simple iterative model, based on known properties of polymeric sequences, that generates long-range nucleotide correlations from an initially noncorrelated coding region. This new model-as well as the DNA walk analysis-both support the intron-late theory of gene evolution.  相似文献   

14.
Three different mammalian origins of DNA replication, 343, S3, and X24, have been cloned into a 15.8 kb circular yeast vector pYACneo. Subsequent transfection into HeLa cells resulted in the isolation of several stably maintained clones. Two cell lines, C343e2 and CS3e1, were found to have sequences maintained as episomes in long-term culture with a stability per generation of approximately 80%. Both episomes also contain matrix attachment region (MAR) sequences which mediate the binding of DNA to the nuclear skeleton and are thought to play a role in DNA replication. Using high salt extraction of the nucleus and fluorescent in situ hybridization, we were able to demonstrate an association of the 343 episome with the nuclear matrix, most probably through functional MAR sequences that allow an association with the nuclear matrix and associated regions containing essential replication proteins. The presence of functional MARs in small episomal sequences may facilitate the replication and maintenance of transfected DNA as an episome and improve their utility as small episomal constructs, potential microchromosomes. J. Cell. Biochem. 67:439–450, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

15.
The pseudo-fourfold homotetrameric synapse formed by Cre protein and target DNA restricts site-specific recombination to sequences containing dyad-symmetric Cre-binding repeats. Mixtures of engineered altered-specificity Cre monomers can form heterotetramers that recombine nonidentical asymmetric sequences, allowing greater flexibility for target site selection in the genome of interest. However, the variety of tetramers allowed by random subunit association increases the chances of unintended reactivity at nontarget sites. This problem can be circumvented by specifying a unique spatial arrangement of heterotetramer subunits. By reconfiguring intersubunit protein-protein contacts, we directed the assembly of two different Cre monomers, each having a distinct DNA sequence specificity, in an alternating (ABAB) configuration. This designed heterotetramer preferentially recombined a particular pair of asymmetric Lox sites over other pairs, whereas a mixture of freely associating subunits showed little bias. Alone, the engineered monomers had reduced reactivity towards both dyad-symmetric and asymmetric sites. Specificity arose because the organization of Cre-binding repeats of the preferred substrate matched the programmed arrangement of the subunits in the heterotetrameric synapse. When this “spatial matching” principle is applied, Cre-mediated recombination can be directed to asymmetric DNA sequences with greater fidelity.  相似文献   

16.
Multiplexed amplification of specific DNA sequences, by PCR or by strand-displacement amplification, is an intrinsically biased process. The relative abundance of amplified DNA can be altered significantly from the original representation and, in extreme cases, allele dropout can occur. In this paper, we present a method of linear amplification of DNA that relies on the cooperative, sequence-dependent functioning of the DNA mismatch-repair enzyme endonuclease V (EndoV) from Thermotoga maritima (Tma) and Bacillus stearothermophilus (Bst) DNA polymerase. Tma EndoV can nick one strand of unmodified duplex DNA, allowing extension by Bst polymerase. By controlling the bases surrounding a mismatch and the mismatch itself, the efficiency of nicking by EndoV and extension by Bst polymerase can be controlled. The method currently allows 100-fold multiplexed amplification of target molecules to be performed isothermally, with an average change of <1.3-fold in their original representation. Because only a single primer is necessary, primer artefacts and nonspecific amplification products are minimized.  相似文献   

17.
18.
Mixed-phase (heterogeneous) and single-phase (homogeneous) DNA subtraction-hybridization methods were used to isolate specific DNA probes for closely related Rhizobium loti strains. In the heterogeneous method, DNA from the prospective probe strain was repeatedly hybridized to a mixture of DNA from cross-hybridizing strains (subtracter DNA) which was immobilized on an epoxy-activated cellulose matrix. Probe strain sequences which shared homology with the matrix-bound subtracter DNA hybridized to it, leaving unique probe strain sequences in the mobile phase. In the homogeneous method, probe strain sequences were hybridized in solution to biotinylated, mercurated subtracter DNA. Biotinylated, mercurated subtracer DNA and probe strain sequences hybridized to it were removed by two-step affinity chromatography on streptavidin-agarose and thiol-Sepharose. The specificity of the sequences remaining after subtraction hybridization by both methods was assessed and compared by colony hybridization with R. loti strains. Both methods allowed the rapid isolation of strain-specific DNA fragments which were suitable for use as probes.  相似文献   

19.
20.
Localization of SV40 genes within supercoiled loop domains   总被引:18,自引:4,他引:14       下载免费PDF全文
Recent studies indicate that eukaryotic DNA is organized into supercoiled loop domains. These loops appear to be anchored at their bases to an insoluble nuclear skeleton or matrix. Most of the DNA in the loops can be released from the matrix by nuclease digestion; the residual DNA remaining with the nuclear matrix represents sequences at the base of the loops, and possibly other sequences which are intimately associated with the nuclear matrix for other reasons. Using a quantitative application of the Southern blotting technique, we have found this residual DNA from SV40 infected 3T3 cells to be enriched in SV40 sequences, indicating that they reside near matrix-DNA attachment points. An enrichment of 3-7 fold relative to total cellular DNA, was found in each of three different lines of SV40 infected 3T3 cells. Control experiments with globin genes showed no such enrichment in this residual matrix DNA. This sequence specificity suggests that the spatial organization of DNA sequences within loops may be related to the functionality of these sequences within the cell.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号