首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper we show that restriction DNA fragments can prime DNA synthesis of a homologous supercoiled plasmid DNA. Using the dideoxyribonucleotide chain terminator method, newly synthesized truncated chains can be detached from the primers by restriction enzyme digestion. Therefore, by choosing DNA fragments flanked by two different restriction enzymes sites, nucleotide sequence information can be simultaneously obtained on both regions of the DNA surrounding the restriction fragment. The advantage of this sequencing approach over current methods is that no prior knowledge of the primary sequence is needed to find the nucleotide sequence of a given DNA fragment. Thus, synthetic primers are not required and internal sequences of a given clone can be easily accessed without the need of fragmenting the original construct. The method has been used with rapid plasmid preparations, thus considerable time and effort can be saved in the gathering of nucleotide sequence information.  相似文献   

2.
3.
The chaos game representation (CGR) is a scatter plot derived from a DNA sequence, with each point of the plot corresponding to one base of the sequence. If the DNA sequence were a random collection of bases, the CGR would be a uniformly filled square; conversely, any patterns visible in the CGR represent some pattern (information) in the DNA sequence. In this paper, patterns previously observed in a variety of DNA sequences are explained solely in terms of nucleotide, dinucleotide and trinucleotide frequencies.  相似文献   

4.
A new set of DNA base-nucleic acid codes and their hypercomplex number representation have been introduced for taking the probability of each nucleotide into full account. A new scoring system has been proposed to suit the hypercomplex number representation of the DNA base-nucleic acid codes and incorporated with the method of dot matrix analysis and various algorithms of sequence alignment. The problem of DNA sequence alignment can be processed in a rather similar way to pairwise alignment of the protein sequence.  相似文献   

5.
CpG dinucleotide clusters also referred to as CpG islands (CGIs) are usually located in the promoter regions of genes in a deoxyribonucleic acid (DNA) sequence. CGIs play a crucial role in gene expression and cell differentiation, as such, they are normally used as gene markers. The earlier CGI identification methods used the rich CpG dinucleotide content in CGIs, as a characteristic measure to identify the locations of CGIs. The fact, that the probability of nucleotide G following nucleotide C in a CGI is greater as compared to a non-CGI, is employed by some of the recent methods. These methods use the difference in transition probabilities between subsequent nucleotides to distinguish between a CGI from a non-CGI. These transition probabilities vary with the data being analyzed and several of them have been reported in the literature sometimes leading to contradictory results. In this article, we propose a new and efficient scheme for identification of CGIs using statistically optimal null filters. We formulate a new CGI identification characteristic to reliably and efficiently identify CGIs in a given DNA sequence which is devoid of any ambiguities. Our proposed scheme combines maximum signal-to-noise ratio and least squares optimization criteria to estimate the CGI identification characteristic in the DNA sequence. The proposed scheme is tested on a number of DNA sequences taken from human chromosomes 21 and 22, and proved to be highly reliable as well as efficient in identifying the CGIs.  相似文献   

6.
遗传密码和DNA序列的高维空间数字编码   总被引:13,自引:7,他引:6  
二进制数字化编码是信息科学最基本的编码方式。用0(00)、1(01)、2(10)和3(11)4个数码对4种碱基(C、T、A、G)进行二进制数字编码,共有24种可能的编码组合,其中8种满足碱基到补法则,它们是拓扑等价的。按碱基分子量大小排列的编码格式:0123/CTAG是最理想的编码格式。用二进制数对DNA的字符序列进行编码,有以下优点:1)压缩信息冗余度,提高编码效率;2)可以对碱基的结构、功能基  相似文献   

7.
Genomics is increasingly considered a global enterprise – the fact that biological information can flow rapidly around the planet is taken to be important to what genomics is and what it can achieve. However, the large-scale international circulation of nucleotide sequence information did not begin with the Human Genome Project. Efforts to formalize and institutionalize the circulation of sequence information emerged concurrently with the development of centralized facilities for collecting that information. That is, the very first databases build for collecting and sharing DNA sequence information were, from their outset, international collaborative enterprises. This paper describes the origins of the International Nucleotide Sequence Database Collaboration between GenBank in the United States, the European Molecular Biology Laboratory Databank, and the DNA Database of Japan. The technical and social groundwork for the international exchange of nucleotide sequences created the conditions of possibility for imagining nucleotide sequences (and subsequently genomes) as a “global” objects. The “transnationalism” of nucleotide sequence was critical to their ontology – what DNA sequences came to be during the Human Genome Project was deeply influenced by international exchange.  相似文献   

8.
Two computer programs for the IBM personal computer are describedfor rapid and accurate entry of DNA sequence data. The DNA sequencefiles produced can be used directly by the DNA sequence manipulationprograms by R. Staden (the DataBase system), the Universityof Wisconsin Genetics Computer Group, DNASTAR, or D. Mount.The first program, DIGISEQ, utilizes a sonic digitizer for semi-automationof sequence entry. To enter the DNA sequence each band of agel reading is touched by the stylus of the sonic digitizer.DIGISEQ corrects for both changes in lane width and lane curvature.The algorithm is extremely efficient and rarely requires re-entenngthe centers of the lanes. The second program, TYPESEQ, usesonly the keyboard for input. The keyboard is reconfigured toplace nucleotides and ambiguity codes under the fingers of onehand, corresponding to the order of the nucleotides on the geldefined by the user Both programs produce individual tones foreach nucleotide, and certain ambiguity codes. This verifiesinput of the correct nucleotide or ambiguity code, and thuseliminates the need to visually check the screen display duringsequence entry. Received on November 16, 1986; accepted on June 16, 1987  相似文献   

9.
We extracted phosphorus atom coordinates from the database of DNA crystal structures and calculated geometrical parameters needed to reproduce the crystal structures in the phosphorus atom representation. Using the geometrical parameters we wrote a piece of software assigning the phosphorus atom coordinates to the DNA of any nucleotide sequence. The software demonstrates non-negligible influence of the primary structure on DNA helicity, which may stand behind the heteromonous double helices of poly(dA).poly(dT) and poly(dG).poly(dC). In addition, the software is so simple that it makes possible to simulate the "crystal" structures of not only viral DNAs, but also the whole genome of Saccharomyces cerevisiae as well as the DNA human chromosome 22 having dozens of megabases in length.  相似文献   

10.
《Genomics》2020,112(2):1847-1852
A novel method is proposed to detect the acceptor and donor splice sites using chaos game representation and artificial neural network. In order to achieve high accuracy, inputs to the neural network, or feature vector, shall reflect the true nature of the DNA segments. Therefore it is important to have one-to-one numerical representation, i.e. a feature vector should be able to represent the original data. Chaos game representation (CGR) is an iterative mapping technique that assigns each nucleotide in a DNA sequence to a respective position on the plane in a one-to-one manner. Using CGR, a DNA sequence can be mapped to a numerical sequence that reflects the true nature of the original sequence. In this research, we propose to use CGR as feature input to a neural network to detect splice sites on the NN269 dataset. Computational experiments indicate that this approach gives good accuracy while being simpler than other methods in the literature, with only one neural network component. The code and data for our method can be accessed from this link: https://github.com/thoang3/portfolio/tree/SpliceSites_ANN_CGR.  相似文献   

11.
The single nucleotide polymorphism (SNP) is the difference of the DNA sequence between individuals and provides abundant information about genetic variation. Large scale discovery of high frequency SNPs is being undertaken using various methods. However, the publicly available SNP data sometimes need to be verified. If only a particular gene locus is concerned, locus-specific polymerase chain reaction amplification may be useful. Problem of this method is that the secondary peak has to be measured. We have analyzed trace data from conventional sequencing equipment and found an applicable rule to discern SNPs from noise. The rule is applied to multiply aligned sequences with a trace and the peak height of the traces are compared between samples. We have developed software that integrates this function to automatically identify SNPs. The software works accurately for high quality sequences and also can detect SNPs in low quality sequences. Further, it can determine allele frequency, display this information as a bar graph and assign corresponding nucleotide combinations. It is also designed for a person to verify and edit sequences easily on the screen. It is very useful for identifying de novo SNPs in a DNA fragment of interest.  相似文献   

12.
The problem of subclone identification for DNA fragments ofa known nucleotide sequence has been considered.We suggest astrategy for rapid identification of a large number of subclonesbased on: (i) partial sequencing of the subclone DNA (singlenucleotide track);(ii)representation of the result in the formof a numeric code showing the distribution of the chosen nucleotidealong the sequence; and (iii) identification of the subclonesequence using this code in a catalogue compiled and printedfor a whole DNA sequence. The same approach is applicable whenthe subclones are expected to have homology with known sequences. Received on January 24, 1986; accepted on September 11, 1986  相似文献   

13.
In this paper, a novel 3D graphical representation of DNA sequence based on codons is proposed. Since there is not loss of information due to overlapping and containing loops, this representation will be useful for comparison of different DNA sequences. This 3D curve will be convenient for DNA mutations comparison specially. In continues we give a numerical characterization of DNA sequences based on the new 3D curve. This characterization facilitates quantitative comparisons of similarities/dissimilarities analysis of DNA sequences based on codons.  相似文献   

14.
Initiation of adenovirus DNA replication is dependent on a complex of the precursor of the terminal protein and the adenovirus-coded DNA polymerase (pTP-pol complex). This complex catalyzes the formation of a covalent linkage between dCMP and pTP in the presence of a functional origin of DNA replication residing in the terminal nucleotide sequence of adenovirus DNA. We have purified the pTP-pol complex of adenovirus type 5 and studied its binding to double-stranded DNA. Using DNA-cellulose chromatography it could be shown that the pTP-pol complex has a higher affinity for adenovirus DNA than for calf thymus or pBR322 DNA. From the differential binding of the pTP-pol complex to plasmids containing adenovirus terminal sequences with different deletions, it has been concluded that a sequence of 14 nucleotide pairs at positions 9-22 plays a crucial role in the binding of pTP-pol to adenovirus DNA. This region is conserved in the DNA's of all human adenovirus serotypes and is obviously an important structural element of the adenovirus origin of DNA replication. Comparative binding studies with adenovirus DNA polymerase and pTP-pol indicated that pTP is responsible for the binding. The nature of the binding of pTP-pol to the conserved sequence will be discussed.  相似文献   

15.
We have developed a program for the graphic representation andmanipulation of DNA sequences. The program (named CARTE fromthe French for ‘map’) is intended as a tool in theplanning and analysis of recombinant DNA experiments. DNA sequencesare represented as standard restriction maps, using any desiredcombination of restriction enzymes. Features of interest, suchas promoters or coding sequences, can be highlighted. The sequencecan be manipulated to mimic cloning, using deletions, insertionsor replacements at specified sites. This process is facilitatedby the simultaneous display of a graphic map of the entire sequence,a detailed picture of the work in progress, and a menu of functions. Received on November 17, 1986; accepted on March 12, 1987  相似文献   

16.
Structural and functional features within genomic sequencesare best described by their position within the genomic structure.Cleavage sites can be conveniently described by single positionsbut genomic domains require the position of their two boundaries.The handling of these positions simultaneously to sequence manipulationsin computer simulations of recombinant DNA procedures greatlyimproves the understanding of the resulting recombinant constructs.In addition, the algorithms describing the fate of domain boundariescan be used for the handling of nucleotide sequences in dynamicdatabase environments handled by languages like Prolog whichare particularly suitable for artificial intelligence implementations.This communication describes a set of algorithms for the automaticupdating of single sites and double domain boundaries in linearand circular models for computer simulation of recombinant DNAprocedures. Received on August 18, 1987; accepted on October 13, 1987  相似文献   

17.
The pseudo oligonucleotide composition, or pseudo K-tuple nucleotide composition (PseKNC), can be used to represent a DNA or RNA sequence with a discrete model or vector yet still keep considerable sequence order information, particularly the global or long-range sequence order information, via the physicochemical properties of its constituent oligonucleotides. Therefore, the PseKNC approach may hold very high potential for enhancing the power in dealing with many problems in computational genomics and genome sequence analysis. However, dealing with different DNA or RNA problems may need different kinds of PseKNC. Here, we present a flexible and user-friendly web server for PseKNC (at http://lin.uestc.edu.cn/pseknc/default.aspx) by which users can easily generate many different modes of PseKNC according to their need by selecting various parameters and physicochemical properties. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to generate their desired PseKNC without the need to follow the complicated mathematical equations, which are presented in this article just for the integrity of PseKNC formulation and its development. It is anticipated that the PseKNC web server will become a very useful tool in computational genomics and genome sequence analysis.  相似文献   

18.
Feng ZP 《Biopolymers》2001,58(5):491-499
A new representation of protein sequence is devoted in this paper, in which each protein can be represented by a 20-dimensional (20D) vector of unit length. Inspired by the principle of superposition of state in quantum mechanics, the squares of the 20 components of the vector correspond to the amino acid composition. Using the new representation of the primary sequence and Bayes Discriminant Algorithm, the subcellular location of prokaryotic proteins was predicted. The overall predictive accuracy in the jackknife test can be 3% higher than the result of using amino acid composition directly for the database of sequence identity is less than 90%, but 5% higher when sequence identity is less than 80%. The higher predictive accuracy indicates that the current measure of extracting the information from the primary sequence is efficient. Since the subcellular location restricting a protein's possible function, the present method should also be a useful measure for the systematic analysis of genome data. The program used in this paper is available on request.  相似文献   

19.

Background

DNA Clustering is an important technology to automatically find the inherent relationships on a large scale of DNA sequences. But the DNA clustering quality can still be improved greatly. The DNA sequences similarity metric is one of the key points of clustering. The alignment-free methodology is a very popular way to calculate DNA sequence similarity. It normally converts a sequence into a feature space based on words’ probability distribution rather than directly matches strings. Existing alignment-free models, e.g. k-tuple, merely employ word frequency information and ignore many types of useful information contained in the DNA sequence, such as classifications of nucleotide bases, position and the like. It is believed that the better data mining results can be achieved with compounded information. Therefore, we present a new alignment-free model that employs compounded information to improve the DNA clustering quality.

Results

This paper proposes a Category-Position-Frequency (CPF) model, which utilizes the word frequency, position and classification information of nucleotide bases from DNA sequences. The CPF model converts a DNA sequence into three sequences according to the categories of nucleotide bases, and then yields a 12-dimension feature vector. The feature values are computed by an entropy based model that takes both local word frequency and position information into account. We conduct DNA clustering experiments on several datasets and compare with some mainstream alignment-free models for evaluation, including k-tuple, DMk, TSM, AMI and CV. The experiments show that CPF model is superior to other models in terms of the clustering results and optimal settings.

Conclusions

The following conclusions can be drawn from the experiments. (1) The hybrid information model is better than the model based on word frequency only. (2) For DNA sequences no more than 5000 characters, the preferred size of sliding windows for CPF is two which provides a great advantage to promote system performance. (3) The CPF model is able to obtain an efficient stable performance and broad generalization.  相似文献   

20.
Xu YH  Manoharan HT  Pitot HC 《BioTechniques》2007,43(3):334, 336-340, 342
The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号