首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
We describe a computer tool to aid the discovery of new motifsin nucleic acid sequences. A typical use would be to analysea set of upstream regions from a family of related genes inorder to find possible control sequences. The heart of the methodis the creation of dictionaries of related subsequences. Thesedictionaries can then be analysed to look for the commonestor best-defined subsequences, those that occur in the highestnumber of different sequences, or for those in equivalent positionswithin the family. We show the application of the method toa set of E. coli promoter sequences. Received on May 9, 1989; accepted on July 27, 1989  相似文献   

3.
‘The GenBank’* nucleic acid sequence database isa computer-based collection of all published DNA and RNA sequences;it contains over five million bases in close to six thousandsequence entries drawn from four thousand five hundred publishedarticles. Each sequence is accompanied by relevant biologicalannotation. The database is available either on magnetic tape,on floppy diskettes, on-line or in hardcopy form. We discussthe structure of the database, the extent of the data and theimplications of the database for research on nucleic acids.  相似文献   

4.
Visualization of nucleic acid sequence structural information   总被引:3,自引:0,他引:3  
Several interactive Pascal programs have been written for theanalysis and display of structural information in nucleic acidsequences. Layout procedures were developed to display the homologyand repeat matrices of a sequence and to predict and displaythe secondary structure of RNA/DNA molecules free of overlapand to predict and display internal repeats. No special plottingdevices are required because the output is adapted to line printers.Sequences from several DNA database systems can be used as input.These programs are part of a general nucleic acid sequence analysispackage. Received on December 9, 1984; accepted on January 11, 1985  相似文献   

5.
Watson-Crick base pairing is a natural molecular recognition process that has been exploited in molecular biology and universally adopted in many fields. An additional mode of nucleic acid sequence recognition that could be used in combination with normal base pairing would add an exta dimension to nucleic acid interactions and open up many new applications. In principle the triplex approach could provide this if developed to recognize any DNA sequence. To this end modified nucleosides have been incorporated into triple-helix-forming oligonucleotides (TFOs) and used to recognize mixed sequence DNA with high selectivity and affinity at neutral pH. Continuing developments are directed towards improving TFO affinity at high pH and increasing triplex association kinetics. A number of applications of triplexes are currently being explored.  相似文献   

6.
The complete cDNA nucleic acid sequence of preproapolipoprotein (apo) A-II, a major protein constituent of high density lipoproteins, has been determined on clones from a human liver ds-cDNA library. Clones containing ds-cDNA for apoA-II were identified in the human liver ds-cDNA library using synthetic oligonucleotides as probes. Of 3200 clones screened, 4 reacted with the oligonucleotide probes. The DNA sequence coding for amino acids ?17 to +17 of apoA-II were determined by Maxam-Gilbert sequence analysis of restriction fragments isolated from one of these clones, pMDB2049. The remainder of the cDNA sequence was established by sequence analysis of a primer extension product synthesized utilizing a restriction fragment near the 5'-end of clone pMDB2049 as primer with total liver mRNA. The apoA-II mRNA encodes for a 100 amino acid protein, preproapoA-II that has an 18 amino acid prepeptide and a 5 amino acid propeptide terminating with a basic dipeptide (Arg-Arg) at the cleavage site to mature apoA-II.  相似文献   

7.
Computer programs are described which help during the collection and analysis of nucleic acid sequence data. They are written in FORTRAN and have been implemented on a PDP 11/60 computer.  相似文献   

8.
System analysis and nucleic acid sequence banks   总被引:2,自引:0,他引:2  
M Gouy  C Gautier  F Milleret 《Biochimie》1985,67(5):433-436
The mass of published nucleic acid sequence data has required the design of several computerized data bases. We show that this activity is related to the methodology of System Analysis and that data bases are a means of modeling biological knowledge. As an example, the ACNUC data base we have created is presented.  相似文献   

9.
Non-parametric statistics for nucleic acid sequence study   总被引:2,自引:0,他引:2  
C Gautier  M Gouy  S Louail 《Biochimie》1985,67(5):449-453
The use of non-parametric statistics for nucleic acid sequence studies is illustrated by some examples. This method is highly flexible and allows design of specific tests for detecting sequence structure. Tests devoted to local repetitivity, codon nearest neighbors, and dinucleotide avoidance are discussed in detail. An appendix indicates all computations required to use these tests.  相似文献   

10.
This paper describes the application of text compression methodsto machine-readable files of nucleic acid and protein sequencedata. Two main methods are used to reduce the storage requirementsof such files, these being n-gram coding and run-length coding.A Pascal program combining both of these techniques resultedin a compression figure of 74.6% for the GenBank database anda program that used only n-gram coding gave a compression figureof 42.8% for the Protein Identification Resource database. Received on November 29, 1985; accepted on February 24, 1986  相似文献   

11.
Statistical characterization of nucleic acid sequence functional domains   总被引:20,自引:14,他引:6  
It has long been recognized that various genome classes were distinguishable on the basis of base composition and nearest neighbor frequencies. In addition Grantham et al. (8) have recently presented evidence that these distinctions are preserved at the level of codon usage. As discussed in this report it is now clear that these and related statistics can uniquely characterize the various functional domains of the genome. In particular peptide coding, intervening segments, structural RNA coding and mitochondrial domains of the vertebrate genome are uniquely characterizable. The statistical measures not only reflect understood functional differences among these domains but suggest others. The ability of these simple statistics of nucleic acid sequences to reflect so much of the encoded complex pattern information and/or effects of selective constraints is somewhat surprising. Here, we investigated the statistical measures most distinctive of the various domains and then linked them to our current understandings in so far as possible.  相似文献   

12.
Nucleic acid-based biochemical assays are crucial to modern biology. Key applications, such as detection of bacterial, viral and fungal pathogens, require detailed knowledge of assay sensitivity and specificity to obtain reliable results. Improved methods to predict assay performance are needed for exploiting the exponentially growing amount of DNA sequence data and for reducing the experimental effort required to develop robust detection assays. Toward this goal, we present an algorithm for the calculation of sequence similarity based on DNA thermodynamics. In our approach, search queries consist of one to three oligonucleotide sequences representing either a hybridization probe, a pair of Padlock probes or a pair of PCR primers with an optional TaqMantrade mark probe (i.e. in silico or 'virtual' PCR). Matches are reported if the query and target satisfy both the thermodynamics of the assay (binding at a specified hybridization temperature and/or change in free energy) and the relevant biological constraints (assay sequences binding to the correct target duplex strands in the required orientations). The sensitivity and specificity of our method is evaluated by comparing predicted to known sequence tagged sites in the human genome. Free energy is shown to be a more sensitive and specific match criterion than hybridization temperature.  相似文献   

13.
Structure prediction of non-canonical motifs such as mismatches, extra unmatched nucleotides or internal and hairpin loop structures in nucleic acids is of great importance for understanding the function and design of nucleic acid structures. Systematic conformational analysis of such motifs typically involves the generation of many possible combinations of backbone dihedral torsion angles for a given motif and subsequent energy minimization (EM) and evaluation. Such approach is limited due to the number of dihedral angle combinations that grows very rapidly with the size of the motif. Two conformational search approaches have been developed that allow both an effective crossing of barriers during conformational searches and the computational demand grows much less with system size then search methods that explore all combinations of backbone dihedral torsion angles. In the first search protocol single torsion angles are flipped into favorable states using constraint EM and subsequent relaxation without constraints. The approach is repeated in an iterative manner along the backbone of the structural motif until no further energy improvement is obtained. In case of two test systems, a DNA-trinucleotide loop (sequence: GCA) and a RNA tetraloop (sequence: UUCG), the approach successfully identified low energy states close to experiment for two out of five start structures. In the second method randomly selected combinations of up to six backbone torsion angles are simultaneously flipped into preset ranges by a short constraint EM followed by unconstraint EM and acceptance according to a Metropolis acceptance criterion. This combined stochastic/EM search was even more effective than the single torsion flip approach and selected low energy states for the two test cases in between two and four cases out of five start structures.  相似文献   

14.
15.
Recently we have constructed a database—the Enzyme–ReactionDatabase–which links a chemical structure to amino acidsequences of enzymes that recognize the chemical structure astheir ligand. The total number of enzymes registered in thedatabase is 1103 with 6668 NBRF–PIR entry codes and 1756chemical compounds. The chemical structures and chemical namesfor 842 compounds are registered in the Chemical–StructureDatabase on the MACCS system. For each enzyme, the sequenceswere divided into clusters, and multiply aligned in each clusterto extract a conserved sequence. A total of 158 781 five–residue–longfragments were constructed from 433 conserved sequences andcompared among different clusters of different enzymes. Oneof these motifs shared by different enzymes S–G–G–L–D.The motif was conserved in both argininosuccinate synthase (EC6.3.4.5 [EC] ) and asparagine synthase (glutamine–hydrolysing)(EC 6.3.5.4 [EC] ). This result showed that the database was usefulfor the analysis of the relationship between chemical structuresand amino acid sequence motifs.  相似文献   

16.
A novel method for nucleic acid sequence determination   总被引:30,自引:0,他引:30  
We describe a novel sequencing methodology which should be readily and completely automated. The method relies on fragmentation of a nucleotide or deoxynucleotide sequence into short fragments, and subsequent quantitation of the fragments by hybridization to oligo-deoxynucleotides on a solid support. The original sequence may be reconstructed from the resulting table of fragment frequencies. We present a specific protocol which would allow practical implementation of this approach.  相似文献   

17.
Comparative studies of sequence motifs in the RNA polymerases and nucleic acid helicases of positive-sense RNA plant viruses have provided a new scheme for the classification of these pathogens. We propose a new luteovirus supergroup which should be added to the already described Sindbisvirus-like and picornavirus-like supergroups. Sequence motifs of nucleic acid helicases and RNA polymerases which previously were considered to be specific for each of the two supergroups now occur together within this new supergroup. We propose that this new viral supergroup provides an evolutionary link between the other two supergroups.  相似文献   

18.
19.

Background  

High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号