首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A program for template matching of protein sequences   总被引:1,自引:0,他引:1  
The matching of a template to a protein sequence is simplifiedby treating it as a special case of sequence alignment. Restrictionof the distances between motifs in the template controls againstspurious matches within very long sequences. The program usingthis algorithm is fast enough to be used in scanning large databasesfor sequences matching a complex template. Received on August 17, 1987; accepted on January 11, 1988  相似文献   

2.
Given two sequences, a pattern of length m, a text of lengthn and a positive integer k, we give two algorithms. The firstfinds all occurrences of the pattern in the text as long asthese do not differ from each other by more than k differences.It runs in O(nk) time. The second algorithm finds all subsequencealignments between the pattern and the test with at most k differences.This algorithm runs in O(nmk) time, is very simple and easyto program. Received on August 12, 1987; accepted on December 31, 1987  相似文献   

3.
We present an algorithm to detect distances between oligonucleotidesin large collections of nucleic acids sequences. The ratiosof actual frequencies of occurrence of short oligonucleotidesat a given distance to the corresponding expected frequencieswere analyzed in four categories of DNA sequences (eukaryoticexons, bacterial genes, introns and non-Alu repeated DNAs).Three base periodic occurrences (independent of the readingframe) of all combinations of mononucleotides and repeats ofall dinucleotides was characteristic for protein coding regions.This was also the case with the majority of trinucleotides (includingtranslational stop signals) in these regions. Mirror-symmetrictrinucleotides (except GCG and CGC) displayed a strong tendencyto be two base periodically repeated in introns. Some two andthree base periodic motifs were also observed in repeated DNAs.The possible biological implications of outstanding three baseperiodicities in bacterial genes and eukaryotic exons are discussed. Received on March 2, 1987; accepted on May 5, 1987  相似文献   

4.
We describe software for aligning protein or nucleic acid sequencesbased on the concept of match density. This method is especiallyuseful for locating regions of short similarity between twolonger sequences which may be largely dissimilar (e.g. locatingactive site regions in distantly related proteins). Our softwareis able to identify biologically interesting similarities betweentwo sub-regions because it allows the user to control the matchingparameters and the manner in which local alignments are selectedfor display. Furthermore, the collection and ranking of alignmentsfor display uses a novel, highly efficient algorithm. We illustratethese features with several examples. In addition, we show thatthis tool can be used to find a new conserved sequence in severalviral DNA polymerases, which, we suggest, occurs at a functionallyimportant enzymatic site. Received on August 17, 1987; accepted on November 17, 1987  相似文献   

5.
The antigenic index: a novel algorithm for predicting antigenic determinants   总被引:39,自引:0,他引:39  
In this paper, we introduce a computer algorithm which can beused to predict the topological features of a protein directlyfrom its primary amino acid sequence. The computer program generatesvalues for surface accessibility parameters and combines thesevalues with those obtained for regional backbone flexibilityand predicted secondary structure. The output of this algorithm,the antigenic index, is used to create a linear surface contourprofile of the protein. Because most, if not all, antigenicsites are located within surface exposed regions of a protein,the program offers a reliable means of predicting potentialantigenic determinants. We have tested the ability of this programto generate accurate surface contour profiles and predict antigenicsites from the linear amino acid sequences of well-characterizedproteins and found a strong correlation between the predictionsof the antigenic index and known structural and biological data. Received on August 17, 1987; accepted on December 31, 1987  相似文献   

6.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

7.
Stochastic models for heterogeneous DNA sequences   总被引:10,自引:0,他引:10  
The composition of naturally occurring DNA sequences is often strikingly heterogeneous. In this paper, the DNA sequence is viewed as a stochastic process with local compositional properties determined by the states of a hidden Markov chain. The model used is a discrete-state, discreteoutcome version of a general model for non-stationary time series proposed by Kitagawa (1987). A smoothing algorithm is described which can be used to reconstruct the hidden process and produce graphic displays of the compositional structure of a sequence. The problem of parameter estimation is approached using likelihood methods and an EM algorithm for approximating the maximum likelihood estimate is derived. The methods are applied to sequences from yeast mitochondrial DNA, human and mouse mitochondrial DNAs, a human X chromosomal fragment and the complete genome of bacteriophage lambda.  相似文献   

8.
A program has been developed for the modelling of modificationsin DNA ends, for the construction of ligated junctions, andfor the analysis in these junctions of new restriction enzymerecognition sequences. This program allows the analysis of restrictionenzyme specifities in ligated junctions of cohesive or bluntDNA ends. Cohesive ends are considered in their natural configurationor after modification by possible blunt-ending procedures. Theprogram also allows the modelling of partial filling-in for5'-single-stranded ends. This program has proven useful forthe design of sequences with new restriction sites or to predictor confirm the sequence of junctions created by the ligationof modified ends. Received on October 28, 1987; accepted on November 23, 1987  相似文献   

9.
An interface program has been developed for users of MS-DOScomputers and the GenBank(R) gene sequence files in their disketteformat. With the program a user is able to produce keyword,author and entry name listings of GenBank items or to selectGenBank sequences for viewing, printing or decoding. The decodeoption uncompresses sequence data and yields a character filewhich has the format used on GenBank magnetic tapes. Programoptions are chosen by selecting items from command menus. Whilethe program is designed primarily for hard disk operation, italso allows users of diskette-based computers to work with GenBankfiles. Received on July 15, 1987; accepted on July 15, 1987  相似文献   

10.
Discriminant analysis of promoter regions in Escherichia coli sequences   总被引:2,自引:0,他引:2  
We have previously developed a general method based on the statisticaltechnique of discriminant analysis to predict splice junctionsin eukaryotic mRNA sequences [Nakata, K., Kanehisa, M. and DeLisi,C. (1985) Nucleic Acids Res., 13, 5327–5340]. In orderto evaluate further applicability of this method, we now analyzethe promoter region of Escherichia coli sequences. The attributesused for discrimination include the accuracy of consensus sequencepatterns measured by the perceptron algorithm, the thermal stabilitymap, the base composition and the Calladine-Dickerson rulesfor helical twist angle, roll angle, torsion angle and propellertwist angle. When applied to selected E. coli sequences in theGenBank database, the method correctly identifies 75 % of thetrue promoter regions. Received on May 15, 1987; accepted on April 17, 1988  相似文献   

11.
A fixed-point alignment analysis technique is presented whichis designed to locate common sequence motifs in collectionsof proteins or nucleic acids. Initially a program aligns a collectionof sequences by a common sequence pattern or known biologicalfeature. The common pattern or feature (fixed-point) may bea user-specified sequence string or a known sequence positionlike mRNA start site, which may be taken directly from the annotatedfeature table of GenBank. Once all alignment markers are located,the sequences are scanned for occurrences of given oligomerswithin a specified span both upstream and downstream of thefixed-point. The occurrences may then be plotted as a functionof the position relative to the fixed-point, displayed as anactual sequence alignment or selectively summarized via variousprogram options. Applications of the technique are discussed. Received on August 17, 1987; accepted on November 17, 1987  相似文献   

12.
Algorithms for identifying local molecular sequence features   总被引:1,自引:0,他引:1  
Efficient algorithms are described for identifying local molecularsequence features including repeats, dyad symmetry pairingsand aligned matches between sequences, while allowing for errors.Specific applications are given to the genomic sequences ofthe Epstein-Barr virus, Varicella-Zoster virus and the bacteriophages and T7. Received on October 6, 1987; accepted on December 13, 1987  相似文献   

13.
A flexible method to align large numbers of biological sequences   总被引:5,自引:0,他引:5  
Summary A method for the alignment of two or more biological sequences is described. The method is a direct extension of the method of Taylor (1987) incorporating a consensus sequence approach and allows considerable freedom in the control of the clustering of the sequences. At one extreme this is equivalent to the earlier method (Taylor 1987), whereas at the other, the clustering approaches the binary method of Feng and Doolittle (1987). Such freedom allows the program to be adapted to particular problems, which has the important advantage of resulting in considerable savings in computer time, allowing very large problems to be tackled. Besides a detailed analysis of the alignment of the cytochrome c superfamily, the clustering and alignment of the PIR sequence data bank (3500 sequences approx.) is described.  相似文献   

14.
Definition and identification of homology domains   总被引:3,自引:0,他引:3  
A method is described for identifying and evaluating regionsof significant similarity between two sequences. The notionof a ‘homology domain’ is employed which definesthe boundaries of a region of sequence homology containing noinsertions or deletions. The relative significance of differentpotential homology domains is evaluated using a non-linear similarityscore related to the probability of finding the observed levelof similarity in the region by chance. The sensitivity of themethod is demonstrated by simulating the evolution of homologydomains and applying the method to their detection. Severalexamples of the use of homology domain identification are given. Received on July 29, 1987; accepted on November 15, 1987  相似文献   

15.
16.
17.
A computer program is described, which constructs maps of restrictionendonuclease cleavage sites in linear or circular DNA molecules,given the fragment lengths in single and double digestions withtwo enzymes. The algorithm is based upon a partition methodand a very simple rule to chain fragments. The program is writtenin Prolog II. Received on July 28, 1987; accepted on December 31, 1987  相似文献   

18.
A new algorithm is proposed to determine the type-II restrictionendonucleases' recognition site knowing the digested DNA sequenceand fragment lengths in an actual case. The algorithm is implementedfor the Commodore 64 microcomputer. Received on January 6, 1987; accepted on June 19, 1987  相似文献   

19.
We have tracked the early years of the evolution of the human immunodeficiency virus type 1 (HIV-1) epidemic in a rural district of central east Africa from the first documented introductions of subtypes A, D, and C to the present predominance of subtype C. The earliest subtype C sequences ever reported are described. Blood samples were collected on filter papers from 1981 to 1984 and from 1987 to 1989 from more than 44,000 individuals living in two areas of Karonga District, Malawi. These samples included HIV-1-positive samples from 200 people. In 1982 to 1984, HIV-1 subtypes A, C, and D were all present, though in small numbers. By 1987 to 1989, 152 (90%) of a total of 168 sequences were subtype C and AC, AD, and DC recombinants had emerged. Four of the subtype C sequences from 1983 to 1984 were closely related and were found at the base of a large cluster of low diversity that by the late 1980s accounted for 40% of C sequences. The other two early C sequences fell into a separate and more diverse cluster. Three other clusters containing sequences from the late 1980s were identified. Each cluster contained at least one sample from a person who had recently arrived in the district. From 18 HIV-1-positive spouse pairs, 12 very closely related pairs of sequences were identified. We conclude that there were multiple introductions of HIV-1 with limited spread, followed by explosive growth of a subtype C cluster, probably arising from a single introduction in or before 1983.  相似文献   

20.
Alignments of DNA and protein sequences containing frameshift errors   总被引:1,自引:0,他引:1  
Molecular sequences, like all experimental data, are subjectto error. Many current DNA sequencing protocols have very signerror rates and often generate artefactual insertions and deletionsof bases (indels) which corrupt the translation of sequencesand compromise the detection of protein homologies. The impactof these errors on the utility of molecular sequence data isdependent on the analytic technique used to interpret the data.In the presence of frameshift errors, standard algorithms usingsix-frame translation can miss important homologies becauseonly subfragments of the correct translation are available inany given frame. We present a new algorithm which can detectand correct frameshift errors in DNA sequences during comparisonof translated sequences with protein sequences in the databases.This algorithm can recognize homologous proteins sharing 30%identity even in the presence of a 7% frameshift error rate.Our algorithm uses dynamic programming, producing a guaranteedoptimal alignment in the presence of frameshifts, and has asensitivity equivalent to Smith-Waterman. The computationalefficiency of the algorithm is O(nm) where n and m are the sizesof two sequences being compared. The algorithm does not relyon prior knowledge or heuristic rules and performs sign betterthan any previously reported method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号