首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 57 毫秒
1.
This paper describes a generic algorithm for finding restrictionsites within DNA sequences. The ‘genericity’ ofthe algorithm is made possible through the use of set theory.Basic elements of DNA sequences, i.e. nucleotides (bases), arerepresented in sets, and DNA sequences, whether specific, ambiguousor even protein-coding, are represented as sequences of thosesets. The set intersection operation demonstrates its abilityto perform pattern-matching correctly on various DNA sequences.The performance analysis showed that the degree of complexityof the pattern matching is reduced from exponential to linear.An example is given to show the actual and potential restrictionsites, derived by the generic algorithm, in the DNA sequencetemplate coding for a synthetic calmodulin. Received on October 2, 1990; accepted on December 18, 1990  相似文献   

2.
3.
An algorithm, ‘phylogenetic scanning’, is describedfor mapping gene conversion events where comparative DNA sequencedata are available from different species. In this algorithm,sets of hypothetical phylogenetic trees are constructed thatdescribe possible sequence relationships due to gene conversionsin different species lineages; these trees are then evaluatedby the principle of parsimony at intervals in the sequence alignment.When used to map gene conversion events that occurred betweenthe pair of -globin genes of higher primates, the algorithmgives results nearly identical to those obtained using a tediousmanual approach. Suggestions are also provided for adaptationof this procedure to the analysis of other recombination events. Received on July 3, 1990; accepted on November 8, 1990  相似文献   

4.
A novel interactive method for generating multiple protein sequencealignments is described. The program has no internal limit tothe number or length of sequences it can handle and is designedfor use with DEC VAX processors running the VMS operating system.The approach used is essentially one of manual sequence manipulation,aided by built-in symbolic displays of identities and similarities,and strict and ‘fuzzy’ (ambiguous) pattern-matchingfacilities. Additional flexibility is provided by means of aninterface to a publicly available automatic alignment systemand to a comprehensive sequence analysis package. Received on August 28, 1990; accepted on November 20, 1990  相似文献   

5.
A package for the creation and processing of multiple sequencealignment is described. There is no limit on the lengths ofthe processed nucleotide or amino acid sequences, and the numberof sequences in the alignment is also unlimited. The main groupsof functions are: a semi–automatic alignment editor; awide set of functions for technical processing of alignments;nucleotide alignment mapping and translation; and similaritysearch functions. A user-friendly interface and a set of generallyused file actions provide a special operational subsystem foreveryday tasks  相似文献   

6.
An improved sequence handling package that runs on the Apple Macintosh   总被引:4,自引:0,他引:4  
We report improvements to our sequence analysis package andadaptation to run on the Apple Macintosh range of machines.The ‘standard’ version of the programs, which runon a VAX, has been given a new user interface that makes theprograms very much easier to work with and has facilitated themove to the Macintosh. The reorganization of the code shouldsimplify moves to other systems that offer WIMP user interfaces.In addition to a large number of small but useful extra features,some important new analytical functions have been devised. Theseinclude sequence and contig editors; optimal alignment and comparisonmethods; and a new method for comparing the observed and expectedfrequencies of selected oligonucleotides. Received on February 12, 1990; accepted on April 19, 1990  相似文献   

7.
In hidden Markov models, the probability of observing a set of strings can be computed using recursion relations. We construct a sufficient condition for simplifying the recursion relations for a certain class of hidden Markov models. If the condition is satisfied, then one can construct a reduced recursion where the dependence on Markov states completely disappears. We discuss a specific example—namely, statistical multiple alignment based on the TKF-model—in which the sufficient condition is satisfied.  相似文献   

8.

Background  

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 % for RNAs in comparison to 20 % for proteins. In this study we enhance the previous benchmark.  相似文献   

9.
Optimal sequence alignment allowing for long gaps   总被引:7,自引:0,他引:7  
A new algorithm for optimal sequence alignment allowing for long insertions and deletions is developed. The algorithm requires O((L+C)MN) computational steps, O(LN) primary memory and O(MN) secondary memory storage, whereM andN(M≥N) are sequence lengths,L (typicallyL≤3) is the number of segment specifying the gap weighting function, andC is a constant. We have also modified our earlier traceback algorithm so that it finds all and only the optimal alignments in a compact form of a directed graph. The current versions accept a set of aligned sequences as input, which facilitates multiple sequence alignment by some iterative procedures. Dedicated to Professor Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

10.
MOTIVATION: The quality of a model structure derived from a comparative modeling procedure is dictated by the accuracy of the predicted sequence-template alignment. As the sequence-template pairs are increasingly remote in sequence relationship, the prediction of the sequence-template alignments becomes increasingly problematic with sequence alignment methods. Structural information of the template, used in connection with the sequence relationship of the sequence-template pair, could significantly improve the accuracy of the sequence-template alignment. In this paper, we describe a sequence-template alignment method that integrates sequence and structural information to enhance the accuracy of sequence-template alignments for distantly related protein pairs. RESULTS: The structure-dependent sequence alignment (SDSA) procedure was optimized for coverage and accuracy on a training set of 412 protein pairs; the structures for each of the training pairs are similar (RMSD< approximately 4A) but the sequence relationship is undetectable (average pair-wise sequence identity = 8%). The optimized SDSA procedure was then applied to extend PSI-BLAST local alignments by calculating the global alignments under the constraint of the residue pairs in the local alignments. This composite alignment procedure was assessed with a testing set of 1421 protein pairs, of which the pair-wise structures are similar (RMSD< approximately 4A) but the sequences are marginally related at best in each pair (average pair-wise sequence identity = 13%). The assessment showed that the composite alignment procedure predicted more aligned residues pairs with an average of 27% increase in correctly aligned residues over the standard PSI-BLAST alignments for the protein pairs in the testing set.  相似文献   

11.
The programs described herein function as part of a suite ofprograms designed for pairwise alignment, multiple alignment,generation of randomized sequences, production of alignmentscores and a sorting routine for analysis of the alignmentsproduced. The sequence alignment programs penalize gaps (absencesof residues) within regions of protein secondary structure andhave the added option of ‘fingerprinting’ structurallyor functionally important protein residues. The multiple alignmentprogram is based upon the sequence alignment method of Needlemanand Wunsch and the multiple alignment extension of Barton andSternberg. Our application includes the feature of optionallyweighting active site, monomer-monomer, ligand contact or otherimportant template residues to bias the alignment toward matchingthese residues. A sum-score for the alignments is introduced,which is independent of gap penalties. This score more adequatelyreflects the character of the alignments for a given scoringmatrix than the gap-penalty-dependent total score describedpreviously in the literature. In addition, individual aminoacid similarity scores at each residue position in the alignmentsare printed with the alignment output to enable immediate quantitativeassessment of homology at key sections of the aligned chains.  相似文献   

12.
Sequence alignment has been an invaluable tool for finding homologous sequences. The significance of the homology found is often quantified statistically by p-values. Theory for computing p-values exists for gapless alignments [Karlin, S., Altschul, S.F., 1990. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA 87, 2264–2268; Karlin, S., Dembo A., 1992. Limit distributions of maximal segmental score among Markov-dependent partial sums. Adv. Appl. Probab. 24, 13–140], but a full generalization to alignments with gaps is not yet complete. We present a unified statistical analysis of two common sequence comparison algorithms: maximum-score (Smith-Waterman) alignments and their generalized probabilistic counterparts, including maximum-likelihood alignments and hidden Markov models. The most important statistical characteristic of these algorithms is the distribution function of the maximum score S max, resp. the maximum free energy F max, for mutually uncorrelated random sequences. This distribution is known empirically to be of the Gumbel form with an exponential tail P(S max > x) ∼ exp(−λx) for maximum-score alignment and P(F max > x) ∼ exp(−λx) for some classes of probabilistic alignment. We derive an exact expression for λ for particular probabilistic alignments. This result is then used to obtain accurate λ values for generic probabilistic and maximum-score alignments. Although the result demonstrated uses a simple match-mismatch scoring system, it is expected to be a good starting point for more general scoring functions.  相似文献   

13.
Efficient methods for multiple sequence alignment with guaranteed error bounds   总被引:11,自引:0,他引:11  
Multiple string (sequence) alignment is a difficult and important problem in computational biology, where it is central in two related tasks: finding highly conserved subregions or embedded patterns of a set of biological sequences (strings of DNA, RNA or amino acids), and inferring the evolutionary history of a set of taxa from their associated biological sequences. Several precise measures have been proposed for evaluating the goodness of a multiple alignment, but no efficient methods are known which compute the optimal alignment for any of these measures in any but small cases. In this paper, we consider two previously proposed measures, and given two computationaly efficient multiple alignment methods (one for each measure) whose deviation from the optimal value isguaranteed to be less than a factor of two. This is the novel feature of these methods, but the methods have additional virtues as well. For both methods, the guaranteed bounds are much smaller than two when the number of strings is small (1.33 for three strings of any length); for one of the methods we give a related randomized method which is much faster and which gives, with high probability, multiple alignments with fairly small error bounds; and for the other measure, the method given yields a non-obviouslower bound on the value of the optimal alignment.  相似文献   

14.

Background  

In this paper, we introduce a progressive corner cutting method called Reticular Alignment for multiple sequence alignment. Unlike previous corner-cutting methods, our approach does not define a compact part of the dynamic programming table. Instead, it defines a set of optimal and suboptimal alignments at each step during the progressive alignment. The set of alignments are represented with a network to store them and use them during the progressive alignment in an efficient way. The program contains a threshold parameter on which the size of the network depends. The larger the threshold parameter and thus the network, the deeper the search in the alignment space for better scored alignments.  相似文献   

15.
In previous work, we have shown that a set of characteristics,defined as (code frequency) pairs, can be derived from a proteinfamily by the use of a signal-processing method. This methodenables the location and extraction of sequence patterns bytaking into account each (code frequency) pair individually.In the present paper, we propose to extend this method in orderto detect and visualize patterns by taking into account severalpairs simultaneously. Two ‘multifrequency’ methodsare described. The first one is based on a rewriting of thesequences with new symbols which summarize the frequency information.The second method is based on a clustering of the patterns associatedwith each pair. Both methods lead to the definition of significantconsensus sequences. Some results obtained with calcium-bindingproteins and serine proteases are also discussed. Received on March 6, 1990; accepted on September 24, 1990  相似文献   

16.
When two sequences are aligned with a single set of alignment parameters, or when mutation parameters are estimated on the basis of a single ``optimal' sequence alignment, the variability of both the alignment and the estimated parameters can be seriously underestimated. To obtain a more realistic impression of the actual uncertainty, we propose sampling sequence alignments and mutation parameters simultaneously from their joint posterior distribution given the two original sequences. We illustrate our method with human and orangutan sequences from the hyper variable region I and with gene–pseudogene pairs. Received: 16 November 2000 / Accepted: 15 May 2001  相似文献   

17.
Pairwise optimal alignments between three or more sequences are not necessarily consistent as a whole, but consistent and inconsistent residues are usually distributed in clusters. An efficient method has been developed for locating consistent regions when each pairwise alignment is given in the form of a “skeletal representation” (Bull. math. Biol. 52, 359–373). This method is further extended so that the combination of pairwise alignments that gives the greatest consistency is found when possibly many alignments are equally optimal for each pairwise comparison. A method for acceleration of simultaneous multiple sequence alignment is proposed in which consistent regions serve as “anchor points” limiting application of direct multi-way alignment to the rest of “inconsistent” regions. Dedicated to Prof. Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

18.
Summary Various measures of sequence dissimilarity have been evaluated by how well the additive least squares estimation of edges (branch lengths) of an unrooted evolutionary tree fit the observed pairwise dissimilarity measures and by how consistent the trees are for different data sets derived from the same set of sequences. This evaluation provided sensitive discrimination among dissimilarity measures and among possible trees. Dissimilarity measures not requiring prior sequence alignment did about as well as did the traditional mismatch counts requiring prior sequence alignment. Application of Jukes-Cantor correction to singlet mismatch counts worsened the results. Measures not requiring alignment had the advantage of being applicable to sequences too different to be critically alignable. Two different measures of pairwise dissimilarity not requiring alignment have been used: (1) multiplet distribution distance (MDD), the square of the Euclidean distance between vectors of the fractions of base singlets (or doublets, or triplets, or…) in the respective sequences, and (2) complements of long words (CLW), the count of bases not occurring in significantly long common words. MDD was applicable to sequences more different than was CLW (noncoding), but the latter often gave better results where both measures were available (coding). MDD results were improved by using longer multiplets and, if the sequences were coding, by using the larger amino acid and codon alphabets rather than the nucleotide alphabet. The additive least squares method could be used to provide a reasonable consensus of different trees for the same set of species (or related genes).  相似文献   

19.
The technique of single-particle electron cryomicroscopy is currently making possible the 3D structure determination of large macromolecular complexes at constantly increasing levels of resolution. Work at resolution now attainable requires many thousands of individual images to be processed computationally. The most time-consuming step of the image-processing procedure is usually the iterative alignment of individual particle images against a set of reference images derived from a preliminary 3-D structure. We have developed an improved multireference alignment procedure based on interpolated cross-correlation images (corrims) that results in an approximately 8-fold acceleration of the iterative alignment steps. These corrims can be used to restrict the number of image-alignment calculations by narrowing down the set of reference images. Another improvement in alignment speed has been achieved by optimising the software and its implementation on many parallel processors. This new corrim-based refinement has been found to work well with two different alignment algorithms, the commonly used "fast alignment by separate translational/rotational searches" and "exhaustive alignment by polar coordinates."  相似文献   

20.
Automated generation of heuristics for biological sequence comparison   总被引:1,自引:0,他引:1  

Background  

Exhaustive methods of sequence alignment are accurate but slow, whereas heuristic approaches run quickly, but their complexity makes them more difficult to implement. We introduce bounded sparse dynamic programming (BSDP) to allow rapid approximation to exhaustive alignment. This is used within a framework whereby the alignment algorithms are described in terms of their underlying model, to allow automated development of efficient heuristic implementations which may be applied to a general set of sequence comparison problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号