首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The algorithm of Waterman et al. (1976) for matching biological sequences was modified under some limitations to be accomplished in essentially MN steps, instead of the M2N steps necessary in the original algorithm. The limitations do not seriously reduce the generality of the original method, and the present method is available for most practical uses. The algorithm can be executed on a small computer with a limited capacity of core memory.  相似文献   

2.

Background  

We present a complete re-implementation of the segment-based approach to multiple protein alignment that contains a number of improvements compared to the previous version 2.2 of DIALIGN. This previous version is superior to Needleman-Wunsch-based multi-alignment programs on locally related sequence sets. However, it is often outperformed by these methods on data sets with global but weak similarity at the primary-sequence level.  相似文献   

3.
The principle of heterotachy states that the substitution rate of sites in a gene can change through time. In this article, we propose a powerful statistical test to detect sites that evolve according to the process of heterotachy. We apply this test to an alignment of 1289 eukaryotic rRNA molecules to 1) determine how widespread the phenomenon of heterotachy is in ribosomal RNA, 2) to test whether these heterotachous sites are nonrandomly distributed, that is, linked to secondary structure features of ribosomal RNA, and 3) to determine the impact of heterotachous sites on the bootstrap support of monophyletic groupings. Our study revealed that with 21 monophyletic taxa, approximately two-thirds of the sites in the considered set of sequences is heterotachous. Although the detected heterotachous sites do not appear bound to specific structural features of the small subunit rRNA, their presence is shown to have a large beneficial influence on the bootstrap support of monophyletic groups. Using extensive testing, we show that this may not be due to heterotachy itself but merely due to the increased substitution rate at the detected heterotachous sites.  相似文献   

4.
We present an efficient algorithm for statistical multiple alignment based on the TKF91 model of Thorne, Kishino, and Felsenstein (1991) on an arbitrary k-leaved phylogenetic tree. The existing algorithms use a hidden Markov model approach, which requires at least O( radical 5(k)) states and leads to a time complexity of O(5(k)L(k)), where L is the geometric mean sequence length. Using a combinatorial technique reminiscent of inclusion/exclusion, we are able to sum away the states, thus improving the time complexity to O(2(k)L(k)) and considerably reducing memory requirements. This makes statistical multiple alignment under the TKF91 model a definite practical possibility in the case of a phylogenetic tree with a modest number of leaves.  相似文献   

5.
Sequence alignment is a standard method to infer evolutionary, structural, and functional relationships among sequences. The quality of alignments depends on the substitution matrix used. Here we derive matrices based on superimpositions from protein pairs of similar structure, but of low or no sequence similarity. In a performance test the matrices are compared with 12 other previously published matrices. It is found that the structure-derived matrices are applicable for comparisons of distantly related sequences. We investigate the influence of evolutionary relationships of protein pairs on the alignment accuracy.  相似文献   

6.
We introduce a new approach to investigate problem of DNA sequence alignment. The method consists of three parts: (i) simple alignment algorithm, (ii) extension algorithm for largest common substring, (iii) graphical simple alignment tree (GSA tree). The approach firstly obtains a graphical representation of scores of DNA sequences by the scoring equation R0*RS0*ST0*(a+bk). Then a GSA tree is constructed to facilitate solving the problem for global alignment of 2 DNA sequences. Finally we give several practical examples to illustrate the utility and practicality of the approach.  相似文献   

7.
MOTIVATION: Multiple alignment of highly divergent sequences is a challenging problem for which available programs tend to show poor performance. Generally, this is due to a scoring function that does not describe biological reality accurately enough or a heuristic that cannot explore solution space efficiently enough. In this respect, we present a new program, Align-m, that uses a non-progressive local approach to guide a global alignment. RESULTS: Two large test sets were used that represent the entire SCOP classification and cover sequence similarities between 0 and 50% identity. Performance was compared with the publicly available algorithms ClustalW, T-Coffee and DiAlign. In general, Align-m has comparable or slightly higher accuracy in terms of correctly aligned residues, especially for distantly related sequences. Importantly, it aligns much fewer residues incorrectly, with average differences of over 15% compared with some of the other algorithms. AVAILABILITY: Align-m and the test sets are available at http://bioinformatics.vub.ac.be  相似文献   

8.
9.
SUMMARY: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the 'twilight zone', i.e. sharing < 30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence.  相似文献   

10.
An algorithm for linear metabolic pathway alignment   总被引:1,自引:0,他引:1  
Metabolic pathway alignment represents one of the most powerful tools for comparative analysis of metabolism. It involves recognition of metabolites common to a set of functionally-related metabolic pathways, interpretation of biological evolution processes and determination of alternative metabolic pathways. Moreover, it is of assistance in function prediction and metabolism modeling. Although research on genomic sequence alignment is extensive, the problem of aligning metabolic pathways has received less attention. We are motivated to develop an algorithm of metabolic pathway alignment to reveal the similarities between metabolic pathways. A new definition of the metabolic pathway is introduced. The algorithm has been implemented into the PathAligner system; its web-based interface is available at http://bibiserv.techfak.uni-bielefeld.de/pathaligner/.  相似文献   

11.
Weights for data related by a tree   总被引:8,自引:0,他引:8  
How can one characterize a set of data collected from different biological species, or indeed any set of data related by an evolutionary tree? The structure imposed by the tree implies that the data are not independent, and for most applications this should be taken into account. We describe strategies for weighting the data that circumvent some of the problems of dependency.  相似文献   

12.
Summary A phylogenetic tree was constructed from 245 globin amino acid sequences. Of the six plant globins, five represented the Leguminosae and one the Ulmaceae. Among the invertebrate sequences, 7 represented the phylum Annelida, 13 represented Insecta and Crustacea of the phylum Arthropoda, and 6 represented the phylum Mollusca. Of the vertebrate globins, 4 represented the Agnatha and 209 represented the Gnathostomata. A common alignment was achieved for the 245 sequences using the parsimony principle, and a matrix of minimum mutational distances was constructed. The most parsimonious phylogenetic tree, i.e., the one having the lowest number of nucleotide substitutions that cause amino acid replacements, was obtained employing clustering and branch-swapping algorithms. Based on the available fossil record, the earliest split in the ancestral metazoan lineage was placed at 680 million years before present (Myr BP), the origin of vertebrates was placed at 510 Myr BP, and the separation of the Chondrichthyes and the Osteichthyes was placed at 425 Myr BP. Local molecular clock calculations were used to date the branch points on the descending branches of the various lineages within the plant and invertebrate portions of the tree. The tree divided the 245 sequences into five distinct clades that corresponded exactly to the five groups plants, annelids, arthropods, molluscs, and vertebrates. Furthermore, the maximum parsimony tree, in contrast to the unweighted pair group and distance Wagner trees, was consistent with the available fossil record and supported the hypotheses that the primitive hemoglobin of metazoans was monomeric and that the multisubunit extracellular hemoglobins found among the Annelida and the Arthropoda represent independently derived states.  相似文献   

13.
An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X ("motifs") interspersed with one another. NTRs fall into a class of repetitive structures broadly known as subrepeats. NTRs have been found in real DNA sequences and are expected to be important in evolutionary biology, both in understanding evolution of the ribosomal DNA (where NTRs can occur), and as a potential marker in population genetic and phylogenetic studies. This article describes an alignment algorithm for the verification phase of the software tool NTRFinder developed for database searches for NTRs. When the search algorithm has located a subsequence containing a possible NTR, with motifs X and x, a verification step aligns this subsequence against an exact NTR built from the templates X and x, to determine whether the subsequence contains an approximate NTR and its extent. This article describes an algorithm to solve this alignment problem in O(|T|(|X| + |x|)) space and time. The algorithm is based on Fischetti et al.'s wrap-around dynamic programming.  相似文献   

14.
An evolutionary model for maximum likelihood alignment of DNA sequences   总被引:16,自引:0,他引:16  
Summary Most algorithms for the alignment of biological sequences are not derived from an evolutionary model. Consequently, these alignment algorithms lack a strong statistical basis. A maximum likelihood method for the alignment of two DNA sequences is presented. This method is based upon a statistical model of DNA sequence evolution for which we have obtained explicit transition probabilities. The evolutionary model can also be used as the basis of procedures that estimate the evolutionary parameters relevant to a pair of unaligned DNA sequences. A parameter-estimation approach which takes into account all possible alignments between two sequences is introduced; the danger of estimating evolutionary parameters from a single alignment is discussed.  相似文献   

15.
Multiple sequence alignment by a pairwise algorithm   总被引:1,自引:0,他引:1  
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem. Received on September 25, 1986; accepted on January 29, 1987  相似文献   

16.
Han Si  Lee SG  Kim KH  Choi CJ  Kim YH  Hwang KS 《Bio Systems》2006,84(3):175-182
Most multiple gene sequence alignment methods rely on conventions regarding the score of a multiple alignment in pairwise fashion. Therefore, as the number of sequences increases, the runtime of sequencing expands exponentially. In order to solve the problem, this paper presents a multiple sequence alignment method using a linear-time suffix tree algorithm to cluster similar sequences at one time without pairwise alignment. After searching for common subsequences, cross-matching common subsequences were generated, and sometimes inexact matching was found. So, a procedure aimed at masking the inexact cross-matching pairs was suggested here. In addition, BLAST was combined with a clustering tool in order to annotate the clusters generated by suffix tree clustering. The proposed method for clustering and annotating genes consists of the following steps: (1) construction of a suffix tree; (2) searching and overlapping common subsequences; (3) grouping subsequence pairs; (4) masking cross-matching pairs; (5) clustering gene sequences; (6) annotating gene clusters by the BLAST search. The performance of the proposed system, CLAGen, was successfully evaluated with 42 gene sequences in a TCA cycle (a citrate cycle) of bacteria. The system generated 11 clusters and found the longest subsequences of each cluster, which are biologically significant.  相似文献   

17.
A method for generating protein backbone models from backbone only NMR data is presented, which is based on molecular fragment replacement (MFR). In a first step, the PDB database is mined for homologous peptide fragments using experimental backbone-only data i.e. backbone chemical shifts (CS) and residual dipolar couplings (RDC). Second, this fragment library is refined against the experimental restraints. Finally, the fragments are assembled into a protein backbone fold using a rigid body docking algorithm using the RDCs as restraints. For improved performance, backbone nuclear Overhauser effects (NOEs) may be included at that stage. Compared to previous implementations of MFR-derived structure determination protocols this model-building algorithm offers improved stability and reliability. Furthermore, relative to CS-ROSETTA based methods, it provides faster performance and straightforward implementation with the option to easily include further types of restraints and additional energy terms.  相似文献   

18.
An algorithm for selection of functional siRNA sequences   总被引:33,自引:0,他引:33  
Randomly designed siRNA targeting different positions within the same mRNA display widely differing activities. We have performed a statistical analysis of 46 siRNA, identifying various features of the 19bp duplex that correlate significantly with functionality at the 70% knockdown level and verified these results against an independent data set of 34 siRNA recently reported by others. Features that consistently correlated positively with functionality across the two data sets included an asymmetry in the stability of the duplex ends (measured as the A/U differential of the three terminal basepairs at either end of the duplex) and the motifs S1, A6, and W19. The presence of the motifs U1 or G19 was associated with lack of functionality. A selection algorithm based on these findings strongly differentiated between the two functional groups of siRNA in both data sets and proved highly effective when used to design siRNA targeting new endogenous human genes.  相似文献   

19.
An algorithm was developed to compare simultaneously severalDNA, RNA or protein sequences. With the algorithm, conservedregions of one sequence are located by doing pairwise comparisonswith other sequences, which is advantageous in planning site-directedmutagenesis studies. The observation matrices filled with scoresof comparisons are superimposed and added together and thosepoints having values greater than or equal to stringency areaccepted. The predicted secondary structural features can alsobe compared. Received on August 21, 1987; accepted on November 20, 1987  相似文献   

20.
MOTIVATION: This paper is concerned with algorithms for aligning two whole genomes so as to identify regions that possibly contain conserved genes. Motivated by existing heuristic-based software tools, we initiate the study of an optimization problem that attempts to uncover conserved genes with a global concern. Another interesting feature in our formulation is the tolerance of noise, which also complicates the optimization problem. A brute-force approach takes time exponential in the noise level. RESULTS: We show how an insight into the optimization structure can lead to a drastic improvement in the time and space requirement [precisely, to O(k2n2) and O(k2n), respectively, where n is the size of the input and k is the noise level]. The reduced space requirement allows us to implement the new algorithm, called MaxMinCluster, on a PC. It is exciting to see that when tested with different real data sets, MaxMinCluster consistently uncovers a high percentage of conserved genes that have been published by GenBank. Its performance is indeed favorably compared to MUMmer (perhaps the most popular software tool for uncovering conserved genes in a whole-genome scale). AVAILABILITY: The source code is available from the website http://www.csis.hku.hk/~colly/maxmincluster/ detailed proof of the propositions can also be found there.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号