共查询到20条相似文献,搜索用时 0 毫秒
1.
Ribonucleic Acid (RNA) structures can be viewed as a special kind of strings where characters in a string can bond with each other. The question of aligning two RNA structures has been studied for a while, and there are several successful algorithms that are based upon different models. In this paper, by adopting the model introduced in Wang and Zhang,(19) we propose two algorithms to attack the question of aligning multiple RNA structures. Our methods are to reduce the multiple RNA structure alignment problem to the problem of aligning two RNA structure alignments. Meanwhile, we will show that the framework of sequence center star alignment algorithm can be applied to the problem of multiple RNA structure alignment, and if the triangle inequality is met in the scoring matrix, the approximation ratio of the algorithm remains to be 2-2(over)n, where n is the total number of structures. 相似文献
2.
Multiple flexible structure alignment using partial order graphs 总被引:2,自引:0,他引:2
MOTIVATION: Existing comparisons of protein structures are not able to describe structural divergence and flexibility in the structures being compared because they focus on identifying a common invariant core and ignore parts of the structures outside this core. Understanding the structural divergence and flexibility is critical for studying the evolution of functions and specificities of proteins. RESULTS: A new method of multiple protein structure alignment, POSA (Partial Order Structure Alignment), was developed using a partial order graph representation of multiple alignments. POSA has two unique features: (1) identifies and classifies regions that are conserved only in a subset of input structures and (2) allows internal rearrangements in protein structures. POSA outperforms other programs in the cases where structural flexibilities exist and provides new insights by visualizing the mosaic nature of multiple structural alignments. POSA is an ideal tool for studying the variation of protein structures within diverse structural families. AVAILABILITY: POSA is freely available for academic users on a Web server at http://fatcat.burnham.org/POSA 相似文献
3.
Background
An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins. 相似文献4.
Multiple sequence alignment 总被引:13,自引:0,他引:13
A method has been developed for aligning segments of several sequences at once. The number of search steps depends only polynomially on the number of sequences, instead of exponentially, because most alignments are rejected without being evaluated explicitly. A data structure herein called the "heap" facilitates this process. For a set of n sequence segments, the overall similarity is taken to be the sum of all the constituent segment pair similarities, which are in turn sums of corresponding residue similarity scores from a Table. The statistical models that test alignments for significance make it possible to group sequences objectively, even when most or all of the interrelationships are weak. These tests are very sensitive, while remaining quite conservative, and discourage the addition of "misfit" sequences to an existing set. The new techniques are applied to a set of five DNA-binding proteins, to a group of three enzymes that employ the coenzyme FAD, and to a control set. The alignment previously proposed for the DNA-binding proteins on the basis of structural comparisons and inspection of sequences is supported quite dramatically, and a highly significant alignment is found for the FAD-binding proteins. 相似文献
5.
Morgenstern B Prohaska SJ Pöhler D Stadler PF 《Algorithms for molecular biology : AMB》2006,1(1):6-12
Background
Automated software tools for multiple alignment often fail to produce biologically meaningful results. In such situations, expert knowledge can help to improve the quality of alignments. 相似文献6.
Pei J 《Current opinion in structural biology》2008,18(3):382-386
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality. 相似文献
7.
Morgenstern B Werner N Prohaska SJ Steinkamp R Schneider I Subramanian AR Stadler PF Weyer-Menkhoff J 《Bioinformatics (Oxford, England)》2005,21(7):1271-1273
Most multi-alignment methods are fully automated, i.e. they are based on a fixed set of mathematical rules. For various reasons, such methods may fail to produce biologically meaningful alignments. Herein, we describe a semi-automatic approach to multiple sequence alignment where biological expert knowledge can be used to influence the alignment procedure. The user can specify parts of the sequences that are biologically related to each other; our software program uses these sites as anchor points and creates a multiple alignment respecting these user-defined constraints. By using known functionally, structurally or evolutionarily related positions of the input sequences as anchor points, our method can produce alignments that reflect the true biological relationships among the input sequences more accurately than fully automated procedures can do. 相似文献
8.
MOTIVATION: Multiple sequence alignment is a fundamental task in bioinformatics. Current tools typically form an initial alignment by merging subalignments, and then polish this alignment by repeated splitting and merging of subalignments to obtain an improved final alignment. In general this form-and-polish strategy consists of several stages, and a profusion of methods have been tried at every stage. We carefully investigate: (1) how to utilize a new algorithm for aligning alignments that optimally solves the common subproblem of merging subalignments, and (2) what is the best choice of method for each stage to obtain the highest quality alignment. RESULTS: We study six stages in the form-and-polish strategy for multiple alignment: parameter choice, distance estimation, merge-tree construction, sequence-pair weighting, alignment merging, and polishing. For each stage, we consider novel approaches as well as standard ones. Interestingly, the greatest gains in alignment quality come from (i) estimating distances by a new approach using normalized alignment costs, and (ii) polishing by a new approach using 3-cuts. Experiments with a parameter-value oracle suggest large gains in quality may be possible through an input-dependent choice of alignment parameters, and we present a promising approach for building such an oracle. Combining the best approaches to each stage yields a new tool we call Opal that on benchmark alignments matches the quality of the top tools, without employing alignment consistency or hydrophobic gap penalties. AVAILABILITY: Opal, a multiple alignment tool that implements the best methods in our study, is freely available at http://opal.cs.arizona.edu. 相似文献
9.
Protein structure alignment 总被引:22,自引:0,他引:22
A new method of comparing protein structures is described, based on distance plot analysis. It is relatively insensitive to insertions and deletions in sequence and is tolerant of the displacement of equivalent substructures between the two molecules being compared. When presented with the co-ordinate sets of two structures, the method will produce automatically an alignment of their sequences based on structural criteria. The method uses the dynamic programming optimization technique, which is widely used in the comparison of protein sequences and thus unifies the techniques of protein structure and sequence comparison. Typical structure comparison problems were examined and the results of the new method compared to the published results obtained using conventional methods. In most examples, the new method produced a result that was equivalent, and in some cases superior, to those reported in the literature. 相似文献
10.
MOTIVATION: We introduce a novel approach to multiple alignment that is based on an algorithm for rapidly checking whether single matches are consistent with a partial multiple alignment. This leads to a sequence annealing algorithm, which is an incremental method for building multiple sequence alignments one match at a time. Our approach improves significantly on the standard progressive alignment approach to multiple alignment. RESULTS: The sequence annealing algorithm performs well on benchmark test sets of protein sequences. It is not only sensitive, but also specific, drastically reducing the number of incorrectly aligned residues in comparison to other programs. The method allows for adjustment of the sensitivity/specificity tradeoff and can be used to reliably identify homologous regions among protein sequences. AVAILABILITY: An implementation of the sequence annealing algorithm is available at http://bio.math.berkeley.edu/amap/ 相似文献
11.
Ikuo Uchiyama 《BMC genomics》2008,9(1):1-22
Background
LEA (late embryogenesis abundant) proteins have first been described about 25 years ago as accumulating late in plant seed development. They were later found in vegetative plant tissues following environmental stress and also in desiccation tolerant bacteria and invertebrates. Although they are widely assumed to play crucial roles in cellular dehydration tolerance, their physiological and biochemical functions are largely unknown.Results
We present a genome-wide analysis of LEA proteins and their encoding genes in Arabidopsis thaliana. We identified 51 LEA protein encoding genes in the Arabidopsis genome that could be classified into nine distinct groups. Expression studies were performed on all genes at different developmental stages, in different plant organs and under different stress and hormone treatments using quantitative RT-PCR. We found evidence of expression for all 51 genes. There was only little overlap between genes expressed in vegetative tissues and in seeds and expression levels were generally higher in seeds. Most genes encoding LEA proteins had abscisic acid response (ABRE) and/or low temperature response (LTRE) elements in their promoters and many genes containing the respective promoter elements were induced by abscisic acid, cold or drought. We also found that 33% of all Arabidopsis LEA protein encoding genes are arranged in tandem repeats and that 43% are part of homeologous pairs. The majority of LEA proteins were predicted to be highly hydrophilic and natively unstructured, but some were predicted to be folded.Conclusion
The analyses indicate a wide range of sequence diversity, intracellular localizations, and expression patterns. The high fraction of retained duplicate genes and the inferred functional diversification indicate that they confer an evolutionary advantage for an organism under varying stressful environmental conditions. This comprehensive analysis will be an important starting point for future efforts to elucidate the functional role of these enigmatic proteins. 相似文献12.
Multiple sequence alignment with the Clustal series of programs 总被引:2,自引:0,他引:2
Chenna R Sugawara H Koike T Lopez R Gibson TJ Higgins DG Thompson JD 《Nucleic acids research》2003,31(13):3497-3500
The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/). 相似文献
13.
Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture. 相似文献
14.
Multiple sequence alignment is discussed in light of homology assessments in phylogenetic research. Pairwise and multiple alignment methods are reviewed as exact and heuristic procedures. Since the object of alignment is to create the most efficient statement of initial homology, methods that minimize nonhomology are to be favored. Therefore, among all possible alignments, the one that satisfies the phylogenetic optimality criterion the best should be considered the best alignment. Since all homology statements are subject to testing and explanation this way, consistency of optimality criteria is desirable. This consistency is based on the treatment of alignment gaps as character information and the consistent use of a cost function (e.g., insertion-deletion, transversion, and transition) through analysis from alignment to phylogeny reconstruction. Cost functions are not subject to testing via inspection; hence the assumptions they make should be examined by varying the assumed values in a sensitivity analysis context to test for the robustness of results. Agreement among data may be used to choose an optimal solution set from all of those examined through parameter variation. This idea of consistency between assumption and analysis through alignment and cladogram reconstruction is not limited to parsimony analysis and could and should be applied to other forms of analysis such as maximum likelihood. 相似文献
15.
M S Waterman 《Nucleic acids research》1986,14(22):9095-9102
An algorithm for multiple sequence alignment is given that matches words of length and degree of mismatch chosen by the user. The alignment maximizes an alignment scoring function. The method is based on a novel extension of our consensus sequence methods. The algorithm works for both DNA and protein sequences, and from earlier work on consensus sequences, it is possible to estimate statistical significance. 相似文献
16.
Structural alignment is useful in identifying members of ncRNAs. Existing tools are all based on the secondary structures of the molecules. There is evidence showing that tertiary interactions (the interaction between a single-stranded nucleotide and a base-pair) in triple helix structures are critical in some functions of ncRNAs. In this article, we address the problem of structural alignment of RNAs with the triple helix. We provide a formal definition to capture a simplified model of a triple helix structure, then develop an algorithm of O(mn(3)) time to align a query sequence (of length m) with known triple helix structure with a target sequence (of length n) with an unknown structure. The resulting algorithm is shown to be useful in identifying ncRNA members in a simulated genome. 相似文献
17.
18.
Multiple sequence alignment using partial order graphs 总被引:14,自引:0,他引:14
MOTIVATION: Progressive Multiple Sequence Alignment (MSA) methods depend on reducing an MSA to a linear profile for each alignment step. However, this leads to loss of information needed for accurate alignment, and gap scoring artifacts. RESULTS: We present a graph representation of an MSA that can itself be aligned directly by pairwise dynamic programming, eliminating the need to reduce the MSA to a profile. This enables our algorithm (Partial Order Alignment (POA)) to guarantee that the optimal alignment of each new sequence versus each sequence in the MSA will be considered. Moreover, this algorithm introduces a new edit operator, homologous recombination, important for multidomain sequences. The algorithm has improved speed (linear time complexity) over existing MSA algorithms, enabling construction of massive and complex alignments (e.g. an alignment of 5000 sequences in 4 h on a Pentium II). We demonstrate the utility of this algorithm on a family of multidomain SH2 proteins, and on EST assemblies containing alternative splicing and polymorphism. AVAILABILITY: The partial order alignment program POA is available at http://www.bioinformatics.ucla.edu/poa. 相似文献
19.
Multiple sequence alignment by a pairwise algorithm 总被引:1,自引:0,他引:1
An algorithm is described that processes the results of a conventionalpairwise sequence alignment program to automatically producean unambiguous multiple alignment of many sequences. Unlikeother, more complex, multiple alignment programs, the methoddescribed here is fast enough to be used on almost any multiplesequence alignment problem.
Received on September 25, 1986; accepted on January 29, 1987 相似文献
20.
Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction. 相似文献