首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Consensus sequence generation is important in many kinds of sequence analysis ranging from sequence assembly to profile-based iterative search methods. However, how can a consensus be constructed when its inherent assumption-that the aligned sequences form a single linear consensus-is not true? RESULTS: Partial Order Alignment (POA) enables construction and analysis of multiple sequence alignments as directed acyclic graphs containing complex branching structure. Here we present a dynamic programming algorithm (heaviest_bundle) for generating multiple consensus sequences from such complex alignments. The number and relationships of these consensus sequences reveals the degree of structural complexity of the source alignment. This is a powerful and general approach for analyzing and visualizing complex alignment structures, and can be applied to any alignment. We illustrate its value for analyzing expressed sequence alignments to detect alternative splicing, reconstruct full length mRNA isoform sequences from EST fragments, and separate paralog mixtures that can cause incorrect SNP predictions. AVAILABILITY: The heaviest_bundle source code is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

2.
A multiple alignment program for protein sequences   总被引:1,自引:0,他引:1  
A program for the multiple alignment of protein sequences ispresented. The program is an extension of the fast alignmentprogram by Wilbur et al. (1984) into higher dimensions. Theuse of hash procedures on fragments of the protein sequencesincreases the speed of calculation. Thereby we also take intoaccount fragments which are present in some, but not in all,sequences considered. The results of some multiple alignmentsare given. Received on September 11, 1986; accepted on March 18, 1987  相似文献   

3.
A flexible multiple sequence alignment program   总被引:12,自引:3,他引:12       下载免费PDF全文
The 'regions' method for multisequence alignment used in the previously reported program MALIGN has been generalized to include recursive refinement so that unaligned portions between two regions at the current level of resolution can be handled with increased resolution. Additionally, there is incorporated a limiting of the number of regions to be used at any level of resolution from which to abstract an alignment. This provides a significant increase in speed over the unlimited version. The program GENALIGN uses this improved regions method to execute fast pairwise alignments in the framework of Taylor's multisequence alignment procedure using clustered pairwise alignments. Pairwise alignments by dynamic programming are also provided in the program.  相似文献   

4.
A multiple sequence alignment program.   总被引:16,自引:7,他引:16       下载免费PDF全文
A program is described for simultaneously aligning two or more molecular sequences which is based on first finding common segments above a specified length and then piecing these together to maximize an alignment scoring function. Optimal as well as near-optimal alignments are found, and there is also provided a means for randomizing the given sequences for testing the statistical significance of an alignment. Alignments may be made in the original alphabets of the sequences or in user-specified alternate ones to take advantage of chemical similarities (such as hydrophobic-hydrophilic).  相似文献   

5.
SUMMARY: We have developed Look-Align, an interactive web-based viewer to display pre-computed multiple sequence alignments. Although initially developed to support the visualization needs of the maize diversity website Panzea (http://www.panzea.org), the viewer is a generic stand-alone tool that can be easily integrated into other websites. AVAILABILITY: Look-Align is written in Perl using open-source components and is available under an open-source license. Live installation and download information can be found at the Panzea website (http://www.panzea.org/software/alignment_viewer.html). CONTACT: ware@cshl.edu SUPPLEMENTARY INFORMATION: The Supplementary information includes sample lists of multiple sequence alignment software and sample screenshots of the viewer.  相似文献   

6.

Background  

Alignment and comparison of related genome sequences is a powerful method to identify regions likely to contain functional elements. Such analyses are data intensive, requiring the inclusion of genomic multiple sequence alignments, sequence annotations, and scores describing regional attributes of columns in the alignment. Visualization and browsing of results can be difficult, and there are currently limited software options for performing this task.  相似文献   

7.
A novel interactive method for generating multiple protein sequencealignments is described. The program has no internal limit tothe number or length of sequences it can handle and is designedfor use with DEC VAX processors running the VMS operating system.The approach used is essentially one of manual sequence manipulation,aided by built-in symbolic displays of identities and similarities,and strict and ‘fuzzy’ (ambiguous) pattern-matchingfacilities. Additional flexibility is provided by means of aninterface to a publicly available automatic alignment systemand to a comprehensive sequence analysis package. Received on August 28, 1990; accepted on November 20, 1990  相似文献   

8.
Four algorithms, A–D, were developed to align two groupsof biological sequences. Algorithm A is equivalent to the conventionaldynamic programming method widely used for aligning ordinarysequences, whereas algorithms B – D are designed to evaluatethe cost for a deletion/insertion more accurately when internalgaps are present in either or both groups of sequences. Rigorousoptimization of the ‘sum of pairs’ (SP) score isachieved by algorithm D, whose average performance is closeto O(MNL2) where M and N are numbers of sequences included inthe two groups and L is the mean length of the sequences. AlgorithmB uses some app mximations to cope with profile-based operations,whereas algorithm C is a simpler variant of algorithm D. Thesegroup-to-group alignment algorithms were applied to multiplesequence alignment with two iterative strategies: a progressivemethod based on a given binary tree and a randomized grouping-realignmentmethod. The advantages and disadvantages of the four algorithmsare discussed on the basis of the results of exatninations ofseveral protein families.  相似文献   

9.
Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of ‘meta-methods’ that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.  相似文献   

10.
Recent developments in the MAFFT multiple sequence alignment program   总被引:4,自引:0,他引:4  
The accuracy and scalability of multiple sequence alignment (MSA) of DNAs and proteins have long been and are still important issues in bioinformatics. To rapidly construct a reasonable MSA, we developed the initial version of the MAFFT program in 2002. MSA software is now facing greater challenges in both scalability and accuracy than those of 5 years ago. As increasing amounts of sequence data are being generated by large-scale sequencing projects, scalability is now critical in many situations. The requirement of accuracy has also entered a new stage since the discovery of functional noncoding RNAs (ncRNAs); the secondary structure should be considered for constructing a high-quality alignment of distantly related ncRNAs. To deal with these problems, in 2007, we updated MAFFT to Version 6 with two new techniques: the PartTree algorithm and the Four-way consistency objective function. The former improved the scalability of progressive alignment and the latter improved the accuracy of ncRNA alignment. We review these and other techniques that MAFFT uses and suggest possible future directions of MSA software as a basis of comparative analyses. MAFFT is available at http://align.bmr.kyushu-u.ac.jp/mafft/software/.  相似文献   

11.
Although molecular biologists often calculate consensus sequencesfrom aligned DNA or protein sequences, relatively little isknown about the properties of many of the consensus methodsbeing used. Consequently, we wrote a program, CONSENSUS, toanalyze and compare methods of calculating a consensus result(a base, an ambiguity code or a subset of codes) at a positionin an aligned set of molecular sequences. The program supportsalphabets of up to four symbols (e.g. (R, Y) or A, C, G, T).The program's output makes it suitable for exploratory dataanalysis or for selecting values of thresholds or confidencelevels in consensus methods having such parameters.  相似文献   

12.
SUMMARY: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the 'twilight zone', i.e. sharing < 30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence.  相似文献   

13.
Nicholas HB  Ropelewski AJ  Deerfield DW 《BioTechniques》2002,32(3):572-4, 576, 578 passim
We present an overview of multiple sequence alignments to outline the practical consequences for the choices among different techniques and parameters. We begin with a discussion of the scoring methods for quantifying the quality of a multiple sequence alignment, followed by a discussion of the algorithms implemented within a variety of multiple sequence alignment programs. We also discuss additional alignment details such as gap penalty and distance metrics. The paper concludes with a discussion on how to improve alignment quality and the limitations of the techniques described in this paper  相似文献   

14.

Background  

The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to N 2 for N sequences. When N grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.  相似文献   

15.
MOTIVATION: Due to the importance of considering secondary structures in aligning functional RNAs, several pairwise sequence-structure alignment methods have been developed. They use extended alignment scores that evaluate secondary structure information in addition to sequence information. However, two problems for the multiple alignment step remain. First, how to combine pairwise sequence-structure alignments into a multiple alignment and second, how to generate secondary structure information for sequences whose explicit structural information is missing. RESULTS: We describe a novel approach for multiple alignment of RNAs (MARNA) taking into consideration both the primary and the secondary structures. It is based on pairwise sequence-structure comparisons of RNAs. From these sequence-structure alignments, libraries of weighted alignment edges are generated. The weights reflect the sequential and structural conservation. For sequences whose secondary structures are missing, the libraries are generated by sampling low energy conformations. The libraries are then processed by the T-Coffee system, which is a consistency based multiple alignment method. Furthermore, we are able to extract a consensus-sequence and -structure from a multiple alignment. We have successfully tested MARNA on several datasets taken from the Rfam database.  相似文献   

16.

Background  

In this paper, we introduce a progressive corner cutting method called Reticular Alignment for multiple sequence alignment. Unlike previous corner-cutting methods, our approach does not define a compact part of the dynamic programming table. Instead, it defines a set of optimal and suboptimal alignments at each step during the progressive alignment. The set of alignments are represented with a network to store them and use them during the progressive alignment in an efficient way. The program contains a threshold parameter on which the size of the network depends. The larger the threshold parameter and thus the network, the deeper the search in the alignment space for better scored alignments.  相似文献   

17.
Gap costs for multiple sequence alignment   总被引:6,自引:0,他引:6  
Standard methods for aligning pairs of biological sequences charge for the most common mutations, which are substitutions, deletions and insertions. Because a single mutation may insert or delete several nucleotides, gap costs that are not directly proportional to gap length are usually the most effective. How to extend such gap costs to alignments of three or more sequences is not immediately obvious, and a variety of approaches have been taken. This paper argues that, since gap and substitution costs together specify optimal alignments, they should be defined using a common rationale. Specifically, a new definition of gap costs for multiple alignments is proposed and compared with previous ones. Since the new definition links a multiple alignment's cost to that of its pairwise projections, it allows knowledge gained about two-sequence alignments to bear on the multiple alignment problem. Also, such linkage is a key element of recent algorithms that have rendered practical the simultaneous alignment of as many as six sequences.  相似文献   

18.

Background

Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment.

Methods

We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments.

Results and conclusions

The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show that our approaches can obtain better results than TCoffee and Clustal Omega in terms of the first ratio.
  相似文献   

19.
MOTIVATION: A tool that simultaneously aligns multiple protein sequences, automatically utilizes information about protein domains, and has a good compromise between speed and accuracy will have practical advantages over current tools. RESULTS: We describe COBALT, a constraint based alignment tool that implements a general framework for multiple alignment of protein sequences. COBALT finds a collection of pairwise constraints derived from database searches, sequence similarity and user input, combines these pairwise constraints, and then incorporates them into a progressive multiple alignment. We show that using constraints derived from the conserved domain database (CDD) and PROSITE protein-motif database improves COBALT's alignment quality. We also show that COBALT has reasonable runtime performance and alignment accuracy comparable to or exceeding that of other tools for a broad range of problems. AVAILABILITY: COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT, and CDD and PROSITE data used is available at: ftp://ftp.ncbi.nlm.nih.gov/pub/agarwala/cobalt  相似文献   

20.
Accurate tools for multiple sequence alignment (MSA) are essential for comparative studies of the function and structure of biological sequences. However, it is very challenging to develop a computationally efficient algorithm that can consistently predict accurate alignments for various types of sequence sets. In this article, we introduce PicXAA (Probabilistic Maximum Accuracy Alignment), a probabilistic non-progressive alignment algorithm that aims to find protein alignments with maximum expected accuracy. PicXAA greedily builds up the multiple alignment from sequence regions with high local similarities, thereby yielding an accurate global alignment that effectively grasps the local similarities among sequences. Evaluations on several widely used benchmark sets show that PicXAA constantly yields accurate alignment results on a wide range of reference sets, with especially remarkable improvements over other leading algorithms on sequence sets with local similarities. PicXAA source code is freely available at: http://www.ece.tamu.edu/∼bjyoon/picxaa/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号