首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 147 毫秒
1.
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.  相似文献   

2.
SUMMARY: POAVIZ creates a visualization of a multiple sequence alignment that makes clear the overall structure of how sequences match and diverge in the alignment. POAVIZ can construct visualizations from any multiple sequence alignment source (e.g. PIR and CLUSTAL formats), and is valuable for revealing complex branching structure (such as domains, large-scale insertions / deletions or recombinations), especially in partnership with the Partial Order Alignment (POA) multiple sequence alignment program. AVAILABILITY: The Partial Order multiple sequence Alignment Visualizer (POAVIZ) program is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

3.
The sensitivity of the commonly used progressive multiple sequence alignment method has been greatly improved for the alignment of divergent protein sequences. Firstly, individual weights are assigned to each sequence in a partial alignment in order to down-weight near-duplicate sequences and up-weight the most divergent ones. Secondly, amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned. Thirdly, residue-specific gap penalties and locally reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure. Fourthly, positions in early alignments where gaps have been opened receive locally reduced gap penalties to encourage the opening up of new gaps at these positions. These modifications are incorporated into a new program, CLUSTAL W which is freely available.  相似文献   

4.
The performances of five global multiple-sequence alignment programs (CLUSTAL W, Divide and Conquer, Malign, PileUp, and TreeAlign) were evaluated using part of the animal mitochondrial small subunit (12S) rRNA molecule. Conserved sequence motifs derived from an alignment based on secondary structural information were used to score how well each program aligned a data set of five vertebrate and five invertebrate taxa over a range of parameter values. All of the programs could align the motifs with reasonable accuracy for at least one set of parameter conditions, although if the whole sequence was considered, similarity to the structural alignment was only 25%-34%. Use of small gap costs generally gave more accurate results, although Malign and TreeAlign generated longer alignments when gap costs were low. The programs differed in the consistency of the alignments when gap cost was varied; CLUSTAL W, Divide and Conquer, and TreeAlign were the most accurate and robust, while PileUp performed poorly as gap cost values increased, and the accuracy of Malign fluctuated. Default settings for the programs did not give the best results, and attempting to select similar parameter values in different programs did not always result in more similar alignments. Poor alignment of even well-conserved motifs can occur if these are near sites with insertions or deletions. Since there is no a priori way to determine gap costs and because such costs can vary over the gene, alignment of rRNA sequences, particularly the less well conserved regions, should be treated carefully and aided by secondary structure and conserved motifs. Some motifs are single bases and so are often invisible to alignment programs. Our tests involved the most conserved regions of the 12S rRNA gene, and alignment of less well conserved regions will be more problematical. None of the alignments we examined produced a fully resolved phylogeny for the data set, indicating that this portion of 12S rRNA is insufficient for resolution of distant evolutionary relationships.  相似文献   

5.
In a case study of fungi of the class Sordariomycetes, we evaluated the effect of multiple sequence alignment (MSA) on the reliability of the phylogenetic trees, topology and confidence of major phylogenetic clades. We compared two main approaches for constructing MSA based on (1) the knowledge of the secondary (2D) structure of ribosomal RNA (rRNA) genes, and (2) automatic construction of MSA by four alignment programs characterized by different algorithms and evaluation methods, CLUSTAL, MAFFT, MUSCLE, and SAM. In the primary fungal sequences of the two functional rRNA genes, the nuclear small and large ribosomal subunits (18 S and 28 S), we identified four and six, respectively, highly variable regions, which correspond mainly to hairpin loops in the 2D structure. These loops are often positioned in expansion segments, which are missing or are not completely developed in the Archaeal and Eubacterial kingdoms. Proper sorting of these sites was a key for constructing an accurate MSA. We utilized DNA sequences from 28 S as an example for one-gene analysis. Five different MSAs were created and analyzed with maximum parsimony and maximum likelihood methods. The phylogenies inferred from the alignments improved with 2D structure with identified homologous segments, and those constructed using the MAFFT alignment program, with all highly variable regions included, provided the most reliable phylograms with higher bootstrap support for the majority of clades. We illustrate and provide examples demonstrating that re-evaluating ambiguous positions in the consensus sequences using 2D structure and covariance is a promising means in order to improve the quality and reliability of sequence alignments.  相似文献   

6.
Amino acid sequences of E. coli glutamate decarboxylase (GADa) and those of 36 GAD of different origin were compared by pairwise alignment using computer program CLUSTAL. GADalpha and plant enzymes showed 59.8-67.8% subunit homology, GADalpha and other bacterial GAD--49.8-77.6%, whereas GADalpha and animal enzymes--13.9-58.8%. Two PLP domains exhibited higher homology comparing to that of the whole subunit in the case of GAD67, plant (68.4-73.9%), and bacterial (46.7-83.2%) enzymes. The alignment of PLP-domains of 37 GAD, three group II decarboxylases, and two pyridoxal enzymes with known 3D structures (bacterial ORD and mAAT from chicken heart) allowed us to reveal conserved residues of the active sites. Their functional role is discussed. Modelling of the PLP-binding sites in active centers for GADalpha and human brain GAD67 was done using the Swiss-PdbViewer homology modelling program. Although the homology between GADalpha and GAD67 is rather low, structural similarity of their active sites allows us to consider here a functional convergence. Thus, glutamate decarboxylation by GADalpha may be helpful for understanding general mechanism of this reaction.  相似文献   

7.
MAFFT version 5: improvement in accuracy of multiple sequence alignment   总被引:44,自引:3,他引:41  
  相似文献   

8.
MOTIVATION: Partial order alignment (POA) has been proposed as a new approach to multiple sequence alignment (MSA), which can be combined with existing methods such as progressive alignment. This is important for addressing problems both in the original version of POA (such as order sensitivity) and in standard progressive alignment programs (such as information loss in complex alignments, especially surrounding gap regions). RESULTS: We have developed a new Partial Order-Partial Order alignment algorithm that optimally aligns a pair of MSAs and which therefore can be applied directly to progressive alignment methods such as CLUSTAL. Using this algorithm, we show the combined Progressive POA alignment method yields results comparable with the best available MSA programs (CLUSTALW, DIALIGN2, T-COFFEE) but is far faster. For example, depending on the level of sequence similarity, aligning 1000 sequences, each 500 amino acids long, took 15 min (at 90% average identity) to 44 min (at 30% identity) on a standard PC. For large alignments, Progressive POA was 10-30 times faster than the fastest of the three previous methods (CLUSTALW). These data suggest that POA-based methods can scale to much larger alignment problems than possible for previous methods. AVAILABILITY: The POA source code is available at http://www.bioinformatics.ucla.edu/poa  相似文献   

9.
Summary A fast dynamic programming algorithm for the spatial superposition of protein structure without prior knowledge of an initial alignment has been developed. The program was applied to serine proteases, hemoglobins, cytochromes C, small copper-binding proteins, and lysozymes. In most cases the existing structural homology could be detected in a completely unbiased way. The results of the method presented are in general agreement with other studies. Applying our method, the different alignment results obtained by other authors for serine proteases and cytochromes C can be classified in terms of different alignment parameters such as gap penalties or cut-off length. Limitations of the method are discussed.  相似文献   

10.
A workbench for multiple alignment construction and analysis   总被引:126,自引:0,他引:126  
Multiple sequence alignment can be a useful technique for studying molecular evolution, as well as for analyzing relationships between structure or function and primary sequence. We have developed for this purpose an interactive program, MACAW (Multiple Alignment Construction and Analysis Workbench), that allows the user to construct multiple alignments by locating, analyzing, editing, and combining "blocks" of aligned sequence segments. MACAW incorporates several novel features. (1) Regions of local similarity are located by a new search algorithm that avoids many of the limitations of previous techniques. (2) The statistical significance of blocks of similarity is evaluated using a recently developed mathematical theory. (3) Candidate blocks may be evaluated for potential inclusion in a multiple alignment using a variety of visualization tools. (4) A user interface permits each block to be edited by moving its boundaries or by eliminating particular segments, and blocks may be linked to form a composite multiple alignment. No completely automatic program is likely to deal effectively with all the complexities of the multiple alignment problem; by combining a powerful similarity search algorithm with flexible editing, analysis and display tools, MACAW allows the alignment strategy to be tailored to the problem at hand.  相似文献   

11.
Elongation factor-1alpha (EF-1alpha) is a highly conserved nuclear coding gene that can be used to investigate recent divergences due to the presence of rapidly evolving introns. However, a universal feature of intron sequences is that even closely related species exhibit insertion and deletion events, which cause variation in the lengths of the sequences. Indels are frequently rich in evolutionary information, but most investigators ignore sites that fall within these variable regions, largely because the analytical tools and theory are not well developed. We examined this problem in the taxonomically problematic parasitoid wasp genus Pauesia (Hymenoptera: Braconidae: Aphidiinae) using congruence as a criterion for assessing a range of methods for aligning such variable-length EF-1alpha intron sequences. These methods included distance- and parsimony-based multiple-alignment programs (CLUSTAL W and MALIGN), direct optimization (POY), and two "by eye" alignment strategies. Furthermore, with one method (CLUSTAL W) we explored in detail the robustness of results to changes in the gap cost parameters. Phenetic-based alignments ("by eye" and CLUSTAL W) appeared, under our criterion, to perform as well as more readily defensible, but computationally more demanding, methods. In general, all of our alignment and tree-building strategies recovered the same basic topological structure, which means that an underlying phylogenetic signal remained regardless of the strategy chosen. However, several relationships between clades were sensitive both to alignment and to tree-building protocol. Further alignments, considering only sequences belonging to the same group, allowed us to infer a range of phylogenetic relationships that were highly robust to tree-building protocol. By comparing these topologies with those obtained by varying the CLUSTAL parameters, we generated the distribution area of congruence and taxonomic compatibility. Finally, we present the first robust estimate of the European Pauesia phylogeny by using two EF-1alpha introns and 38 taxa (plus 3 outgroups). This estimate conflicts markedly with the traditional subgeneric classification. We recommend that this classification be abandoned, and we propose a series of monophyletic species groups.  相似文献   

12.
PhyloBLAST is an internet-accessed application based on CGI/Perl programming that compares a users protein sequence to a SwissProt/TREMBL database using BLAST2 and then allows phylogenetic analyses to be performed on selected sequences from the BLAST output. Flexible features such as ability to input your own multiple sequence alignment and use PHYLIP program options provide additional web-based phylogenetic analysis functionality beyond the analysis of a BLAST result.  相似文献   

13.
gff2aplot: Plotting sequence comparisons   总被引:1,自引:0,他引:1  
SUMMARY: gff2aplot is a program to visualize the alignment of two sequences together with their annotations. Input for the program consists of single or multiple files in GFF-format which specify the alignment coordinates and annotation features of both sequences. Output is in PostScript format of any size. The features to be displayed are highly customizable to meet user specific needs. The program serves to generate print-quality images for comparative genome sequence analysis. AVAILABILITY: gff2aplot is freely available under the GNU software licence and can be downloaded from the address specified below. Supplementary information: http://genome.imim.es/software/gfftools/GFF2APLOT.html  相似文献   

14.
Comparative ITS2 sequencing in the plant genus Aeschynanthus(Gesneriaceae) reveals an insertion/deletion (indel) hot spot in the ITS2 sequences that is difficult to align. Examination of other Gesneriaceae sequences shows that this is a widespread phenomenon in this plant family. Minimum free-energy secondary structure analyses localize the hot spot to the terminal part of arm 1. Arm 1 is twice as long in Gesneriaceae than in other asterids. In addition, the pattern of indels is consistent with this secondary structure model. The high variability of the extended terminal part of arm 1 in Gesneriaceae and the fact that it can be deleted altogether imply that it is functionally superfluous. In contrast, the base of arm 1 is relatively conserved and may function as an exonuclease recognition site. This study illustrates how comparative secondary structure analyses can be helpful in fine-scale alignment. Alignment based on secondary structure conflicts with our initial manual alignment and, to a lesser extent, with a CLUSTAL X alignment with default parameters.  相似文献   

15.
MOTIVATION: SAM-T99 is an iterative hidden Markov model-based method for finding proteins similar to a single target sequence and aligning them. One of its main uses is to produce multiple alignments of homologs of the target sequence. Previous tests of SAM-T99 and its predecessors have concentrated on the quality of the searches performed, not on the quality of the multiple alignment. In this paper we report on tests of multiple alignment quality, comparing SAM-T99 to the standard multiple aligner, CLUSTALW. RESULTS: The paper evaluates the multiple-alignment aspect of the SAM-T99 protocol, using the BAliBASE benchmark alignment database. On these benchmarks, SAM-T99 is comparable in accuracy with ClustalW. AVAILABILITY: The SAM-T99 protocol can be run on the web at http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html and the alignment tune-up option described here can be run at http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-tuneup.html. The protocol is also part of the standard SAM suite of tools. http://www.cse.ucsc.edu/research/compbio/sam/  相似文献   

16.
MOTIVATION: Homologous sequences are sometimes similar over some regions but different over other regions. Homologous sequences have a much lower global similarity if the different regions are much longer than the similar regions. RESULTS: We present a generalized global alignment algorithm for comparing sequences with intermittent similarities, an ordered list of similar regions separated by different regions. A generalized global alignment model is defined to handle sequences with intermittent similarities. A dynamic programming algorithm is designed to compute an optimal general alignment in time proportional to the product of sequence lengths and in space proportional to the sum of sequence lengths. The algorithm is implemented as a computer program named GAP3 (Global Alignment Program Version 3). The generalized global alignment model is validated by experimental results produced with GAP3 on both DNA and protein sequences. The GAP3 program extends the ability of standard global alignment programs to recognize homologous sequences of lower similarity. AVAILABILITY: The GAP3 program is freely available for academic use at http://bioinformatics.iastate.edu/aat/align/align.html.  相似文献   

17.
The number of nuclear small subunit (SSU) ribosomal RNA (rRNA) sequences for Nematoda has increased dramatically in recent years, and although their use in constructing phylogenies has also increased, relatively little attention has been given to their alignment. Here we examined the sensitivity of the nematode SSU data set to different alignment parameters and to the removal of alignment ambiguous regions. Ten alignments were created with CLUSTAL W using different sets of alignment parameters (10 full alignments), and each alignment was examined by eye and alignment ambiguous regions were removed (creating 10 reduced alignments). These alignment ambiguous regions were analyzed as a third type of data set, culled alignments. Maximum parsimony, neighbor-joining, and parsimony bootstrap analyses were performed. The resulting phylogenies were compared to each other by the symmetric difference distance tree comparison metric (SymD). The correlation of the phylogenies with the alignment parameters was tested by comparing matrices from SymD with corresponding matrices of Manhattan distances representing the alignment parameters. Differences among individual parsimony trees from the full alignments were frequently correlated with the differences among alignment parameters (580/1000 tests), as were trees from the culled alignments (403/1000 tests). Differences among individual parsimony trees from the reduced alignments were less frequently correlated with the differences among alignment parameters (230/1000 tests). Differences among majority-rule consensus trees (50%) from the parsimony analysis of the full alignments were significantly correlated with the differences among alignment parameters, whereas consensus trees from the reduced and culled analyses were not correlated with the alignment parameters. These patterns of correlation confirm that choice of alignment parameters has the potential to bias the resultant phylogenies for the nematode SSU data set, and suggest that the removal of alignment ambiguous regions reduces this effect. Finally, we discuss the implications of conservative phylogenetic hypotheses for Nematoda produced by exploring alignment space and removing alignment ambiguous regions for SSU rDNA.  相似文献   

18.
The MUST package is a phylogenetically oriented set of programs for data management and display, allowing one to handle both raw data (sequences) and results (trees, number of steps, bootstrap proportions). It is complementary to the main available software for phylogenetic analysis (PHYLIP, PAUP, HENNING86, CLUSTAL) with which it is fully compatible. The first part of MUST consists of the acquisition of new sequences, their storage, modification, and checking of sequence integrity in files of aligned sequences. In order to improve alignment, an editor function for aligned sequences offers numerous options, such as selection of subsets of sequences, display of consensus sequences, and search for similarities over small sequence fragments. For phylogenetic reconstruction, the choice of species and portions of sequences to be analyzed is easy and very rapid, permitting fast testing of numerous combinations of sequences and taxa. The resulting files can be formatted for most programs of tree construction. An interactive tree-display program recovers the output of all these programs. Finally, various modules allow an in-depth analysis of results, such as comparison of distance matrices, variation of bootstrap proportions with respect to various parameters or comparison of the number of steps per position. All presently available complete sequences of 28S rRNA are furnished aligned in the package. MUST therefore allows the management of all the operations required for phylogenetic reconstructions.  相似文献   

19.
20.
从中国不同城市收集疑为沙眼衣原体(Ct)感染的泌尿生殖道标本323份,巢式PCR扩增Ctomp1基因片段(包括4个变异区),测定其中96份阳性标本omp1基因序列,根据同源性分型并分析其多态位点;根据氨基酸序列,用Mega 3软件构建进化树,分析临床株与相应参考株之间的亲缘关系。从96份沙眼衣原体阳性标本中,检出28种基因变体,其中E型最常见;同时发现Ct E、F型omp1基因高度保守,而其它基因型都显示一定的变异性。进化树分析发现,各临床株与相应参考株之间遗传距离较近。实验结果表明沙眼衣原体omp1基因呈现较大的多态性,可为其疫苗的研制及感染的防治提供重要的实验依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号