首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. Availability and implementation: JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.  相似文献   

2.
AltAVisT: comparing alternative multiple sequence alignments   总被引:2,自引:0,他引:2  
We introduce a WWW-based tool that is able to compare two alternative multiple alignments of a given sequence set. Regions where both alignments coincide are color-coded to visualize the local agreement between the two alignments and to identify those regions that can be considered to be reliably aligned. AVAILABILITY: http://bibiserv.techfak.uni-bielefeld.de/altavist/.  相似文献   

3.
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP.  相似文献   

4.
COMPAM is a tool for visualizing relationships among multiple whole genomes by combining all pairwise genome alignments. It displays shared conserved regions (blocks) and where these blocks occur (edges) as block relation graphs which can be explored interactively. An unannotated genome, e.g. can then be explored using information from well-annotated genomes, COG-based genome annotation and genes. COMPAM can run either as a stand-alone application or through an applet that is provided as service to PLATCOM, a toolset for whole genome comparative analysis, where a wide variety of genomes can be easily selected. Features provided by COMPAM include the ability to export genome relationship information into file formats that can be used by other existing tools. AVAILABILITY: http://bio.informatics.indiana.edu/projects/compam/  相似文献   

5.
6.

Background  

The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors.  相似文献   

7.
Evaluation measures of multiple sequence alignments.   总被引:1,自引:0,他引:1  
Multiple sequence alignments (MSAs) are frequently used in the study of families of protein sequences or DNA/RNA sequences. They are a fundamental tool for the understanding of the structure, functionality and, ultimately, the evolution of proteins. A new algorithm, the Circular Sum (CS) method, is presented for formally evaluating the quality of an MSA. It is based on the use of a solution to the Traveling Salesman Problem, which identifies a circular tour through an evolutionary tree connecting the sequences in a protein family. With this approach, the calculation of an evolutionary tree and the errors that it would introduce can be avoided altogether. The algorithm gives an upper bound, the best score that can possibly be achieved by any MSA for a given set of protein sequences. Alternatively, if presented with a specific MSA, the algorithm provides a formal score for the MSA, which serves as an absolute measure of the quality of the MSA. The CS measure yields a direct connection between an MSA and the associated evolutionary tree. The measure can be used as a tool for evaluating different methods for producing MSAs. A brief example of the last application is provided. Because it weights all evolutionary events on a tree identically, but does not require the reconstruction of a tree, the CS algorithm has advantages over the frequently used sum-of-pairs measures for scoring MSAs, which weight some evolutionary events more strongly than others. Compared to other weighted sum-of-pairs measures, it has the advantage that no evolutionary tree must be constructed, because we can find a circular tour without knowing the tree.  相似文献   

8.
9.
MOTIVATION: Most multiple sequence alignment programs use heuristics that sometimes introduce errors into the alignment. The most commonly used methods to correct these errors use iterative techniques to maximize an objective function. We present here an alternative, knowledge-based approach that combines a number of recently developed methods into a two-step refinement process. The alignment is divided horizontally and vertically to form a 'lattice' in which well aligned regions can be differentiated. Alignment correction is then restricted to the less reliable regions, leading to a more reliable and efficient refinement strategy. RESULTS: The accuracy and reliability of RASCAL is demonstrated using: (i) alignments from the BAliBASE benchmark database, where significant improvements were often observed, with no deterioration of the existing high-quality regions, (ii) a large scale study involving 946 alignments from the ProDom protein domain database, where alignment quality was increased in 68% of the cases; and (iii) an automatic pipeline to obtain a high-quality alignment of 695 full-length nuclear receptor proteins, which took 11 min on a DEC Alpha 6100 computer Availability: RASCAL is available at ftp://ftp-igbmc.u-strasbg.fr/pub/RASCAL. SUPPLEMENTARY INFORMATION: http://bioinfo-igbmc.u-strasbourg.fr/BioInfo/RASCAL/paper/rascal_supp.html  相似文献   

10.
Multiple sequence alignment (MSA) accuracy is important, but there is no widely accepted method of judging the accuracy that different alignment algorithms give. We present a simple approach to detecting two types of error, namely block shifts and the misplacement of residues within a gap. Given a MSA, subsets of very similar sequences are generated through the use of a redundancy filter, typically using a 70–90% sequence identity cut-off. Subsets thus produced are typically small and degenerate, and errors can be easily detected even by manual examination. The errors, albeit minor, are inevitably associated with gaps in the alignment, and so the procedure is particularly relevant to homology modelling of protein loop regions. The usefulness of the approach is illustrated in the context of the universal but little known [K/R]KLH motif that occurs in intracellular loop 1 of G protein coupled receptors (GPCR); other issues relevant to GPCR modelling are also discussed.  相似文献   

11.

Background  

Accurate multiple sequence alignments of proteins are very important in computational biology today. Despite the numerous efforts made in this field, all alignment strategies have certain shortcomings resulting in alignments that are not always correct. Refinement of existing alignment can prove to be an intelligent choice considering the increasing importance of high quality alignments in large scale high-throughput analysis.  相似文献   

12.
There is a lack of programs available that focus on providing an overview of an aligned set of sequences such that the comparison of homologous sites becomes comprehensible and intuitive. Being able to identify similarities, differences, and patterns within a multiple sequence alignment is biologically valuable because it permits visualization of the distribution of a particular feature and inferences about the structure, function, and evolution of the sequences in question. We have therefore created a web server, fingerprint, which combines the characteristics of existing programs that represent identity, variability, charge, hydrophobicity, solvent accessibility, and structure along with new visualizations based on composition, heterogeneity, heterozygosity, dN/dS and nucleotide diversity. fingerprint is easy to use and globally accessible through any computer using any major browser. fingerprint is available at http://evol.mcmaster.ca/fingerprint/ .  相似文献   

13.

Motivation  

Sequence-based methods for phylogenetic reconstruction from (nucleic acid) sequence data are notoriously plagued by two effects: homoplasies and alignment errors. Large evolutionary distances imply a large number of homoplastic sites. As most protein-coding genes show dramatic variations in substitution rates that are not uncorrelated across the sequence, this often leads to a patchwork pattern of (i) phylogenetically informative and (ii) effectively randomized regions. In highly variable regions, furthermore, alignment errors accumulate resulting in sometimes misleading signals in phylogenetic reconstruction.  相似文献   

14.
MOTIVATION: The program ESPript (Easy Sequencing in PostScript) allows the rapid visualization, via PostScript output, of sequences aligned with popular programs such as CLUSTAL-W or GCG PILEUP. It can read secondary structure files (such as that created by the program DSSP) to produce a synthesis of both sequence and structural information. RESULTS: ESPript can be run via a command file or a friendly html-based user interface. The program calculates an homology score by columns of residues and can sort this calculation by groups of sequences. It offers a palette of markers to highlight important regions in the alignment. ESPript can also paste information on residue conservation into coordinate files, for subsequent visualization with a graphics program. AVAILABILITY: ESPript can be accessed on its Web site at http://www.ipbs.fr/ESPript. Sources and helpfiles can be downloaded via anonymous ftp from ftp.ipbs.fr. A tar file is held in the directory pub/ESPript.  相似文献   

15.
SUMMARY: Genalyzer is a software tool designed for the interactive visualization of sequence matches between DNA or protein sequences. It provides visualizations on different levels of granularity, from complete overviews via zoomed regions to alignments of particular matching substrings. Genalyzer can efficiently handle very large datasets, allowing to display tens of thousands of matches between sequences of tens of millions of bases. AVAILABILITY: Genalyzer is available free of charge for non-commercial research institutions. For more details, see http://www.genalyzer.de  相似文献   

16.

Background  

Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program.  相似文献   

17.
ABSTRACT: BACKGROUND: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). RESULTS: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. CONCLUSION: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.  相似文献   

18.
19.
VISTA : visualizing global DNA sequence alignments of arbitrary length   总被引:31,自引:0,他引:31  
Summary: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. Availability: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request. Contact: vista@lbl.gov  相似文献   

20.
Pfam contains multiple alignments and hidden Markov model based profiles (HMM-profiles) of complete protein domains. The definition of domain boundaries, family members and alignment is done semi-automatically based on expert knowledge, sequence similarity, other protein family databases and the ability of HMM-profiles to correctly identify and align the members. Release 2.0 of Pfam contains 527 manually verified families which are available for browsing and on-line searching via the World Wide Web in the UK at http://www.sanger.ac.uk/Pfam/ and in the US at http://genome.wustl. edu/Pfam/ Pfam 2.0 matches one or more domains in 50% of Swissprot-34 sequences, and 25% of a large sample of predicted proteins from the Caenorhabditis elegans genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号