首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Pairwise optimal alignments between three or more sequences are not necessarily consistent as a whole, but consistent and inconsistent residues are usually distributed in clusters. An efficient method has been developed for locating consistent regions when each pairwise alignment is given in the form of a “skeletal representation” (Bull. math. Biol. 52, 359–373). This method is further extended so that the combination of pairwise alignments that gives the greatest consistency is found when possibly many alignments are equally optimal for each pairwise comparison. A method for acceleration of simultaneous multiple sequence alignment is proposed in which consistent regions serve as “anchor points” limiting application of direct multi-way alignment to the rest of “inconsistent” regions. Dedicated to Prof. Akiyoshi Wada on the occasion of his 60th birthday.  相似文献   

2.
Most phylogenetic‐tree building applications use multiple sequence alignments as a starting point. A recent meta‐level methodology, called Heads or Tails, aims to reveal the quality of multiple sequence alignments by comparing alignments taken in the forward direction with the alignments of the same sequences when the sequences are reversed. Through an examination of a special case for multiple sequence alignment – pair‐wise alignments, where an optimal algorithm exists – and the use of a modi?ed global‐alignment application, it is shown that the forward and reverse alignments, even when they are the same, do not capture all the possible variations in the alignments and when the forward and reverse alignments differ there may be other alignments that remain unaccounted for. The implication is that comparing just the forward and (biologically irrelevant) reverse alignments is not sufficient to capture the variability in multiple sequence alignments, and the Heads or Tails methodology is therefore not suitable as a method for investigating multiple sequence alignment accuracy. Part of the reason is the inability of individual multiple sequence alignment applications to adequately sample the space of possible alignments. A further implication is that the Hall [Hall, B.G., 2008. Mol. Biol. Evol. 25, 1576–1580] methodology may create optimal synthetic multiple sequence alignments that extant aligners will be unable to completely recover ab initio due to alternative alignments being possible at particular sites. In general, it is shown that more divergent sequences will give rise to an increased number of alternative alignments, so sequence sets with a higher degree of similarity are preferable to sets with lower similarity as the starting point for phylogenetic tree building. © The Willi Hennig Society 2009.  相似文献   

3.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone.Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions.These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues.This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.  相似文献   

4.
A DNA/protein sequence comparison is a popular computational tool for molecular biologists. Finding a good alignment implies an evolutionary and/or functional relationship between proteins or genomic loci. Sequential similarity between two proteins indicates their structural resemblance, providing a practical approach for structural modeling, when structure of one of these proteins is known. The first step in the homology modeling is a construction of an accurate sequence alignment. The commonly used alignment algorithms do not provide an adequate treatment of the structurally mismatched residues in locally dissimilar regions. We propose a simple modification of the existing alignment algorithm which treats these regions properly and demonstrate how this modification improves sequence alignments in real proteins.  相似文献   

5.
MOTIVATION: The pairwise alignment of biological sequences obtained from an algorithm will in general contain both correct and incorrect parts. Hence, to allow for a valid interpretation of the alignment, the local trustworthiness of the alignment has to be quantified. RESULTS: We present a novel approach that attributes a reliability index to every pair of residues, including gapped regions, in the optimal alignment of two protein sequences. The method is based on a fuzzy recast of the dynamic programming algorithm for sequence alignment in terms of mean field annealing. An extensive evaluation with structural reference alignments not only shows that the probability for a pair of residues to be correctly aligned grows consistently with increasing reliability index, but moreover demonstrates that the value of the reliability index can directly be translated into an estimate of the probability for a correct alignment.  相似文献   

6.
Dickson RJ  Gloor GB 《PloS one》2012,7(6):e37645
The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/.  相似文献   

7.
A molecular sequence alignment algorithm based on dynamic programming has been extended to allow the computation of all pairs of residues that can be part of optimal and suboptimal sequence alignments. The uncertainties inherent in sequence alignment can be displayed using a new form of dot plot. The method allows the qualitative assessment of whether or not two sequences are related, and can reveal what parts of the alignment are better determined than others. It also permits the computation of representative optimal and suboptimal alignments. The relation between alignment reliability and alignment parameters is discussed. Other applications are to cyclical permutations of sequences and the detection of self-similarity. An application to multiple sequence alignment is noted.  相似文献   

8.
The Intronerator (http://www.cse.ucsc.edu/ approximately kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions.  相似文献   

9.
A major problem in predicting protein structure by homology modelling is that the sequence alignment from which the model is built may not be the best one in terms of the correct equivalencing of residues assessed by structural or functional criteria. A useful strategy is to generate and examine a number of suboptimal alignments as better alignments can often be found away from the optimal. A procedure to filter rapidly suboptimal alignments based on measurement of core volumes and packing pair potentials is investigated. The approach is benchmarked on three pairs of sequences which are non-trivial to align correctly, namely two immunoglobulin domains, plastocyanin with azurin and two distant globin sequences. It is shown to be useful to reduce a large ensemble of possible alignments down to a few which correspond more closely to the correct (structure based) alignment.  相似文献   

10.
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88–96, 1998. Published 1998 Wiley-Liss, Inc.
  • 1 This article is a US government work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    11.
    SUMMARY: We introduce an algorithm that uses the information gained from simultaneous consideration of an entire group of related proteins to create multiple structure alignments (MSTAs). Consistency-based alignment (CBA) first harnesses the information contained within regions that are consistently aligned among a set of pairwise superpositions in order to realign pairs of proteins through both global and local refinement methods. It then constructs a multiple alignment that is maximally consistent with the improved pairwise alignments. We validate CBA's alignments by assessing their accuracy in regions where at least two of the aligned structures contain the same conserved sequence motif. RESULTS: CBA correctly aligns well over 90% of motif residues in superpositions of proteins belonging to the same family or superfamily, and it outperforms a number of previously reported MSTA algorithms.  相似文献   

    12.
    Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
    Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

    13.
    MOTIVATION: Multiple sequence alignment is an important tool to understand and analyze functions of homologous proteins. However, the logic of residue conservation/variation is usually apparent only in three-dimensional (3D) space, not on a primary sequence level. Thus, in a traditional multiple alignment it is often difficult to directly visualize and analyze key residues because they are masked by other residues along the alignment. Here we present an integrated multiple alignment and 3D structure visualization program that can (1) map and highlight residues from a 1D alignment onto a 3D structure and vice versa and (2) display only the alignment of preselected, key residues. This program, called Visualize Structure Sequence Alignment, also has many other built-in tools that can help analyze multiple sequence alignments. AVAILABILITY: http://bioinformatics.burnham.org/liwz/vissa CONTACT: liwz@burnham.org.  相似文献   

    14.
    MOTIVATION: Protein structure comparison (PSC) has been used widely in studies of structural and functional genomics. However, PSC is computationally expensive and as a result almost all of the PSC methods currently in use look only for the optimal alignment and ignore many alternative alignments that are statistically significant and that may provide insight into protein evolution or folding. RESULTS: We have developed a new PSC method with efficiency to detect potentially viable alternative alignments in all-against-all database comparisons. The efficiency of the new PSC method derives from the ability to directly home in on a limited number of viable and ranked alignment solutions based on intuitively derived SSE (secondary structure element)-matching probabilities.  相似文献   

    15.
    We describe EnteriX, a suite of three web-based visualization tools for graphically portraying alignment information from comparisons among several fixed and user-supplied sequences from related enterobacterial species, anchored on a reference genome (http://bio.cse.psu.edu/). The first visualization, Enteric, displays stacked pairwise alignments between a reference genome and each of the related bacteria, represented schematically as PIPs (Percent Identity Plots). Encoded in the views are large-scale genomic rearrangement events and functional landmarks. The second visualization, Menteric, computes and displays 1 Kb views of nucleotide-level multiple alignments of the sequences, together with annotations of genes, regulatory sites and conserved regions. The third, a Java-based tool named Maj, displays alignment information in two formats, corresponding roughly to the Enteric and Menteric views, and adds zoom-in capabilities. The uses of such tools are diverse, from examining the multiple sequence alignment to infer conserved sites with potential regulatory roles, to scrutinizing the commonalities and differences between the genomes for pathogenicity or phylogenetic studies. The EnteriX suite currently includes >15 enterobacterial genomes, generates views centered on four different anchor genomes and provides support for including user sequences in the alignments.  相似文献   

    16.
    The programs described herein function as part of a suite ofprograms designed for pairwise alignment, multiple alignment,generation of randomized sequences, production of alignmentscores and a sorting routine for analysis of the alignmentsproduced. The sequence alignment programs penalize gaps (absencesof residues) within regions of protein secondary structure andhave the added option of ‘fingerprinting’ structurallyor functionally important protein residues. The multiple alignmentprogram is based upon the sequence alignment method of Needlemanand Wunsch and the multiple alignment extension of Barton andSternberg. Our application includes the feature of optionallyweighting active site, monomer-monomer, ligand contact or otherimportant template residues to bias the alignment toward matchingthese residues. A sum-score for the alignments is introduced,which is independent of gap penalties. This score more adequatelyreflects the character of the alignments for a given scoringmatrix than the gap-penalty-dependent total score describedpreviously in the literature. In addition, individual aminoacid similarity scores at each residue position in the alignmentsare printed with the alignment output to enable immediate quantitativeassessment of homology at key sections of the aligned chains.  相似文献   

    17.
    MOTIVATION: Amino acid sequence alignments are widely used in the analysis of protein structure, function and evolutionary relationships. Proteins within a superfamily usually share the same fold and possess related functions. These structural and functional constraints are reflected in the alignment conservation patterns. Positions of functional and/or structural importance tend to be more conserved. Conserved positions are usually clustered in distinct motifs surrounded by sequence segments of low conservation. Poorly conserved regions might also arise from the imperfections in multiple alignment algorithms and thus indicate possible alignment errors. Quantification of conservation by attributing a conservation index to each aligned position makes motif detection more convenient. Mapping these conservation indices onto a protein spatial structure helps to visualize spatial conservation features of the molecule and to predict functionally and/or structurally important sites. Analysis of conservation indices could be a useful tool in detection of potentially misaligned regions and will aid in improvement of multiple alignments. RESULTS: We developed a program to calculate a conservation index at each position in a multiple sequence alignment using several methods. Namely, amino acid frequencies at each position are estimated and the conservation index is calculated from these frequencies. We utilize both unweighted frequencies and frequencies weighted using two different strategies. Three conceptually different approaches (entropy-based, variance-based and matrix score-based) are implemented in the algorithm to define the conservation index. Calculating conservation indices for 35522 positions in 284 alignments from SMART database we demonstrate that different methods result in highly correlated (correlation coefficient more than 0.85) conservation indices. Conservation indices show statistically significant correlation between sequentially adjacent positions i and i + j, where j < 13, and averaging of the indices over the window of three positions is optimal for motif detection. Positions with gaps display substantially lower conservation properties. We compare conservation properties of the SMART alignments or FSSP structural alignments to those of the ClustalW alignments. The results suggest that conservation indices should be a valuable tool of alignment quality assessment and might be used as an objective function for refinement of multiple alignments. AVAILABILITY: The C code of the AL2CO program and its pre-compiled versions for several platforms as well as the details of the analysis are freely available at ftp://iole.swmed.edu/pub/al2co/.  相似文献   

    18.
    Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

    19.
    VISTAS is a suite of programs for protein sequence and structure analysis. The system allows the simultaneous display, in separate windows, of multiple sequence alignments, of known or model 3D structures, and of 2D graphic representations of sequence and/or alignment properties. The displays are fully integrated, and therefore manipulations in one window can be reflected in each of the others. Beyond its display facilities, VISTAS brings together a number of existing tools under a single, user-friendly umbrella: these include a fully functional interactive color alignment procedure, conserved motif selection, a range of database-scanning routines, and interactive access to the OWL composite sequence database and to the PRINTS protein fingerprint database. Exploration of the sequence database is thus straightforward, and predefined structural motifs from the fingerprint database may be readily visualized. Of particular note is the ability to calculate conservation criteria from sequence alignments and to display the information in a 3D context: this renders VISTAS a powerful tool for aiding mutagenesis studies and for facilitating refinement of molecular models.  相似文献   

    20.
    When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25%.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号