首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 8 毫秒
1.
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality.  相似文献   

2.
Elofsson A 《Proteins》2002,46(3):330-339
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained.  相似文献   

3.
4.
Protein multiple sequence alignment is an important bioinformatics tool. It has important applications in biological evolution analysis and protein structure prediction. A variety of alignment algorithms in this field have achieved great success. However, each algorithm has its own inherent deficiencies. In this paper, permutation similarity is proposed to evaluate several protein multiple sequence alignment algorithms that are widely used currently. As the permutation similarity method only concerns the relative order of different protein evolutionary distances, without taking into account the slight difference between the evolutionary distances, it can get more robust evaluations. The longest common subsequence method is adopted to define the similarity between different permutations. Using these methods, we assessed Dialign, Tcoffee, ClustalW and Muscle and made comparisons among them.  相似文献   

5.
MOTIVATION: Protein sequence alignment plays a critical role in computational biology as it is an integral part in many analysis tasks designed to solve problems in comparative genomics, structure and function prediction, and homology modeling. METHODS: We have developed novel sequence alignment algorithms that compute the alignment between a pair of sequences based on short fixed- or variable-length high-scoring subsequences. Our algorithms build the alignments by repeatedly selecting the highest scoring pairs of subsequences and using them to construct small portions of the final alignment. We utilize PSI-BLAST generated sequence profiles and employ a profile-to-profile scoring scheme derived from PICASSO. RESULTS: We evaluated the performance of the computed alignments on two recently published benchmark datasets and compared them against the alignments computed by existing state-of-the-art dynamic programming-based profile-to-profile local and global sequence alignment algorithms. Our results show that the new algorithms achieve alignments that are comparable with or better than those achieved by existing algorithms. Moreover, our results also showed that these algorithms can be used to provide better information as to which of the aligned positions are more reliable--a critical piece of information for comparative modeling applications.  相似文献   

6.
MicroRNA identification based on sequence and structure alignment   总被引:20,自引:0,他引:20  
MOTIVATION: MicroRNAs (miRNA) are approximately 22 nt long non-coding RNAs that are derived from larger hairpin RNA precursors and play important regulatory roles in both animals and plants. The short length of the miRNA sequences and relatively low conservation of pre-miRNA sequences restrict the conventional sequence-alignment-based methods to finding only relatively close homologs. On the other hand, it has been reported that miRNA genes are more conserved in the secondary structure rather than in primary sequences. Therefore, secondary structural features should be more fully exploited in the homologue search for new miRNA genes. RESULTS: In this paper, we present a novel genome-wide computational approach to detect miRNAs in animals based on both sequence and structure alignment. Experiments show this approach has higher sensitivity and comparable specificity than other reported homologue searching methods. We applied this method on Anopheles gambiae and detected 59 new miRNA genes. AVAILABILITY: This program is available at http://bioinfo.au.tsinghua.edu.cn/miralign. SUPPLEMENTARY INFORMATION: Supplementary information is available at http://bioinfo.au.tsinghua.edu.cn/miralign/supplementary.htm.  相似文献   

7.
A structure-based method for protein sequence alignment   总被引:1,自引:0,他引:1  
MOTIVATION: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database. RESULTS: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family. AVAILABILITY: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer. CONTACT: bryant@ncbi.nlm.nih.gov.  相似文献   

8.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

9.
While most of the recent improvements in multiple sequence alignment accuracy are due to better use of vertical information, which include the incorporation of consistency-based pairwise alignments and the use of profile alignments, we observe that it is possible to further improve accuracy by taking into account alignment of neighboring residues when aligning two residues, thus making better use of horizontal information. By modifying existing multiple alignment algorithms to make use of horizontal information, we show that this strategy is able to consistently improve over existing algorithms on a few sets of benchmark alignments that are commonly used to measure alignment accuracy, and the average improvements in accuracy can be as much as 1–3% on protein sequence alignment and 5–10% on DNA/RNA sequence alignment. Unlike previous algorithms, consistent average improvements can be obtained across all identity levels.  相似文献   

10.
The accuracy of current signal peptide predictors is outstanding. The most successful predictors are based on neural networks and hidden Markov models, reaching a sensitivity of 99% and an accuracy of 95%. Here, we demonstrate that the popular BLASTP alignment tool can be tuned for signal peptide prediction reaching the same high level of prediction success. Alignment-based techniques provide additional benefits. In spite of high success rates signal peptide predictors yield false predictions. Simple sequences like polyvaline, for example, are predicted as signal peptides. The general architecture of learning systems makes it difficult to trace the cause of such problems. This kind of false predictions can be recognized or avoided altogether by using sequence comparison techniques. Based on these results we have implemented a public web service, called Signal-BLAST. Predictions returned by Signal-BLAST are transparent and easy to analyze. AVAILABILITY: Signal-BLAST is available online at http://sigpep.services.came.sbg.ac.at/signalblast.html.  相似文献   

11.
Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.  相似文献   

12.
The Smith-Waterman (SW) algorithm is a typical technique for local sequence alignment in computational biology. However, the SW algorithm does not consider the local behaviours of the amino acids, which may result in loss of some useful information. Inspired by the success of Markov Edit Distance (MED) method, this paper therefore proposes a novel Markov pairwise protein sequence alignment (MPPSA) method that takes the local context dependencies into consideration. The numerical results have shown its superiority to the SW for pairwise protein sequence comparison.  相似文献   

13.
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment. Proteins 32:88–96, 1998. Published 1998 Wiley-Liss, Inc.
  • 1 This article is a US government work and, as such, is in the public domain in the United States of America.
  •   相似文献   

    14.
    Extracting protein alignment models from the sequence database.   总被引:16,自引:2,他引:14       下载免费PDF全文
    Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences.  相似文献   

    15.
    Identification of protein sequence homology by consensus template alignment   总被引:26,自引:0,他引:26  
    A pattern-matching procedure is described, based on fitting templates to the sequence, which allows general structural constraints to be imposed on the patterns identified. The templates correspond to structurally conserved regions of the sequence and were initially derived from a small number of related sequences whose tertiary structures are known. The templates were then made more representative by aligning other sequences of unknown structure. Two alignments were built up containing 100 immunoglobulin variable domain sequences and 85 constant domain sequences, respectively. From each of these extended alignments, templates were generated to represent features conserved in all the sequences. These consisted mainly of patterns of hydrophobicity associated with beta-structure. For structurally conserved beta-strands with no conserved features, templates based on general secondary structure prediction principles were used to identify their possible locations. The specificity of the templates was demonstrated by their ability to identify the conserved features in known immunoglobulin and immunoglobulin-related sequences but not in other non-immunoglobulin sequences.  相似文献   

    16.
    Clustal W—蛋白质与核酸序列分析软件   总被引:2,自引:1,他引:2  
    蛋白质与核酸的序列分析在现代生物学和生物信息学中发挥着重要作用,新的算法和软件层出不穷,本文介绍一个可运行在PC机上的完全免费的多序列比较软件-ClustalW,它不但可以进行蛋白质与核酸的多序列比较,分析不同序列之间的相似性关系,还可以绘制进化树。由于其灵活的输入输出格式、方便的参数设定和选择、详尽的在线帮助以及良好的可移植性,使得ClustalW在蛋白质与核酸的序列分析中得到了广泛应用。  相似文献   

    17.
    Multiple sequence alignment (MSA) is one of the most fundamental problems in computational molecular biology. The running time of the best known scheme for finding an optimal alignment, based on dynamic programming, increases exponentially with the number of input sequences. Hence, many heuristics were suggested for the problem. We consider a version of the MSA problem where the goal is to find an optimal alignment in which matches are restricted to positions in predefined matching segments. We present several techniques for making the dynamic programming algorithm more efficient, while still finding an optimal solution under these restrictions. We prove that it suffices to find an optimal alignment of the predefined sequence segments, rather than single letters, thereby reducing the input size and thus improving the running time. We also identify "shortcuts" that expedite the dynamic programming scheme. Empirical study shows that, taken together, these observations lead to an improved running time over the basic dynamic programming algorithm by 4 to 12 orders of magnitude, while still obtaining an optimal solution. Under the additional assumption that matches between segments are transitive, we further improve the running time for finding the optimal solution by restricting the search space of the dynamic programming algorithm  相似文献   

    18.
    Motivation: The nucleotide sequencing process produces not onlythe sequence of nucleotides, but also associated quality values.Quality values provide valuable information, but are primarilyused only for trimming sequences and generally ignored in subsequentanalyses. Results: This article describes how the scoring schemes of standardalignment algorithms can be modified to take into account qualityvalues to produce improved alignments and statistically moreaccurate scores. A prototype implementation is also provided,and used to post-process a set of BLAST results. Quality-adjustedalignment is a natural extension of standard alignment methods,and can be implemented with only a small constant factor performancepenalty. The method can also be applied to related methods includingheuristic search algorithms like BLAST and FASTA. Availability: Software is available at http://malde.org/~ketil/qaa. Contact: ketil.malde{at}imr.no Supplementary information: Supplementary data are availableat Bioinformatics online. Associate Editor: Limsoon Wong  相似文献   

    19.
    This paper describes a multiple alignment method using a workstationand supercomputer. The method is based on the alignment of aset of aligned sequences with the new sequence, and uses a recursiveprocedure of such alignment. The alignment is executed in areasonable computation time on diverse levels from a workstationto a supercomputer, from the viewpoint of alignment resultsand computational speed by parallel processing. The applicationof the algorithm is illustrated by several examples of multiplealignment of 12 amino acid and DNA sequences of HIV (human immunodeficiencyvirus) env genes. Colour graphic programs on a workstation andparallel processing on a supercomputer are discussed. Received on April 26, 1988; accepted on July 7, 1988  相似文献   

    20.
    The most popular algorithms employed in the pairwise alignment of protein primary structures (Smith-Watermann (SW) algorithm, FASTA, BLAST, etc.) only analyze the amino acid sequence. The SW algorithm is the most accurate, yielding alignments that agree best with superimpositions of the corresponding spatial structures of proteins. However, even the SW algorithm fails to reproduce the spatial structure alignment when the sequence identity is lower than 30%. The objective of this work was to develop a new and more accurate algorithm taking the secondary structure of proteins into account. The alignments generated by this algorithm and having the maximal weight with the secondary structure considered proved to be more accurate than SW alignments. With sequences having less than 30% identity, the accuracy (i.e., the portion of reproduced positions of a reference alignment obtained by superimposing the protein spatial structures) of the new algorithm is 58 vs. 35% of the SW algorithm. The accuracy of the new algorithm is much the same with secondary structures established experimentally or predicted theoretically. Hence, the algorithm is applicable to proteins with unknown spatial structures. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

    设为首页 | 免责声明 | 关于勤云 | 加入收藏

    Copyright©北京勤云科技发展有限公司  京ICP备09084417号