首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
All popular algorithms of pair-wise alignment of protein primary structures (e.g. Smith-Waterman (SW), FASTA, BLAST, et al.) utilize only amino acid sequences. The SW-algorithm is the most accurate among them, i.e. it produces alignments that are most similar to the alignments obtained by superposition of protein 3D-structures. But even the SW-algorithm is unable to restore the 3D-based alignment if similarity of amino acid sequences (%id) is below 30%. We have proposed a novel alignment method that explicitly takes into account the secondary structure of the compared proteins. We have shown that it creates significantly more accurate alignments compared to SW-algorithm. In particular, for sequences with %id < 30% the average accuracy of the new method is 58% compared to 35% for SW-algorithm (the accuracy of an algorithmic sequence alignment is the part of restored position of a "golden standard" alignment obtained by superposition of corresponding 3D-structures). The accuracy of the proposed method is approximately identical both for experimental, and for theoretically predicted secondary structures. Thus the method can be applied for alignment of protein sequences even if protein 3D-structure is unknown. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

2.
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain.  相似文献   

3.
The accuracy of the global Smith-Waterman alignments and Pareto-optimal alignments depending on the degree of sequence similarity (percent of coincidence, % id, and the number of remote fragments NGap) has been examined. An algorithm for constructing a set of three to six alignments has been developed of which the accuracy of the best alignment exceeds on the average the accuracy of the best alignment that can be constructed using the Smith-Waterman algorithm. For weakly homologous sequences (% id 15, NGap 20), the increase in the accuracy is on the average about 8%, with the average accuracy of the global Smith-Waterman alignments being about 38% (the accuracy was estimated on model test sets).  相似文献   

4.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone.Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions.These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues.This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.  相似文献   

5.
Qiu J  Elber R 《Proteins》2006,62(4):881-891
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.  相似文献   

6.
The most popular algorithms employed in the pairwise alignment of protein primary structures (Smith-Watermann (SW) algorithm, FASTA, BLAST, etc.) only analyze the amino acid sequence. The SW algorithm is the most accurate, yielding alignments that agree best with superimpositions of the corresponding spatial structures of proteins. However, even the SW algorithm fails to reproduce the spatial structure alignment when the sequence identity is lower than 30%. The objective of this work was to develop a new and more accurate algorithm taking the secondary structure of proteins into account. The alignments generated by this algorithm and having the maximal weight with the secondary structure considered proved to be more accurate than SW alignments. With sequences having less than 30% identity, the accuracy (i.e., the portion of reproduced positions of a reference alignment obtained by superimposing the protein spatial structures) of the new algorithm is 58 vs. 35% of the SW algorithm. The accuracy of the new algorithm is much the same with secondary structures established experimentally or predicted theoretically. Hence, the algorithm is applicable to proteins with unknown spatial structures. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

7.
The accuracy of global Smith-Waterman alignments and Pareto-optimal alignments depending on the degree of sequence similarity (percent of coincidence, %id, and the number of removed fragments NGap) has been examined. An algorithm for constructing a set of three to six alignments has been developed of which the best alignment on the average exceeds in accuracy the best alignment that can be constructed using the Smith-Waterman algorithm. For weakly homologous sequences (%id 15, NGap 20), the increase in accuracy is on the average about 8%, with the average accuracy of the global Smith-Waterman alignments being about 38% (the accuracy was estimated on model test sets).  相似文献   

8.
Profile search methods based on protein domain alignments have proven to be useful tools in comparative sequence analysis. Domain alignments used by currently available search methods have been computed by sequence comparison. With the growth of the protein structure database, however, alignments of many domain pairs have also been computed by structure comparison. Here, we examine the extent to which information from these two sources agrees. We measure agreement with respect to identification of homologous regions in each protein, that is, with respect to the location of domain boundaries. We also measure agreement with respect to identification of homologous residue sites by comparing alignments and assessing the accuracy of the molecular models they predict. We find that domain alignments in publicly available collections based on sequence and structure comparison are largely consistent. However, the homologous regions identified by sequence comparison are often shorter than those identified by 3D structure comparison. In addition, when overall sequence similarity is low alignments from sequence comparison produce less accurate molecular models, suggesting that they less accurately identify homologous sites. These observations suggest that structure comparison results might be used to improve the overall accuracy of domain alignment collections and the performance of profile search methods based on them.  相似文献   

9.
STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments.  相似文献   

10.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.  相似文献   

11.
Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.  相似文献   

12.
The review considers the original works on the primary structure of biopolymers, which were carried out from 1983 to 2003. Most works were supported by the Russian program Human Genome and earlier similar Russian programs. Little-known publications of 1983-1993 and recent unpublished results are described in detail. In the field of genome comparisons, these concern the OWEN hierarchic algorithm aligning syntenic regions of two genome sequences. The resulting global alignment is obtained as an ordered chain of local similarities. Alignment of sequences sized about 10(6) nucleotides takes several minutes. The concept of local similarity conflicts is generalized to multiple comparisons. New algorithms aligning protein sequences are described and compared with the Smith-Waterman algorithm, which is now most accurate. The ANCHOR hierarchic algorithm generates alignments of much the same accuracy and is twice as rapid as the Smith-Waterman one. The STRSWer algorithm takes an account of the secondary structures of proteins under study. With the secondary structures predicted using the PSI-PRED software for pairs of proteins having 10-30% similarity, the average accuracy of alignments generated by STRSWer is 15% higher than that achieved with the Smith-Waterman algorithm.  相似文献   

13.
Wang J  Feng JA 《Proteins》2005,58(3):628-637
Sequence alignment has become one of the essential bioinformatics tools in biomedical research. Existing sequence alignment methods can produce reliable alignments for homologous proteins sharing a high percentage of sequence identity. The performance of these methods deteriorates sharply for the sequence pairs sharing less than 25% sequence identity. We report here a new method, NdPASA, for pairwise sequence alignment. This method employs neighbor-dependent propensities of amino acids as a unique parameter for alignment. The values of neighbor-dependent propensity measure the preference of an amino acid pair adopting a particular secondary structure conformation. NdPASA optimizes alignment by evaluating the likelihood of a residue pair in the query sequence matching against a corresponding residue pair adopting a particular secondary structure in the template sequence. Using superpositions of homologous proteins derived from the PSI-BLAST analysis and the Structural Classification of Proteins (SCOP) classification of a nonredundant Protein Data Bank (PDB) database as a gold standard, we show that NdPASA has improved pairwise alignment. Statistical analyses of the performance of NdPASA indicate that the introduction of sequence patterns of secondary structure derived from neighbor-dependent sequence analysis clearly improves alignment performance for sequence pairs sharing less than 20% sequence identity. For sequence pairs sharing 13-21% sequence identity, NdPASA improves the accuracy of alignment over the conventional global alignment (GA) algorithm using the BLOSUM62 by an average of 8.6%. NdPASA is most effective for aligning query sequences with template sequences whose structure is known. NdPASA can be accessed online at http://astro.temple.edu/feng/Servers/BioinformaticServers.htm.  相似文献   

14.
The PSI-BLAST algorithm has been acknowledged as one of the most powerful tools for detecting remote evolutionary relationships by sequence considerations only. This has been demonstrated by its ability to recognize remote structural homologues and by the greatest coverage it enables in annotation of a complete genome. Although recognizing the correct fold of a sequence is of major importance, the accuracy of the alignment is crucial for the success of modeling one sequence by the structure of its remote homologue. Here we assess the accuracy of PSI-BLAST alignments on a stringent database of 123 structurally similar, sequence-dissimilar pairs of proteins, by comparing them to the alignments defined on a structural basis. Each protein sequence is compared to a nonredundant database of the protein sequences by PSI-BLAST. Whenever a pair member detects its pair-mate, the positions that are aligned both in the sequential and structural alignments are determined, and the alignment sensitivity is expressed as the percentage of these positions out of the structural alignment. Fifty-two sequences detected their pair-mates (for 16 pairs the success was bi-directional when either pair member was used as a query). The average percentage of correctly aligned residues per structural alignment was 43.5+/-2.2%. Other properties of the alignments were also examined, such as the sensitivity vs. specificity and the change in these parameters over consecutive iterations. Notably, there is an improvement in alignment sensitivity over consecutive iterations, reaching an average of 50.9+/-2.5% within the five iterations tested in the current study.  相似文献   

15.
Rai BK  Fiser A 《Proteins》2006,63(3):644-661
A major bottleneck in comparative protein structure modeling is the quality of input alignment between the target sequence and the template structure. A number of alignment methods are available, but none of these techniques produce consistently good solutions for all cases. Alignments produced by alternative methods may be superior in certain segments but inferior in others when compared to each other; therefore, an accurate solution often requires an optimal combination of them. To address this problem, we have developed a new approach, Multiple Mapping Method (MMM). The algorithm first identifies the alternatively aligned regions from a set of input alignments. These alternatively aligned segments are scored using a composite scoring function, which determines their fitness within the structural environment of the template. The best scoring regions from a set of alternative segments are combined with the core part of the alignments to produce the final MMM alignment. The algorithm was tested on a dataset of 1400 protein pairs using 11 combinations of two to four alignment methods. In all cases MMM showed statistically significant improvement by reducing alignment errors in the range of 3 to 17%. MMM also compared favorably over two alignment meta-servers. The algorithm is computationally efficient; therefore, it is a suitable tool for genome scale modeling studies.  相似文献   

16.
Peng J  Xu J 《Proteins》2011,79(6):1930-1939
Most threading methods predict the structure of a protein using only a single template. Due to the increasing number of solved structures, a protein without solved structure is very likely to have more than one similar template structures. Therefore, a natural question to ask is if we can improve modeling accuracy using multiple templates. This article describes a new multiple-template threading method to answer this question. At the heart of this multiple-template threading method is a novel probabilistic-consistency algorithm that can accurately align a single protein sequence simultaneously to multiple templates. Experimental results indicate that our multiple-template method can improve pairwise sequence-template alignment accuracy and generate models with better quality than single-template models even if they are built from the best single templates (P-value <10(-6)) while many popular multiple sequence/structure alignment tools fail to do so. The underlying reason is that our probabilistic-consistency algorithm can generate accurate multiple sequence/template alignments. In another word, without an accurate multiple sequence/template alignment, the modeling accuracy cannot be improved by simply using multiple templates to increase alignment coverage. Blindly tested on the CASP9 targets with more than one good template structures, our method outperforms all other CASP9 servers except two (Zhang-Server and QUARK of the same group). Our probabilistic-consistency algorithm can possibly be extended to align multiple protein/RNA sequences and structures.  相似文献   

17.
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian‐weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary‐structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics‐scale analysis. HwRMSD can align homologs with low‐sequence identity and large conformational differences, cases where both sequence‐based and structural‐based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence‐alignment method, substitution matrix, and gap parameters for each unique pair of homologs. Proteins 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

18.
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality.  相似文献   

19.
A structure-based method for protein sequence alignment   总被引:1,自引:0,他引:1  
MOTIVATION: With the continuing rapid growth of protein sequence data, protein sequence comparison methods have become the most widely used tools of bioinformatics. Among these methods are those that use position-specific scoring matrices (PSSMs) to describe protein families. PSSMs can capture information about conserved patterns within families, which can be used to increase the sensitivity of searches for related sequences. Certain types of structural information, however, are not generally captured by PSSM search methods. Here we introduce a program, Structure-based ALignment TOol (SALTO), that aligns protein query sequences to PSSMs using rules for placing and scoring gaps that are consistent with the conserved regions of domain alignments from NCBI's Conserved Domain Database. RESULTS: In most cases, the alignment scores obtained using the local alignment version follow an extreme value distribution. SALTO's performance in finding related sequences and producing accurate alignments is similar to or better than that of IMPALA; one advantage of SALTO is that it imposes an explicit gapping model on each protein family. AVAILABILITY: A stand-alone version of the program that can generate global or local alignments is available by ftp distribution (ftp://ftp.ncbi.nih.gov/pub/SALTO/), and has been incorporated to Cn3D structure/alignment viewer. CONTACT: bryant@ncbi.nlm.nih.gov.  相似文献   

20.
Alignment of protein sequences by their profiles   总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号