首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
Two new methods for the visualization of structural similarity in proteins with known three-dimensional structures are presented. They are based on the degree of equivalence of α-carbon pairs in two proteins. The quantitative measure for residue equivalence is the comparison score generated using the sequence and structure alignment method of Taylor and Orengo, which is based on the comparison of interatomic distances (and other properties that can be defined on a residue basis).The first method uses information on corresponding α-carbon positions to display vectors joining these structurally equivalent residues. These vectors can be defined as target constraints, and their minimization “bends” the two proteins toward a common average structure. In the average structure the corresponding residues virtually superpose, while insertions and deletions become clearly visible.The second method uses the comparison scores to perform a weighted least-squares fit of the two structures. It is further used to color code the two structures according to the score value, i.e., their similarity, on a continuous scale from red to blue. Examples of the methods for the comparison of flavodoxin, chemotaxis Y protein and L-arabinose-binding protein are given.  相似文献   

2.
We present a comprehensive evaluation of a new structure mining method called PB-ALIGN. It is based on the encoding of protein structure as 1D sequence of a combination of 16 short structural motifs or protein blocks (PBs). PBs are short motifs capable of representing most of the local structural features of a protein backbone. Using derived PB substitution matrix and simple dynamic programming algorithm, PB sequences are aligned the same way amino acid sequences to yield structure alignment. PBs are short motifs capable of representing most of the local structural features of a protein backbone. Alignment of these local features as sequence of symbols enables fast detection of structural similarities between two proteins. Ability of the method to characterize and align regions beyond regular secondary structures, for example, N and C caps of helix and loops connecting regular structures, puts it a step ahead of existing methods, which strongly rely on secondary structure elements. PB-ALIGN achieved efficiency of 85% in extracting true fold from a large database of 7259 SCOP domains and was successful in 82% cases to identify true super-family members. On comparison to 13 existing structure comparison/mining methods, PB-ALIGN emerged as the best on general ability test dataset and was at par with methods like YAKUSA and CE on nontrivial test dataset. Furthermore, the proposed method performed well when compared to flexible structure alignment method like FATCAT and outperforms in processing speed (less than 45 s per database scan). This work also establishes a reliable cut-off value for the demarcation of similar folds. It finally shows that global alignment scores of unrelated structures using PBs follow an extreme value distribution. PB-ALIGN is freely available on web server called Protein Block Expert (PBE) at http://bioinformatics.univ-reunion.fr/PBE/.  相似文献   

3.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

4.
Most bioinformatics analyses require the assembly of a multiple sequence alignment. It has long been suspected that structural information can help to improve the quality of these alignments, yet the effect of combining sequences and structures has not been evaluated systematically. We developed 3DCoffee, a novel method for combining protein sequences and structures in order to generate high-quality multiple sequence alignments. 3DCoffee is based on TCoffee version 2.00, and uses a mixture of pairwise sequence alignments and pairwise structure comparison methods to generate multiple sequence alignments. We benchmarked 3DCoffee using a subset of HOMSTRAD, the collection of reference structural alignments. We found that combining TCoffee with the threading program Fugue makes it possible to improve the accuracy of our HOMSTRAD dataset by four percentage points when using one structure only per dataset. Using two structures yields an improvement of ten percentage points. The measures carried out on HOM39, a HOMSTRAD subset composed of distantly related sequences, show a linear correlation between multiple sequence alignment accuracy and the ratio of number of provided structure to total number of sequences. Our results suggest that in the case of distantly related sequences, a single structure may not be enough for computing an accurate multiple sequence alignment.  相似文献   

5.
Kifer I  Nussinov R  Wolfson HJ 《Proteins》2008,73(2):380-394
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.  相似文献   

6.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.  相似文献   

7.
The profile method, for detecting distantly related proteins by sequence comparison, has been extended to incorporate secondary structure information from known X-ray structures. The sequence of a known structure is aligned to sequences of other members of a given folding class. From the known structure, the secondary structure (alpha-helix, beta-strand or "other") is assigned to each position of the aligned sequences. As in the standard profile method, a position-dependent scoring table, termed a profile, is calculated from the aligned sequences. However, rather than using the standard Dayhoff mutation table in calculating the profile, we use distinct amino acid mutation tables for residues in alpha-helices, beta-strands or other secondary structures to calculate the profile. In addition, we also distinguish between internal and external residues. With this new secondary structure-based profile method, we created a profile for eight-stranded, antiparallel beta barrels of the insecticyanin folding class. It is based on the sequences of retinol-binding protein, insecticyanin and beta-lactoglobulin. Scanning the sequence database with this profile, it was possible to detect the sequence of avidin. The structure of streptavidin is known, and it appears to be distantly related to the antiparallel beta barrels. Also detected is the sequence of complement component C8, which we therefore predict to be a member of this folding class.  相似文献   

8.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.  相似文献   

9.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

10.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

11.
An algorithm is presented for the multiple alignment of protein sequences that is both accurate and rapid computationally. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, two sequences are aligned, then the third sequence is aligned against the alignment of both sequences one and two. Similarly, the fourth sequence is aligned against one, two and three. This is repeated until all sequences have been aligned. Iteration is then performed to yield a final alignment. The accuracy of sequence alignment is evaluated from alignment of the secondary structures in a family of proteins. For the globins, the multiple alignment was on average 99% accurate compared to 90% for pairwise comparison of sequences. For the alignment of immunoglobulin constant and variable domains, the use of many sequences yielded an alignment of 63% average accuracy compared to 41% average for individual variable/constant alignments. The multiple alignment algorithm yields an assignment of disulphide connectivity in mammalian serotransferrin that is consistent with crystallographic data, whereas pairwise alignments give an alternative assignment.  相似文献   

12.
Summary A structure-based scoring matrix MDPRE was derived from amino acid spatial preferences in protein structures. Sequence alignment and evolutionary studies by using MDPRE matrix gave similar results as those from ordinary sequence and structure alignments. It is interesting that a matrix derived from structure data solely could give comparable alignment results, strongly indicating the intimate connection between protein sequences and structures. The branch order and length from this approach were close to those obtained by a structure comparison method. Thus, by applying this structure-based matrix, the trees obtained should reflect evolutionary characteristics of protein structure. This approach takes advantage over a direct structure comparison in that (1) only a sequence and MDPRE matrix are needed, making it simple and widely applicable (especially in the absence of 3-dimensional protein structure data); (2) an established algorithm for sequence alignment and tree building could be employed, providing opportunities for direct comparison between matrices from different methodologies. One of the most striking features of this method is its capability to detect protein structure homologies when the sequence identities are low. This was well reflected in the given examples of the alignment of dinucleotidebinding domains.  相似文献   

13.
Protein fold recognition using sequence-derived predictions.   总被引:18,自引:9,他引:9       下载免费PDF全文
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.  相似文献   

14.
Yau SS  Yu C  He R 《DNA and cell biology》2008,27(5):241-250
Graphical representation of gene sequences provides a simple way of viewing, sorting, and comparing various gene structures. Here we first report a two-dimensional graphical representation for protein sequences. With this method, we constructed the moment vectors for protein sequences, and mathematically proved that the correspondence between moment vectors and protein sequences is one-to-one. Therefore, each protein sequence can be represented as a point in a map, which we call protein map, and cluster analysis can be used for comparison between the points. Sixty-six proteins from five protein families were analyzed using this method. Our data showed that for proteins in the same family, their corresponding points in the map are close to each other. We also illustrate the efficiency of this approach by performing an extensive cluster analysis of the protein kinase C family. These results indicate that this protein map could be used to mathematically specify the similarity of two proteins and predict properties of an unknown protein based on its amino acid sequence.  相似文献   

15.
Classification of newly determined protein structures is important in understanding their function and mechanism of action. Currently available methods employ a global structure alignment strategy and are computationally expensive. We propose a two-step methodology with a quick screen to significantly reduce the number of candidate structures followed by global structure alignment of the query structure with the reduced set. We represent a protein structure as a sequence of local structures, codified in the form of geometric invariants. Geometric invariants are quantities that remain unchanged under transformations such as translation and rotation. Protein structures represented as multi-attribute sequences are aligned via dynamic programming to identify close neighbors of the query structure. The query structure is then compared with this reduced dataset using conventional structure comparison methods to predict its functional class. For a typical protein structure, the screening method was able to reduce the protein data bank to mere 200 proteins while preserving structurally closest neighbor in the reduced set. This has resulted in 30 to 60 fold improvement in the execution time. We present the results of leave-one-out classification experiment on ASTRAL-95 domains and comparison with SCOP classification hierarchy.  相似文献   

16.
We propose a new method for detecting conserved RNA secondary structures in a family of related RNA sequences. Our method is based on a combination of thermodynamic structure prediction and phylogenetic comparison. In contrast to purely phylogenetic methods, our algorithm can be used for small data sets of approximately 10 sequences, efficiently exploiting the information contained in the sequence variability. The procedure constructs a prediction only for those parts of sequences that are consistent with a single conserved structure. Our implementation produces reasonable consensus structures without user interference. As an example we have analysed the complete HIV-1 and hepatitis C virus (HCV) genomes as well as the small segment of hantavirus. Our method confirms the known structures in HIV-1 and predicts previously unknown conserved RNA secondary structures in HCV.  相似文献   

17.
Dai Q  Li L  Liu X  Yao Y  Zhao F  Zhang M 《PloS one》2011,6(11):e26779
Word-based models have achieved promising results in sequence comparison. However, as the important statistical properties of words in biological sequence, how to use the overlapping structures and background information of the words to improve sequence comparison is still a problem. This paper proposed a new statistical method that integrates the overlapping structures and the background information of the words in biological sequences. To assess the effectiveness of this integration for sequence comparison, two sets of evaluation experiments were taken to test the proposed model. The first one, performed via receiver operating curve analysis, is the application of proposed method in discrimination between functionally related regulatory sequences and unrelated sequences, intron and exon. The second experiment is to evaluate the performance of the proposed method with f-measure for clustering Hepatitis E virus genotypes. It was demonstrated that the proposed method integrating the overlapping structures and the background information of words significantly improves biological sequence comparison and outperforms the existing models.  相似文献   

18.
MOTIVATION: The functions of non-coding RNAs are strongly related to their secondary structures, but it is known that a secondary structure prediction of a single sequence is not reliable. Therefore, we have to collect similar RNA sequences with a common secondary structure for the analyses of a new non-coding RNA without knowing the exact secondary structure itself. Therefore, the sequence comparison in searching similar RNAs should consider not only their sequence similarities but also their potential secondary structures. Sankoff's algorithm predicts the common secondary structures of the sequences, but it is computationally too expensive to apply to large-scale analyses. Because we often want to compare a large number of cDNA sequences or to search similar RNAs in the whole genome sequences, much faster algorithms are required. RESULTS: We propose a new method of comparing RNA sequences based on the structural alignments of the fixed-length fragments of the stem candidates. The implemented software, SCARNA (Stem Candidate Aligner for RNAs), is fast enough to apply to the long sequences in the large-scale analyses. The accuracy of the alignments is better or comparable with the much slower existing algorithms. AVAILABILITY: The web server of SCARNA with graphical structural alignment viewer is available at http://www.scarna.org/.  相似文献   

19.
Alignment of protein sequences by their profiles   总被引:7,自引:0,他引:7  
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.  相似文献   

20.
Evaluation and improvements in the automatic alignment of protein sequences   总被引:6,自引:0,他引:6  
The accuracy of protein sequence alignment obtained by applying a commonly used global sequence comparison algorithm is assessed. Alignments based on the superposition of the three-dimensional structures are used as a standard for testing the automatic, sequence-based methods. Alignments obtained from the global comparison of five pairs of homologous protein sequences studied gave 54% agreement overall for residues in secondary structures. The inclusion of information about the secondary structure of one of the proteins in order to limit the number of gaps inserted in regions of secondary structure, improved this figure to 68%. A similarity score of greater than six standard deviation units suggests that an alignment which is greater than 75% correct within secondary structural regions can be obtained automatically for the pair of sequences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号