首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20–25 %, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555–574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis.

HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.  相似文献   

2.
Hydrophobic cluster analysis (HCA) is a protein sequence comparison method based on alpha-helical representations of the sequences where the size, shape and orientation of the clusters of hydrophobic residues are primarily compared. The effectiveness of HCA has been suggested to originate from its potential ability to focus on the residues forming the hydrophobic core of globular proteins. We have addressed the robustness of the bidimensional representation used for HCA in its ability to detect the regular secondary structure elements of proteins. Various parameters have been studied such as those governing cluster size and limits, the hydrophobic residues constituting the clusters as well as the potential shift of the cluster positions with respect to the position of the regular secondary structure elements. The following results have been found to support the alpha-helical bidimensional representation used in HCA: (i) there is a positive correlation (clearly above background noise) between the hydrophobic clusters and the regular secondary structure elements in proteins; (ii) the hydrophobic clusters are centred on the regular secondary structure elements; (iii) the pitch of the helical representation which gives the best correspondence is that of an alpha-helix. The correspondence between hydrophobic clusters and regular secondary structure elements suggests a way to implement variable gap penalties during the automatic alignment of protein sequences.  相似文献   

3.

Background  

Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet.  相似文献   

4.
A new technique of protein sequence analysis, namely, Hydrophobic Cluster Analysis (HCA), has been used to align and compare the sequences of proteins belonging to the receptor superfamily (steroid, thyroid hormone and retinoic acid receptors) and serpin superfamily (corticosteroid binding globulin (CBG) and alpha 1-antitrypsin (alpha 1-AT]. By matching up clusters of hydrophobic amino-acids that oftenmost correspond to identifiable secondary structures (alpha-helices, beta-strands etc.), it has been possible to deduce the following information on the secondary structures of these proteins: CBG is structurally related to alpha 1-AT (HCA score greater than 80%), the structures of the hormone-binding domains of the steroid receptors that bind 3-keto-delta 4-steroids are closely interrelated (greater than 80%) but less closely related to that of the estrogen receptor (ER) (approximately 75%), vitamin D, retinoic acid and thyroid hormone receptors are structurally closely related (greater than or equal to 80%). Their secondary structures are, however, also related to that of the steroid receptors (approximately 70%), and a high degree of analogy exists between the structures of serpins and of the hormone-binding domains of members of the steroid superfamily (60-70%). HCA has clearly shown that a previous local sequence alignment of the estrogen receptor with other steroid receptors and cytochromes P450 has to be reconsidered. The published consensus steroid binding sequence previously identified in cytochromes is in fact 80 amino-acids upstream from its previously defined position. Other regions of contiguous sequence identity have also been identified which may be involved in the hydrophobic core of the protein or in steroid binding. Their positions have been indicated using the crystal structure of alpha 1-AT as a model.  相似文献   

5.
MOTIVATION: Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS: We developed PROMALS, a multiple alignment method that shows promising results for protein homologs with sequence identity below 10%, aligning close to half of the amino acid residues correctly on average. This is about three times more accurate than traditional pairwise sequence alignment methods. PROMALS algorithm derives its strength from several sources: (i) sequence database searches to retrieve additional homologs; (ii) accurate secondary structure prediction; (iii) a hidden Markov model that uses a novel combined scoring of amino acids and secondary structures; (iv) probabilistic consistency-based scoring applied to progressive alignment of profiles. Compared to the best alignment methods that do not use secondary structure prediction and database searches (e.g. MUMMALS, ProbCons and MAFFT), PROMALS is up to 30% more accurate, with improvement being most prominent for highly divergent homologs. Compared to SPEM and HHalign, which also employ database searches and secondary structure prediction, PROMALS shows an accuracy improvement of several percent. AVAILABILITY: The PROMALS web server is available at: http://prodata.swmed.edu/promals/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

6.
A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the ‘structural alignment’ space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The ‘best’ centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.  相似文献   

7.
It is at present difficult to accurately position gaps in sequence alignment and to determine substructural homology in structure alignment when reconstructing phylogenies based on highly divergent sequences. Therefore, we have developed a new strategy for inferring phylogenies based on highly divergent sequences. In this new strategy, the whole secondary structure presented as a string in bracket notation is used as phylogenetic characters to infer phylogenetic relationships. It is no longer necessary to decompose the secondary structure into homologous substructural components. In this study, reliable phylogenetic relationships of eight species in Pectinidae were inferred from the structure alignment, but not from sequence alignment, even with the aid of structural information. The results suggest that this new strategy should be useful for inferring phylogenetic relationships based on highly divergent sequences. Moreover, the structural evolution of ITS1 in Pectinidae was also investigated. The whole ITS1 structure could be divided into four structural domains. Compensatory changes were found in all four structural domains. Structural motifs in these domains were identified further. These motifs, especially those in D2 and D3, may have important functions in the maturation of rRNAs.  相似文献   

8.
Classification of proteins is a major challenge in bioinformatics. Here an approach is presented, that unifies different existing classifications of protein structures and sequences. Protein structural domains are represented as nodes in a hypergraph. Shared memberships in sequence families result in hyperedges in the graph. The presented method partitions the hypergraph into clusters of structural domains. Each computed cluster is based on a set of shared sequence family memberships. Thus, the clusters put existing protein sequence families into the context of structural family hierarchies. Conversely, structural domains are related to their sequence family memberships, which can be used to gain further knowledge about the respective structural families.  相似文献   

9.
A total of 48 full-length protein sequences of pectin lyases from different source organisms available in NCBI were subjected to multiple sequence alignment, domain analysis, and phylogenetic tree construction. A phylogenetic tree constructed on the basis of the protein sequences revealed two distinct clusters representing pectin lyases from bacterial and fungal sources. Similarly, the multiple accessions of different source organisms representing bacterial and fungal pectin lyases also formed distinct clusters, showing sequence level homology. The sequence level similarities among different groups of pectinase enzymes, viz. pectin lyase, pectate lyase, polygalacturonase, and pectin esterase, were also analyzed by subjecting a single protein sequence from each group with common source organism to tree construction. Four distinct clusters representing different groups of pectinases with common source organisms were observed, indicating the existing sequence level similarity among them. Multiple sequence alignment of pectin lyase protein sequence of different source organisms along with pectinases with common source organisms revealed a conserved region, indicating homology at sequence level. A conserved domain Pec_Lyase_C was frequently observed in the protein sequences of pectin lyases and pectate lyases, while Glyco_hydro_28 domains and Pectate lyase-like β-helix clan domain are frequently observed in polygalacturonases and pectin esterases, respectively. The signature amino acid sequence of 41 amino acids, i.e. TYDNAGVLPITVN-SNKSLIGEGSKGVIKGKGLRIVSGAKNI, related with the Pec_Lyase_C is frequently observed in pectin lyase protein sequences and might be related with the structure and enzymatic function.  相似文献   

10.
Modern computational methods for protein structure prediction have been used to study the structure of the 33 kDa extrinsic membrane protein, associated to the oxygen evolving complex of photosynthetic organisms. A multiple alignment of 14 sequences of this protein from cyanobacteria, algae and plants is presented. The alignment allows the identification of fully conserved residues and the recognition of one deletion and one insertion present in the plant sequences but not in cyanobacteria. A tree of similarity, deduced from pair-wise comparison and cluster analysis of the sequences, is also presented. The alignment and the consensus sequence derived are used for prediction the secondary structure of the protein. This prediction indicates that it is a mainly-beta protein (25–38% of -strands) with no more than 4% of -helix. Fold recognition by threading is applied to obtain a topological 2D model of the protein. In this model the secondary structure elements are located, including several highly conserved loops. Some of these conserved loops are suggested to be important for the binding of the 33 kDa protein to Photosystem II and for the stability of the manganese cluster. These structural predictions are in good agreement with experimental data reported by several authors.  相似文献   

11.
In this study we present an accurate secondary structure prediction procedure by using a query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable. Proteins 27:329–335, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

12.
13.
We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.  相似文献   

14.
SUMMARY: Improving and ascertaining the quality of a multiple sequence alignment is a very challenging step in protein sequence analysis. This is particularly the case when dealing with sequences in the 'twilight zone', i.e. sharing < 30% identity. Here we describe INTERALIGN, a dedicated user-friendly alignment editor including a view of secondary structures and a synchronized display of carbon alpha traces of corresponding protein structures. Profile alignment, using CLUSTALW, is implemented to improve the alignment of a sequence of unknown structure with the visually optimized structural alignment as compared with a standard multiple sequence alignment. Tree-based ordering further helps in identifying the structure closest to a given sequence.  相似文献   

15.
GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics.  相似文献   

16.
A new method for comparing and aligning protein sequences is described. This method, hydrophobic cluster analysis (HCA), relies upon a two-dimensional (2D) representation of the sequences. Hydrophobic clusters are determined in this 2D pattern and then used for the sequence comparisons. The method does not require powerful computer resources and can deal with distantly related proteins, even if no 3D data are available. This is illustrated in the present report by a comparison of human haemoglobin with leghaemoglobin, a comparison of the two domains of liver rhodanese (thiosulphate sulphurtransferase) and a comparison of plastocyanin and azurin.  相似文献   

17.
MOTIVATION: Computationally identifying non-coding RNA regions on the genome has much scope for investigation and is essentially harder than gene-finding problems for protein-coding regions. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignments of RNA sequences. On the other hand, Hidden Markov Models (HMMs) have played important roles for modeling and analysing biological sequences. Especially, the concept of Pair HMMs (PHMMs) have been examined extensively as mathematical models for alignments and gene finding. RESULTS: We propose the pair HMMs on tree structures (PHMMTSs), which is an extension of PHMMs defined on alignments of trees and provides a unifying framework and an automata-theoretic model for alignments of trees, structural alignments and pair stochastic context-free grammars. By structural alignment, we mean a pairwise alignment to align an unfolded RNA sequence into an RNA sequence of known secondary structure. First, we extend the notion of PHMMs defined on alignments of 'linear' sequences to pair stochastic tree automata, called PHMMTSs, defined on alignments of 'trees'. The PHMMTSs provide various types of alignments of trees such as affine-gap alignments of trees and an automata-theoretic model for alignment of trees. Second, based on the observation that a secondary structure of RNA can be represented by a tree, we apply PHMMTSs to the problem of structural alignments of RNAs. We modify PHMMTSs so that it takes as input a pair of a 'linear' sequence and a 'tree' representing a secondary structure of RNA to produce a structural alignment. Further, the PHMMTSs with input of a pair of two linear sequences is mathematically equal to the pair stochastic context-free grammars. We demonstrate some computational experiments to show the effectiveness of our method for structural alignments, and discuss a complexity issue of PHMMTSs.  相似文献   

18.
Structure comparison is widely used to quantify protein relationships. Although there are several approaches to calculate structural similarity, specifying significance thresholds for similarity metrics is difficult due to the inherent likeness of common secondary structure elements. In this study, metal co‐factor location is used to assess the biological relevance of structural alignments. The distance between the centroids of bound co‐factors adds a chemical and function‐relevant constraint to the structural superimposition of two proteins. This additional dimension can be used to define cut‐off values for discriminating valid and spurious alignments in large alignment sets. The hypothesis underlying our approach is that metal coordination sites constrain structural evolution, thus revealing functional relationships between distantly related proteins. A comparison of three related nitrogenases shows the sequence and fold constraints imposed on the protein structures up to 18 Å away from the centers of their bound metal clusters. Proteins 2014; 82:648–656. © 2013 Wiley Periodicals, Inc.  相似文献   

19.
The primary sequences were compared among several proteins: gene product 5 protein (GP5) from phage M13; PIKE from phage Ike; gene product 32 protein (GP32) from phage T4; RecA, SSB and SSF from Escherichia coli. These proteins bind strongly and cooperatively to single-stranded DNA with no sequence specificity. GP5 is the smallest in this group and its three-dimensional structure is well-characterized. Using the entire sequence of GP5 as a template we searched for the regions in other single-stranded DNA binding proteins yielding the best alignment of aromatic and basic residues. The identified domains show alignment of five aromatic and four charged residues in these proteins. The domains in PIKE, GP32 and RecA exhibit statistically significant sequence homology with GP5. These observations strongly favor the hypothesis that the protein-single-stranded DNA complex in this class of proteins is stabilized by the stacking interaction of the aromatic residues with the bases of the DNA, and by the electrostatic interaction of the basic residues with the phosphate groups of the DNA. We also find that the DNA binding domains of these proteins have similar secondary structural preferences, mainly beta structures. The triple-stranded beta-sheet may be a common motif in the DNA binding domains of these proteins.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号