首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Profile comparison methods have been shown to be very powerful in creating accurate alignments of protein sequences, especially in the case of remotely related proteins (RRP). These methods take advantage of the observation that hydrophobic profiles are more conserved than the corresponding amino acid sequences. Here, we present the PROFALIGN algorithm, which allows one to perform a detailed comparative analysis, at both local and global levels of two protein sequence profiles. The user can either choose among four different hydrophobic scales (Miyazawa-Jernigan, Eisenberg, Engelman-Steiz, and Kyte-Doolittle) or can add a personal scale. The interface is designed for a wide range of users, including those who are not involved in protein research. It allows one to vary the alignment parameters (such as gap penalties, embedding, and profile smoothness). Secondary structure propensity is added as an optional alignment filter. Similar segments of two proteins are singled out on the basis of score. We have tested the algorithm with different Src homology 3 (SH3) domain fragments sharing low sequence homology but very similar three-dimensional (3D) structures. By using the Miyazawa-Jernigan hydrophobic scale, PROFALIGN was able to detect the strong correlation between the regions that are known to be crucial for SH3 transition state topology. PROFALIGN seems able to identify most of the mutual alignment of structures on the basis of their hydrophobic profiles, delimiting the regions containing the key determinants of folding. Therefore, the present methodology may be useful for the detection of the most structurally relevant positions inside remote related proteins.  相似文献   

2.

Background  

Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.  相似文献   

3.
Pectate lyases are plant virulence factors that degrade the pectate component of the plant cell wall. The enzymes share considerable sequence homology with plant pollen and style proteins, suggesting a shared structural topology and possibly functional relationships as well. The three-dimensional structures of two Erwinia chrysanthemi pectate lyases, C and E, have been superimposed and the structurally conserved amino acids have been identified. There are 232 amino acids that superimpose with a root-mean-square deviation of 3 A or less. These amino acids have been used to correct the primary sequence alignment derived from evolution-based techniques. Subsequently, multiple alignment techniques have allowed the realignment of other extracellular pectate lyases as well as all sequence homologs, including pectin lyases and the plant pollen and style proteins. The new multiple sequence alignment reveals amino acids likely to participate in the parallel beta helix motif, those involved in binding Ca2+, and those invariant amino acids with potential catalytic properties. The latter amino acids cluster in two well-separated regions on the pectate lyase structures, suggesting two distinct enzymatic functions for extracellular pectate lyases and their sequence homologs.  相似文献   

4.
Summary Adenovirus E1A and c-myc genes are known to be capable of transforming primary rat cells when they occur in combination with either polyoma middle-T or T24 Harvey-ras 1 genes. There was a low level of amino acid sequence homology between the nuclear adenovirus-12 (Ad12) E1A protein product (289 amino acids) and the c-myc protein based on optimal alignment and percentage identity. In contrast to others [Ralston R, Bishop JM (1983) Nature 306:803–806], we concluded that this low level of amino acid sequence homology was not significant, since rabies glycoprotein (RGP), which has no transforming function and localizes to the cell surface, had a similar low level of amino acid sequence homology to the c-myc protein. Furthermore, dot-matrix analysis, when used to test the overall level of amino acid sequence homology, showed no significant homology between c-myc and Ad12 E1A, E1B, or RGP. Thus, low levels of amino acid sequence homology between two proteins may not be sufficient to predict structural and functional similarities between them reliably, even if the two proteins appear to share a common function.  相似文献   

5.
Cathepsin L is a cysteine protease which degrades connective tissue proteins including collagen, elastin, and fibronectin. In this study, five well-characterized cathepsin L proteins from different arthropods were used as query sequences for the Drosophila genome database. The search yielded 10 cathepsin L-like sequences, of which eight putatively represent novel cathepsin L-like proteins. To understand the phylogenetic relationship among these cathepsin L-like proteins, a phylogenetic tree was constructed based on their sequences. In addition, models of the tertiary structures of cathepsin L were constructed using homology modeling methods and subjected to molecular dynamics simulations to obtain reasonable structure to understand its dynamical behavior. Our findings demonstrate that all of the potential Drosophila cathepsin L-like proteins contain at least one cathepsin propeptide inhibitor domain. Multiple sequence alignment and homology models clearly highlight the conservation of active site residues, disulfide bonds, and amino acid residues critical for inhibitor binding. Furthermore, comparative modeling indicates that the sequence/structure/function profiles and active site architectures are conserved.  相似文献   

6.
Remote homology detection refers to the detection of structure homology in evolutionarily related proteins with low sequence similarity. Supervised learning algorithms such as support vector machine (SVM) are currently the most accurate methods. In most of these SVM-based methods, efforts have been dedicated to developing new kernels to better use the pairwise alignment scores or sequence profiles. Moreover, amino acids’ physicochemical properties are not generally used in the feature representation of protein sequences. In this article, we present a remote homology detection method that incorporates two novel features: (1) a protein's primary sequence is represented using amino acid's physicochemical properties and (2) the similarity between two proteins is measured using recurrence quantification analysis (RQA). An optimization scheme was developed to select different amino acid indices (up to 10 for a protein family) that are best to characterize the given protein family. The selected amino acid indices may enable us to draw better biological explanation of the protein family classification problem than using other alignment-based methods. An SVM-based classifier will then work on the space described by the RQA metrics. The classification scheme is named as SVM-RQA. Experiments at the superfamily level of the SCOP1.53 dataset show that, without using alignment or sequence profile information, the features generated from amino acid indices are able to produce results that are comparable to those obtained by the published state-of-the-art SVM kernels. In the future, better prediction accuracies can be expected by combining the alignment-based features with our amino acids property-based features. Supplementary information including the raw dataset, the best-performing amino acid indices for each protein family and the computed RQA metrics for all protein sequences can be downloaded from http://ym151113.ym.edu.tw/svm-rqa.  相似文献   

7.
Zhu M  Li M 《Molecular bioSystems》2012,8(6):1686-1693
G-protein coupled receptors (GPCRs) are recognized to constitute the largest family of membrane proteins. Due to the disproportion in the quantity of crystal structures and their amino acid sequences, homology modeling contributes a reasonable and feasible approach to GPCR theoretical coordinates. With the brand new crystal structures resolved recently, herein we deliberated how to designate them as templates to carry out homology modeling in four aspects: (1) various sequence alignment methods; (2) protein weight matrix; (3) different sets of multiple templates; (4) active and inactive state of templates. The accuracy of models was evaluated by comparing the similarity of stereo conformation and molecular docking results between models and the experimental structure of Meleagris gallopavo β(1)-adrenergic receptor (Mg_Adrb1) that we desired to develop as an example. Our results proposed that: (1) Cobalt and MAFFT, two algorithms of sequence alignment, were suitable for single- and multiple-template modeling, respectively; (2) Blosum30 is applicable to align sequences in the case of low sequence identity; (3) multiple-template modeling is not always better than single-template one; (4) the state of template is an influential factor in simulating the GPCR structures as well.  相似文献   

8.
Computational methods such as sequence alignment and motif construction are useful in grouping related proteins into families, as well as helping to annotate new proteins of unknown function. These methods identify conserved amino acids in protein sequences, but cannot determine the specific functional or structural roles of conserved amino acids without additional study. In this work, we present 3MATRIX (http://3matrix.stanford.edu) and 3MOTIF (http://3motif.stanford.edu), a web-based sequence motif visualization system that displays sequence motif information in its appropriate three-dimensional (3D) context. This system is flexible in that users can enter sequences, keywords, structures or sequence motifs to generate visualizations. In 3MOTIF, users can search using discrete sequence motifs such as PROSITE patterns, eMOTIFs, or any other regular expression-like motif. Similarly, 3MATRIX accepts an eMATRIX position-specific scoring matrix, or will convert a multiple sequence alignment block into an eMATRIX for visualization. Each query motif is used to search the protein structure database for matches, in which the motif is then visually highlighted in three dimensions. Important properties of motifs such as sequence conservation and solvent accessible surface area are also displayed in the visualizations, using carefully chosen color shading schemes.  相似文献   

9.
Although multiple sequence alignments (MSAs) are essential for a wide range of applications from structure modeling to prediction of functional sites, construction of accurate MSAs for distantly related proteins remains a largely unsolved problem. The rapidly increasing database of spatial structures is a valuable source to improve alignment quality. We explore the use of 3D structural information to guide sequence alignments constructed by our MSA program PROMALS. The resulting tool, PROMALS3D, automatically identifies homologs with known 3D structures for the input sequences, derives structural constraints through structure-based alignments and combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. PROMALS3D can also align sequences of multiple input structures, with the output representing a multiple structure-based alignment refined in combination with sequence constraints. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins. PROMALS3D outperforms a number of existing methods for constructing multiple sequence or structural alignments using both reference-dependent and reference-independent evaluation methods.  相似文献   

10.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

11.
Amino acid residues that are involved in functional interactions in proteins have strong evolutionary pressure to remain unchanged and consequently their substitution patterns are different from those that are noninteracting. To characterize and quantify the differences between amino acid substitution patterns due to structural restraints and those under functional restraints, we have made a comparative analysis of families of homologous proteins. Residues classified as having the same amino acid type, secondary structure, accessibility, and side-chain hydrogen bonds are shown to be better conserved if they are close to the active site. We have focused on enzyme families for this analysis since they have functional sites that are easily defined by their catalytic residues. We have derived new sets of environment-specific substitution tables, which we term function-dependent environment-specific substitution tables, where amino acid residues are classified according to their distance from the functional sites. The residues that are within a distance of 9 A from the active site have distinct amino acid substitution patterns when compared to the other sites. The function-dependent environment-specific substitution tables have been tested using the sequence-structure homology recognition program FUGUE and the results compared with the recognition performance obtained using the standard environment-specific substitution tables. Significant improvements are obtained in both recognition performance and alignment accuracy using the function-dependent environment-specific substitution tables (P-value = 0.02, according to the Wilcoxon signed rank test for alignment accuracy). The alignments near the active site are greatly improved with pronounced improvements at lower percentage identities (less than 30%).  相似文献   

12.
Cells can switch the functional states of extracellular matrix proteins by stretching them while exerting mechanical force. Using steered molecular dynamics, we investigated how the mechanical stability of FnIII modules from the cell adhesion protein fibronectin is affected by natural variations in their amino acid sequences. Despite remarkably similar tertiary structures, FnIII modules share low sequence homology. Conversely, the sequence homology for the same FnIII module across multiple species is notably higher, suggesting that sequence variability is functionally significant. Our studies find that the mechanical stability of FnIII modules can be tuned through substitutions of just a few key amino acids by altering access of water molecules to hydrogen bonds that break early in the unfolding pathway. Furthermore, the FnIII hierarchy of mechanical unfolding can be changed by environmental conditions, such as pH for FnIII10, or by forming complexes with other molecules, such as heparin binding to FnIII13.  相似文献   

13.
Estrada E 《Proteins》2004,54(4):727-737
The folding degree index (Estrada, Bioinformatics 2002;18:697-704) is extended to account for the contribution of amino acids to folding. First, the mathematical formalism for extending the folding degree index is presented. Then, the amino acid contributions to folding degree of several proteins are used to analyze its relation to secondary structure. The possibilities of using these contributions in helping or checking the assignation of secondary structure to amino acids are also introduced. The influence of external factors to the amino acids contribution to folding degree is studied through the temperature effect on ribonuclease A. Finally, the analysis of 3D protein similarity through the use of amino acid contributions to folding degree is studied by selecting a series of lysozymes. These results are compared to that obtained by sequence alignment (2D similarity) and 3D superposition of the structures, showing the uniqueness of the current approach.  相似文献   

14.
15.
MOTIVATION: Sequence alignment techniques have been developed into extremely powerful tools for identifying the folding families and function of proteins in newly sequenced genomes. For a sufficiently low sequence identity it is necessary to incorporate additional structural information to positively detect homologous proteins. We have carried out an extensive analysis of the effectiveness of incorporating secondary structure information directly into the alignments for fold recognition and identification of distant protein homologs. A secondary structure similarity matrix based on a database of three-dimensionally aligned proteins was first constructed. An iterative application of dynamic programming was used which incorporates linear combinations of amino acid and secondary structure sequence similarity scores. Initially, only primary sequence information is used. Subsequently contributions from secondary structure are phased in and new homologous proteins are positively identified if their scores are consistent with the predetermined error rate. RESULTS: We used the SCOP40 database, where only PDB sequences that have 40% homology or less are included, to calibrate homology detection by the combined amino acid and secondary structure sequence alignments. Combining predicted secondary structure with sequence information results in a 8-15% increase in homology detection within SCOP40 relative to the pairwise alignments using only amino acid sequence data at an error rate of 0.01 errors per query; a 35% increase is observed when the actual secondary structure sequences are used. Incorporating predicted secondary structure information in the analysis of six small genomes yields an improvement in the homology detection of approximately 20% over SSEARCH pairwise alignments, but no improvement in the total number of homologs detected over PSI-BLAST, at an error rate of 0.01 errors per query. However, because the pairwise alignments based on combinations of amino acid and secondary structure similarity are different from those produced by PSI-BLAST and the error rates can be calibrated, it is possible to combine the results of both searches. An additional 25% relative improvement in the number of genes identified at an error rate of 0.01 is observed when the data is pooled in this way. Similarly for the SCOP40 dataset, PSI-BLAST detected 15% of all possible homologs, whereas the pooled results increased the total number of homologs detected to 19%. These results are compared with recent reports of homology detection using sequence profiling methods. AVAILABILITY: Secondary structure alignment homepage at http://lutece.rutgers.edu/ssas CONTACT: anders@rutchem.rutgers.edu; ronlevy@lutece.rutgers.edu Supplementary Information: Genome sequence/structure alignment results at http://lutece.rutgers.edu/ss_fold_predictions.  相似文献   

16.
An open question in protein homology modeling is, how well do current modeling packages satisfy the dual criteria of quality of results and practical ease of use? To address this question objectively, we examined homology‐built models of a variety of therapeutically relevant proteins. The sequence identities across these proteins range from 19% to 76%. A novel metric, the difference alignment index (DAI), is developed to aid in quantifying the quality of local sequence alignments. The DAI is also used to construct the relative sequence alignment (RSA), a new representation of global sequence alignment that facilitates comparison of sequence alignments from different methods. Comparisons of the sequence alignments in terms of the RSA and alignment methodologies are made to better understand the advantages and caveats of each method. All sequence alignments and corresponding 3D models are compared to their respective structure‐based alignments and crystal structures. A variety of protein modeling software was used. We find that at sequence identities >40%, all packages give similar (and satisfactory) results; at lower sequence identities (<25%), the sequence alignments generated by Profit and Prime, which incorporate structural information in their sequence alignment, stand out from the rest. Moreover, the model generated by Prime in this low sequence identity region is noted to be superior to the rest. Additionally, we note that DSModeler and MOE, which generate reasonable models for sequence identities >25%, are significantly more functional and easier to use when compared with the other structure‐building software.  相似文献   

17.
张强  唐青  李浩  王环宇  梁国栋 《病毒学报》2007,23(2):115-120
为了解我国狂犬病毒M、P基因序列和结构特点,用RT-PCR方法获得目的基因片段,测定核苷酸序列后,计算机分析核苷酸和氨基酸序列及其功能区位点结构。结果显示四株病毒M基因核苷酸和氨基酸序列同源性分别为83.9%~99.5%和93.1%~99%,四株狂犬病毒M蛋白上调节病毒RNA转录和复制功能的第58位氨基酸残基均为谷氨酰胺残基(E),与特异性细胞蛋白WW区域作用的PPxY结构序列均为PPEY保守序列;四株病毒P基因核苷酸和氨基酸序列同源性分别为83.6%~99.8%和87.2%~99%,P蛋白与胞浆动力蛋白轻链LC8相互作用的序列位于143~148位氨基酸残基,均为DKSTQT,四株病毒P基因与L蛋白、N蛋白作用位点序列显示未发生影响其生物学功能的变异。研究结果证实了这两种蛋白结构在病毒致病性中起重要作用的推论。  相似文献   

18.
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with C alpha-RMSD values to the native of 2 A or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment, a result which relies on the accuracy of the secondary-structure prediction of the query sequence. The most accurate alignments were obtained using template profiles constructed with the aid of structural alignments. In contrast, a simple sequence-to-sequence alignment algorithm, using a membrane protein-specific substitution matrix, shows no improvement in alignment accuracy. We suggest that profile-to-profile alignment methods should be adopted to maximize the accuracy of homology models of membrane proteins.  相似文献   

19.
The method of the representation of amino acid sequence by graph of the interactions energy between parts of spatial structure has been elaborated. Our method provides the possibility to establish the compatibility between each point of a polypeptide chain and the Van der Waals interactions energy of regions of a native globule adjacent to this amino acid residue. We have undertaken an exhaustive analysis of a set of proteins. Boundaries of domain and module structures have been found. Nonequivalence of different parts of sequences in respect to their contribution to stabilization of the spatial structure of the protein macromolecules has been revealed. On the basis of the number of energetic levels which are necessary to identify all independent parts of the globule, the contribution from each part of the sequence to stabilization of the spatial structure of the globule is defined. Thus, it has been found that the sequence of amino acid residues coincides with the sequence of the numerical values which can be used in turn in formal procedures, such as an alignment, a search of consensus, the recognition of composition peculiarities, etc. An example of the comparison of proteins with various sequence identities is considered to demonstrate the scheme of an alignment procedure.  相似文献   

20.
S Guida  A Heguy  M Melli 《Gene》1992,111(2):239-243
The evolutionary conservation of a sequence or part of it can help to identify the essential functional and structural domains within a protein. We have cloned and characterised a cDNA coding for the type-I interleukin-1 receptor (IL-1R) of chick (ch) embryo fibroblasts. The comparison of the amino acid (aa) sequences of the avian with that of murine (m) and human (h) IL-1Rs shows a 60% homology. The intracellular domain is the most conserved region of the chIL-1R, showing 76-79% homology to the murine and human sequences, respectively. The striking conservation of the cytoplasmic region of the receptor is confirmed by its homology with the Toll receptor protein of Drosophila melanogaster. The alignment between the chicken and D. melanogaster proteins shows the presence of four aa blocks with more than 80% homology. The possible functional significance of this homology is discussed. The extracellular binding region of the receptor has a clearly recognisable immunoglobulin-like structure although the sequence divergence is higher than in the cytoplasmic domain.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号