首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3x3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

4.
5.
Most recent protein secondary structure prediction methods use sequence alignments to improve the prediction quality. We investigate the relationship between the location of secondary structural elements, gaps, and variable residue positions in multiple sequence alignments. We further investigate how these relationships compare with those found in structurally aligned protein families. We show how such associations may be used to improve the quality of prediction of the secondary structure elements, using the Quadratic-Logistic method with profiles. Furthermore, we analyze the extent to which the number of homologous sequences influences the quality of prediction. The analysis of variable residue positions shows that surprisingly, helical regions exhibit greater variability than do coil regions, which are generally thought to be the most common secondary structure elements in loops. However, the correlation between variability and the presence of helices does not significantly improve prediction quality. Gaps are a distinct signal for coil regions. Increasing the coil propensity for those residues occurring in gap regions enhances the overall prediction quality. Prediction accuracy increases initially with the number of homologues, but changes negligibly as the number of homologues exceeds about 14. The alignment quality affects the prediction more than other factors, hence a careful selection and alignment of even a small number of homologues can lead to significant improvements in prediction accuracy.  相似文献   

6.
7.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

8.
MOTIVATION: Structural alignments of superfamily members often exhibit insertions and deletions of secondary structure elements (SSEs), yet conserved subsets of SSEs appear to be important for maintaining the fold and facilitating common functionalities. RESULTS: A database of aligned SSEs was constructed from the structure-based alignments of protein superfamily members in the CAMPASS database. SSEs were classified into several types on the basis of their length and solvent accessibility and counts were made for the replacements of SSEs in different types at structurally aligned positions. The results, summarized as log-odds substitution matrices, can be used for two types of comparisons: (1) structure against structure, both with secondary structure assignments; and (2) structure against sequence with predicted secondary structures. The conservation of SSEs at each alignment position was defined as the deviation of observed SSE frequencies from the uniform distribution. This offers a useful resource to define and examine the core of superfamily folds. Even when the structure of only a single member of a superfamily is known, the extended method can be used to predict the conservation of SSEs. Such information will be useful when modelling the structure of other members of a superfamily or identifying structurally and functionally important positions in the fold.  相似文献   

9.
10.
All popular algorithms of pair-wise alignment of protein primary structures (e.g. Smith-Waterman (SW), FASTA, BLAST, et al.) utilize only amino acid sequences. The SW-algorithm is the most accurate among them, i.e. it produces alignments that are most similar to the alignments obtained by superposition of protein 3D-structures. But even the SW-algorithm is unable to restore the 3D-based alignment if similarity of amino acid sequences (%id) is below 30%. We have proposed a novel alignment method that explicitly takes into account the secondary structure of the compared proteins. We have shown that it creates significantly more accurate alignments compared to SW-algorithm. In particular, for sequences with %id < 30% the average accuracy of the new method is 58% compared to 35% for SW-algorithm (the accuracy of an algorithmic sequence alignment is the part of restored position of a "golden standard" alignment obtained by superposition of corresponding 3D-structures). The accuracy of the proposed method is approximately identical both for experimental, and for theoretically predicted secondary structures. Thus the method can be applied for alignment of protein sequences even if protein 3D-structure is unknown. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

11.
挑选了NCBI COG数据库中具有全基因组的单细胞微生物,选择其中三维结构已知的蛋白质作为研究对象,研究了不同类型的二级结构含量和长度对古细菌和细菌类蛋白质耐热性的影响作用。结果表明:耐热的古细菌类蛋白质中含有相当数量的短的3_(10)螺旋,而耐热的细菌蛋白质中含有较短的loop环。这不仅说明二级结构对蛋白质耐热性有重要的影响,还表明二级结构对古细菌和细菌类蛋白质耐热性的影响作用是不同的。  相似文献   

12.
Thiamine pyrophosphate (TPP) is an essential cofactor for all forms of life. In Salmonella enterica, the thiH gene product is required for the synthesis of the 4-methyl-5-beta hydroxyethyl-thiazole monophosphate moiety of TPP. ThiH is a member of the radical S-adenosylmethionine (AdoMet) superfamily of proteins that is characterized by the presence of oxygen labile [Fe-S] clusters. Lack of an in vitro activity assay for ThiH has hampered the analysis of this interesting enzyme. We circumvented this problem by using an in vivo activity assay for ThiH. Random and directed mutagenesis of the thiH gene was performed. Analysis of auxotrophic thiH mutants defined two classes, those that required thiazole to make TPP (null mutants) and those with thiamine auxotrophy that was corrected by either L-tyrosine or thiazole (ThiH* mutants). Increased levels of AdoMet also corrected the thiamine requirement of members of the latter class. Residues required for in vivo function were identified and are discussed in the context of structures available for AdoMet enzymes.  相似文献   

13.
Syntaxins and Sec1/munc18 proteins are central to intracellular membrane fusion. All syntaxins comprise a variable N-terminal region, a conserved SNARE motif that is critical for SNARE complex formation, and a transmembrane region. The N-terminal region of neuronal syntaxin 1A contains a three-helix domain that folds back onto the SNARE motif forming a 'closed' conformation; this conformation is required for munc18-1 binding. We have examined the generality of the structural properties of syntaxins by NMR analysis of Vam3p, a yeast syntaxin essential for vacuolar fusion. Surprisingly, Vam3p also has an N-terminal three-helical domain despite lacking apparent sequence homology with syntaxin 1A in this region. However, Vam3p does not form a closed conformation and its N-terminal domain is not required for binding to the Sec1/munc18 protein Vps33p, suggesting that critical distinctions exist in the mechanisms used by syntaxins to govern different types of membrane fusion.  相似文献   

14.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

15.
The nucleotide sequence of chicken, pheasant, duck and Tetrahymena pyriformis U5 RNAs as well as that of new mammalian variant U5 RNAs was determined and compared to that of rat and HeLa cells U5 RNAs. Primary structure conservation is about 95% between rat and human cells, 82% between mammals and birds and 57% between the Protozoan and mammals. The same model of secondary structure, a free single-stranded region flanked by two hairpins can be constructed from all RNAs and is identical to the model previously proposed for mammalian U5 RNA on an experimental basis (1). Thus, this model is confirmed and is likely to be that of an ancestor U5 RNA. The 3' region of the U5 RNA molecule constitutes domain A, and is common to U1, U2, U4 and U5 RNAs (2). The characteristic nucleotide sequences of domain A are highly conserved throughout the phylogenetic evolution of U5 RNA suggesting that they are important elements in the function of the four small RNAs. Another region of high evolutionary conservation is the top part of the 5' side hairpin whose conserved sequence is specific to U5 RNA. It might participate in the particular function of U5 RNA.  相似文献   

16.
Evolutionary conservation of kinetochore protein sequences in plants   总被引:5,自引:0,他引:5  
The evolutionary conservation of structural/functional kinetochore proteins has been studied on isolated nuclei and pro-/metaphase chromosomes of mono- and dicot plants. The cross-reactivities of antibodies against human CENPC, CENPE and CENPF, and against maize CENPCa with the centromeric regions of mitotic chromosomes of Vicia faba and/or Hordeum vulgare are shown. Putative homologs of the kinetochore protein SKP1 (suppressor of kinetochore protein 1p of yeast) were found in both species and of CBF5p (centromere binding factor 5 of yeast) in barley. Antibodies against synthetic peptides derived from partial sequences encoding these proteins were produced and recognized the centromeric regions on mitotic chromosomes as detected by indirect immunofluorescence.  相似文献   

17.
18.
The amino acid sequences of enzymes like alcohol dehydrogenase and glyceraldehyde-3-phosphate dehydrogenase are strongly conserved across all phyla. We suggest that the amino acid conservation of such enzymes might be a result of the fact that they function as part of a multi-enzyme complex. The specific interactions between the proteins involved would hinder evolutionary change of their surfaces.  相似文献   

19.
The prediction of protein secondary structure (alpha-helices, beta-sheets and coil) is improved by 9% to 66% using the information available from a family of homologous sequences. The approach is based both on averaging the Garnier et al. (1978) secondary structure propensities for aligned residues and on the observation that insertions and high sequence variability tend to occur in loop regions between secondary structures. Accordingly, an algorithm first aligns a family of sequences and a value for the extent of sequence conservation at each position is obtained. This value modifies a Garnier et al. prediction on the averaged sequence to yield the improved prediction. In addition, from the sequence conservation and the predicted secondary structure, many active site regions of enzymes can be located (26 out of 43) with limited over-prediction (8 extra). The entire algorithm is fully automatic and is applicable to all structural classes of globular proteins.  相似文献   

20.
Protein structural class prediction is one of the challenging problems in bioinformatics. Previous methods directly based on the similarity of amino acid (AA) sequences have been shown to be insufficient for low-similarity protein data-sets. To improve the prediction accuracy for such low-similarity proteins, different methods have been recently proposed that explore the novel feature sets based on predicted secondary structure propensities. In this paper, we focus on protein structural class prediction using combinations of the novel features including secondary structure propensities as well as functional domain (FD) features extracted from the InterPro signature database. Our comprehensive experimental results based on several benchmark data-sets have shown that the integration of new FD features substantially improves the accuracy of structural class prediction for low-similarity proteins as they capture meaningful relationships among AA residues that are far away in protein sequence. The proposed prediction method has also been tested to predict structural classes for partially disordered proteins with the reasonable prediction accuracy, which is a more difficult problem comparing to structural class prediction for commonly used benchmark data-sets and has never been done before to the best of our knowledge. In addition, to avoid overfitting with a large number of features, feature selection is applied to select discriminating features that contribute to achieve high prediction accuracy. The selected features have been shown to achieve stable prediction performance across different benchmark data-sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号