首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 255 毫秒
1.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

2.
Efforts to predict protein secondary structure have been hampered by the apparent structural plasticity of local amino acid sequences. Kabsch and Sander (1984, Proc. Natl. Acad. Sci. USA 81, 1075–1078) articulated this problem by demonstrating that identical pentapeptide sequences can adopt distinct structures in different proteins. With the increased size of the protein structure database and the availability of new methods to characterize structural environments, we revisit this observation of structural plasticity. Within a set of proteins with less than 50% sequence identity, 59 pairs of identical hexapeptide sequences were identified. These local structures were compared and their surrounding structural environments examined. Within a protein structural class (α/α, β/β, α/β, α + β), the structural similarity of sequentially identical hexapeptides usually is preserved. This study finds eight pairs of identical hexapeptide sequences that adopt β-strand structure in one protein and α-helical structure in the other. In none of the eight cases do the members of these sequence pairs come from proteins within the same folding class. These results have implications for class dependent secondary structure prediction algorithms.  相似文献   

3.
Bilateral similarity function is designed for analyzing the similarities of biological sequences such as DNA, RNA secondary structure or protein in this paper. The defined function can perform comprehensive comparison between sequences remarkably well, both in terms of the Hamming distance of two compared sequences and the corresponding location difference. Compared with the existing methods for similarity analysis, the examination of similarities/dissimilarities illustrates that the proposed method with the computational complexity of O(N) is effective for these three kinds of biological sequences, and bears the universality for them.  相似文献   

4.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

5.
Background

A commonly recurring problem in structural protein studies, is the determination of all heavy atom positions from the knowledge of the central α-carbon coordinates.

Results

We employ advances in virtual reality to address the problem. The outcome is a 3D visualisation based technique where all the heavy backbone and side chain atoms are treated on equal footing, in terms of the Cα coordinates. Each heavy atom is visualised on the surfaces of a different two-sphere, that is centered at another heavy backbone and side chain atoms. In particular, the rotamers are visible as clusters, that display a clear and strong dependence on the underlying backbone secondary structure.

Conclusions

We demonstrate that there is a clear interdependence between rotameric states and secondary structure. Our method easily detects those atoms in a crystallographic protein structure which are either outliers or have been likely misplaced, possibly due to radiation damage. Our approach forms a basis for the development of a new generation, visualization based side chain construction, validation and refinement tools. The heavy atom positions are identified in a manner which accounts for the secondary structure environment, leading to improved accuracy.

  相似文献   

6.
Abstract

Current methods for comparative analyses of protein sequences are 1D-alignments of amino acid sequences based on the maximization of amino acid identity (homology) and the prediction of secondary structure elements. This method has a major drawback once the amino acid identity drops below 20–25 %, since maximization of a homology score does not take into account any structural information. A new technique called Hydrophobic Cluster Analysis (HCA) has been developed by Lemesle-Varloot et al. (Biochimie 72, 555–574), 1990). This consists of comparing several sequences simultaneously and combining homology detection with secondary structure analysis.

HCA is primarily based on the detection and comparison of structural segments constituting the hydrophobic core of globular protein domains, with or without transmembrane domains. We have applied HCA to the analysis of different families of G-protein coupled receptors, such as catecholamine receptors as well as peptide hormone receptors. Utilizing HCA the thrombin receptor, a new and as yet unique member of the family of G-protein coupled receptors, can be clearly classified as being closely related to the family of neuropeptide receptors rather than to the catecholamine receptors for which the shape of the hydrophobic clusters and the length of their third cytoplasmic loop are very different. Furthermore, the potential of HCA to predict relationships between new putative and already characterized members of this family of receptors will be presented.  相似文献   

7.
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.  相似文献   

8.
BackgroundSimilarity based computational methods are a useful tool for predicting protein functions from protein–protein interaction (PPI) datasets. Although various similarity-based prediction algorithms have been proposed, unsatisfactory prediction results have occurred on many occasions. The purpose of this type of algorithm is to predict functions of an unannotated protein from the functions of those proteins that are similar to the unannotated protein. Therefore, the prediction quality largely depends on how to select a set of proper proteins (i.e., a prediction domain) from which the functions of an unannotated protein are predicted, and how to measure the similarity between proteins. Another issue with existing algorithms is they only believe the function prediction is a one-off procedure, ignoring the fact that interactions amongst proteins are mutual and dynamic in terms of similarity when predicting functions. How to resolve these major issues to increase prediction quality remains a challenge in computational biology.ResultsIn this paper, we propose an innovative approach to predict protein functions of unannotated proteins iteratively from a PPI dataset. The iterative approach takes into account the mutual and dynamic features of protein interactions when predicting functions, and addresses the issues of protein similarity measurement and prediction domain selection by introducing into the prediction algorithm a new semantic protein similarity and a method of selecting the multi-layer prediction domain. The new protein similarity is based on the multi-layered information carried by protein functions. The evaluations conducted on real protein interaction datasets demonstrated that the proposed iterative function prediction method outperformed other similar or non-iterative methods, and provided better prediction results.ConclusionsThe new protein similarity derived from multi-layered information of protein functions more reasonably reflects the intrinsic relationships among proteins, and significant improvement to the prediction quality can occur through incorporation of mutual and dynamic features of protein interactions into the prediction algorithm.  相似文献   

9.
Abstract

Measuring the (dis)similarity between RNA secondary structures is critical for the study of RNA secondary structures and has implications to RNA functional characterization. Although a number of methods have been developed for comparing RNA structural similarities, their applications have been limited by the complexity of the required computation. In this paper, we present a novel method for comparing the similarity of RNA secondary structures generated from the same RNA sequence, i.e., a secondary structure ensemble, using a matrix representation of the RNA structures. Relevant features of the RNA secondary structures can be easily extracted through singular value decomposition (SVD) of the representing matrices. We have mapped the feature vectors of the singular values to a kernel space, where (dis)similarities among the mapped feature vectors become more evident, making clustering of RNA secondary structures easier to handle. The pair-wise comparison of RNA structures is achieved through computing the distance between the singular value vectors in the kernel space. We have applied a fuzzy kernel clustering method, using this similarity metric, to cluster the RNA secondary structure ensembles. Our application results suggest that our fuzzy kernel clustering method is highly promising for classifications of RNA structure ensembles, because of its low computational complexity and high clustering accuracy.  相似文献   

10.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

11.
Abstract

Protein sequences are treated as stochastic processes on the basis of a reduced amino acid alphabet of 10 types of amino acids. The realization of a stochastic process is described by associated transition probability matrix that corresponds to the process uniquely. Then new distances between transition probability matrices are defined for sequences similarity analysis. Two separate datasets are prepared and tested to identify the validity of the method. The results demonstrate the new method is powerful and efficient.  相似文献   

12.
Abstract

Structures and functions of proteins play various essential roles in biological processes. The functions of newly discovered proteins can be predicted by comparing their structures with that of known-functional proteins. Many approaches have been proposed for measuring the protein structure similarity, such as the template-modeling (TM)-score method, GRaphlet (GR)-Align method as well as the commonly used root-mean-square deviation (RMSD) measures. However, the alignment comparisons between the similarity of protein structure cost much time on large dataset, and the accuracy still have room to improve. In this study, we introduce a new three-dimensional (3D) Yau–Hausdorff distance between any two 3D objects. The (3D) Yau–Hausdorff distance can be used in particular to measure the similarity/dissimilarity of two proteins of any size and does not need aligning and superimposing two structures. We apply structural similarity to study function similarity and perform phylogenetic analysis on several datasets. The results show that (3D) Yau–Hausdorff distance could serve as a more precise and effective method to discover biological relationships between proteins than other methods on structure comparison.

Communicated by Ramaswamy H. Sarma  相似文献   

13.
Abstract

This paper develops mathematical methods for describing and analyzing RNA secondary structures. It was motivated by the need to develop rigorous yet efficient methods to treat transitions from one secondary structure to another, which we propose here may occur as motions of loops within RNAs having appropriate sequences. In this approach a molecular sequence is described as a vector of the appropriate length. The concept of symmetries between nucleic acid sequences is developed, and the 48 possible different types of symmetries are described. Each secondary structure possible for a particular nucleotide sequence determines a symmetric, signed permutation matrix. The collection of all possible secondary structures is comprised of all matrices of this type whose left multiplication with the sequence vector leaves that vector unchanged. A transition between two secondary structures is given by the product of the two corresponding structure matrices. This formalism provides an efficient method for describing nucleic acid sequences that allows questions relating to secondary structures and transitions to be addressed using the powerful methods of abstract algebra. In particular, it facilitates the determination of possible secondary structures, including those containing pseudoknots. Although this paper concentrates on RNA structure, this formalism also can be applied to DNA  相似文献   

14.

Background  

The chemical property and biological function of a protein is a direct consequence of its primary structure. Several algorithms have been developed which determine alignment and similarity of primary protein sequences. However, character based similarity cannot provide insight into the structural aspects of a protein. We present a method based on spectral similarity to compare subsequences of amino acids that behave similarly but are not aligned well by considering amino acids as mere characters. This approach finds a similarity score between sequences based on any given attribute, like hydrophobicity of amino acids, on the basis of spectral information after partial conversion to the frequency domain.  相似文献   

15.
A protein sequence database (PFDB) containing about 11,000 entries is available for Macintosh computers. The PFDB can be easily updated by importing sequences from the PIR collection through the internet. The most important feature of the database is its organization in families of closely related sequences, each family being characterized by its average dipeptide composition [Petrilli (1993), Comput. Appl. Biosci. 2, 89–93]. This allows one to perform a rapid and sensitive protein similarity search by comparing the precalculated family dipeptide composition with that of the query sequence by a linear correlation coefficient. An example of an application in which a new protein was classsified by using a sequence of a fragment just 19 residues long is reported.  相似文献   

16.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

17.

Background

Similarity search in protein databases is one of the most essential issues in computational proteomics. With the growing number of experimentally resolved protein structures, the focus shifted from sequences to structures. The area of structure similarity forms a big challenge since even no standard definition of optimal structure similarity exists in the field.

Results

We propose a protein structure similarity measure called SProt. SProt concentrates on high-quality modeling of local similarity in the process of feature extraction. SProt’s features are based on spherical spatial neighborhood of amino acids where similarity can be well-defined. On top of the partial local similarities, global measure assessing similarity to a pair of protein structures is built. Finally, indexing is applied making the search process by an order of magnitude faster.

Conclusions

The proposed method outperforms other methods in classification accuracy on SCOP superfamily and fold level, while it is at least comparable to the best existing solutions in terms of precision-recall or quality of alignment.
  相似文献   

18.
All popular algorithms of pair-wise alignment of protein primary structures (e.g. Smith-Waterman (SW), FASTA, BLAST, et al.) utilize only amino acid sequences. The SW-algorithm is the most accurate among them, i.e. it produces alignments that are most similar to the alignments obtained by superposition of protein 3D-structures. But even the SW-algorithm is unable to restore the 3D-based alignment if similarity of amino acid sequences (%id) is below 30%. We have proposed a novel alignment method that explicitly takes into account the secondary structure of the compared proteins. We have shown that it creates significantly more accurate alignments compared to SW-algorithm. In particular, for sequences with %id < 30% the average accuracy of the new method is 58% compared to 35% for SW-algorithm (the accuracy of an algorithmic sequence alignment is the part of restored position of a "golden standard" alignment obtained by superposition of corresponding 3D-structures). The accuracy of the proposed method is approximately identical both for experimental, and for theoretically predicted secondary structures. Thus the method can be applied for alignment of protein sequences even if protein 3D-structure is unknown. The program is available at ftp://194.149.64.196/STRUSWER/.  相似文献   

19.
Abstract

The transient secondary structure and dynamics of an intrinsically unstructured linker domain from the 70 kDa subunit of human replication protein A was investigated using solution state NMR. Stable secondary structure, inferred from large secondary chemical shifts, was observed for a segment of the intrinsically unstructured linker domain when it is attached to an N-terminal protein interaction domain. Results from NMR relaxation experiments showed the rotational diffusion for this segment of the intrinsically unstructured linker domain to be correlated with the N-terminal protein interaction domain. When the N-terminal domain is removed, the stable secondary structure is lost and faster rotational diffusion is observed. The large secondary chemical shifts were used to calculate phi and psidihedral angles and these dihedral angles were used to build a backbone structural model. Restrained molecular dynamics were performed on this new structure using the chemical shift based dihedral angles and a single NOE distance as restraints. In the resulting family of structures a large, solvent exposed loop was observed for the segment of the intrinsically unstructured linker domain that had large secondary chemical shifts.  相似文献   

20.
Recent advances in gene sequencing and rational drug design have re-emphasized the need for new methods for protein analysis, classification, and structure and function prediction. In this article, we introduce a new method for analyzing protein sequences based on Sammon''s non-linear mapping algorithm. When applied to a family of homologous sequences, the method is able to capture the essential features of the similarity matrix, and provides a faithful representation of chemical or evolutionary distance in a simple and intuitive way. The merits of the new algorithm are demonstrated using examples from the protein kinase family.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号