首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
A new method is proposed to identify whether a query protein is singleplex or multiplex for improving the quality of protein subcellular localization prediction. Based on the transductive learning technique, this approach utilizes the information from the both query proteins and known proteins to estimate the subcellular location number of every query protein so that the singleplex and multiplex proteins can be recognized and distinguished. Each query protein is then dealt with by a targeted single-label or multi-label predictor to achieve a high-accuracy prediction result. We assess the performance of the proposed approach by applying it to three groups of protein sequences datasets. Simulation experiments show that the proposed approach can effectively identify the singleplex and multiplex proteins. Through a comparison, the reliably of this method for enhancing the power of predicting protein subcellular localization can also be verified.  相似文献   

2.
Classification of newly determined protein structures is important in understanding their function and mechanism of action. Currently available methods employ a global structure alignment strategy and are computationally expensive. We propose a two-step methodology with a quick screen to significantly reduce the number of candidate structures followed by global structure alignment of the query structure with the reduced set. We represent a protein structure as a sequence of local structures, codified in the form of geometric invariants. Geometric invariants are quantities that remain unchanged under transformations such as translation and rotation. Protein structures represented as multi-attribute sequences are aligned via dynamic programming to identify close neighbors of the query structure. The query structure is then compared with this reduced dataset using conventional structure comparison methods to predict its functional class. For a typical protein structure, the screening method was able to reduce the protein data bank to mere 200 proteins while preserving structurally closest neighbor in the reduced set. This has resulted in 30 to 60 fold improvement in the execution time. We present the results of leave-one-out classification experiment on ASTRAL-95 domains and comparison with SCOP classification hierarchy.  相似文献   

3.
Kim D  Xu D  Guo JT  Ellrott K  Xu Y 《Protein engineering》2003,16(9):641-650
A new method for fold recognition is developed and added to the general protein structure prediction package PROSPECT (http://compbio.ornl.gov/PROSPECT/). The new method (PROSPECT II) has four key features. (i) We have developed an efficient way to utilize the evolutionary information for evaluating the threading potentials including singleton and pairwise energies. (ii) We have developed a two-stage threading strategy: (a) threading using dynamic programming without considering the pairwise energy and (b) fold recognition considering all the energy terms, including the pairwise energy calculated from the dynamic programming threading alignments. (iii) We have developed a combined z-score scheme for fold recognition, which takes into consideration the z-scores of each energy term. (iv) Based on the z-scores, we have developed a confidence index, which measures the reliability of a prediction and a possible structure-function relationship based on a statistical analysis of a large data set consisting of threadings of 600 query proteins against the entire FSSP templates. Tests on several benchmark sets indicate that the evolutionary information and other new features of PROSPECT II greatly improve the alignment accuracy. We also demonstrate that the performance of PROSPECT II on fold recognition is significantly better than any other method available at all levels of similarity. Improvement in the sensitivity of the fold recognition, especially at the superfamily and fold levels, makes PROSPECT II a reliable and fully automated protein structure and function prediction program for genome-scale applications.  相似文献   

4.
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.  相似文献   

5.
We present a novel method for structural comparison of protein structures. The approach consists of two main phases: 1) an initial search phase where, starting from aligned pairs of secondary structure elements, the space of 3D transformations is searched for similarities and 2) a subsequent refinement phase where interim solutions are subjected to parallel, local, iterative dynamic programming in the areas of possible improvement. The proposed method combines dynamic programming for finding alignments but does not restrict solutions to be sequential. In addition, to deal with the problem of nonuniqueness of optimal similarities, we introduce a consensus scoring method in selecting the preferred similarity and provide a list of top-ranked solutions. The method, called FASE (flexible alignment of secondary structure elements), was tested on well-known data and various standard problems from the literature. The results show that FASE is able to find remote and weak similarities consistently using a reasonable run time. The method was tested (using the SCOP database) on its ability to discriminate interfold pairs from intrafold pairs at the level of the best existing methods. The method was then applied to the problem of finding circular permutations in proteins.  相似文献   

6.
Tobi D 《Proteins》2012,80(4):1167-1176
A novel methodology for comparison of protein dynamics is presented. Protein dynamics is calculated using the Gaussian network model and the modes of motion are globally aligned using the dynamic programming algorithm of Needleman and Wunsch, commonly used for sequence alignment. The alignment is fast and can be used to analyze large sets of proteins. The methodology is applied to the four major classes of the SCOP database: "all alpha proteins," "all beta proteins," "alpha and beta proteins," and "alpha/beta proteins". We show that different domains may have similar global dynamics. In addition, we report that the dynamics of "all alpha proteins" domains are less specific to structural variations within a given fold or superfamily compared with the other classes. We report that domain pairs with the most similar and the least similar global dynamics tend to be of similar length. The significance of the methodology is that it suggests a new and efficient way of mapping between the global structural features of protein families/subfamilies and their encoded dynamics.  相似文献   

7.
Homology detection and protein structure prediction are central themes in bioinformatics. Establishment of relationship between protein sequences or prediction of their structure by sequence comparison methods finds limitations when there is low sequence similarity. Recent works demonstrate that the use of profiles improves homology detection and protein structure prediction. Profiles can be inferred from protein multiple alignments using different approaches. The "Conservatism-of-Conservatism" is an effective profile analysis method to identify structural features between proteins having the same fold but no detectable sequence similarity. The information obtained from protein multiple alignments varies according to the amino acid classification employed to calculate the profile. In this work, we calculated entropy profiles from PSI-BLAST-derived multiple alignments and used different amino acid classifications summarizing almost 500 different attributes. These entropy profiles were converted into pseudocodes which were compared using the FASTA program with an ad-hoc matrix. We tested the performance of our method to identify relationships between proteins with similar fold using a nonredundant subset of sequences having less than 40% of identity. We then compared our results using Coverage Versus Error per query curves, to those obtained by methods like PSI-BLAST, COMPASS and HHSEARCH. Our method, named HIP (Homology Identification with Profiles) presented higher accuracy detecting relationships between proteins with the same fold. The use of different amino acid classifications reflecting a large number of amino acid attributes, improved the recognition of distantly related folds. We propose the use of pseudocodes representing profile information as a fast and powerful tool for homology detection, fold assignment and analysis of evolutionary information enclosed in protein profiles.  相似文献   

8.
Searching a database for a local alignment to a query under a typical scoring scheme, such as PAM120 or BLOSUM62 with affine gap costs, is a computation that has resisted algorithmic improvement due to its basis in dynamic programming and the weak nature of the signals being searched for. In a query preprocessing step, a set of tables can be built that permit one to (a) eliminate a large fraction of the dynamic programming matrix from consideration and (b) to compute several steps of the remainder with a single table lookup. While this result is not an asymptotic improvement over the original Smith-Waterman algorithm, its complexity is characterized in terms of some sparse features of the matrix and it yields the fastest software implementation to date for such searches.  相似文献   

9.
Searching for protein structure-function relationships using three-dimensional (3D) structural coordinates represents a fundamental approach for determining the function of proteins with unknown functions. Since protein structure databases are rapidly growing in size, the development of a fast search method to find similar protein substructures by comparison of protein 3D structures is essential. In this article, we present a novel protein 3D structure search method to find all substructures with root mean square deviations (RMSDs) to the query structure that are lower than a given threshold value. Our new algorithm runs in O(m + N/m(0.5)) time, after O(N log N) preprocessing, where N is the database size and m is the query length. The new method is 1.8-41.6 times faster than the practically best known O(N) algorithm, according to computational experiments using a huge database (i.e., >20,000,000 C-alpha coordinates).  相似文献   

10.
Electron cryo-microscopy is a fast advancing biophysical technique to derive three-dimensional structures of large protein complexes. Using this technique, many density maps have been generated at intermediate resolution such as 6-10 ? resolution. Although it is challenging to derive the backbone of the protein directly from such density maps, secondary structure elements such as helices and β-sheets can be computationally detected. Our work in this paper provides an approach to enumerate the top-ranked possible topologies instead of enumerating the entire population of the topologies. This approach is particularly practical for large proteins. We developed a directed weighted graph, the topology graph, to represent the secondary structure assignment problem. We prove that the problem of finding the valid topology with the minimum cost is NP hard. We developed an O(N(2)2(N)) dynamic programming algorithm to identify the topology with the minimum cost. The test of 15 proteins suggests that our dynamic programming approach is feasible to work with proteins of much larger size than we could before. The largest protein in the test contains 18 helical sticks detected from the density map out of 33 helices in the protein.  相似文献   

11.
Our algorithm predicts short linear functional motifs in proteins using only sequence information. Statistical models for short linear functional motifs in proteins are built using the database of short sequence fragments taken from proteins in the current release of the Swiss-Prot database. Those segments are confirmed by experiments to have single-residue post-translational modification. The sensitivities of the classification for various types of short linear motifs are in the range of 70%. The query protein sequence is dissected into short overlapping fragments. All segments are represented as vectors. Each vector is then classified by a machine learning algorithm (Support Vector Machine) as potentially modifiable or not. The resulting list of plausible post-translational sites in the query protein is returned to the user. We also present a study of the human protein kinase C family as a biological application of our method.  相似文献   

12.
13.
Comparison and classification of folding patterns from a database of protein structures is crucial to understand the principles of protein architecture, evolution and function. Current search methods for proteins with similar folding patterns are slow and computationally intensive. The sharp growth in the number of known protein structures poses severe challenges for methods of structural comparison. There is a need for methods that can search the database of structures accurately and rapidly. We provide several methods to search for similar folding patterns using a concise tableau representation of proteins that encodes the relative geometry of secondary structural elements. Our first approach allows the extraction of identical and very closely-related protein folding patterns in constant-time (per hit). Next, we address the hard computational problem of extraction of maximally-similar subtableaux, when comparing two tableaux. We solve the problem using Quadratic and Linear integer programming formulations and demonstrate their power to identify subtle structural similarities, especially when protein structures significantly diverge. Finally, we describe a rapid and accurate method for comparing a query structure against a database of protein domains, TableauSearch. TableauSearch is rapid enough to search the entire structural database in seconds on a standard desktop computer. Our analysis of TableauSearch on many queries shows that the method is very accurate in identifying similarities of folding patterns, even between distantly related proteins. AVAILABILITY: A web server implementing the TableauSearch is available from http://hollywood.bx.psu.edu/TabSearch.  相似文献   

14.
15.
A method for comparison of protein sequences based on their primary and secondary structure is described. Protein sequences are annotated with predicted secondary structures (using a modified Chou and Fasman method). Two lettered code sequences are generated (Xx, where X is the amino acid and x is its annotated secondary structure). Sequences are compared with a dynamic programming method (STRALIGN) that includes a similarity matrix for both the amino acids and secondary structures. The similarity value for each paired two-lettered code is a linear combination of similarity values for the paired amino acids and their annotated secondary structures. The method has been applied to eight globin proteins (28 pairs) for which the X-ray structure is known. For protein pairs with high primary sequence similarity (greater than 45%), STRALIGN alignment is identical to that obtained by a dynamic programming method using only primary sequence information. However, alignment of protein pairs with lower primary sequence similarity improves significantly with the addition of secondary structure annotation. Alignment of the pair with the least primary sequence similarity of 16% was improved from 0 to 37% 'correct' alignment using this method. In addition, STRALIGN was successfully applied to seven pairs of distantly related cytochrome c proteins, and three pairs of distantly related picornavirus proteins.  相似文献   

16.
We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.  相似文献   

17.
Ohlson T  Wallner B  Elofsson A 《Proteins》2004,57(1):188-197
To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.  相似文献   

18.
Mostafavi S  Morris Q 《Proteomics》2012,12(10):1687-1696
In this article, we review how interaction networks can be used alone or in combination in an automated fashion to provide insight into gene and protein function. We describe the concept of a "gene-recommender system" that can be applied to any large collection of interaction networks to make predictions about gene or protein function based on a query list of proteins that share a function of interest. We discuss these systems in general and focus on one specific system, GeneMANIA, that has unique features and uses different algorithms from the majority of other systems.  相似文献   

19.
An approach to predicting folding nuclei in globular proteins with known three-dimensional structures is proposed. This approach is based on the pinpointing of the lowest saddle points on the barrier between the unfolded state and native structure on the free-energy landscape of a protein chain; the proposed technique uses the dynamic programming method. A comparison of calculation results with experimental data on the folding nuclei of 21 proteins shows that the model provides good Φ value predictions for protein structures determined by X-ray analysis and, less successfully, in structures determined by nuclear magnetic resonance. Consideration of the whole ensemble of transition states provides a better prediction of folding nuclei than consideration of only transition states with lowest free energies. In addition, we predict the location of folding nuclei in three-dimensional structures of some proteins whose folding kinetics is being studied, but there is no experimental evidence concerning their folding nuclei.  相似文献   

20.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号