共查询到20条相似文献,搜索用时 0 毫秒
1.
Kristensen DM Chen BY Fofanov VY Ward RM Lisewski AM Kimmel M Kavraki LE Lichtarge O 《Protein science : a publication of the Protein Society》2006,15(6):1530-1536
The annotation of protein function has not kept pace with the exponential growth of raw sequence and structure data. An emerging solution to this problem is to identify 3D motifs or templates in protein structures that are necessary and sufficient determinants of function. Here, we demonstrate the recurrent use of evolutionary trace information to construct such 3D templates for enzymes, search for them in other structures, and distinguish true from spurious matches. Serine protease templates built from evolutionarily important residues distinguish between proteases and other proteins nearly as well as the classic Ser-His-Asp catalytic triad. In 53 enzymes spanning 33 distinct functions, an automated pipeline identifies functionally related proteins with an average positive predictive power of 62%, including correct matches to proteins with the same function but with low sequence identity (the average identity for some templates is only 17%). Although these template building, searching, and match classification strategies are not yet optimized, their sequential implementation demonstrates a functional annotation pipeline which does not require experimental information, but only local molecular mimicry among a small number of evolutionarily important residues. 相似文献
2.
3.
A loop closure-based sequential algorithm, PRODA_MATCH, was developed to match catalytic residues onto a scaffold for enzyme design in silico. The computational complexity of this algorithm is polynomial with respect to the number of active sites, the number of catalytic residues, and the maximal iteration number of cyclic coordinate descent steps. This matching algorithm is independent of a rotamer library that enables the catalytic residue to take any required conformation during the reaction coordinate. The catalytic geometric parameters defined between functional groups of transition state (TS) and the catalytic residues are continuously optimized to identify the accurate position of the TS. Pseudo-spheres are introduced for surrounding residues, which make the algorithm take binding into account as early as during the matching process. Recapitulation of native catalytic residue sites was used as a benchmark to evaluate the novel algorithm. The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions. This indicates that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions. 相似文献
4.
We present a fast algorithm to produce a graphic matrix representationof sequence homology. The algorithm is based on lexicographicalordering of fragments. It preserves most of the options of asimple naive algorithm with a significant increase in speed.This algorithm was the basis for a program, called DNAMAT, thathas been extensively tested during the last three years at theWeizmann Institute of Science and has proven to be very useful.In addition we suggest a way to extend our approach to analysea series of related DNA or RNA sequences, in order to determinecertain common structural features. The analysis is done bysumming a set of dot-matrices to produce an overallmatrix that displays structural elements common to most of thesequences. We give an example of this procedure by analysingtRNA sequences.
Received on June 26, 1986; accepted on September 28, 1986 相似文献
5.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance 总被引:4,自引:0,他引:4
MOTIVATION: Existing algorithms for automated protein structure alignment generate contradictory results and are difficult to interpret. An algorithm which can provide a context for interpreting the alignment and uses a simple method to characterize protein structure similarity is needed. RESULTS: We describe a heuristic for limiting the search space for structure alignment comparisons between two proteins, and an algorithm for finding minimal root-mean-squared-distance (RMSD) alignments as a function of the number of matching residue pairs within this limited search space. Our alignment algorithm uses coordinates of alpha-carbon atoms to represent each amino acid residue and requires a total computation time of O(m(3) n(2)), where m and n denote the lengths of the protein sequences. This makes our method fast enough for comparisons of moderate-size proteins (fewer than approximately 800 residues) on current workstation-class computers and therefore addresses the need for a systematic analysis of multiple plausible shape similarities between two proteins using a widely accepted comparison metric. 相似文献
6.
One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine (SVM), provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs, and PSI-BLAST. 相似文献
7.
Fahad Saeed Trairak Pisitkun Jason D Hoffert Sara Rashidian Guanghui Wang Marjan Gucek Mark A Knepper 《Proteome science》2013,11(Z1):S14
Phosphorylation site assignment of high throughput tandem mass spectrometry (LC-MS/MS) data is one of the most common and critical aspects of phosphoproteomics. Correctly assigning phosphorylated residues helps us understand their biological significance. The design of common search algorithms (such as Sequest, Mascot etc.) do not incorporate site assignment; therefore additional algorithms are essential to assign phosphorylation sites for mass spectrometry data. The main contribution of this study is the design and implementation of a linear time and space dynamic programming strategy for phosphorylation site assignment referred to as PhosSA. The proposed algorithm uses summation of peak intensities associated with theoretical spectra as an objective function. Quality control of the assigned sites is achieved using a post-processing redundancy criteria that indicates the signal-to-noise ratio properties of the fragmented spectra. The quality assessment of the algorithm was determined using experimentally generated data sets using synthetic peptides for which phosphorylation sites were known. We report that PhosSA was able to achieve a high degree of accuracy and sensitivity with all the experimentally generated mass spectrometry data sets. The implemented algorithm is shown to be extremely fast and scalable with increasing number of spectra (we report up to 0.5 million spectra/hour on a moderate workstation). The algorithm is designed to accept results from both Sequest and Mascot search engines. An executable is freely available at http://helixweb.nih.gov/ESBL/PhosSA/ for academic research purposes. 相似文献
8.
9.
MOTIVATION: The rapidly growing protein structure repositories have opened up new opportunities for discovery and analysis of functional and evolutionary relationships among proteins. Detecting conserved structural sites that are unique to a protein family is of great value in identification of functionally important atoms and residues. Currently available methods are computationally expensive and fail to detect biologically significant local features. RESULTS: We propose Local Feature Mining in Proteins (LFM-Pro) as a framework for automatically discovering family-specific local sites and the features associated with these sites. Our method uses the distance field to backbone atoms to detect geometrically significant structural centers of the protein. A feature vector is generated from the geometrical and biochemical environment around these centers. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. The utility and success of LFM-Pro are demonstrated on trypsin-like serine proteases family of proteins and on a challenging classification dataset via comparison with DALI. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features. AVAILABILITY: The software and the datasets are freely available for academic research use at http://bioinfo.ceng.metu.edu.tr/Pub/LFMPro. 相似文献
10.
Molecular mimicry: quantitative methods to study structural similarity between protein and RNA 下载免费PDF全文
With rapidly increasing availability of three-dimensional structures, one major challenge for the post-genome era is to infer the functions of biological molecules based on their structural similarity. While quantitative studies of structural similarity between the same type of biological molecules (e.g., protein vs. protein) have been carried out intensively, the comparable study of structural similarity between different types of biological molecules (e.g., protein vs. RNA) remains unexplored. Here we have developed a new bioinformatics approach to quantitatively study the structural similarity between two different types of biopolymers--proteins and RNA--based on the spatial distribution of conserved elements. We applied it to two previously proposed tRNA-protein mimicry pairs whose functional relatedness between two molecules has been recently determined experimentally. Our method detected the biologically meaningful signals, which are consistent with experimental evidence. 相似文献
11.
Background
Protein function is often dependent on subsets of solvent-exposed residues that may exist in a similar three-dimensional configuration in non homologous proteins thus having different order and/or spacing in the sequence. Hence, functional annotation by means of sequence or fold similarity is not adequate for such cases. 相似文献12.
Antanas A. Glema 《Journal of molecular recognition : JMR》1990,3(3):137-141
In this review the results of the interaction of the active dyes used in the USSR textile industry with microbial enzymes and blood serum proteins are discussed. The complexity of dye/protein interaction and the dependence of this interaction on different factors is demonstrated. Some practical aspects of the use of dye containing sorbents are presented and discussed. Their suitability for RNA ligase and DNA ligase, acetate kinase, alcohol dehydrogenase, lactate dehydrogenase and glucose-6-phosphate dehydrogenase purification and blood serum protein fractionation is demonstrated. 相似文献
13.
Sequence similarity between dopamine beta-hydroxylase and peptide alpha-amidating enzyme: evidence for a conserved catalytic domain 总被引:1,自引:0,他引:1
A comparison of human dopamine beta-hydroxylase (EC 1.14.17.1) with bovine peptide C-terminal alpha-amidating enzyme (EC 1.14.17.3), revealed a 28% identity extending throughout a common catalytic domain of approximately 270 residues. The shared biochemical properties of these two enzymes from neurosecretory granules suggests that the sequence similarity reflects a genuine homology and provides a structural basis for a new family of copper type II, ascorbate-dependent monooxygenases. 相似文献
14.
A novel tool for computer-aided design of single-site mutations in proteins and peptides is presented. It proceeds by performing in silico all possible point mutations in a given protein or protein region and estimating the stability changes with linear combinations of database-derived potentials, whose coefficients depend on the solvent accessibility of the mutated residues. Upon completion, it yields a list of the most stabilizing, destabilizing or neutral mutations. This tool is applied to mouse, hamster and human prion proteins to identify the point mutations that are the most likely to stabilize their cellular form. The selected mutations are essentially located in the second helix, which presents an intrinsic preference to form beta-structures, with the best mutations being T183-->F, T192-->A and Q186-->A. The T183 mutation is predicted to be by far the most stabilizing one, but should be considered with care as it blocks the glycosylation of N181 and this blockade is known to favor the cellular to scrapie conversion. Furthermore, following the hypothesis that the first helix might induce the formation of hydrophilic beta-aggregates, several mutations that are neutral with respect to the structure's stability but improve the helix hydrophobicity are selected, among which is E146-->L. These mutations are intended as good candidates to undergo experimental tests. 相似文献
15.
A yeast vacuolar protease, carboxypeptidase Y (CPY), is known to be involved in the C-terminal processing of peptides and proteins; however, its real function remains unclear. The CPY biosynthetic pathway has been used as a model system for protein sorting in eukaryotes. CPY is synthesized as a prepro-form that travels through the ER and Golgi to its final destination in vacuoles. In the course of studies on the transport mechanism of CPY, various post-translational events have been identified, e.g. carbohydrate modification and cleavage of the pre-segments. In addition, sorting signals and various sorting vehicles, similar to those found in higher eukaryotic cells, have been found. The catalytic triad in the active site of CPY makes this enzyme a serine protease. A unique feature distinguishing CPY from other serine proteases is its wide pH optimum, in particular its high activity at acidic pH. Several structural properties which might contribute to this unique feature exist such as a conserved free cysteine residue in the S1 substrate binding pocket, a recognition site for a C-terminal carboxyl group, and a disulfide zipper motif. The structural bases in CPY functions are discussed in this article. 相似文献
16.
Selenocysteine is the 21th amino acid, which occurs in all kingdoms of life. Selenocysteine is encoded by the STOP-codon UGA. For its insertion, it requires a specific mRNA sequence downstream the UGA-codon that forms a hairpin like structure (called Sec insertion sequence (SECIS)). We consider the computational problem of generating new amino acid sequences containing selenocysteine. This requires to find an mRNA sequence that is similar to the SECIS-consensus, is able to form the secondary structure required for selenocysteine insertion, and whose translation is maximally similar to the original amino acid sequence. We show that the problem can be solved in linear time when considering the hairpin-like SECIS-structure (and, more generally, when considering a structure that does not contain pseudoknots). 相似文献
17.
A heuristic graph comparison algorithm and its application to detect functionally related enzyme clusters 总被引:9,自引:1,他引:9 下载免费PDF全文
The availability of computerized knowledge on biochemical pathways in the KEGG database opens new opportunities for developing computational methods to characterize and understand higher level functions of complete genomes. Our approach is based on the concept of graphs; for example, the genome is a graph with genes as nodes and the pathway is another graph with gene products as nodes. We have developed a simple method for graph comparison to identify local similarities, termed correlated clusters, between two graphs, which allows gaps and mismatches of nodes and edges and is especially suitable for detecting biological features. The method was applied to a comparison of the complete genomes of 10 microorganisms and the KEGG metabolic pathways, which revealed, not surprisingly, a tendency for formation of correlated clusters called FRECs (functionally related enzyme clusters). However, this tendency varied considerably depending on the organism. The relative number of enzymes in FRECs was close to 50% for Bacillus subtilis and Escherichia coli, but was <10% for Synechocystis and Saccharomyces cerevisiae. The FRECs collection is reorganized into a collection of ortholog group tables in KEGG, which represents conserved pathway motifs with the information about gene clusters in all the completely sequenced genomes. 相似文献
18.