首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background  

We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment.  相似文献   

2.
Enzyme function conservation has been used to derive the threshold of sequence identity necessary to transfer function from a protein of known function to an unknown protein. Using pairwise sequence comparison, several studies suggested that when the sequence identity is above 40%, enzyme function is well conserved. In contrast, Rost argued that because of database bias, the results from such simple pairwise comparisons might be misleading. Thus, by grouping enzyme sequences into families based on sequence similarity and selecting representative sequences for comparison, he showed that enzyme function starts to diverge quickly when the sequence identity is below 70%. Here, we employ a strategy similar to Rost's to reduce the database bias; however, we classify enzyme families based not only on sequence similarity, but also on functional similarity, i.e. sequences in each family must have the same four digits or the same first three digits of the enzyme commission (EC) number. Furthermore, instead of selecting representative sequences for comparison, we calculate the function conservation of each enzyme family and then average the degree of enzyme function conservation across all enzyme families. Our analysis suggests that for functional transferability, 40% sequence identity can still be used as a confident threshold to transfer the first three digits of an EC number; however, to transfer all four digits of an EC number, above 60% sequence identity is needed to have at least 90% accuracy. Moreover, when PSI-BLAST is used, the magnitude of the E-value is found to be weakly correlated with the extent of enzyme function conservation in the third iteration of PSI-BLAST. As a result, functional annotation based on the E-values from PSI-BLAST should be used with caution. We also show that by employing an enzyme family-specific sequence identity threshold above which 100% functional conservation is required, functional inference of unknown sequences can be accurately accomplished. However, this comes at a cost: those true positive sequences below this threshold cannot be uniquely identified.  相似文献   

3.
We investigate the conservation of amino acid residue sequences in 21 DNA-binding protein families and study the effects that mutations have on DNA-sequence recognition. The observations are best understood by assigning each protein family to one of three classes: (i) non-specific, where binding is independent of DNA sequence; (ii) highly specific, where binding is specific and all members of the family target the same DNA sequence; and (iii) multi-specific, where binding is also specific, but individual family members target different DNA sequences. Overall, protein residues in contact with the DNA are better conserved than the rest of the protein surface, but there is a complex underlying trend of conservation for individual residue positions. Amino acid residues that interact with the DNA backbone are well conserved across all protein families and provide a core of stabilising contacts for homologous protein-DNA complexes. In contrast, amino acid residues that interact with DNA bases have variable levels of conservation depending on the family classification. In non-specific families, base-contacting residues are well conserved and interactions are always found in the minor groove where there is little discrimination between base types. In highly specific families, base-contacting residues are highly conserved and allow member proteins to recognise the same target sequence. In multi-specific families, base-contacting residues undergo frequent mutations and enable different proteins to recognise distinct target sequences. Finally, we report that interactions with bases in the target sequence often follow (though not always) a universal code of amino acid-base recognition and the effects of amino acid mutations can be most easily understood for these interactions.  相似文献   

4.
Alignments grow, secondary structure prediction improves.   总被引:12,自引:0,他引:12  
Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.  相似文献   

5.
Correlated mutation analyses (CMA) on multiple sequence alignments are widely used for the prediction of the function of amino acids. The accuracy of CMA‐based predictions is mainly determined by the number of sequences, by their evolutionary distances, and by the quality of the alignments. These criteria are best met in structure‐based sequence alignments of large super‐families. So far, CMA‐techniques have mainly been employed to study the receptor interactions. The present work shows how a novel CMA tool, called Comulator, can be used to determine networks of functionally related residues in enzymes. These analyses provide leads for protein engineering studies that are directed towards modification of enzyme specificity or activity. As proof of concept, Comulator has been applied to four enzyme super‐families: the isocitrate lyase/phoshoenol‐pyruvate mutase super‐family, the hexokinase super‐family, the RmlC‐like cupin super‐family, and the FAD‐linked oxidases super‐family. In each of those cases networks of functionally related residue positions were discovered that upon mutation influenced enzyme specificity and/or activity as predicted. We conclude that CMA is a powerful tool for redesigning enzyme activity and selectivity. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
7.
Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.  相似文献   

8.
Protein threading using PROSPECT: design and evaluation   总被引:14,自引:0,他引:14  
Xu Y  Xu D 《Proteins》2000,40(3):343-354
The computer system PROSPECT for the protein fold recognition using the threading method is described and evaluated in this article. For a given target protein sequence and a template structure, PROSPECT guarantees to find a globally optimal threading alignment between the two. The scoring function for a threading alignment employed in PROSPECT consists of four additive terms: i) a mutation term, ii) a singleton fitness term, iii) a pairwise-contact potential term, and iv) alignment gap penalties. The current version of PROSPECT considers pair contacts only between core (alpha-helix or beta-strand) residues and alignment gaps only in loop regions. PROSPECT finds a globally optimal threading efficiently when pairwise contacts are considered only between residues that are spatially close (7 A or less between the C(beta) atoms in the current implementation). On a test set consisting of 137 pairs of target-template proteins, each pair being from the same superfamily and having sequence identity 相似文献   

9.
Only a minority of currently known protein families is characterized structurally. This makes homology-based structure modeling an essential instrument that can be viewed as the first approximation to experimental determination of protein structure. Using sequence similarity searches, we detected a distant similarity between a family of uncharacterized hypothetical proteins, COG4849, and the family of tRNA nucleotidyltransferases. The suggested remote homology between the N-terminal domain of COG4849 and the catalytic domain of tRNA nucleotidyltransferase was further supported by comparison of sequence profiles, methods for fold recognition and structure modeling. The combined multiple alignment of the two families reveals shared conservation of functionally important motifs and suggests the similarity in catalytic mechanisms of the performed reactions. Our results suggest that (i) the N-terminal domain of proteins from COG4849 shares structural similarity with the catalytic domain of tRNA nucleotidyltransferase, and (ii) this domain catalyzes the nucleotidyl transfer reaction involving two metal ions.  相似文献   

10.
Vicatos S  Reddy BV  Kaznessis Y 《Proteins》2005,58(4):935-949
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.  相似文献   

11.
The pyridoxal-5'-phosphate-dependent enzymes (B6 enzymes) are grouped into three main families named alpha, beta, and gamma. Proteins in the alpha and gamma families share the same fold and might be distantly related, while those in the beta family exhibit specific structural features. The rat aromatic L-amino acid decarboxylase (AADC; EC(4.1.1.28)) catalyzes the synthesis of two important neurotransmitters: dopamine and serotonin. It binds the cofactor pyridoxal-5'-phosphate and belongs to the alpha family. Despite the low level of sequence identity (approximately 10%) shared by the rat AADC and the sequences of the enzymes belonging to the B6 enzymes family, including the known three-dimensional structures, a multiple sequence alignment was deduced. A model was built using segments belonging to seven of the eleven known structures. By homology, and based on knowledge of the biochemistry of the aspartate aminotransferase, structurally and functionally important residues were identified in the rat AADC. Site-directed mutagenesis of the conserved residues D271, T246, and C311 was carried out in order to confirm our predictions and highlight their functional role. Mutation of D271A and D271N resulted in complete loss of enzyme activity, while the D271E mutant exhibited 2% of the wild-type activity. Substitution of T246A resulted in 5% of the wild-type activity while the C311A mutant conserved 42% of the wild-type activity. A functional model of the AADC is discussed in view of the structural model and the complementary mutagenesis and labelling studies.  相似文献   

12.
MS‐based proteomics characterizes protein contents of biological samples. The most common approach is to first match observed MS/MS peptide spectra against theoretical spectra from a protein sequence database and then to score these matches. The false discovery rate (FDR) can be estimated as a function of the score by searching together the protein sequence database and its randomized version and comparing the score distributions of the randomized versus nonrandomized matches. This work introduces a straightforward isotonic regression‐based method to estimate the cumulative FDRs and local FDRs (LFDRs) of peptide identification. Our isotonic method not only performed as well as other methods used for comparison, but also has the advantages of being: (i) monotonic in the score, (ii) computationally simple, and (iii) not dependent on assumptions about score distributions. We demonstrate the flexibility of our approach by using it to estimate FDRs and LFDRs for protein identification using summaries of the peptide spectra scores. We reconfirmed that several of these methods were superior to a two‐peptide rule. Finally, by estimating both the FDRs and LFDRs, we showed for both peptide and protein identification, moderate FDR values (5%) corresponded to large LFDR values (53 and 60%).  相似文献   

13.
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the C(α) atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA(+), FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.  相似文献   

14.
Culture independent PCR: an alternative enzyme discovery strategy   总被引:1,自引:0,他引:1  
Degenerate primers were designed for use in a culture-independent PCR screening of DNA from composite fungal communities, inhabiting residues of corn stovers and leaves. According to similarity searches and alignments amplified clone sequences affiliated with glycosyl hydrolase family 7 and glycosyl hydrolase family 45 though significant sequence divergence was observed. Glycosyl hydrolases from families 7 and 45 play a crucial role in biomass conversion to fuel ethanol. Research in this renewable energy source has two objectives: (i) To contribute to development of a renewable alternative to world's limited crude fossil oil reserves and (ii) to reduce air pollution. Amplification with 18S rDNA-specific primers revealed species within the ascomycetous orders Sordariales and Hypocreales as well as basidiomycetous order Agaricales to be present in these communities. Our study documents the value of culture-independent PCR in microbial diversity studies and could add to development of a new enzyme screening technology.  相似文献   

15.
Position-specific substitution matrices, known as profiles,derived from multiple sequence alignments are currently usedto search sequence databases for distantly related members ofprotein families. The performance of the database searches isenhanced by using (i) a sequence weighting scheme which assignshigher weights to more distantly related sequences based onbranch lengths derived from phylogenetic trees, (ii) exclusionof positions with mainly padding characters at sites of insertionsor deletions and (iii) the BLOSUM62 residue comparison matrix.A natural consequence of these modifications is an improvementin the alignment of new sequences to the profiles. However,the accuracy of the alignments can be further increased by employinga similarity residue comparison matrix. These developments areimplemented in a program called PROFILEWEIGHT which runs onUnix and Vax computers. The only input required by the programis the multiple sequence alignment. The output from PROFILEWEIGHTis a profile designed to be used by existing searching and alignmentprograms. Test results from database searches with four differentfamilies of proteins show the improved sensitivity of the weightedprofiles.  相似文献   

16.
A gene encoding a copper/zinc superoxide dismutase (Cu/ Zn-SOD) of a filarial nematode, Brugia malayi, has been isolated and the biochemical properties of a functionally expressed recombinant enzyme were investigated. The cloned complementary DNA contained a single open reading frame of 477 bp encoding 158 amino acids (aa), which conserved metal-binding residues as well as residues specific for Cu/Zn-SODs. Comparison of the deduced aa sequence of the enzyme with that of other helminthes species, including filarial worms, exhibited high degree of similarities (49-98%). Recombinant enzyme of 32 kDa had an isoelectric point of 6.6 and was shown to consist of 2 subunits linked by interchain disulfide bonds. Enzyme activity of the recombinant protein was inhibited by potassium cyanide and hydrogen peroxide but not by sodium azide. It showed a wide range of pH optima, i.e., 7.0-11.0 and was highly resistant to heat inactivation.  相似文献   

17.
Enzyme function is much less conserved than anticipated, i.e., the requirement for sequence similarity that implies similarity in enzymatic function is much higher than the requirement that implies similarity in protein structure. This is because the function of an enzyme is an extremely complicated problem that may involve very subtle structural details as well as many other physical chemistry factors. Accordingly, if simply based on the sequence similarity approach, it would hardly get a decent success rate in predicting enzyme sub-class even for a dataset consisting of samples with 50% sequence identity. To cope with such a situation, the GO-PseAA predictor was adopted to identify the sub-class for each of the six main enzyme families. It has been observed that, even for the much more stringent datasets in which none of the enzymes has 25% sequence identity to any others, the overall success rates are 73-95%, suggesting that the GO-PseAA predictor can catch the core features of the statistical samples concerned and may become a useful high throughput tool in proteomics and bioinformatics.  相似文献   

18.
Lee Y  Mick J  Furdui C  Beamer LJ 《PloS one》2012,7(6):e38114
Coevolution analyses identify residues that co-vary with each other during evolution, revealing sequence relationships unobservable from traditional multiple sequence alignments. Here we describe a coevolutionary analysis of phosphomannomutase/phosphoglucomutase (PMM/PGM), a widespread and diverse enzyme family involved in carbohydrate biosynthesis. Mutual information and graph theory were utilized to identify a network of highly connected residues with high significance. An examination of the most tightly connected regions of the coevolutionary network reveals that most of the involved residues are localized near an interdomain interface of this enzyme, known to be the site of a functionally important conformational change. The roles of four interface residues found in this network were examined via site-directed mutagenesis and kinetic characterization. For three of these residues, mutation to alanine reduces enzyme specificity to ~10% or less of wild-type, while the other has ~45% activity of wild-type enzyme. An additional mutant of an interface residue that is not densely connected in the coevolutionary network was also characterized, and shows no change in activity relative to wild-type enzyme. The results of these studies are interpreted in the context of structural and functional data on PMM/PGM. Together, they demonstrate that a network of coevolving residues links the highly conserved active site with the interdomain conformational change necessary for the multi-step catalytic reaction. This work adds to our understanding of the functional roles of coevolving residue networks, and has implications for the definition of catalytically important residues.  相似文献   

19.
In order to bridge the gap between proteins with three-dimensional (3-D) structural information and those without 3-D structures, extensive experimental and computational efforts for structure recognition are being invested. One of the rapid and simple computational approaches for structure recognition makes use of sequence profiles with sensitive profile matching procedures to identify remotely related homologous families. While adopting this approach we used profiles that are generated from structure-based sequence alignment of homologous protein domains of known structures integrated with sequence homologues. We present an assessment of this fast and simple approach. About one year ago, using this approach, we had identified structural homologues for 315 sequence families, which were not known to have any 3-D structural information. The subsequent experimental structure determination for at least one of the members in 110 of 315 sequence families allowed a retrospective assessment of the correctness of structure recognition. We demonstrate that correct folds are detected with an accuracy of 96.4% (106/110). Most (81/106) of the associations are made correctly to the specific structural family. For 23/106, the structure associations are valid at the superfamily level. Thus, profiles of protein families of known structure when used with sensitive profile-based search procedure result in structure association of high confidence. Further assignment at the level of superfamily or family would provide clues to probable functions of new proteins. Importantly, the public availability of these profiles from us could enable one to perform genome wide structure assignment in a local machine in a fast and accurate manner.  相似文献   

20.
Family 28 belongs to the largest families of glycoside hydrolases. It covers several enzyme specificities of bacterial, fungal, plant and insect origins. This study deals with all available amino acid sequences of family 28 members. First, it focuses on the detailed analysis of 115 sequences of polygalacturonases yielding their evolutionary tree. The large data set allowed modification of some of the existing family 28 sequence characteristics and to draw the sequence features specific for bacterial and fungal exopolygalacturonases discriminating them from the endopolygalacturonases. The evolutionary tree reflects both the taxonomy and specificity so that bacterial, fungal and plant enzymes form their own clusters, the endo- and exo-mode of action being respected, too. The only insect (animal) representative is most related to fungal endopolygalacturonases. The present study brings further: (i) the analysis of available rhamnogalacturonase sequences; (ii) the elucidation of relatedness between the recently added member, the endo-xylogalacturonan hydrolase and the rest of the family; and (iii) revealing the sequence features characteristic of the individual enzyme specificities and the evolutionary relationships within the entire family 28. The disulfides common for the individual enzyme groups were also proposed. With regard to functionally important residues of polygalacturonases, xylogalacturonan hydrolase possesses all of them, while the rhamnogalacturonases, known to lack the histidine residue (His223; Aspergillus niger polygalacturonase II numbering), have a further tyrosine (Tyr291) replaced by a conserved tryptophan. Evolutionarily, the xylogalacturonan hydrolase is most related to fungal exopolygalacturonases and the rhamnogalacturonases form their own cluster on the adjacent branch.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号