首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods.  相似文献   

2.
The sequencing of theMycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78% of the encoded gene products. For 69% of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by ∼ 11%. Remote similarity detection methods have enabled domain assignments for 1325 ‘hypothetical proteins’. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/∼dots. An erratum to this article is available at .  相似文献   

3.
Summary We have identified a cDNA clone encoding BMP receptor-associated molecule 1 (BRAM1) from the zebrafish expressed sequence tag (EST) database. The 2606 bp full-length bram1 cDNA was cloned, and further confirmed by nucleotide sequencing. The zebrafish sequence encodes a protein of 195 amino acids with an evolutionarily conserved MYND domain, which displays ∼ ∼98% homology with human and mouse BRAM1, and ∼ ∼64% homology with C. elegans BRA-1 and BRA-2. The bram1 gene, composed of five exons and four introns, spans ∼ ∼14 kb on linkage group 14 of the zebrafish genome. RT-PCR and whole mount in situ hybridization analyses disclosed that zebrafish BRAM1 is a maternal factor. The protein interacts directly with zebrafish BMP Receptor type IA, as observed from GST-pull down and co-immunoprecipitation assays. Furthermore, cotransfection of zebrafish BRAM1 with the corresponding BMP receptor resulted in down-regulation of BMP-mediated signaling. Our results collectively indicate that BRAM1 plays a biological role during zebrafish development.  相似文献   

4.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation.  相似文献   

5.
Kikuchi T 《Amino acids》2008,35(3):541-549
It is well-known that the IgG-binding domain from staphylococcal protein A folds into a 3α helix bundle structure, while the IgG-binding domain of streptococcal protein G forms an (α + β) structure. Recently, He et al. (Biochemistry 44:14055–14061, 2005) made mutants of these proteins from the wild types of protein A and protein G strains. These mutants are referred to as protein A219 and protein G311, and it was showed that these two mutants have different 3D structures, i.e., the 3α helix bundle structure and the (α + β) structure, respectively, despite the high sequence identity (59%). The purpose of our study was to clarify how such 3D structural differences are coded in the sequences with high homology. To address this problem, we introduce a predicted contact map constructed based on the interresidue average-distance statistics for prediction of folding properties of a protein. We refer to this map as an average distance map (ADM). Furthermore, the statistics of interresidue distances can be converted to an effective interresidue potential. We calculated the contact frequency of each residue of a protein in random conformations with this effective interresidue potential, and then we obtained values similar to ϕ values. We refer to this contact frequency of each residue as a p(μ) value. The comparison of the p(μ) values to the ϕ values for a protein suggests that p(μ) values reveal the information on the folding initiation site. Using these techniques, we try to extract the information on the difference in the 3D structures of protein A219 and protein G311 coded in their amino acid sequences in the present work. The results show that the ADM analyses and the p(μ) value analyses predict the information of folding initiation sites, which can be used to detect the 3D difference in both proteins.  相似文献   

6.
TonB is a protein prevalent in a large number of Gram-negative bacteria that is believed to be responsible for the energy transduction component in the import of ferric iron complexes and vitamin B12 across the outer membrane. We have analyzed all the TonB proteins that are currently contained in the Entrez database and have identified nine different clusters based on its conserved 90-residue C-terminal domain amino acid sequence. The vast majority of the proteins contained a single predicted cytoplasmic transmembrane domain; however, nine of the TonB proteins encompass a ∼290 amino acid N-terminal extension homologous to the MecR1 protein, which is composed of three additional predicted transmembrane helices. The periplasmic linker region, which is located between the N-terminal domain and the C-terminal domain, is extremely variable both in length (22–283 amino acids) and in proline content, indicating that a Pro-rich domain is not a required feature for all TonB proteins. The secondary structure of the C-terminal domain is found to be well preserved across all families, with the most variable region being between the second α-helix and the third β-strand of the antiparallel β-sheet. The fourth β-strand found in the solution structure of the Escherichia coli TonB C-terminal domain is not a well conserved feature in TonB proteins in most of the clusters. Interestingly, several of the TonB proteins contained two C-terminal domains in series. This analysis provides a framework for future structure-function studies of TonB, and it draws attention to the unusual features of several TonB proteins. Byron C. H. Chu and R. Sean Peacock contributed equally to this work.  相似文献   

7.
8.
We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %.  相似文献   

9.
The interaction between fibrinogen and magnetite nanoparticles in solution has been studied by the methods of spin labeling, ferromagnetic resonance, dynamic and Rayleigh light scattering. It is shown that protein molecules adsorb on the surface of nanoparticles to form multilayer protein covers. The number of molecules adsorbed on one nanoparticle amounts to ∼65 and the thickness of the adsorption layer amounts to ∼27 nm. Separate nanoparticles with fibrinogen covers (clusters) form aggregates due to interactions of the end D domains of fibrinogen. Under the influence of direct magnetic field, nanoparticles with adsorbed proteins form linear aggregates parallel to the force lines. It is shown that the rate of protein coagulation during the formation of fibrin gel under the action of thrombin on fibrinogen decreases ∼2 times in the presence of magnetite nanoparticles, and the magnitude of the average fiber mass/length ratio grows.  相似文献   

10.
Altered glycosylation of plasma proteins has been directly implicated in the pathogenesis of rheumatoid arthritis (RA). The present study investigated the changes in the Concanavalin-A (Con-A)-bound plasma proteins in the RA patients in comparison to that of the healthy controls. Two proteins (MW ∼32 kDa and ∼62 kDa) showed an alteration in expression while an altered monosaccharide profile (high mannose) was observed in the ∼62 kDa protein in the samples collected from RA patients. The 2-dimensional polyacrylamide gel electrophoresis analysis of the Con-A-bound plasma samples showed a large number of protein spots, a few of which were differentially expressed in the RA patients. Some unidentified proteins were detected in the RA patients which were absent in the control samples. The present study, therefore, enunciates the role of carbohydrates as well as that of the acute phase response in the disease pathogenesis.  相似文献   

11.
Surface-enhanced laser desorption/ionization time-of-flight analysis was used to monitor both the kinetics and heterogeneity of product formation during the biotinylation of a number of model proteins and peptide targets. The selected molecules were the IgG-binding protein, protein A, human serum albumin, and a synthetic peptide corresponding to the N terminus of a streptococcal M1 protein. The extent of biotinylation was determined by kinetic analysis of the shift in molecular mass from the native material. Each residue modified by reaction with N-hydroxysuccinimide biotin resulted in an addition of ∼341 amu to the native protein or polypeptide. The novelty of the method was in the ability to determine the molecular mass shift, without first separating the targeted molecule from the biotinylating reagent. The analysis was rapid, simple, and provided information on the average number of biotin molecules added and the homogeneity of the resulting product.  相似文献   

12.
The purification and functional characterization of protein kinase A catalytic subunit (PKAcat) from bovine lens cytosol has been described. Purification to homogeneity has been achieved by using 100 kDa cut-off membrane filtration followed by Sephacryl S-300 chromatography and finally fractionating on High Q anion exchange column. The purified protein migrates as a single band of molecular mass ∼41 kDa on 12.5% SDS-PAGE. Proteomic data from ion trap LC-MS when analyzed through NCBI blast program reveals significant homology (52%) with bovine zeta-crystallin and also some homology with pig casein kinase I alpha chain (38%) and SLA-DR1 beta 1 domain (38%). The search does not indicate homology with any known catalytic subunit of PKA. Inspite of the significant homology with the zeta-crystallin, our protein is different from it in terms of molecular mass. pI value of the kinase (5.3) obtained from 2D analysis is also different from zeta-crystallin (8.5). The protein is found to contain 17% α-helix, 26.5% β-sheet, 21.4% turn and 34.7% random coil. The active catalytic subunit of the bovine lens cAMP-dependent kinase belongs to Type I Cα subtype. The enzyme shows maximum activity at 30 min incubation in presence of 5 mM MgCl2 and 50 μM ATP. The kinase shows broad substrate specificity. It prefers Ser over Thr as phosphorylating residue. Phosphorylation of crystallin proteins, major protein fraction of bovine lens and phosphorylation of chaperone protein α crystallin by the kinase suggests that the kinase plays some crucial role in regulation of chaperone function within lens.  相似文献   

13.
Preparations of Na,K-ATPase from outer medulla of rabbit kidney purified in accordance with the method of P. L. Jorgensen were shown to contain as admixture a protease that moves with α-subunit (∼100 kDa) as a single protein band during one-dimensional SDS-PAGE. The electro-elution of proteins of this band from polyacrylamide gel results in the appearance of two protein fragments (∼67 and 55 kDa) that are stained with polyclonal antibodies against Na,K-ATPase α-subunit. Liquid chromatography/tandem mass spectrometry (LC/MS/MS) analysis showed that the neutral membrane-bound endopeptidase neprilysin is located in one protein band together with the Na,K-ATPase α-subunit. Addition of thiorphan, a specific inhibitor of neutral endopeptidase, eliminates proteolysis of the α-subunit. The data demonstrate that Na,K-ATPase α-subunit may be a natural target for neprilysin.  相似文献   

14.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

15.
16.
Protein SHA-D of the SH3-Bergerac chimeric proteins family was constructed by the substitution of the β-turn N47-D48 in the spectrin SH3 domain by the KATANDKTYE amino acid sequence. The structural and dynamic properties of SHA-D in the solution were studied by means of high-resolution NMR spectroscopy. The extension of the SHA-D polypeptide chain in comparison with the wild type of protein WT-SH3 (∼17%) almost does not affect the overall molecule topology. The spatial structure of SHA-D is nearly identical to those of the proteins of the SH3-Bergerac family; however, there are some differences in the dynamic characteristics in the region of the insertion. The G52D substitution in the SHA-D protein results in the destabilization of the insertion region, where the conditions for the conformational exchange appear. The destabilization further affects the entire SHA-D molecule, making its structure more labile.  相似文献   

17.

Background  

The kelch motif is an ancient and evolutionarily-widespread sequence motif of 44–56 amino acids in length. It occurs as five to seven repeats that form a β-propeller tertiary structure. Over 28 kelch-repeat proteins have been sequenced and functionally characterised from diverse organisms spanning from viruses, plants and fungi to mammals and it is evident from expressed sequence tag, domain and genome databases that many additional hypothetical proteins contain kelch-repeats. In general, kelch-repeat β-propellers are involved in protein-protein interactions, however the modest sequence identity between kelch motifs, the diversity of domain architectures, and the partial information on this protein family in any single species, all present difficulties to developing a coherent view of the kelch-repeat domain and the kelch-repeat protein superfamily. To understand the complexity of this superfamily of proteins, we have analysed by bioinformatics the complement of kelch-repeat proteins encoded in the human genome and have made comparisons to the kelch-repeat proteins encoded in other sequenced genomes.  相似文献   

18.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

19.
Using pSXIVVI+X3 as an expressing vector, an occluded recombinant Trichoplusia ni nuclear polyhedrosis virus carrying the cDNA encoding plasminogen activators inhibitor-2 (PAI-2) under the control of the Syn and XIV promoters, has been constructed. SDS-PAGE and immunoblot analysis revealed that the virus-mediated PAI-2, with a molecular weight of ∼45 kDa, was synthesized in the Sf cells at a level of ∼16% of total intracellular protein and in the supernatant phase at a level of ∼64% of total extracellular protein secreted into the hemolymph of infected larvae. The expressed protein was similar to its authentic counterpart in terms of immunoreactivity and bioactivity. Received 5 May 1998/ Accepted in revised form 15 July 1998  相似文献   

20.
Cui J  Han LY  Lin HH  Tang ZQ  Jiang L  Cao ZW  Chen YZ 《Immunogenetics》2006,58(8):607-613
Major histocompatibility complex (MHC)-binding peptides are essential for antigen recognition by T-cell receptors and are being explored for vaccine design. Computational methods have been developed for predicting MHC-binding peptides of fixed lengths, based on the training of relatively few non-binders. It is desirable to introduce methods applicable for peptides of flexible lengths and trained by using more diverse sets of non-binders. MHC-BPS is a web-based MHC-binder prediction server that uses support vector machines for predicting peptide binders of flexible lengths for 18 MHC class I and 12 class II alleles from sequence-derived physicochemical properties, which were trained by using 4,208∼3,252 binders and 234,333∼168,793 non-binders, and evaluated by an independent set of 545∼476 binders and 110,564∼84,430 non-binders. The binder prediction accuracies are 86∼99% for 25 and 70∼80% for five alleles, and the non-binder accuracies are 96∼99% for 30 alleles. A screening of HIV-1 genome identifies 0.01∼5% and 5∼8% of the constituent peptides as binders for 24 and 6 alleles, respectively, including 75∼100% of the known epitopes. This method correctly predicts 73.3% of the 15 newly published epitopes in the last 4 months of 2005. MHC-BPS is available at .Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号