共查询到20条相似文献,搜索用时 31 毫秒
1.
Protein domain prediction is often the preliminary step in both experimental and computational protein research. Here we present a new method to predict the domain boundaries of a multidomain protein from its amino acid sequence using a fuzzy mean operator. Using the nr-sequence database together with a reference protein set (RPS) containing known domain boundaries, the operator is used to assign a likelihood value for each residue of the query sequence as belonging to a domain boundary. This procedure robustly identifies contiguous boundary regions. For a dataset with a maximum sequence identity of 30%, the average domain prediction accuracy of our method is 97% for one domain proteins and 58% for multidomain proteins. The presented model is capable of using new sequence/structure information without re-parameterization after each RPS update. When tested on a current database using a four year old RPS and on a database that contains different domain definitions than those used to train the models, our method consistently yielded the same accuracy while two other published methods did not. A comparison with other domain prediction methods used in the CASP7 competition indicates that our method performs better than existing sequence-based methods. 相似文献
2.
The sequencing of theMycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown.
We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene
products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families.
Functions are predicted for 78% of the encoded gene products. For 69% of these, functions can be inferred by domain assignments.
The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between
families of unknown and known structures have increased structural information by ∼ 11%. Remote similarity detection methods
have enabled domain assignments for 1325 ‘hypothetical proteins’. The most populated families in MTB are involved in lipid
metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which we refer to as MTB-specific,
no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and
some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible
for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly
available at http://hodgkin.mbu.iisc.ernet.in/∼dots.
An erratum to this article is available at . 相似文献
3.
Summary We have identified a cDNA clone encoding BMP receptor-associated molecule 1 (BRAM1) from the zebrafish expressed sequence tag (EST) database. The 2606 bp full-length bram1 cDNA was cloned, and further confirmed by nucleotide sequencing. The zebrafish sequence encodes a protein of 195 amino acids with an evolutionarily conserved MYND domain, which displays ∼
∼98% homology with human and mouse BRAM1, and ∼
∼64% homology with C. elegans BRA-1 and BRA-2. The bram1 gene, composed of five exons and four introns, spans ∼
∼14 kb on linkage group 14 of the zebrafish genome. RT-PCR and whole mount in situ hybridization analyses disclosed that zebrafish BRAM1 is a maternal factor. The protein interacts directly with zebrafish BMP Receptor type IA, as observed from GST-pull down and co-immunoprecipitation assays. Furthermore, cotransfection of zebrafish BRAM1 with the corresponding BMP receptor resulted in down-regulation of BMP-mediated signaling. Our results collectively indicate that BRAM1 plays a biological role during zebrafish development. 相似文献
4.
We describe a method to identify protein domain boundaries from sequence information alone based on the assumption that hydrophobic residues cluster together in space. SnapDRAGON is a suite of programs developed to predict domain boundaries based on the consistency observed in a set of alternative ab initio three-dimensional (3D) models generated for a given protein multiple sequence alignment. This is achieved by running a distance geometry-based folding technique in conjunction with a 3D-domain assignment algorithm. The overall accuracy of our method in predicting the number of domains for a non-redundant data set of 414 multiple alignments, representing 185 single and 231 multiple-domain proteins, is 72.4 %. Using domain linker regions observed in the tertiary structures associated with each query alignment as the standard of truth, inter-domain boundary positions are delineated with an accuracy of 63.9 % for proteins comprising continuous domains only, and 35.4 % for proteins with discontinuous domains. Overall, domain boundaries are delineated with an accuracy of 51.8 %. The prediction accuracy values are independent of the pair-wise sequence similarities within each of the alignments. These results demonstrate the capability of our method to delineate domains in protein sequences associated with a wide variety of structural domain organisation. 相似文献
5.
Kikuchi T 《Amino acids》2008,35(3):541-549
It is well-known that the IgG-binding domain from staphylococcal protein A folds into a 3α helix bundle structure, while the
IgG-binding domain of streptococcal protein G forms an (α + β) structure. Recently, He et al. (Biochemistry 44:14055–14061,
2005) made mutants of these proteins from the wild types of protein A and protein G strains. These mutants are referred to as
protein A219 and protein G311, and it was showed that these two mutants have different 3D structures, i.e., the 3α helix bundle
structure and the (α + β) structure, respectively, despite the high sequence identity (59%). The purpose of our study was
to clarify how such 3D structural differences are coded in the sequences with high homology. To address this problem, we introduce
a predicted contact map constructed based on the interresidue average-distance statistics for prediction of folding properties
of a protein. We refer to this map as an average distance map (ADM). Furthermore, the statistics of interresidue distances
can be converted to an effective interresidue potential. We calculated the contact frequency of each residue of a protein
in random conformations with this effective interresidue potential, and then we obtained values similar to ϕ values. We refer
to this contact frequency of each residue as a p(μ) value. The comparison of the p(μ) values to the ϕ values for a protein suggests that p(μ) values reveal the information on the folding initiation site. Using these techniques, we try to extract the information
on the difference in the 3D structures of protein A219 and protein G311 coded in their amino acid sequences in the present
work. The results show that the ADM analyses and the p(μ) value analyses predict the information of folding initiation sites, which can be used to detect the 3D difference in both
proteins. 相似文献
6.
TonB is a protein prevalent in a large number of Gram-negative bacteria that is believed to be responsible for the energy
transduction component in the import of ferric iron complexes and vitamin B12 across the outer membrane. We have analyzed all the TonB proteins that are currently contained in the Entrez database and
have identified nine different clusters based on its conserved 90-residue C-terminal domain amino acid sequence. The vast
majority of the proteins contained a single predicted cytoplasmic transmembrane domain; however, nine of the TonB proteins
encompass a ∼290 amino acid N-terminal extension homologous to the MecR1 protein, which is composed of three additional predicted
transmembrane helices. The periplasmic linker region, which is located between the N-terminal domain and the C-terminal domain,
is extremely variable both in length (22–283 amino acids) and in proline content, indicating that a Pro-rich domain is not
a required feature for all TonB proteins. The secondary structure of the C-terminal domain is found to be well preserved across
all families, with the most variable region being between the second α-helix and the third β-strand of the antiparallel β-sheet.
The fourth β-strand found in the solution structure of the Escherichia coli TonB C-terminal domain is not a well conserved feature in TonB proteins in most of the clusters. Interestingly, several of
the TonB proteins contained two C-terminal domains in series. This analysis provides a framework for future structure-function
studies of TonB, and it draws attention to the unusual features of several TonB proteins.
Byron C. H. Chu and R. Sean Peacock contributed equally to this work. 相似文献
7.
8.
We have found certain conserved motifs and secondary structural patterns present in the vicinity of interior domain boundary points (dbps) by a data-driven approach without any a priori constraint on the type and number of such features, and without any requirement of sequence homology. We have used these motifs and patterns to rerank the solutions obtained by the well-known domain guess by size (DGS) algorithm. We predict, overall, five solutions. The average accuracy of overall (i.e., top five) predictions by our method [domain boundary prediction using conserved patterns (DPCP)] has improved the average accuracy of the top five solutions of DGS from 71.74 to 82.88 %, in the case of two-continuous-domain proteins, and from 21.38 to 80.56 %, for two-discontinuous-domain proteins. Considering only the top solution, the gains in accuracy are from 0 to 72.74 % for two-continuous-domain proteins with chain lengths up to 300 residues, and from 0 to 62.85 % for those with up to 400 residues. In the case of discontinuous domains, top_min solutions (the minimum number of solutions required for predicting all dbps of a protein) of DPCP improve the average accuracy of DGS prediction from 12.5 to 76.3 % in proteins with chain lengths up to 300 residues, and from 13.33 to 70.84 % for proteins with up to 400 residues. In our validation experiments, the performance of DPCP was also found to be superior to that of domain identification from secondary structure element alignment (DomSSEA), the best method reported so far for efficient prediction of domain boundaries using predicted secondary structure. The average accuracies of the topmost solution of DomSSEA are 61 and 52 % for proteins with up to 300 residues and 400, respectively, in the case of continuous domains; the corresponding accuracies for the discontinuous case are 28 and 21 %. 相似文献
9.
A. V. Bychkova O. N. Sorokina A. L. Kovarski A. B. Shapiro V. B. Leonova M. A. Rozenfel’d 《Biophysics》2010,55(4):544-549
The interaction between fibrinogen and magnetite nanoparticles in solution has been studied by the methods of spin labeling,
ferromagnetic resonance, dynamic and Rayleigh light scattering. It is shown that protein molecules adsorb on the surface of
nanoparticles to form multilayer protein covers. The number of molecules adsorbed on one nanoparticle amounts to ∼65 and the
thickness of the adsorption layer amounts to ∼27 nm. Separate nanoparticles with fibrinogen covers (clusters) form aggregates
due to interactions of the end D domains of fibrinogen. Under the influence of direct magnetic field, nanoparticles with adsorbed
proteins form linear aggregates parallel to the force lines. It is shown that the rate of protein coagulation during the formation
of fibrin gel under the action of thrombin on fibrinogen decreases ∼2 times in the presence of magnetite nanoparticles, and
the magnitude of the average fiber mass/length ratio grows. 相似文献
10.
Raghav SK Gupta B Agrawal C Saroha A Das RH Chaturvedi VP Das HR 《Glycoconjugate journal》2006,23(3-4):167-173
Altered glycosylation of plasma proteins has been directly implicated in the pathogenesis of rheumatoid arthritis (RA). The
present study investigated the changes in the Concanavalin-A (Con-A)-bound plasma proteins in the RA patients in comparison to that of the healthy controls. Two proteins (MW ∼32 kDa
and ∼62 kDa) showed an alteration in expression while an altered monosaccharide profile (high mannose) was observed in the
∼62 kDa protein in the samples collected from RA patients. The 2-dimensional polyacrylamide gel electrophoresis analysis of
the Con-A-bound plasma samples showed a large number of protein spots, a few of which were differentially expressed in the
RA patients. Some unidentified proteins were detected in the RA patients which were absent in the control samples. The present
study, therefore, enunciates the role of carbohydrates as well as that of the acute phase response in the disease pathogenesis. 相似文献
11.
Surface-enhanced laser desorption/ionization time-of-flight analysis was used to monitor both the kinetics and heterogeneity
of product formation during the biotinylation of a number of model proteins and peptide targets. The selected molecules were
the IgG-binding protein, protein A, human serum albumin, and a synthetic peptide corresponding to the N terminus of a streptococcal
M1 protein. The extent of biotinylation was determined by kinetic analysis of the shift in molecular mass from the native
material. Each residue modified by reaction with N-hydroxysuccinimide biotin resulted in an addition of ∼341 amu to the native protein or polypeptide. The novelty of the method
was in the ability to determine the molecular mass shift, without first separating the targeted molecule from the biotinylating
reagent. The analysis was rapid, simple, and provided information on the average number of biotin molecules added and the
homogeneity of the resulting product. 相似文献
12.
The purification and functional characterization of protein kinase A catalytic subunit (PKAcat) from bovine lens cytosol has
been described. Purification to homogeneity has been achieved by using 100 kDa cut-off membrane filtration followed by Sephacryl
S-300 chromatography and finally fractionating on High Q anion exchange column. The purified protein migrates as a single
band of molecular mass ∼41 kDa on 12.5% SDS-PAGE. Proteomic data from ion trap LC-MS when analyzed through NCBI blast program
reveals significant homology (52%) with bovine zeta-crystallin and also some homology with pig casein kinase I alpha chain
(38%) and SLA-DR1 beta 1 domain (38%). The search does not indicate homology with any known catalytic subunit of PKA. Inspite
of the significant homology with the zeta-crystallin, our protein is different from it in terms of molecular mass. pI value
of the kinase (5.3) obtained from 2D analysis is also different from zeta-crystallin (8.5). The protein is found to contain
17% α-helix, 26.5% β-sheet, 21.4% turn and 34.7% random coil. The active catalytic subunit of the bovine lens cAMP-dependent
kinase belongs to Type I Cα subtype. The enzyme shows maximum activity at 30 min incubation in presence of 5 mM MgCl2 and 50 μM ATP. The kinase shows broad substrate specificity. It prefers Ser over Thr as phosphorylating residue. Phosphorylation
of crystallin proteins, major protein fraction of bovine lens and phosphorylation of chaperone protein α crystallin by the
kinase suggests that the kinase plays some crucial role in regulation of chaperone function within lens. 相似文献
13.
Groubman MA Kamanina YV Petrushanko IIu Rubtsov AM Lopina OD 《Biochemistry. Biokhimii?a》2010,75(10):1281-1284
Preparations of Na,K-ATPase from outer medulla of rabbit kidney purified in accordance with the method of P. L. Jorgensen
were shown to contain as admixture a protease that moves with α-subunit (∼100 kDa) as a single protein band during one-dimensional
SDS-PAGE. The electro-elution of proteins of this band from polyacrylamide gel results in the appearance of two protein fragments
(∼67 and 55 kDa) that are stained with polyclonal antibodies against Na,K-ATPase α-subunit. Liquid chromatography/tandem mass
spectrometry (LC/MS/MS) analysis showed that the neutral membrane-bound endopeptidase neprilysin is located in one protein
band together with the Na,K-ATPase α-subunit. Addition of thiorphan, a specific inhibitor of neutral endopeptidase, eliminates
proteolysis of the α-subunit. The data demonstrate that Na,K-ATPase α-subunit may be a natural target for neprilysin. 相似文献
14.
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few. 相似文献
15.
16.
V. S. Khristoforov D. A. Prokhorov M. A. Timchenko Yu. A. Kudrevatykh L. V. Gushchina V. V. Filimonov V. P. Kutyshenko 《Russian Journal of Bioorganic Chemistry》2010,36(4):468-476
Protein SHA-D of the SH3-Bergerac chimeric proteins family was constructed by the substitution of the β-turn N47-D48 in the
spectrin SH3 domain by the KATANDKTYE amino acid sequence. The structural and dynamic properties of SHA-D in the solution were studied by means of high-resolution
NMR spectroscopy. The extension of the SHA-D polypeptide chain in comparison with the wild type of protein WT-SH3 (∼17%) almost
does not affect the overall molecule topology. The spatial structure of SHA-D is nearly identical to those of the proteins
of the SH3-Bergerac family; however, there are some differences in the dynamic characteristics in the region of the insertion.
The G52D substitution in the SHA-D protein results in the destabilization of the insertion region, where the conditions for
the conformational exchange appear. The destabilization further affects the entire SHA-D molecule, making its structure more
labile. 相似文献
17.
Background
The kelch motif is an ancient and evolutionarily-widespread sequence motif of 44–56 amino acids in length. It occurs as five to seven repeats that form a β-propeller tertiary structure. Over 28 kelch-repeat proteins have been sequenced and functionally characterised from diverse organisms spanning from viruses, plants and fungi to mammals and it is evident from expressed sequence tag, domain and genome databases that many additional hypothetical proteins contain kelch-repeats. In general, kelch-repeat β-propellers are involved in protein-protein interactions, however the modest sequence identity between kelch motifs, the diversity of domain architectures, and the partial information on this protein family in any single species, all present difficulties to developing a coherent view of the kelch-repeat domain and the kelch-repeat protein superfamily. To understand the complexity of this superfamily of proteins, we have analysed by bioinformatics the complement of kelch-repeat proteins encoded in the human genome and have made comparisons to the kelch-repeat proteins encoded in other sequenced genomes. 相似文献18.
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very
important for quick and accurate determination of protein structural class with computation method in protein science. One
of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino
acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction
that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction
of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical
properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains
more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature
vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector
was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal
space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results
indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful
complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization,
membrane protein types and protein secondary structure, etc. 相似文献
19.
X-W Zhang T Sun X-F Zhou X-Y Zeng X Liu D-X Gu 《Journal of industrial microbiology & biotechnology》1998,21(4-5):175-177
Using pSXIVVI+X3 as an expressing vector, an occluded recombinant Trichoplusia ni nuclear polyhedrosis virus carrying the cDNA encoding plasminogen activators inhibitor-2 (PAI-2) under the control of the
Syn and XIV promoters, has been constructed. SDS-PAGE and immunoblot analysis revealed that the virus-mediated PAI-2, with
a molecular weight of ∼45 kDa, was synthesized in the Sf cells at a level of ∼16% of total intracellular protein and in the
supernatant phase at a level of ∼64% of total extracellular protein secreted into the hemolymph of infected larvae. The expressed
protein was similar to its authentic counterpart in terms of immunoreactivity and bioactivity.
Received 5 May 1998/ Accepted in revised form 15 July 1998 相似文献
20.
MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties 总被引:1,自引:1,他引:0
Major histocompatibility complex (MHC)-binding peptides are essential for antigen recognition by T-cell receptors and are being explored for vaccine design. Computational methods have been developed for predicting MHC-binding peptides of fixed lengths, based on the training of relatively few non-binders. It is desirable to introduce methods applicable for peptides of flexible lengths and trained by using more diverse sets of non-binders. MHC-BPS is a web-based MHC-binder prediction server that uses support vector machines for predicting peptide binders of flexible lengths for 18 MHC class I and 12 class II alleles from sequence-derived physicochemical properties, which were trained by using 4,208∼3,252 binders and 234,333∼168,793 non-binders, and evaluated by an independent set of 545∼476 binders and 110,564∼84,430 non-binders. The binder prediction accuracies are 86∼99% for 25 and 70∼80% for five alleles, and the non-binder accuracies are 96∼99% for 30 alleles. A screening of HIV-1 genome identifies 0.01∼5% and 5∼8% of the constituent peptides as binders for 24 and 6 alleles, respectively, including 75∼100% of the known epitopes. This method correctly predicts 73.3% of the 15 newly published epitopes in the last 4 months of 2005. MHC-BPS is available at .Electronic Supplementary Material Supplementary material is available for this article at and is accessible for authorized users. 相似文献