首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Protein sequence comparison based on the wavelet transform approach   总被引:4,自引:0,他引:4  
A protein's chemical properties, the chain conformation, the function of the protein and its species specificity are determined by the information contained in the amino acid sequence. Proteins of similar functions have at some level sequential identical amino acid sequences. The closer the phylogenetic relationship, the more similar are the sequences. To find the similarities between two or more protein sequences is of great importance for protein sequence analysis. The differences in the amino acid sequences permit the construction of a family tree of evolution. In this work, a comparison method was devised that is capable of analysing a protein sequence 'hierarchically', i.e. it can examine a protein sequence at different spatial resolutions. Based on a wavelet decomposition of protein sequences and a cross-correlation study, a sequence-scale similarity concept is proposed for generating a similarity vector, which renders the comparison of two sequences feasible at different spatial resolutions (scales). This new similarity concept is an expansion of the conventional sequence similarity, which only takes into account the local pairwise amino acid match and ignores the information contained in coarser spatial resolutions.  相似文献   

2.
3.
S Inouye  A Nakazawa  T Nakazawa 《Gene》1986,44(2-3):235-242
The xylS gene is a regulatory gene which positively controls expression of the genes on the TOL plasmid for degradation enzymes of benzoate or m-toluate in Pseudomonas putida. Cloning of the gene in Escherichia coli and determination of the nucleotide sequence revealed an open reading frame of 963 bp which corresponds to a protein with an Mr of 36,502. The xylS gene was recloned onto a tac-promoter vector, and the product was identified by the maxicell procedure as a protein with an approximate Mr of 37,000. The predicted amino acid sequence of XylS protein showed a basic character and contained a region similar to those in other DNA-binding proteins.  相似文献   

4.
An optimized design of the rabies virus glycoprotein (G protein) for use within DNA vaccines has been suggested. The design represents a territorially adapted antigen constructed taking into account glycoprotein amino acid sequences of the rabies viruses registered in the Russian Federation and the vaccine Vnukovo-32 strain. Based on the created consensus amino acid sequence, the nucleotide codon-optimized sequence of this modified glycoprotein was obtained and cloned into the pVAX1 plasmid (a vector of the last generation used in the creation of DNA vaccines). A twofold increase in this gene expression compared to the expression of the Vnukovo-32 strain viral glycoprotein gene in a similar vector was registered in the transfected cell culture. It has been demonstrated that the accumulation of modified G protein exceeds the number of the control protein synthesized using the plasmid with the Vnukovo-32 strain viral glycoprotein gene by 20 times. Thus, the obtained modified rabies virus glycoprotein can be considered to be a promising DNA vaccine antigen.  相似文献   

5.
The nucleotide sequence of the gene encoding the cellulose-binding protein A (CBPA) of Eubacterium cellulosolvens 5 was determined. The gene consists of an open reading frame of 3453 nucleotides and encodes a protein of 1151 amino acids with a molecular mass of 126408 Da. The deduced amino acid sequence of CBPA contained one domain highly similar to a catalytic domain of glycosyl hydrolases belonging to family 9, two linker-like domains and four domains of unknown function. Among the four domains of unknown function, the domains 1 and 2 region had significant homology in amino acid sequence with the cellulose-binding domains in the family 9 glycosyl hydrolases. The cloned gene was inserted into an expression vector, pBAD-TOPO, and expressed in Escherichia coli as a fused protein. The fused protein was detected by immunoblotting using antiserum against CBPA.  相似文献   

6.
A set of programs was developed for searching nucleic acid and protein sequence data bases for sequences similar to a given sequence. The programs, written in FORTRAN 77, were optimized for vector processing on a Hitachi S810-20 supercomputer. A search of a 500-residue protein sequence against the entire PIR data base Ver. 1.0 (1) (0.5 M residues) is carried out in a CPU time of 45 sec. About 4 min is required for an exhaustive search of a 1500-base nucleotide sequence against all mammalian sequences (1.2M bases) in Genbank Ver. 29.0. The CPU time is reduced to about a quarter with a faster version.  相似文献   

7.
As a preliminary to the understanding of the function of the highly conserved Escherichia coli heat shock protein HtpG, the protein was purified and partially characterized. The htpG gene was subcloned into the inducible expression vector, pT7-6. Upon induction, the HtpG protein accumulated to approximately 30% of the total protein in the cell. A purification scheme was devised which involved column chromatography on DEAE-cellulose, hydroxylapatite, and Sephacryl S-200. The amino acid composition of the purified protein corresponded closely with the predicted amino acid composition derived from the DNA sequence, and the sequence of the 8 amino-terminal residues matched the predicted sequence exactly. The molecular weight of the denatured protein is 65,500 and the native molecular weight is 144,620, as calculated by using both the Stokes radius and the sedimentation coefficient. As the molecular weight predicted from the DNA sequence is 71,429, this indicates the HtpG protein is a dimer. The HtpG protein was found to be a phosphoprotein. Thus, HtpG is structurally similar to its eukaryotic homologue, hsp83, which is also a phosphoprotein and a dimer.  相似文献   

8.
Pseudomonas sp. A-01, isolated as a strain with chitosan-degrading activity, produced a 28 kDa chitosanase. Following purification of the chitosanase (Cto1) and determination of its N-terminal amino acid sequence, the corresponding gene (cto1) was cloned by a reverse-genetic technique. The gene encoded a protein, composed of 266 amino acids, including a putative signal sequence (1-28), that showed an amino acid sequence similar to known family-46 chitosanases. Cto1 was successfully overproduced and was secreted by a Brevibacillus choshinensis transformant carrying the cto1 gene on expression plasmid vector pNCMO2. The purified recombinant Cto1 protein was stable at pH 5-8 and showed the best chitosan-hydrolyzing activity at pH 5. Replacement of two acidic amino acid residues, Glu23 and Asp41, which correspond to previously identified active centers in Streptomyces sp. N174 chitosanase, with Gln and Asn respectively caused a defect in the hydrolyzing activity of the enzyme.  相似文献   

9.
Catalase is a characteristic enzyme of peroxisomes. To study the molecular mechanisms of the biogenesis of peroxisomes and catalase in a less complex system than rat liver cells, we expressed recombinant rat catalase in Escherichia coli, which has no peroxisomes. The concentration of recombinant catalase produced in E. coli transformed with the expression vector carrying the complete coding region of rat catalase cDNA was about 0.1% of the total soluble protein. The recombinant catalase was purified by DEAE-cellulose column chromatography followed by acidic ethanol precipitations. The properties of rat liver catalase and those of the recombinant were similar with respect to molecular mass, catalytic properties, profiles of absorption spectra, and iron contents. The NH2-terminal amino acid sequence of the purified recombinant catalase, as determined by Edman degradation, was in complete agreement with the amino acid sequence predicted from the nucleotide sequence of rat catalase cDNA, except that the first initiator methionine was not detected. The COOH-terminal amino acid sequence was determined by carboxypeptidase A digestion and the sequence, -Ala-Asn-Leu-OH, matched the predicted COOH-terminal amino acid sequence of rat catalase. Recombinant rat catalase gave almost the same multiple protein bands on native polyacrylamide gel isoelectric focusing as observed with authentic rat liver catalase.  相似文献   

10.
11.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

12.
油菜AP2/ERF-B4类转录因子克隆及表达载体的构建   总被引:1,自引:0,他引:1  
利用油菜UniGene数据库,以拟南芥转录因子保守序列为探针,通过电子克隆方法分离得到一个UniGene库Bna.17538,进一步序列拼接得到一个油菜AP2/ERF-B4亚族的转录因子BnaERFB4-1,长度为672 bp,并进行了相关的生物信息学分析.结果显示BnaERFB4-1是亲水性蛋白,蛋白质三级结构与拟南芥RAP2.6L非常相似,蛋白质无序化程度大于拟南芥RAP2.6L.设计引物通过PCR和RT-PCR方法分别从甘蓝型油菜沪油15幼苗的DNA和cDNA中分离了BnaERFB4-1基因,命名为BnaERFB4-1-Hy15.序列测定和分析显示,来源于沪油15的BnaERFB4-1-Hy15基因与电子克隆的基因序列差异很小,有3个氨基酸位点不同,存在一个内含子.将BnaERFB4-1-Hy15基因通过BamHⅠ和SacⅠ酶切后分别插入酵母表达载体YK3302和植物双元表达载体pYF1404的相应位置,构建了BnaERFB4-1-Hy15基因的酵母体内结合和植物转化载体,为深入研究该基因在油菜抗逆调控中的作用奠定了基础.  相似文献   

13.
家蚕抗菌肽CMIV基因结构改造及表达产物的研究   总被引:20,自引:0,他引:20  
参照天然抗菌肽CMIV组分的氨基酸序列,作了近50%的改动,根据大肠杆菌偏爱的密码子,设计并人工合成了抗菌肽基因片段.将人工合成的抗菌肽类CMIV基因先重组到测序载体pUC118上,经过序列分析,发现克隆于载体pUC118上的基因片段与设计的序列完全一致.再将该基因片段重组到表达载体pET28(a)上,抗菌肽以融合蛋白的形式表达.融合蛋白经镍-金属离子胶亲和层析纯化后,再用CNBr裂解,最终产物具有与天然抗菌肽相同的生物学活性  相似文献   

14.
复合干扰素突变体在毕赤酵母中的表达、纯化及活性分析   总被引:1,自引:0,他引:1  
根据毕赤酵母密码子偏性合成了复合干扰素突变体基因 ,克隆至分泌型酵母表达载体pMEX9K ,将重组载体pMEX CIFNm用SacⅠ线性化后 ,转化毕赤酵母GS115 .转化子经诱导后 ,培养上清有抗病毒活性的蛋白产生 .经过离子交换 ,疏水层析 ,凝胶过滤三步层析纯化 ,得到了纯度大于95 %的重组复合干扰素突变体 ,经N端氨基酸序列分析表明 ,该蛋白N端序列与理论值一致 ,质谱测定分子量为 19 3kD ,与理论值一致 .用细胞病变抑制法测定其活性 ,并结合Lowry法蛋白定量计算其比活性为 6× 10 8IU mg ,与复合干扰素的比活相当 .  相似文献   

15.
16.
The third hypervariable domain V3 of the human immunodeficiency virus type 1 gpl20 envelope glycoprotein contains neutralizing epitopes and plays an important role in the diagnosis of HIV infection . Neutralizing antibodies bind to conserved epitope with sequence GPG of V3 loop. The effect of sequence variation on the antigenic properties of the V3 epitope gp120 was studied using five synthetic peptides. The amino acid sequence of the peptide corresponding to the V3 region gp120 of HIV-1 subtype C showed the highest immunoreactivity. The DNA fragment encoding V3-C region gp120 was synthesized by polymerase chain reaction and cloned into pET41b vector. The recombinant plasmid was expressed in the E. coli cells, and recombinant protein was purified using glutathione-S sepharose affinity chromatography. The serological activity of the recombinant protein was tested using ELISA and compared to activity of similar synthetic peptide. The results of this study showed that most immunoreactive agent was the amino acid sequence of V3 region gp120 of HIV-1 subtype C. The recombinant antigen comprising this sequence was more antigenic than synthetic peptide with the same sequence. The evaluation of this antigen shows that this protein is a good candidate for the immunoassay development.  相似文献   

17.
18.
Du C  Niu R  Chu E  Zhang P  Lin X 《Journal of biochemistry》2006,139(5):913-920
The thymidylate synthase (TS), an important target for many anticancer drugs, has been cloned from different species. But the cDNA property and function of TS in zebrafish are not well documented. In order to use zebrafish as an animal model for screening novel anticancer agents, we isolated TS cDNA from zebrafish and compared its sequence with those from other species. The open reading frame (ORF) of zebrafish TS cDNA sequence was 954 nucleotides, encoding a 318-amino acid protein with a calculated molecular mass of 36.15 kDa. The deduced amino acid sequence of zebrafish TS was similar to those from other organisms, including rat, mouse and humans. The zebrafish TS protein was expressed in Escherichia coli and purified to homogeneity. The purified zebrafish TS showed maximal activity at 28 degrees C with similar K(m) value to human TS. Western immunoblot assay confirmed that TS was expressed in all the developmental stages of zebrafish with a high level of expression at the 1-4 cell stages. To study the function of TS in zebrafish embryo development, a short hairpin RNA (shRNA) expression vector, pSilencer 4.1-CMV/TS, was constructed which targeted the protein-coding region of zebrafish TS mRNA. Significant change in the development of tail and epiboly was found in zebrafish embryos microinjected pSilencer4.1-CMV/TS siRNA expression vector.  相似文献   

19.
The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin.  相似文献   

20.
Li ZC  Zhou XB  Dai Z  Zou XY 《Amino acids》2009,37(2):415-425
A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou’s pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246–255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号