首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A simple approach to scan quickly a large protein sequence databasefor homology is described. The approach used is strictly dependenton the database organization. A database has been compiled inwhich protein sequences are grouped into families of closelyrelated proteins, each family being characterized by its averagedipeptide composition. A new entry in the database can be allocatedin a family by comparing its dipeptide composition with theaverage dipeptide composition of the families.  相似文献   

2.
Protein tyrosine phosphorylation is an important regulatory mechanisms in cell physiology. While the protein tyrosine kinase (PTKase) family has been extensively studied, only six protein tyrosine phosphatases (PTPases) have been described. By Southern blot analysis, genomic DNA from several different phyla were found to cross-hybridize with a cDNA probe encoding the human leukocyte-common antigen (LCA; CD45) PTPase domains. To pursue this observation further, total mRNA from the protochordate Styela plicata was used as a tempalte to copy and amplify, using polymerase chain reaction (PCR) technology, PTPase domains. Twenty-seven distinct sequences were identified that contain hallmark residues of PTPases; two of these are similar to described mammalian PTPases. Southern blot analysis indicates that at least one other Styela sequence is highly conserved in a variety of phyla. Seven of the Styela domains have significant similarity to each other, indicating a subfamily of PTPases. However, most of the sequences are disparate. A comparison of the 27 Styela sequences with the ten known PTPase domain sequences reveals that only three residues are absolutely conserved and identifies regions that are highly divergent. The data indicate that the PTPase family will be equally as large and diverse as the PTKases. The extent and diversity of the PTPase family suggests that these enzymes are, in their own right, important regulators of cell behavior.The nucleotide sequence data reported in this paper have been submitted to the GenBank nucleotide sequence database and have been assigned the accession numbers M37986-M38041.  相似文献   

3.
Abstract

Conserved protein sequence segments are commonly believed to correspond to functional sites in the protein sequence. A novel approach is proposed to profile the changing degree of conservation along the protein sequence, by evaluating the occurrence frequencies of all short oligopeptides of the given sequence in a large proteome database. Thus, a protein sequence conservation profile can be plotted for every protein. The profile indicates where along the sequences the potential functional (conserved) sites are located. The corresponding oligopeptides belonging to the sites are very frequent across many prokaryotic species. Analysis of a representative set of such profiles reveals a common feature of all examined proteins: they consist of sequence modules represented by the peaks of conservation. Typical size of the modules (peak-to-peak distance) is 25–30 amino acid residues.  相似文献   

4.
MOTIVATION: The PFDB (Protein Family Database) is a new database designed to integrate protein family-related data with relevant functional and genomic data. It currently manages biological data for three projects-the CATH protein domain database (Orengo et al., 1997; Pearl et al., 2001), the VIDA virus domains database (Albà et al., 2001) and the Gene3D database (Buchan et al., 2001). The PFDB has been designed to accommodate protein families identified by a variety of sequence based or structure based protocols and provides a generic resource for biological research by enabling mapping between different protein families and diverse biochemical and genetic data, including complete genomes. RESULTS: A characteristic feature of the PFDB is that it has a number of meta-level entities (for example aggregation, collection and inclusion) represented as base tables in the final design. The explicit representation of relationships at the meta-level has a number of advantages, including flexibility-both in terms of the range of queries that can be formulated and the ability to integrate new biological entities within the existing design. A potential drawback with this approach-poor performance caused by the number of joins across meta-level tables-is avoided by implementing the PFDB with materialized views using the mature relational database technology of Oracle 8i. The resultant database is both fast and flexible. This paper presents the principles on which the database has been designed and implemented, and describes the current status of the database and query facilities supported.  相似文献   

5.
6.
舒为  田晓玉  赵洪伟 《微生物学报》2020,60(9):1999-2011
【目的】海南海口含有丰富的温泉资源,对温泉微生物多样性进行研究,有助于进一步开发和利用海南温泉微生物资源。【方法】本文采用Illumina Hi Seq高通量测序技术对海口3个温泉[海甸岛荣域温泉(S1)、火山口开心农场温泉(S2)和西海岸海长流温泉(S3)]水样中微生物ITS序列和16Sr RNA基因V3-V4区进行测序及生物信息学分析,探究海口市3个不同区域的温泉真菌多样性与细菌多样性。【结果】(1)α多样性分析表明,真菌群落中,S3(29)S1(29)S2,而在细菌群落中,S2(29)S1(29)S3。β多样性分析表明,3个温泉真菌群落和细菌群落组成差异皆显著。(2)分类分析表明,温泉真菌群落优势菌门为子囊菌门(Ascomycota)和担子菌门(Basidiomycota),细菌群落优势菌门为变形菌门(Proteobacteria)、拟杆菌门(Bacteroidetes)、Thermi、硝化螺旋菌门(Nitrospirae)、绿菌门(Chlorobi)、厚壁菌门(Firmicutes)、绿弯菌门(Chloroflexi)、放线菌门(Actinobacteria)。(3) CCA (Canonical correspondence analysis)分析表明,3个温泉的真菌群落主要影响因子是温度,细菌群落主要影响因子是总磷。【结论】海南省海口市温泉中含有丰富的微生物资源,其微生物群落组成受多种环境因子影响,且影响真菌和细菌的主要环境因子不同。  相似文献   

7.
Abstract

The long-wavelength circular dichroism (CD) changes induced by binding of fd gene 5 protein to the alternating DNA sequences poly[d(A-C)] and poly [d(C-T)] were similar to those induced by the protein complexed with the homopolymers poly[d(A)], poly[d(C)], and poly[d(T)]. The fd gene 5 protein showed different binding affinities for the various polymers. The affinity for the alternating sequences was not compositionally weighted with respect to the affinities for the homopolymers, indicating that both base composition and base sequence of the template are important for the binding of fd gene 5 protein.  相似文献   

8.
Abstract

In this paper, we propose a nongraphical representation for protein secondary structures. By counting the frequency of occurrence of all possible four-tuples (i.e., four-letter words) of a protein secondary structure sequence, we construct a set of 3 × 3 matrices for the corresponding protein secondary structure sequence. Furthermore, the leading eigenvalues of these matrices are computed and considered as invariants for the protein secondary structure sequences. To illustrate the utility of our approach, we apply it to a set of real data to distinguish protein structural classes. The result indicates that it can be used to complement the classification of protein secondary structures.  相似文献   

9.
Employing whole-genome analysis we have characterized a large family of genes coding for calpain-related proteins in three kinetoplastid parasites. We have defined a total of 18 calpain-like sequences in Trypanosoma brucei, 27 in Leishmania major, and 24 in Trypanosoma cruzi. Sequence characterization revealed a well-conserved protease domain in most proteins, although residues critical for catalytic activity were frequently altered. Many of the proteins contain a novel N-terminal sequence motif unique to kinetoplastids. Furthermore, 24 of the sequences contain N-terminal fatty acid acylation motifs indicating association of these proteins with intracellular membranes. This extended family of proteins also includes a group of sequences that completely lack a protease domain but is specifically related to other kinetoplastid calpain-related proteins by a highly conserved N-terminal domain and by genomic organization. All sequences lack the C-terminal calmodulin-related calcium-binding domain typical of most mammalian calpains. Our analysis emphasizes the highly modular structure of calpains and calpain-like proteins, suggesting that they are involved in diverse cellular functions. The discovery of this surprisingly large family of calpain-like proteins in lower eukaryotes that combines novel and conserved sequence modules contributes to our understanding of the evolution of this abundant protein family. Electronic Supplementary Material Electronic Supplementary material is available for this article at and accessible for authorised users. [Reviewing Editor : Dr. John Oakeshott]  相似文献   

10.
Naveed M  Khan A  Khan AU 《Amino acids》2012,42(5):1809-1823
G protein-coupled receptors (GPCRs) are transmembrane proteins, which transduce signals from extracellular ligands to intracellular G protein. Automatic classification of GPCRs can provide important information for the development of novel drugs in pharmaceutical industry. In this paper, we propose an evolutionary approach, GPCR-MPredictor, which combines individual classifiers for predicting GPCRs. GPCR-MPredictor is a web predictor that can efficiently predict GPCRs at five levels. The first level determines whether a protein sequence is a GPCR or a non-GPCR. If the predicted sequence is a GPCR, then it is further classified into family, subfamily, sub-subfamily, and subtype levels. In this work, our aim is to analyze the discriminative power of different feature extraction and classification strategies in case of GPCRs prediction and then to use an evolutionary ensemble approach for enhanced prediction performance. Features are extracted using amino acid composition, pseudo amino acid composition, and dipeptide composition of protein sequences. Different classification approaches, such as k-nearest neighbor (KNN), support vector machine (SVM), probabilistic neural networks (PNN), J48, Adaboost, and Naives Bayes, have been used to classify GPCRs. The proposed hierarchical GA-based ensemble classifier exploits the prediction results of SVM, KNN, PNN, and J48 at each level. The GA-based ensemble yields an accuracy of 99.75, 92.45, 87.80, 83.57, and 96.17% at the five levels, on the first dataset. We further perform predictions on a dataset consisting of 8,000 GPCRs at the family, subfamily, and sub-subfamily level, and on two other datasets of 365 and 167 GPCRs at the second and fourth levels, respectively. In comparison with the existing methods, the results demonstrate the effectiveness of our proposed GPCR-MPredictor in classifying GPCRs families. It is accessible at .  相似文献   

11.
Abstract

We present here the results obtained by applying several different methods to quantitatively measure regularities in protein sequences based on pair-preferences. We have studied the distribution of amino acid residues, singly as well as in pairs in a large data base and have attempted this task. We confirmed the existence of well-defined pair-preferences in proteins which were shown to be remarkably absent in simulated random sequences of similar amino acid distribution. The analysis of the sequences from the SWISS-PROT data base using simple statistical tests, Fourier analysis, fractal analysis and statistical thermodynamical tests were used to derive parameters to define a natural sequence. As a consequence of the existence of pair-preferences, parameters like fractal dimension (D), spectral exponent (β), scaling parameter (H) and entropy (statistical) were found to be characteristic for natural sequences. For a reference state we chose a randomised state devoid of any pair-preference. The pair-preferences qualified well to be used as quantitative measures of regularities in protein sequences.  相似文献   

12.
Databases of multiple sequence alignments are a valuable aid to protein sequence classification and analysis. One of the main challenges when constructing such a database is to simultaneously satisfy the conflicting demands of completeness on the one hand and quality of alignment and domain definitions on the other. The latter properties are best dealt with by manual approaches, whereas completeness in practice is only amenable to automatic methods. Herein we present a database based on hidden Markov model profiles (HMMs), which combines high quality and completeness. Our database, Pfam, consists of parts A and B. Pfam-A is curated and contains well-characterized protein domain families with high quality alignments, which are maintained by using manually checked seed alignments and HMMs to find and align all members. Pfam-B contains sequence families that were generated automatically by applying the Domainer algorithm to cluster and align the remaining protein sequences after removal of Pfam-A domains. By using Pfam, a large number of previously unannotated proteins from the Caenorhabditis elegans genome project were classified. We have also identified many novel family memberships in known proteins, including new kazal, Fibronectin type III, and response regulator receiver domains. Pfam-A families have permanent accession numbers and form a library of HMMs available for searching and automatic annotation of new protein sequences. Proteins: 28:405–420, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

13.
Functional annotation of protein sequences with low similarity to well characterized protein sequences is a major challenge of computational biology in the post genomic era. The cyclin protein family is once such important family of proteins which consists of sequences with low sequence similarity making discovery of novel cyclins and establishing orthologous relationships amongst the cyclins, a difficult task. The currently identified cyclin motifs and cyclin associated domains do not represent all of the identified and characterized cyclin sequences. We describe a Support Vector Machine (SVM) based classifier, CyclinPred, which can predict cyclin sequences with high efficiency. The SVM classifier was trained with features of selected cyclin and non cyclin protein sequences. The training features of the protein sequences include amino acid composition, dipeptide composition, secondary structure composition and PSI-BLAST generated Position Specific Scoring Matrix (PSSM) profiles. Results obtained from Leave-One-Out cross validation or jackknife test, self consistency and holdout tests prove that the SVM classifier trained with features of PSSM profile was more accurate than the classifiers based on either of the other features alone or hybrids of these features. A cyclin prediction server--CyclinPred has been setup based on SVM model trained with PSSM profiles. CyclinPred prediction results prove that the method may be used as a cyclin prediction tool, complementing conventional cyclin prediction methods.  相似文献   

14.
It is now possible to obtain sequence information from gel-separated proteins by mass spectrometry at levels too low for conventional approaches. Usually this tandem mass spectrometric data are used for database searches with the aim of identifying the corresponding gene. Recently it has been shown that long and accurate amino acid sequences can be obtained which are sufficient for PCR-based strategies to clone the corresponding gene [Wilm et al. (1996), Nature 379, 466–469]. More than eight proteins have now been cloned based on that method. In many more cases the sequence information identified homologous proteins. Issues involved in cloning by mass spectrometric sequence information are discussed, as are two case studies. These results clearly establish mass spectrometry as a viable tool not only for the database identification of proteins, but also for the de novo sequencing of gel-separated proteins at the low-picomole to femtomole level.  相似文献   

15.
Background

Strawberry crinkle virus (SCV) is a member of the genus Cytorhabdovirus, family Rhabdovirida, and order Mononegavirales. SCV affects the production of various strawberry cultivars. In this study we investigated the genetic diversity of SCV in strawberry fields based on P3 (movement protein) gene.

Methods and results

The samples were collected from strawberry fields in the Kurdistan Province, Iran. P3 gene from 20 SCV isolates, representing 18 nucleic acid haplotypes, is composed of 729 nucleotides, encoding a protein with 243 amino acids. SCV-P3 sequences shared 98.77%–99.86% nucleotide and 97.5%–100% amino acid sequence identity. Phylogenetic analyses of the new P3 sequences with two previously published SCV-P3 sequences from the Czech Republic showed that there are two major phylogroups (I and II) and three minor phylogroups in the body of the phylogeny, I-1, I-2, II-1. Comparisons of P3 gene sequences revealed a mutational bias, with more differences being transitions than transversions. The ratio of non-synonymous/synonymous nucleotide changes was?<?1, indicating that SCV-P3 gene is under predominantly negative selection.

Conclusions

Phylogenetic and sequence identity analyses showed that SCV isolates from Iran are closely related and have not diverged more than 2% based on P3 gene despite geographical separation and strawberry cultivar. This is the first report of the genetic diversity of SCV worldwide.

  相似文献   

16.
An Intriguing Controversy over Protein Structural Class Prediction   总被引:9,自引:0,他引:9  
A recent report by Bahar et al. [(1997), Proteins 29, 172–185] indicates that the coupling effects among different amino acid components as originally formulated by K. C. Chou [(1995), Proteins 21, 319–344] are important for improving the prediction of protein structural classes. These authors have further proposed a compact lattice model to illuminate the physical insight contained in the component-coupled algorithm. However, a completely opposite result was concluded by Eisenhaber et al. [(1996), Proteins 25, 169–179], using a different dataset constructed according to their definition. To address such an intriguing controversy, tests were conducted by various approaches for the datasets from an objective database, the SCOP database [Murzin et al. (1995), J. Mol. Biol. 247, 536–540]. The results obtained by both self-consistency and jackknife tests indicate that the overall rates of correct prediction by the algorithm incorporating the coupling effect among different amino acid components are significantly higher than those by the algorithms without counting such an effect. This is fully consistent with the physical reality that the folding of a protein is the result of a collective interaction among its constituent amino acid residues, and hence the coupling effects of different amino acid components must be incorporated in order to improve the prediction quality. It was found by a revisiting the calculation procedures by Eisenhaber et al. that there was a conceptual mistake in constructing the structural class datasets and a systematic mistake in applying the component-coupled algorithm. These findings are informative for understanding and utilizing the component-coupled algorithm to study the structural classes of proteins.  相似文献   

17.
The cells of Helicobacter pylori were suspended in the medium containing35S-methionine. After a heat shock of the cells at 42 C for 5, 10, and 30 min, the production of proteins was analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and autoradiography. Out of many proteins produced by the cells, only 66 kDa protein production was dramatically increased by heat treatment. The N-terminal amino acid sequence of 66 kDa protein was quite similar to that of 62 kDa and 54 kDa proteins previously suggested as heat shock protein (HSP) of H. pylori based on the reaction with polyclonal and monoclonal antibodies against HSP 60 family proteins produced by other bacteria. Therefore, it was concluded that H. pylori produces the 66 kDa protein as its major heat shock protein which belongs to HSP 60 family.  相似文献   

18.
根据蛋白质的氨基酸组成实现其快速鉴定   总被引:1,自引:0,他引:1  
常规进行蛋白质鉴定的方法是测定其氨基酸顺序,它需要蛋白质顺序分析仪,对蛋白质的纯度要求高,费时和花费大,与之相比,蛋白质的氨基酸组成和分子量是容易实验测定的。本文描述了一个基于蛋白质的组成和分子量进行其快速鉴定的方法。其基本出发点是,通过统计蛋白质序列数据库中每个序列的氨基酸组成和分子量,得到一个含蛋白质长度、组成和分子量的数据库,将靶蛋白质的组成等数据与该数据库进行对比,可以检出组成和分子量与之接近的蛋白质。从而对该蛋白质进行初步鉴定。在有些情况下,甚至能相当准确地确定靶蛋白质与数据库中的某个(些)蛋白质相关。根据这一原理本文设计了根据氨基酸组成检索蛋白质组成数据库的程序,通过对胰岛素原、细胞肿瘤抗原P53和泛肽等多种蛋白质的组成分析,证实根据氨基酸组成能较好地进行蛋白质鉴定。  相似文献   

19.
The human T-cell receptor (Tcr) Vb6 family has been scrutinized for polymorphisms, both in coding as well as in intronic sequences by polymerase chain reaction (PCR), subsequent multiple electroblot hybridizations, and sequence analysis. Multiplex PCR is an efficient means of screening for Tcr variability. Four novel loci could be distinguished and several new alleles are described including two pseudogenes. The Vb6 family is characterized by an intronic stretch of simple repetitive (gt)n sequences. These elements are hypervariable, especially in the Vb6.7 subfamily, where they are particularly long. The unexpected persistence of simple repetitive sequences in Tcr and major histocompatibility complex (MHC) class II genes over extended periods of the vertebrate evolutionary history can be interpreted in parallel terms in both gene families.The nucleotide sequence data reported in this paper have been submitted to the nucleotide sequence database GenBank and have been assigned the accession numbers M97503–97505.  相似文献   

20.
Secretion of recombinant proteins aims to reproduce the correct posttranslational modifications of the expressed protein while simplifying its recovery. In this study, secretion signal sequences from an abundantly secreted 34-kDa protein (P34) from Pseudozyma flocculosa were cloned. The efficiency of these sequences in the secretion of recombinant green fluorescent protein (GFP) was investigated in two Pseudozyma species and compared with other secretion signal sequences, from S. cerevisiae and Pseudozyma spp. The results indicate that various secretion signal sequences were functional and that the P34 signal peptide was the most effective secretion signal sequence in both P. flocculosa and P. antarctica. The cells correctly processed the secretion signal sequences, including P34 signal peptide, and mature GFP was recovered from the culture medium. This is the first report of functional secretion signal sequences in P. flocculosa. These sequences can be used to test the secretion of other recombinant proteins and for studying the secretion pathway in P. flocculosa and P. antarctica.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号