首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In this article, we provide a comprehensive study of the content of the Universal Protein Resource (UniProt) protein data sets for human and mouse. The tryptic search spaces of the UniProtKB (UniProt knowledgebase) complete proteome sets were compared with other data sets from UniProtKB and with the corresponding International Protein Index, reference sequence, Ensembl, and UniRef100 (where UniRef is UniProt reference clusters) organism‐specific data sets. All protein forms annotated in UniProtKB (both the canonical sequences and isoforms) were evaluated in this study. In addition, natural and disease‐associated amino acid variants annotated in UniProtKB were included in the evaluation. The peptide unicity was also evaluated for each data set. Furthermore, the peptide information in the UniProtKB data sets was also compared against the available peptide‐level identifications in the main MS‐based proteomics repositories. Identifying the peptides observed in these repositories is an important resource of information for protein databases as they provide supporting evidence for the existence of otherwise predicted proteins. Likewise, the repositories could use the information available in UniProtKB to direct reprocessing efforts on specific sets of peptides/proteins of interest. In summary, we provide comprehensive information about the different organism‐specific sequence data sets available from UniProt, together with the pros and cons for each, in terms of search space for MS‐based bottom‐up proteomics workflows. The aim of the analysis is to provide a clear view of the tryptic search space of UniProt and other protein databases to enable scientists to select those most appropriate for their purposes.  相似文献   

2.
DisProt: a database of protein disorder   总被引:1,自引:0,他引:1  
The Database of Protein Disorder (DisProt) is a curated database that provides structure and function information about proteins that lack a fixed three-dimensional (3D) structure under putatively native conditions, either in their entirety or in part. Starting from the central premise that intrinsic disorder is an important structural class of protein and in order to meet the increasing interest thereof, DisProt is aimed at becoming a central repository of disorder-related information. For each disordered protein, the database includes the name of the protein, various aliases, accession codes, amino acid sequence, location of the disordered region(s), and methods used for structural (disorder) characterization. If applicable, most entries also list the biological function(s) of each disordered region, how each region of disorder is used for function, as well as provide links to PubMed abstracts and major protein databases. AVAILABILITY: www.disprot.org  相似文献   

3.
Fertilization of the sea urchin egg is accompanied by activation of one or more protein tyrosine kinases which have been shown to phosphorylate a restricted set of egg proteins in vitro. In order to characterize these tyrosine kinase substrates, we have used an antibody specific for phosphotyrosine to prepare an immunoaffinity column capable of binding phosphotyrosine containing proteins. This column bound five 32P-labelled proteins from detergent extracts of embryo membranes phosphorylated in vitro. All were very tightly membrane associated, requiring detergent solubilization. Phosphoamino acid analysis revealed that each of these proteins was phosphorylated exclusively on tyrosine, indicating that they do not act as substrates for other classes of protein kinases.  相似文献   

4.
Rahul Kaushik  Kam Y. J. Zhang 《Proteins》2020,88(10):1271-1284
The infinitesimally small sequence space naturally scouted in the millions of years of evolution suggests that the natural proteins are constrained by some functional prerequisites and should differ from randomly generated sequences. We have developed a protein sequence fitness scoring function that implements sequence and corresponding secondary structural information at tripeptide levels to differentiate natural and nonnatural proteins. The proposed fitness function is extensively validated on a dataset of about 210 000 natural and nonnatural protein sequences and benchmarked with existing methods for differentiating natural and nonnatural proteins. The high sensitivity, specificity, and percentage accuracy (0.81%, 0.95%, and 91% respectively) of the fitness function demonstrates its potential application for sampling the protein sequences with higher probability of mimicking natural proteins. Moreover, the four major classes of proteins (α proteins, β proteins, α/β proteins, and α + β proteins) are separately analyzed and β proteins are found to score slightly lower as compared to other classes. Further, an analysis of about 250 designed proteins (adopted from previously reported cases) helped to define the boundaries for sampling the ideal protein sequences. The protein sequence characterization aided by the proposed fitness function could facilitate the exploration of new perspectives in the design of novel functional proteins.  相似文献   

5.
Cocoa seed storage proteins play an important role in flavour development as aroma precursors are formed from their degradation during fermentation. Major proteins in the beans of Theobroma cacao are the storage proteins belonging to the vicilin and albumin classes. Although both these classes of proteins have been extensively characterized, there is still limited information on the expression and abundance of other proteins present in cocoa beans. This work is the first attempt to characterize the whole cocoa bean proteome by nano‐UHPLC‐ESI MS/MS analysis using tryptic digests of cocoa bean protein extracts. The results of this analysis show that >1000 proteins could be identified using a species‐specific Theobroma cacao database. The majority of the identified proteins were involved with metabolism and energy. Additionally, a significant number of the identified proteins were linked to protein synthesis and processing. Several proteins were also involved with plant response to stress conditions and defence. Albumin and vicilin storage proteins showed the highest intensity values among all detected proteins, although only seven entries were identified as storage proteins. A comparison of MS/MS data searches carried out against larger non‐specific databases confirmed that using a species‐specific database can increase the number of identified proteins, and at the same time reduce the number of false positives. The results of this work will be useful in developing tools that can allow the comparison of the proteomic profile of cocoa beans from different genotypes and geographic origins. Data are available via ProteomeXchange with identifier PXD005586.  相似文献   

6.
Protein-translocating outer membrane porins of Gram-negative bacteria   总被引:1,自引:0,他引:1  
Five families of outer membrane porins that function in protein secretion in Gram-negative bacteria are currently recognized. In this report, these five porin families are analyzed from structural and phylogenetic standpoints. They are the fimbrial usher protein (FUP), outer membrane factor (OMF), autotransporter (AT), two-partner secretion (TPS) and outer membrane secretin (Secretin) families. All members of these families in the current databases were identified, and all full-length homologues were multiply aligned for structural and phylogenetic analyses. The organismal distribution of homologues in each family proved to be unique with some families being restricted to proteobacteria and others being widespread in other bacterial kingdoms as well as eukaryotes. The compositions of and size differences between subfamilies provide evidence for specific orthologous relationships, which agree with available functional information and intra-subfamily phylogeny. The results reveal that horizontal transfer of genes encoding these proteins between phylogenetically distant organisms has been exceptionally rare although transfer within select bacterial kingdoms may have occurred. The resultant in silico analyses are correlated with available experimental evidence to formulate models relevant to the structures and evolutionary origins of these proteins.  相似文献   

7.
SUMMARY: Disease processes often involve crosstalks between proteins in different pathways. Different proteins have been used as separate therapeutic targets for the same disease. Synergetic targeting of multiple targets has been explored in combination therapy of a number of diseases. Potential harmful interactions of multiple targeting have also been closely studied. To facilitate mechanistic study of drug actions and a more comprehensive understanding the relationship between different targets of the same disease, it is useful to develop a database of known therapeutically relevant multiple pathways (TRMPs). Information about non-target proteins and natural small molecules involved in these pathways also provides useful hint for searching new therapeutic targets and facilitate the understanding of how therapeutic targets interact with other molecules in performing specific tasks. The TRMPs database is designed to provide information about such multiple pathways along with related therapeutic targets, corresponding drugs/ligands, targeted disease conditions, constituent individual pathways, structural and functional information about each protein in the pathways. Cross links to other databases are also introduced to facilitate the access of information about individual pathways and proteins. AVAILABILITY: This database can be accessed at http://bidd.nus.edu.sg/group/trmp/trmp.asp and it currently contains 11 entries of multiple pathways, 97 entries of individual pathways, 120 targets covering 72 disease conditions together with 120 sets of drugs directed at each of these targets. Each entry can be retrieved through multiple methods including multiple pathway name, individual pathway name and disease name. SUPPLEMENTARY INFORMATION: http://bidd.nus.edu.sg/group/trmp/sm.pdf  相似文献   

8.
Integrins are transmembrane proteins regulating cellular shape, mobility and the cell cycle. A highly conserved signature motif in the cytoplasmic tail of the integrin α‐subunit, KXGFFKR, plays a critical role in regulating integrin function. To date, six proteins have been identified that target this motif of the platelet‐specific integrin αIIbβ3. We employ peptide‐affinity chromatography followed‐up with LC‐MS/MS analysis as well as protein chips to identify new potential regulators of integrin function in platelets and put them into their biological context using information from protein:protein interaction (PPI) databases. Totally, 44 platelet proteins bind with high affinity to an immobilized LAMWKVGFFKR‐peptide. Of these, seven have been reported in the PPI literature as interactors with integrin α‐subunits. 68 recombinant human proteins expressed on the protein chip specifically bind with high affinity to biotin‐tagged α‐integrin cytoplasmic peptides. Two of these proteins are also identified in the peptide‐affinity experiments, one is also found in the PPI databases and a further one is present in the data to all three approaches. Finally, novel short linear interaction motifs are common to a number of proteins identified.  相似文献   

9.
The keratin proteins from wool can be divided into two classes: the intermediate filament proteins (IFPs) and the matrix proteins. Using peptide mass spectral fingerprinting it was possible to match spots to the known theoretical sequences of some IFPs in web-based databases, as enzyme digestion generated sufficient numbers of peptides from each spot to achieve this. In contrast, it was more difficult to obtain good matches for some of the lower molecular weight matrix proteins. Relatively few peaks were generated from tryptic digests of high-sulfur proteins because of their lower molecular weight and the absence of basic residues in the first two-thirds of the sequence. Their high sequence homology also means that generally only a few of these peptides could be considered to be unique identifiers for each protein. Nevertheless, it was still possible to uniquely identify some of these proteins, while the presence of two peptides in the matrix-assisted laser desorption/ionization time-of-flight mass spectrum allowed classification of other protein spots as being members of this family. Only one major peptide peak was generated by the high-glycine tyrosine proteins (HGTPs) and there were relatively few sequences available in web-based databases, limiting their identification to one HGTP family.  相似文献   

10.
Proteomics is a powerful tool to analyze the differences in gene expression of bacterial strains. Staphylococcus aureus has long been recognized as an important pathogen in human disease. In order to investigate this pathogen, the proteome of a clinical methicillin-resistant S. aureus (MRSA) strain of the sequence type ST398 was determined using 2-DE. Using 2-DE we obtained a total of 105 spots the MRSA strain. Furthermore in correlation with bioinformatic databases, they allowed accurate identification and characterization of proteins, resulting in 227 identified proteins. There were found proteins related to basic function of the cell, but also proteins related to virulence like catalase, specific of S. aureus species, and proteins related to antibiotic resistance. Proteins associated with antibiotic resistance or virulence factors are related to genomic databases. The most abundant classes identified involved glycolysis, energy production, one-carbon metabolism, and oxidation-reduction process, all of which reflect an active metabolism. These results highlight the importance of proteomics to deepen in the knowledge of protein expression of MRSA strain of the lineage ST398, microorganism with diverse and important resistance mechanisms. With this proteome map we have an essential tool for a better understanding of this pathogen and providing new data for protein databases. This article is part of a Special Issue entitled: Proteomics: The clinical link.  相似文献   

11.
Izrailev S  Farnum MA 《Proteins》2004,57(4):711-724
The problem of assigning a biochemical function to newly discovered proteins has been traditionally approached by expert enzymological analysis, sequence analysis, and structural modeling. In recent years, the appearance of databases containing protein-ligand interaction data for large numbers of protein classes and chemical compounds have provided new ways of investigating proteins for which the biochemical function is not completely understood. In this work, we introduce a method that utilizes ligand-binding data for functional classification of enzymes. The method makes use of the existing Enzyme Commission (EC) classification scheme and the data on interactions of small molecules with enzymes from the BRENDA database. A set of ligands that binds to an enzyme with unknown biochemical function serves as a query to search a protein-ligand interaction database for enzyme classes that are known to interact with a similar set of ligands. These classes provide hypotheses of the query enzyme's function and complement other computational annotations that take advantage of sequence and structural information. Similarity between sets of ligands is computed using point set similarity measures based upon similarity between individual compounds. We present the statistics of classification of the enzymes in the database by a cross-validation procedure and illustrate the application of the method on several examples.  相似文献   

12.
The substrates of the cdc2 kinase.   总被引:17,自引:0,他引:17  
The eukaryotic cell cycle is characterized by two major events, DNA replication (S phase) and mitosis (M phase). According to the current paradigm of the cell cycle as a cdc2 cycle, both of these events are driven by serine-threonine specific protein kinases encoded by functional homologs of the fission yeast cdc2 gene. To understand how cdc2 kinases function, it is necessary to identify their physiological substrates and to determine how phosphorylation of these substrates promotes cell cycle progression. Definitive information about substrates relevant to early stages of the cell cycle (G1 and S phases) remains scarce, but several likely physiological targets of the mitotic cdc2 kinase have recently been identified. Current evidence indicates that cdc2 kinase may trigger entry of cells into mitosis not only by initiating important regulatory pathways but also by direct phosphorylation of abundant structural proteins.  相似文献   

13.
Human tissues have distinct biological functions. Many proteins/enzymes are known to be expressed only in specific tissues and therefore the metabolic networks in various tissues are different. Though high quality global human metabolic networks and metabolic networks for certain tissues such as liver have already been studied, a systematic study of tissue specific metabolic networks for all main tissues is still missing. In this work, we reconstruct the tissue specific metabolic networks for 15 main tissues in human based on the previously reconstructed Edinburgh Human Metabolic Network (EHMN). The tissue information is firstly obtained for enzymes from Human Protein Reference Database (HPRD) and UniprotKB databases and transfers to reactions through the enzyme-reaction relationships in EHMN. As our knowledge of tissue distribution of proteins is still very limited, we replenish the tissue information of the metabolic network based on network connectivity analysis and thorough examination of the literature. Finally, about 80% of proteins and reactions in EHMN are determined to be in at least one of the 15 tissues. To validate the quality of the tissue specific network, the brain specific metabolic network is taken as an example for functional module analysis and the results reveal that the function of the brain metabolic network is closely related with its function as the centre of the human nervous system. The tissue specific human metabolic networks are available at .  相似文献   

14.
The amino acid composition and architecture of all beta-barrel membrane proteins of known three-dimensional structure have been examined to generate information that will be useful in identifying beta-barrels in genome databases. The database consists of 15 nonredundant structures, including several novel, recent structures. Known structures include monomeric, dimeric, and trimeric beta-barrels with between 8 and 22 membrane-spanning beta-strands each. For this analysis the membrane-interacting surfaces of the beta-barrels were identified with an experimentally derived, whole-residue hydrophobicity scale, and then the barrels were aligned normal to the bilayer and the position of the bilayer midplane was determined for each protein from the hydrophobicity profile. The abundance of each amino acid, relative to the genomic abundance, was calculated for the barrel exterior and interior. The architecture and diversity of known beta-barrels was also examined. For example, the distribution of rise-per-residue values perpendicular to the bilayer plane was found to be 2.7 +/- 0.25 A per residue, or about 10 +/- 1 residues across the membrane. Also, as noted by other authors, nearly every known membrane-spanning beta-barrel strand was found to have a short loop of seven residues or less connecting it to at least one adjacent strand. Using this information we have begun to generate rapid screening algorithms for the identification of beta-barrel membrane proteins in genomic databases. Application of one algorithm to the genomes of Escherichia coli and Pseudomonas aeruginosa confirms its ability to identify beta-barrels, and reveals dozens of unidentified open reading frames that potentially code for beta-barrel outer membrane proteins.  相似文献   

15.
Successful genome mining is dependent on accurate prediction of protein function from sequence. This often involves dividing protein families into functional subtypes (e.g., with different substrates). In many cases, there are only a small number of known functional subtypes, but in the case of the adenylation domains of nonribosomal peptide synthetases (NRPS), there are >500 known substrates. Latent semantic indexing (LSI) was originally developed for text processing but has also been used to assign proteins to families. Proteins are treated as ‘‘documents’’ and it is necessary to encode properties of the amino acid sequence as ‘‘terms’’ in order to construct a term-document matrix, which counts the terms in each document. This matrix is then processed to produce a document-concept matrix, where each protein is represented as a row vector. A standard measure of the closeness of vectors to each other (cosines of the angle between them) provides a measure of protein similarity. Previous work encoded proteins as oligopeptide terms, i.e. counted oligopeptides, but used no information regarding location of oligopeptides in the proteins. A novel tokenization method was developed to analyze information from multiple alignments. LSI successfully distinguished between two functional subtypes in five well-characterized families. Visualization of different ‘‘concept’’ dimensions allows exploration of the structure of protein families. LSI was also used to predict the amino acid substrate of adenylation domains of NRPS. Better results were obtained when selected residues from multiple alignments were used rather than the total sequence of the adenylation domains. Using ten residues from the substrate binding pocket performed better than using 34 residues within 8 Å of the active site. Prediction efficiency was somewhat better than that of the best published method using a support vector machine.  相似文献   

16.
Knowledge of protein structural class can provide important information about its folding patterns. Many approaches have been developed for the prediction of protein structural classes. However, the information used by these approaches is primarily based on amino acid sequences. In this study, a novel method is presented to predict protein structural classes by use of chemical shift (CS) information derived from nuclear magnetic resonance spectra. Firstly, 399 non-homologue (about 15% identity) proteins were constructed to investigate the distribution of averaged CS values of six nuclei ((13)CO, (13)Cα, (13)Cβ, (1)HN, (1)Hα and (15)N) in three protein structural classes. Subsequently, support vector machine was proposed to predict three protein structural classes by using averaged CS information of six nuclei. Overall accuracy of jackknife cross-validation achieves 87.0%. Finally, the feature selection technique is applied to exclude redundant information and find out an optimized feature set. Results show that the overall accuracy increased to 88.0% by using the averaged CSs of (13)CO, (1)Hα and (15)N. The proposed approach outperformed other state-of-the-art methods in terms of predictive accuracy in particular for low-similarity protein data. We expect that our proposed approach will be an excellent alternative to traditional methods for protein structural class prediction.  相似文献   

17.
Analysis of cellular protein patterns by computer-aided 2-dimensional gel electrophoresis together with recent advances in protein sequence analysis have made possible the establishment of comprehensive 2-dimensional gel protein databases that may link protein and DNA information and that offer a global approach to the study of the cell. Using the integrated approach offered by 2-dimensional gel protein databases it is now possible to reveal phenotype specific protein (or proteins), to microsequence them, to search for homology with previously identified proteins, to clone the cDNAs, to assign partial protein sequence to genes for which the full DNA sequence and the chromosome location is known, and to study the regulatory properties and function of groups of proteins that are coordinately expressed in a given biological process. Human 2-dimensional gel protein databases are becoming increasingly important in view of the concerted effort to map and sequence the entire genome.  相似文献   

18.
庄永龙  周敏  李衍达  沈岩 《遗传》2004,26(4):514-518
随着人类基因组序列草图的完成,基因组突变的研究显得日益重要,而越来越多的突变信息的积累,使得各种突变数据库相继诞生。本文根据各种数据库的功能,对目前的人类突变相关数据库资源进行了分类总结,分类为突变数据库、单核苷酸多态信息数据库、与疾病相关的突变数据库、突变对蛋白质的影响、突变图谱以及特定基因的突变信息,分析该如何合理使用这些遗传突变数据资源,以及目前的突变数据库所存在的问题。Abstract:Researches on genome mutation are becoming more and more important with the finish of human genome DNA draft. This review is to classify the existing human mutation databases, including mutation database, SNP(single nucleotide polymorphisms) databases, mutation databases about disease, mutation databases about proteins, mutation databases about map and mutation information about specific gene. We also give advice on how to utilize these mutation databases, and discuss problems of existing databases.  相似文献   

19.
The Protein Circular Dichroism Data Bank (PCDDB) [https://pcddb.cryst.bbk.ac.uk] is an established resource for the biological, biophysical, chemical, bioinformatics, and molecular biology communities. It is a freely-accessible repository of validated protein circular dichroism (CD) spectra and associated sample and metadata, with entries having links to other bioinformatics resources including, amongst others, structure (PDB), AlphaFold, and sequence (UniProt) databases, as well as to published papers which produced the data and cite the database entries. It includes primary (unprocessed) and final (processed) spectral data, which are available in both text and pictorial formats, as well as detailed sample and validation information produced for each of the entries. Recently the metadata content associated with each of the entries, as well as the number and structural breadth of the protein components included, have been expanded. The PCDDB includes data on both wild-type and mutant proteins, and because CD studies primarily examine proteins in solution, it also contains examples of the effects of different environments on their structures, plus thermal unfolding/folding series. Methods for both sequence and spectral comparisons are included.The data included in the PCDDB complement results from crystal, cryo-electron microscopy, NMR spectroscopy, bioinformatics characterisations and classifications, and other structural information available for the proteins via links to other databases. The entries in the PCDDB have been used for the development of new analytical methodologies, for interpreting spectral and other biophysical data, and for providing insight into structures and functions of individual soluble and membrane proteins and protein complexes.  相似文献   

20.
BackgroundMembrane proteins play important roles in cell survival and cell communication, as they function as transporters, receptors, anchors and enzymes. They are also potential targets for drugs that block receptors or inhibit enzymes related to diseases. Although the number of known structures of membrane proteins is still small relative to the size of the proteome as a whole, many new membrane protein structures have been determined recently.Scope of the articleWe compared and analyzed the widely used membrane protein databases, mpstruc, Orientations of Proteins in Membranes (OPM), and PDBTM, as well as the extended dataset of mpstruc based on sequence similarity, the PDB structures whose classification field indicates that they are “membrane proteins” and the proteins with Structural Classification of Proteins (SCOP) class-f domains. We evaluated the relationships between these databases or datasets based on the overlap in their contents and the degree of consistency in the structural, topological, and functional classifications and in the transmembrane domain assignment.Major conclusionsThe membrane databases differ from each other in their coverage, and in the criteria that they use for annotation and classification. To ensure the efficient use of these databases, it is important to understand their differences and similarities. The establishment of more detailed and consistent annotations for the sequence, structure, membrane association, and function of membrane proteins is still required.General significanceConsidering the recent growth of experimentally determined structures, a broad survey and cumulative analysis of the sum of knowledge as presented in the membrane protein structure databases can be helpful to elucidate structures and functions of membrane proteins. We also aim to provide a framework for future research and classification of membrane proteins.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号