首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Discrimination of outer membrane proteins using support vector machines   总被引:3,自引:0,他引:3  
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for dissecting outer membrane proteins (OMPs) from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have developed a method based on support vector machines using amino acid composition and residue pair information. Our approach with amino acid composition has correctly predicted the OMPs with a cross-validated accuracy of 94% in a set of 208 proteins. Further, this method has successfully excluded 633 of 673 globular proteins and 191 of 206 alpha-helical membrane proteins. We obtained an overall accuracy of 92% for correctly picking up the OMPs from a dataset of 1087 proteins belonging to all different types of globular and membrane proteins. Furthermore, residue pair information improved the accuracy from 92 to 94%. This accuracy of discriminating OMPs is higher than that of other methods in the literature, which could be used for dissecting OMPs from genomic sequences. AVAILABILITY: Discrimination results are available at http://tmbeta-svm.cbrc.jp.  相似文献   

2.
3.
Mono-ADP-ribosylation is one of the posttranslational protein modifications regulating cellular metabolism, e.g., nitrogen fixation, in prokaryotes. Several bacterial toxins mono-ADP-ribosylate and inactivate specific proteins in their animal hosts. Recently, two mammalian GPI-anchored cell surface enzymes with similar activities were cloned (designated ART1 and ART2). We have now identified six related expressed sequence tags (ESTs) in the public database and cloned the two novel human genes from which these are derived (designatedART3andART4). The deduced amino acid sequences of the predicted gene products show 28% sequence identity to one another and 32–41% identity vs the muscle and T cell enzymes. They contain signal peptide sequences characteristic of GPI anchorage. Southern Zoo blot analyses suggest the presence of related genes in other mammalian species. By PCR screening of somatic cell hybrids and byin situhybridization, we have mapped the two genes to human chromosomes 4p14–p15.1 and 12q13.2–q13.3. Northern blot analyses show that these genes are specifically expressed in testis and spleen, respectively. Comparison of genomic and cDNA sequences reveals a conserved exon/intron structure, with an unusually large exon encoding the predicted mature membrane proteins. Secondary structure prediction analyses indicate conserved motifs and amino acid residues consistent with a common ancestry of this emerging mammalian enzyme family and bacterial mono(ADP-ribosyl)transferases. It is possible that the four human gene family members identified so far represent the “tip of an iceberg,” i.e., a larger family of enzymes that influences the function of target proteins via mono-ADP-ribosylation.  相似文献   

4.
5.
We present here the 2.6Å resolution crystal structure of the pT26‐6p protein, which is encoded by an ORF of the plasmid pT26‐2, recently isolated from the hyperthermophilic archaeon, Thermococcus sp. 26,2. This large protein is present in all members of a new family of mobile elements that, beside pT26‐2 include several virus‐like elements integrated in the genomes of several Thermococcales and Methanococcales (phylum Euryarchaeota). Phylogenetic analysis suggested that this protein, together with its nearest neighbor (organized as an operon) have coevolved for a long time with the cellular hosts of the encoding mobile element. As the sequences of the N and C‐terminal regions suggested a possible membrane association, a deletion construct (739 amino acids) was used for structural analysis. The structure consists of two very similar β‐sheet domains with a new topology and a five helical bundle C‐terminal domain. Each of these domains corresponds to a unique fold that has presently not been found in cellular proteins. This result supports the idea that proteins encoded by plasmid and viruses that have no cellular homologues could be a reservoir of new folds for structural genomic studies.  相似文献   

6.
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence‐search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino‐acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as “Protein Blocks” (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence‐search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z‐score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales‐up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web‐server that is freely available at http://www.bo‐protscience.fr/forsa .  相似文献   

7.
Polyglycine hydrolases are secreted fungal proteases that cleave glycine–glycine peptide bonds in the inter‐domain linker region of specific plant defense chitinases. Previously, we reported the catalytic activity of polyglycine hydrolases from the phytopathogens Epicoccum sorghi (Es‐cmp) and Cochliobolus carbonum (Bz‐cmp). Here we report the identity of their encoding genes and the primary amino acid sequences of the proteins responsible for these activities. Peptides from a tryptic digest of Es‐cmp were analyzed by LC‐MS/MS and the spectra obtained were matched to a draft genome sequence of E. sorghi. From this analysis, a 642 amino acid protein containing a predicted β‐lactamase catalytic region of 280 amino acids was identified. Heterologous strains of the yeast Pichia pastoris were created to express this protein and its homolog from C. carbonum from their cDNAs. Both strains produced recombinant proteins with polyglycine hydrolase activity as shown by SDS‐PAGE and MALDI‐MS based assays. Site directed mutagenesis was used to mutate the predicted catalytic serine of Es‐cmp to glycine, resulting in loss of catalytic activity. BLAST searching of publicly available fungal genomes identified full‐length homologous proteins in 11 other fungi of the class Dothideomycetes, and in three fungi of the related class Sordariomycetes while significant BLAST hits extended into the phylum Basidiomycota. Multiple sequence alignment led to the identification of a network of seven conserved tryptophans that surround the β‐lactamase‐like region. This is the first report of a predicted β‐lactamase that is an endoprotease.  相似文献   

8.
Two homologous 29 amino acid-long highly hydrophobic membrane miniproteins were identified in the Bligh–Dyer lipid extracts of Escherichia coli and Salmonella typhimurium using liquid chromatography/tandem mass spectrometry (LC/MS/MS). The amino acid sequences of the proteins were determined by collision-induced dissociation tandem mass spectrometry, in conjunction with a translating BLAST (tBLASTn) search, i.e., comparing the MS/MS-determined protein query sequence against the six-frame translations of the nucleotide sequences of the E. coli and S. typhimurium genomes. Further MS characterization revealed that both proteins retain the N-terminal initiating formyl-methionines. The methodologies described here may be amendable for detecting and characterizing small hydrophobic proteins in other organisms that are difficult to annotate or analyze by conventional methods.  相似文献   

9.
Vampirovibrio chlorellavorus is recognized as a pathogen of commercially‐relevant Chlorella species. Algal infection and total loss of productivity (biomass) often occurs when susceptible algal hosts are cultivated in outdoor open pond systems. The pathogenic life cycle of this bacterium has been inferred from laboratory and field observations, and corroborated in part by the genomic analyses for two Arizona isolates recovered from an open algal reactor. V. chlorellavorus predation has been reported to occur in geographically‐ and environmentally‐diverse conditions. Genomic analyses of these and additional field isolates is expected to reveal new information about the extent of ecological diversity and genes involved in host‐pathogen interactions. The draft genome sequences for two isolates of the predatory V. chlorellavorus (Cyanobacteria; Ca. Melainabacteria) from an outdoor cultivation system located in the Arizona Sonoran Desert were assembled and annotated. The genomes were sequenced and analyzed to identify genes (proteins) with predicted involvement in predation, infection, and cell death of Chlorella host species prioritized for biofuel production at sites identified as highly suitable for algal production in the southwestern USA. Genomic analyses identified several predicted genes encoding secreted proteins that are potentially involved in pathogenicity, and at least three apparently complete sets of virulence (Vir) genes, characteristic of the VirB‐VirD type system encoding the canonical VirB1‐11 and VirD4 proteins, respectively. Additional protein functions were predicted suggesting their involvement in quorum sensing and motility. The genomes of two previously uncharacterized V. chlorellavorus isolates reveal nucleotide and protein level divergence between each other, and a previously sequenced V. chlorellavorus genome. This new knowledge will enhance the fundamental understanding of trans‐kingdom interactions between a unique cosmopolitan cyanobacterial pathogen and its green microalgal host, of broad interest as a source of harvestable biomass for biofuels or bioproducts.  相似文献   

10.
We have performed an amino acid composition (AAC) analysis of the complete sequences for 235 secondary transport proteins from Escherichia coli, which have functions in the uptake and export of organic and inorganic metabolites, efflux of drugs and in controlling membrane potential. This revealed the trends in content for specific amino acid types and for combinations of amino acids with similar physicochemical properties. In certain proteins or groups of proteins, the so-called spikes of high content for a specific amino acid type or combination of amino acids were identified and confirmed statistically, which in some cases could be directly related to function and ligand specificity. This was prevalent in proteins with a function of multidrug or metal ion efflux. Any tool that can help in identifying bacterial multidrug efflux proteins is important for a better understanding of this mechanism of antibiotic resistance. Phylogenetic analysis based on sequence alignments and comparison of sequences at the N- and C-terminal ends confirmed transporter Family classification. Locations of specific amino acid types in some of the proteins that have crystal structures (EmrE, LacY, AcrB) were also considered to help link amino acid content with protein function. Though there are limitations, this work has demonstrated that a basic analysis of AAC is a useful tool to use in combination with other computational and experimental methods for classifying and investigating function and ligand specificity in a large group of transport or other membrane proteins, including those that are molecular targets for development of new drugs.  相似文献   

11.
Nine proteins secreted in the saliva of the pea aphid Acyrthosiphon pisum were identified by a proteomics approach using GE‐LC‐MS/MS and LC‐MS/MS, with reference to EST and genomic sequence data for A. pisum. Four proteins were identified by their sequences: a homolog of angiotensin‐converting enzyme (an M2 metalloprotease), an M1 zinc‐dependant metalloprotease, a glucose‐methanol‐choline (GMC)‐oxidoreductase and a homolog to regucalcin (also known as senescence marker protein 30). The other five proteins are not homologous to any previously described sequence and included an abundant salivary protein (represented by ACYPI009881), with a predicted length of 1161 amino acids and high serine, tyrosine and cysteine content. A. pisum feeds on plant phloem sap and the metalloproteases and regucalcin (a putative calcium‐binding protein) are predicted determinants of sustained feeding, by inactivation of plant protein defences and inhibition of calcium‐mediated occlusion of phloem sieve elements, respectively. The amino acid composition of ACYPI009881 suggests a role in the aphid salivary sheath that protects the aphid mouthparts from plant defences, and the oxidoreductase may promote gelling of the sheath protein or mediate oxidative detoxification of plant allelochemicals. Further salivary proteins are expected to be identified as more sensitive MS technologies are developed.  相似文献   

12.
We sequenced nearly the entire mitochondrial genome of Argyroneta aquatica, a wholly underwater‐living spider, thereby enhancing the available genomic information for Arachnida. The confirmed sequences contained the complete set of known genes present in other metazoan mitochondrial genomes. However, the mitochondrial gene order of A. aquatica was distinctly different from that of the most distant Chelicerata Limulus polyphemus (Xiphosura), probably because of a series of gene translocations and/or inversions. Comparison of arachnid mitochondrial gene orders for the purpose of phylogenetic inference is only minimally useful, but provides a strong signal in closely related lineages. To test the basal relationships and the evolutionary pattern of tRNA gene rearrangements among Arachnida, phylogenetic analyses using amino acid sequences of the 13 protein‐coding genes were performed. An interesting feature, the five 135‐bp tandem repeats and two 363‐bp tandem repeats, was identified in the putative control region. Although control region tandem repeats have been reported in many other arachnid and metazoan species, this is the first time it has been described in spiders.  相似文献   

13.
As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or fewer of genes can be predicted when a novel genome is decoded. A prediction method independent of the homology search is urgently needed. By analyzing oligonucleotide compositions in genomic sequences, we previously developed a modified Self-Organizing Map ‘BLSOM’ that clustered genomic fragments according to phylotype with no advance knowledge of phylotype. Using BLSOM for di-, tri- and tetrapeptide compositions, we developed a system to enable separation (self-organization) of proteins by function. Analyzing oligopeptide frequencies in proteins previously classified into COGs (clusters of orthologous groups of proteins), BLSOMs could faithfully reproduce the COG classifications. This indicated that proteins, whose functions are unknown because of lack of significant sequence similarity with function-known proteins, can be related to function-known proteins based on similarity in oligopeptide composition. BLSOM was applied to predict functions of vast quantities of proteins derived from mixed genomes in environmental samples.  相似文献   

14.
While micro‐organisms actively mediate and participate in freshwater ecosystem services, we know little about freshwater microbial genetic diversity. Genome sequences are available for many bacteria from the human microbiome and the ocean (over 800 and 200, respectively), but only two freshwater genomes are currently available: the streamlined genomes of Polynucleobacter necessarius ssp. asymbioticus and the Actinobacterium AcI‐B1. Here, we sequenced and analysed draft genomes of eight phylogentically diverse freshwater bacteria exhibiting a range of lifestyle characteristics. Comparative genomics of these bacteria reveals putative freshwater bacterial lifestyles based on differences in predicted growth rate, capability to respond to environmental stimuli and diversity of useable carbon substrates. Our conceptual model based on these genomic characteristics provides a foundation on which further ecophysiological and genomic studies can be built. In addition, these genomes greatly expand the diversity of existing genomic context for future studies on the ecology and genetics of freshwater bacteria.  相似文献   

15.
The proteome of Tropheryma whipplei, the intracellular bacterium responsible for Whipple's disease (WD), was analyzed using two complementary approaches: 2‐DE coupled with MALDI‐TOF and SDS‐PAGE with nanoLC‐MS/MS. This strategy led to the identification of 206 proteins of 808 predicted ORFs, resolving some questions raised by the genomic sequence of this bacterium. We successfully identified antibiotic targets and proteins with predicted N‐terminal signal sequences. Additionally, we identified a family of surface proteins (known as T. whipplei surface proteins (WiSPs)), which are encoded by a unique group of species‐specific genes and serve as both coding regions and DNA repeats that promote genomic recombination. Comparison of the protein expression profiles of the intracellular facultative host‐associated WD bacterium with other host‐associated, intracellular obligate, and environmental bacteria revealed that T. whipplei shares a proteomic expression profile with other host‐associated facultative intracellular bacteria. In summary, this study describes the global protein expression pattern of T. whipplei and reveals some specific features of the T. whipplei proteome.  相似文献   

16.
Proper protein localization is essential for critical cellular processes, including vesicle‐mediated transport and protein translocation. Tail‐anchored (TA) proteins are integrated into organellar membranes via the C‐terminus, orienting the N‐terminus towards the cytosol. Localization of TA proteins occurs posttranslationally and is governed by the C‐terminus, which contains the integral transmembrane domain (TMD) and targeting sequence. Targeting of TA proteins is dependent on the hydrophobicity of the TMD as well as the length and composition of flanking amino acid sequences. We previously identified an unusual homologue of elongator protein, Elp3, in the apicomplexan parasite Toxoplasma gondii as a TA protein targeting the outer mitochondrial membrane. We sought to gain further insight into TA proteins and their targeting mechanisms using this early‐branching eukaryote as a model. Our bioinformatics analysis uncovered 59 predicted TA proteins in Toxoplasma, 9 of which were selected for follow‐up analyses based on representative features. We identified novel TA proteins that traffic to specific organelles in Toxoplasma, including the parasite endoplasmic reticulum, mitochondrion, and Golgi apparatus. Domain swap experiments elucidated that targeting of TA proteins to these specific organelles was strongly influenced by the TMD sequence, including charge of the flanking C‐terminal sequence.   相似文献   

17.
Free energy of transferring amino acid side‐chains from aqueous environment into lipid bilayers, known as transfer free energy (TFE), provides important information on the thermodynamic stability of membrane proteins. In this study, we derived a TFE profile named General Transfer Free Energy Profile (GeTFEP) based on computation of the TFEs of 58 β‐barrel membrane proteins (βMPs). The GeTFEP agrees well with experimentally measured and computationally derived TFEs. Analysis based on the GeTFEP shows that residues in different regions of the transmembrane (TM) segments of βMPs have different roles during the membrane insertion process. Results further reveal the importance of the sequence pattern of TM strands in stabilizing βMPs in the membrane environment. In addition, we show that GeTFEP can be used to predict the positioning and the orientation of βMPs in the membrane. We also show that GeTFEP can be used to identify structurally or functionally important amino acid residue sites of βMPs. Furthermore, the TM segments of α‐helical membrane proteins can be accurately predicted with GeTFEP, suggesting that the GeTFEP is of general applicability in studying membrane protein.  相似文献   

18.
Three cDNA clones that hybridize to a partial rice cDNA that show similarity to bovine mitochondrial 2-oxoglutarate/malate translocator were isolated from leaves of Panicum miliaceum L. (proso millet), an NAD-malic enzyme-type C4 plant. The nucleotide sequences of the clones resemble each other, and some of the isolated cDNAs contained extra sequences that seemed to be introns. The predicted proteins encoded by the cDNAs have 302 amino acids and molecular weights of 32211 and 32150. The hydrophobic profile of the amino acid sequence predicted the existence of six transmembrane -helices that is a common property of members in the mitochondrial transporter family. The predicted amino acid sequence showed the highest similarity with that of the 2-oxoglutarate/malate translocator from mammalian mitochondria. An expression plasmid containing the coding region of the cDNAs was used to over-express recombinant protein with a C-terminal histidine tag Escherichia coli, which was affinity-purified. The antibody against the recombinant protein cross-reacted with proteins of 31–32 kDa in the membrane fraction from P. miliaceum mitochondria, but not with the chloroplast fraction. The recombinant protein reconstituted in liposomes efficiently transported malate, citrate, and 2-oxoglutarate.  相似文献   

19.
In plant genomes, the incorporation of DNA segments is not a common method of artificial gene transfer. Nevertheless, various segments of pararetroviruses have been found in plant genomes in recent decades. The rice genome contains a number of segments of endogenous rice tungro bacilliform virus‐like sequences (ERTBVs), many of which are present between AT dinucleotide repeats (ATrs). Comparison of genomic sequences between two closely related rice subspecies, japonica and indica, allowed us to verify the preferential insertion of ERTBVs into ATrs. In addition to ERTBVs, the comparative analyses showed that ATrs occasionally incorporate repeat sequences including transposable elements, and a wide range of other sequences. Besides the known genomic sequences, the insertion sequences also represented DNAs of unclear origins together with ERTBVs, suggesting that ATrs have integrated episomal DNAs that would have been suspended in the nucleus. Such insertion DNAs might be trapped by ATrs in the genome in a host‐dependent manner. Conversely, other simple mono‐ and dinucleotide sequence repeats (SSR) were less frequently involved in insertion events relative to ATrs. Therefore, ATrs could be regarded as hot spots of double‐strand breaks that induce non‐homologous end joining. The insertions within ATrs occasionally generated new gene‐related sequences or involved structural modifications of existing genes. Likewise, in a comparison between Arabidopsis thaliana and Arabidopsis lyrata, the insertions preferred ATrs to other SSRs. Therefore ATrs in plant genomes could be considered as genomic dumping sites that have trapped various DNA molecules and may have exerted a powerful evolutionary force.  相似文献   

20.
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time‐consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K‐nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state‐of‐the‐art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx . Proteins 2013; 81:1351–1362 © 2013 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号