期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

tsORFdb: Theoretical Small Open Reading Frames (ORFs) database and massProphet: Peptide Mass Fingerprinting (PMF) tool for unknown small functional ORFs

Hyoung-Sam Heo Sanghyuk Lee Yeon Ja Choi S. June Oh 《Biochemical and biophysical research communications》2010,397(1):120-126

Peptide mass fingerprinting (PMF) has become one of the most widely used methods for rapid identification of proteins in proteomics research. Many peaks, however, remain unassigned after PMF analysis, partly because of post-translational modification and the limited scope of protein sequences. Almost all PMF tools employ only known or predicted protein sequences and do not include open reading frames (ORFs) in the genome, which eliminates the chance of finding novel functional peptides. Unlike most tools that search protein sequences from known coding sequences, the tool we developed uses a database for theoretical small ORFs (tsORFs) and a PMF application using a tsORFs database (tsORFdb). The tsORFdb is a database for ORFeome that encompasses all potential tsORFs derived from whole genome sequences as well as the predicted ones. The massProphet system tries to extend the search scope to include the ORFeome using the tsORFdb. The tsORFdb and massProphet should be useful for proteomics research to give information about unknown small ORFs as well as predicted and registered proteins. 相似文献

2.

hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes

Lamesch P Li N Milstein S Fan C Hao T Szabo G Hu Z Venkatesan K Bethel G Martin P Rogers J Lawlor S McLaren S Dricot A Borick H Cusick ME Vandenhaute J Dunham I Hill DE Vidal M 《Genomics》2007,89(3):307-315

Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs (http://horfdb.dfci.harvard.edu). This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets. 相似文献

3.

ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe

Matsuyama A Arai R Yashiroda Y Shirai A Kamata A Sekido S Kobayashi Y Hashimoto A Hamamoto M Hiraoka Y Horinouchi S Yoshida M 《Nature biotechnology》2006,24(7):841-847

Cloning of the entire set of an organism's protein-coding open reading frames (ORFs), or 'ORFeome', is a means of connecting the genome to downstream 'omics' applications. Here we report a proteome-scale study of the fission yeast Schizosaccharomyces pombe based on cloning of the ORFeome. Taking advantage of a recombination-based cloning system, we obtained 4,910 ORFs in a form that is readily usable in various analyses. First, we evaluated ORF prediction in the fission yeast genome project by expressing each ORF tagged at the 3' terminus. Next, we determined the localization of 4,431 proteins, corresponding to approximately 90% of the fission yeast proteome, by tagging each ORF with the yellow fluorescent protein. Furthermore, using leptomycin B, an inhibitor of the nuclear export protein Crm1, we identified 285 proteins whose localization is regulated by Crm1. 相似文献

4.

New insights into chemical biology from ORFeome libraries 总被引：1，自引：0，他引：1

Yashiroda Y Matsuyama A Yoshida M 《Current opinion in chemical biology》2008,12(1):55-59

As the genomes of many organisms have been sequenced, a variety of global analyses, called 'omics,' have been initiated. Cloning of the set of all open reading frames encoded by the genome (ORFeome) of an organism is a major challenge, which serves as an indispensable provision before one launches into the ocean of the postgenomic world. A suitable strategy for high-throughput cloning and expression of thousands of genes is crucial to success. Recently developed systems employing site-specific or homologous recombination have made it feasible to manipulate thousands of ORFs en masse. Using these technologies, several recent studies have successfully fished biologically active small molecules and target proteins out of this bountiful ocean. 相似文献

5.

Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes 总被引：13，自引：0，他引：13

下载免费PDF全文

Gong W Shen YP Ma LG Pan Y Du YL Wang DH Yang JY Hu LD Liu XF Dong CX Ma L Chen YH Yang XY Gao Y Zhu D Tan X Mu JY Zhang DB Liu YL Dinesh-Kumar SP Li Y Wang XP Gu HY Qu LJ Bai SN Lu YT Li JY Zhao JD Zuo J Huang H Deng XW Zhu YX 《Plant physiology》2004,135(2):773-782

相似文献

6.

ORFeome projects: gateway between genomics and omics

Rual JF Hill DE Vidal M 《Current opinion in chemical biology》2004,8(1):20-25

The availability of entire genome sequences is expected to revolutionize the way in which biology and medicine are conducted for years to come. However, achieving this promise still requires significant effort in the areas of gene annotation, cloning and expression of thousands of known and heretofore unknown protein-encoding genes. Traditional technologies of manipulating genes are too cumbersome and inefficient when one is dealing with more than a few genes at a time. Entire libraries composed of all protein-encoding open reading frames (ORFs) cloned in highly flexible vectors will be needed to take full advantage of the information found in any genome sequence. The creation of such ORFeome resources using novel technologies for cloning and expressing entire proteomes constitutes an effective gateway from whole genome sequencing efforts to downstream 'omics' applications. 相似文献

7.

Discovering neuropeptides in Caenorhabditis elegans by two dimensional liquid chromatography and mass spectrometry

Husson SJ Clynen E Baggerman G De Loof A Schoofs L 《Biochemical and biophysical research communications》2005,335(1):76-86

Completion of the Caenorhabditis elegans genome sequencing project in 1998 has provided more insight into the complexity of nematode neuropeptide signaling. Several C. elegans neuropeptide precursor genes, coding for approximately 250 peptides, have been predicted from the genomic database. One can, however, not deduce whether all these peptides are actually expressed, nor is it possible to predict all post-translational modifications. Using two dimensional nanoscale liquid chromatography combined with tandem mass spectrometry and database mining, we analyzed a mixed stage C. elegans extract. This peptidomic setup yielded 21 peptides derived from formerly predicted neuropeptide-like protein (NLP) precursors and 28 predicted FMRFamide-related peptides. In addition, we were able to sequence 11 entirely novel peptides derived from nine peptide precursors that were not predicted or identified in any way previously. Some of the identified peptides display profound sequence similarities with neuropeptides from other invertebrates, indicating that these peptides have a long evolutionary history. 相似文献

8.

High-throughput expression,purification, and characterization of recombinant Caenorhabditis elegans proteins

Huang RY Boulton SJ Vidal M Almo SC Bresnick AR Chance MR 《Biochemical and biophysical research communications》2003,307(4):928-934

Modern proteomics approaches include techniques to examine the expression, localization, modifications, and complex formation of proteins in cells. In order to address issues of protein function in vitro using classical biochemical and biophysical approaches, high-throughput methods of cloning the appropriate reading frames, and expressing and purifying proteins efficiently are an important goal of modern proteomics approaches. This process becomes more difficult as functional proteomics efforts focus on the proteins from higher organisms, since issues of correctly identifying intron-exon boundaries and efficiently expressing and solubilizing the (often) multi-domain proteins from higher eukaryotes are challenging. Recently, 12,000 open-reading-frame (ORF) sequences from Caenorhabditis elegans have become available for functional proteomics studies [Nat. Gen. 34 (2003) 35]. We have implemented a high-throughput screening procedure to express, purify, and analyze by mass spectrometry hexa-histidine-tagged C. elegans ORFs in Escherichia coli using metal affinity ZipTips. We find that over 65% of the expressed proteins are of the correct mass as analyzed by matrix-assisted laser desorption MS. Many of the remaining proteins indicated to be "incorrect" can be explained by high-throughput cloning or genome database annotation errors. This provides a general understanding of the expected error rates in such high-throughput cloning projects. The ZipTip purified proteins can be further analyzed under both native and denaturing conditions for functional proteomics efforts. 相似文献

9.

Pooled ORF expression technology (POET): using proteomics to screen pools of open reading frames for protein expression

Gillette WK Esposito D Frank PH Zhou M Yu LR Jozwik C Zhang X McGowan B Jacobowitz DM Pollard HB Hao T Hill DE Vidal M Conrads TP Veenstra TD Hartley JL 《Molecular & cellular proteomics : MCP》2005,4(11):1647-1652

We have developed a pooled ORF expression technology, POET, that uses recombinational cloning and proteomic methods (two-dimensional gel electrophoresis and mass spectrometry) to identify ORFs that when expressed are likely to yield high levels of soluble, purified protein. Because the method works on pools of ORFs, the procedures needed to subclone, express, purify, and assay protein expression for hundreds of clones are greatly simplified. Small scale expression and purification of 12 positive clones identified by POET from a pool of 688 Caenorhabditis elegans ORFs expressed in Escherichia coli yielded on average 6 times as much protein as 12 negative clones. Larger scale expression and purification of six of the positive clones yielded 47-374 mg of purified protein/liter. Using POET, pools of ORFs can be constructed, and the pools of the resulting proteins can be analyzed and manipulated to rapidly acquire information about the attributes of hundreds of proteins simultaneously. 相似文献

10.

Coverage of whole proteome by structural genomics observed through protein homology modeling database

Yura K Yamaguchi A Go M 《Journal of structural and functional genomics》2006,7(2):65-76

We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics. 相似文献

11.

Comparative analysis of various gene finders specific to Caenorhabditis elegans genome

下载免费PDF全文

Kashyap L Tabish M 《Bioinformation》2006,1(6):203-207

Computational gene prediction and identifying alternatively spliced isoforms have always been a challenging task. In this paper, we describe the performance of three gene/exon finding programmes namely Fex, Gen view2 and Gene builder capable of predicting open reading frames or exons for a given set of sequences from C. elegans genome. The predicted exons were compared with the 'sequencing consortium' identified exons and degree of consensus among them is discussed. We found that exon prediction by Fex was similar to the consortium prediction as compared to Gen view2 and Gene builder results. Interestingly, some exons (six exons in five genes) predicted positive only by Fex and not by the 'sequencing consortium' are found at the C. elegans EST database. This data is critical for further debate and discussion on gene finding in C. elegans. 相似文献

12.

Identity of the F52F12.1 gene product in Caenorhabditis elegans as an organic cation transporter

Wu X Fei YJ Huang W Chancy C Leibach FH Ganapathy V 《Biochimica et biophysica acta》1999,1418(1):239-244

We describe here the cloning and functional characterization of an organic cation transporter from Caenorhabditis elegans (CeOCT1). The CeOCT1 cDNA is 1826 bp long and codes for a protein of 568 amino acids. The oct1 gene is approximately 3.2 kb in size and consists of 12 exons. The location of this gene corresponds to the F52F12.1 gene locus on chromosome I. The predicted protein contains 12 putative transmembrane domains. It exhibits significant homology to mammalian OCTs. When expressed in mammalian cells, CeOCT1 induces the transport of the prototypical organic cation tetraethylammonium. The Michaelis-Menten constant for this substrate is 80+/-16 microM. The substrate specificity of CeOCT1 is broad. This represents the first report on the cloning and functional characteristics of an organic cation transporter from C. elegans. 相似文献

13.

A public genome-scale lentiviral expression library of human ORFs

Yang X Boehm JS Yang X Salehi-Ashtiani K Hao T Shen Y Lubonja R Thomas SR Alkan O Bhimdi T Green TM Johannessen CM Silver SJ Nguyen C Murray RR Hieronymus H Balcha D Fan C Lin C Ghamsari L Vidal M Hahn WC Hill DE Root DE 《Nature methods》2011,8(8):659-661

Functional characterization of the human genome requires tools for systematically modulating gene expression in both loss-of-function and gain-of-function experiments. We describe the production of a sequence-confirmed, clonal collection of over 16,100 human open-reading frames (ORFs) encoded in a versatile Gateway vector system. Using this ORFeome resource, we created a genome-scale expression collection in a lentiviral vector, thereby enabling both targeted experiments and high-throughput screens in diverse cell types. 相似文献

14.

Articles selected by Faculty of 1000: Neurospora crassa genome sequence; human c-Myc target genes; prokaryote genome annotation database; diversity of marine eukaryotes; transcriptional regulation by a pseudogene

《Genome biology》2003,4(6):1-3

A selection of evaluations from Faculty of 1000 covering the synthesis of sugar arrays, C. elegans ORFeome v1.1, a novel method for identifying tRNA genes, a step towards analysis of the human serum proteome and rates of molecular divergence in rearranged chromosomes. 相似文献

15.

Novel identification of expressed genes and functional classification of hypothetical proteins from Neisseria meningitidis serogroup A

Bernardini G Arena S Braconi D Scaloni A Santucci A 《Proteomics》2007,7(18):3342-3347

To implement the 2-DE database of serogroup A Neisseria meningitidis (MenA) and improve its potential of investigation in bacterial biology, cell extracts were separated by tricine-SDS-PAGE and 131 novel proteins were identified by microLC-ESI-IT-MS/MS. These identifications extended to 404, the number of MenA gene expression products characterized at the proteome level, approximately covering 20% of the total ORFs predicted from genome sequence. This technical approach was particularly useful in ascertaining expression of ribosomal as well as hypothetical proteins. Particular attention was paid to functional characterization of hypothetical proteins by means of software analyses and database searches. 相似文献

16.

Identification of an endo-beta-N-acetylglucosaminidase gene in Caenorhabditis elegans and its expression in Escherichia coli

Kato T Fujita K Takeuchi M Kobayashi K Natsuka S Ikura K Kumagai H Yamamoto K 《Glycobiology》2002,12(10):581-587

We report the identification, molecular cloning, and characterization of an endo-beta-N-acetylglucosaminidase from the nematode Caenorhabditis elegans. A search of the C. elegans genome database revealed the existence of a gene exhibiting 34% identity to Mucor hiemalis (a fungus) endo-beta-N-acetylglucosaminidase (Endo-M). Actually, the C. elegans extract contained endo-beta-N-acetylglucosaminidase activity. The putative cDNA for the C. elegans endo-beta-N-acetylglucosaminidase (Endo-CE) was amplified by polymerase chain reaction from the Uni-ZAP XR library, cloned, and sequenced. The recombinant Endo-CE expressed in Escherichia coli exhibited substrate specificity mainly for high-mannose type oligosaccharides. Man(8)GlcNAc(2) was the best substrate for Endo-CE, and Man(3)GlcNAc(2) was also hydrolyzed. Biantennary complex type oligosaccharides were poor substrates, and triantennary complex substrates were not hydrolyzed. Its substrate specificity was similar to those of Endo-M and endo-beta-N-acetylglucosaminidase from hen oviduct. Endo-CE was confirmed to exhibit transglycosylation activity, as seen for some microbial endo-beta-N-acetylglucosaminidases. This is the first report of the molecular cloning of an endo-beta-N-acetylglucosaminidase gene from a multicellular organism, which shows the possibility of using this well-characterized nematode as a model system for elucidating the role of this enzyme. 相似文献

17.

Isoform discovery by targeted cloning, 'deep-well' pooling and parallel sequencing

Salehi-Ashtiani K Yang X Derti A Tian W Hao T Lin C Makowski K Shen L Murray RR Szeto D Tusneem N Smith DR Cusick ME Hill DE Roth FP Vidal M 《Nature methods》2008,5(7):597-600

相似文献

18.

Construction and expression of sugar kinase transcriptional gene fusions by using the Sinorhizobium meliloti ORFeome

Humann JL Schroeder BK Mortimer MW House BL Yurgel SN Maloney SC Ward KL Fallquist HM Ziemkiewicz HT Kahn ML 《Applied and environmental microbiology》2008,74(21):6756-6765

相似文献

19.

Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. 总被引：10，自引：2，他引：10

下载免费PDF全文

M Borodovsky K E Rudd E V Koonin 《Nucleic acids research》1994,22(22):4756-4767

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. 相似文献

20.

The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis 总被引：4，自引：0，他引：4

Ramezani-Rad M Hollenberg CP Lauber J Wedler H Griess E Wagner C Albermann K Hani J Piontek M Dahlems U Gellissen G 《FEMS yeast research》2003,4(2):207-215

The methylotrophic yeast Hansenula polymorpha is a recognised model system for investigation of peroxisomal function, special metabolic pathways like methanol metabolism, of nitrate assimilation or thermostability. Strain RB11, an odc1 derivative of the particular H. polymorpha isolate CBS4732 (synonymous to ATCC34438, NRRL-Y-5445, CCY38-22-2) has been developed as a platform for heterologous gene expression. The scientific and industrial significance of this organism is now being met by the characterisation of its entire genome. The H. polymorpha RB11 genome consists of approximately 9.5 Mb and is organised as six chromosomes ranging in size from 0.9 to 2.2 Mb. Over 90% of the genome was sequenced with concomitant high accuracy and assembled into 48 contigs organised on eight scaffolds (supercontigs). After manual annotation 4767 out of 5933 open reading frames (ORFs) with significant homologies to a non-redundant protein database were predicted. The remaining 1166 ORFs showed no significant similarity to known proteins. The number of ORFs is comparable to that of other sequenced budding yeasts of similar genome size. 相似文献