首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Peptide mass fingerprinting (PMF) has become one of the most widely used methods for rapid identification of proteins in proteomics research. Many peaks, however, remain unassigned after PMF analysis, partly because of post-translational modification and the limited scope of protein sequences. Almost all PMF tools employ only known or predicted protein sequences and do not include open reading frames (ORFs) in the genome, which eliminates the chance of finding novel functional peptides. Unlike most tools that search protein sequences from known coding sequences, the tool we developed uses a database for theoretical small ORFs (tsORFs) and a PMF application using a tsORFs database (tsORFdb). The tsORFdb is a database for ORFeome that encompasses all potential tsORFs derived from whole genome sequences as well as the predicted ones. The massProphet system tries to extend the search scope to include the ORFeome using the tsORFdb. The tsORFdb and massProphet should be useful for proteomics research to give information about unknown small ORFs as well as predicted and registered proteins.  相似文献   

2.
Complete sets of cloned protein-encoding open reading frames (ORFs), or ORFeomes, are essential tools for large-scale proteomics and systems biology studies. Here we describe human ORFeome version 3.1 (hORFeome v3.1), currently the largest publicly available resource of full-length human ORFs (available at ). Generated by Gateway recombinational cloning, this collection contains 12,212 ORFs, representing 10,214 human genes, and corresponds to a 51% expansion of the original hORFeome v1.1. An online human ORFeome database, hORFDB, was built and serves as the central repository for all cloned human ORFs (http://horfdb.dfci.harvard.edu). This expansion of the original ORFeome resource greatly increases the potential experimental search space for large-scale proteomics studies, which will lead to the generation of more comprehensive datasets.  相似文献   

3.
Cloning of the entire set of an organism's protein-coding open reading frames (ORFs), or 'ORFeome', is a means of connecting the genome to downstream 'omics' applications. Here we report a proteome-scale study of the fission yeast Schizosaccharomyces pombe based on cloning of the ORFeome. Taking advantage of a recombination-based cloning system, we obtained 4,910 ORFs in a form that is readily usable in various analyses. First, we evaluated ORF prediction in the fission yeast genome project by expressing each ORF tagged at the 3' terminus. Next, we determined the localization of 4,431 proteins, corresponding to approximately 90% of the fission yeast proteome, by tagging each ORF with the yellow fluorescent protein. Furthermore, using leptomycin B, an inhibitor of the nuclear export protein Crm1, we identified 285 proteins whose localization is regulated by Crm1.  相似文献   

4.
New insights into chemical biology from ORFeome libraries   总被引:1,自引:0,他引:1  
As the genomes of many organisms have been sequenced, a variety of global analyses, called 'omics,' have been initiated. Cloning of the set of all open reading frames encoded by the genome (ORFeome) of an organism is a major challenge, which serves as an indispensable provision before one launches into the ocean of the postgenomic world. A suitable strategy for high-throughput cloning and expression of thousands of genes is crucial to success. Recently developed systems employing site-specific or homologous recombination have made it feasible to manipulate thousands of ORFs en masse. Using these technologies, several recent studies have successfully fished biologically active small molecules and target proteins out of this bountiful ocean.  相似文献   

5.
6.
The availability of entire genome sequences is expected to revolutionize the way in which biology and medicine are conducted for years to come. However, achieving this promise still requires significant effort in the areas of gene annotation, cloning and expression of thousands of known and heretofore unknown protein-encoding genes. Traditional technologies of manipulating genes are too cumbersome and inefficient when one is dealing with more than a few genes at a time. Entire libraries composed of all protein-encoding open reading frames (ORFs) cloned in highly flexible vectors will be needed to take full advantage of the information found in any genome sequence. The creation of such ORFeome resources using novel technologies for cloning and expressing entire proteomes constitutes an effective gateway from whole genome sequencing efforts to downstream 'omics' applications.  相似文献   

7.
Completion of the Caenorhabditis elegans genome sequencing project in 1998 has provided more insight into the complexity of nematode neuropeptide signaling. Several C. elegans neuropeptide precursor genes, coding for approximately 250 peptides, have been predicted from the genomic database. One can, however, not deduce whether all these peptides are actually expressed, nor is it possible to predict all post-translational modifications. Using two dimensional nanoscale liquid chromatography combined with tandem mass spectrometry and database mining, we analyzed a mixed stage C. elegans extract. This peptidomic setup yielded 21 peptides derived from formerly predicted neuropeptide-like protein (NLP) precursors and 28 predicted FMRFamide-related peptides. In addition, we were able to sequence 11 entirely novel peptides derived from nine peptide precursors that were not predicted or identified in any way previously. Some of the identified peptides display profound sequence similarities with neuropeptides from other invertebrates, indicating that these peptides have a long evolutionary history.  相似文献   

8.
Modern proteomics approaches include techniques to examine the expression, localization, modifications, and complex formation of proteins in cells. In order to address issues of protein function in vitro using classical biochemical and biophysical approaches, high-throughput methods of cloning the appropriate reading frames, and expressing and purifying proteins efficiently are an important goal of modern proteomics approaches. This process becomes more difficult as functional proteomics efforts focus on the proteins from higher organisms, since issues of correctly identifying intron-exon boundaries and efficiently expressing and solubilizing the (often) multi-domain proteins from higher eukaryotes are challenging. Recently, 12,000 open-reading-frame (ORF) sequences from Caenorhabditis elegans have become available for functional proteomics studies [Nat. Gen. 34 (2003) 35]. We have implemented a high-throughput screening procedure to express, purify, and analyze by mass spectrometry hexa-histidine-tagged C. elegans ORFs in Escherichia coli using metal affinity ZipTips. We find that over 65% of the expressed proteins are of the correct mass as analyzed by matrix-assisted laser desorption MS. Many of the remaining proteins indicated to be "incorrect" can be explained by high-throughput cloning or genome database annotation errors. This provides a general understanding of the expected error rates in such high-throughput cloning projects. The ZipTip purified proteins can be further analyzed under both native and denaturing conditions for functional proteomics efforts.  相似文献   

9.
We have developed a pooled ORF expression technology, POET, that uses recombinational cloning and proteomic methods (two-dimensional gel electrophoresis and mass spectrometry) to identify ORFs that when expressed are likely to yield high levels of soluble, purified protein. Because the method works on pools of ORFs, the procedures needed to subclone, express, purify, and assay protein expression for hundreds of clones are greatly simplified. Small scale expression and purification of 12 positive clones identified by POET from a pool of 688 Caenorhabditis elegans ORFs expressed in Escherichia coli yielded on average 6 times as much protein as 12 negative clones. Larger scale expression and purification of six of the positive clones yielded 47-374 mg of purified protein/liter. Using POET, pools of ORFs can be constructed, and the pools of the resulting proteins can be analyzed and manipulated to rapidly acquire information about the attributes of hundreds of proteins simultaneously.  相似文献   

10.
We have been developing FAMSBASE, a protein homology-modeling database of whole ORFs predicted from genome sequences. The latest update of FAMSBASE (), which is based on the protein three-dimensional (3D) structures released by November 2003, contains modeled 3D structures for 368,724 open reading frames (ORFs) derived from genomes of 276 species, namely 17 archaebacterial, 130 eubacterial, 18 eukaryotic and 111 phage genomes. Those 276 genomes are predicted to have 734,193 ORFs in total and the current FAMSBASE contains protein 3D structure of approximately 50% of the ORF products. However, cases that a modeled 3D structure covers the whole part of an ORF product are rare. When portion of an ORF with 3D structure is compared in three kingdoms of life, in archaebacteria and eubacteria, approximately 60% of the ORFs have modeled 3D structures covering almost the entire amino acid sequences, however, the percentage falls to about 30% in eukaryotes. When annual differences in the number of ORFs with modeled 3D structure are calculated, the fraction of modeled 3D structures of soluble protein for archaebacteria is increased by 5%, and that for eubacteria by 7% in the last 3 years. Assuming that this rate would be maintained and that determination of 3D structures for predicted disordered regions is unattainable, whole soluble protein model structures of prokaryotes without the putative disordered regions will be in hand within 15 years. For eukaryotic proteins, they will be in hand within 25 years. The 3D structures we will have at those times are not the 3D structure of the entire proteins encoded in single ORFs, but the 3D structures of separate structural domains. Measuring or predicting spatial arrangements of structural domains in an ORF will then be a coming issue of structural genomics.  相似文献   

11.
Computational gene prediction and identifying alternatively spliced isoforms have always been a challenging task. In this paper, we describe the performance of three gene/exon finding programmes namely Fex, Gen view2 and Gene builder capable of predicting open reading frames or exons for a given set of sequences from C. elegans genome. The predicted exons were compared with the 'sequencing consortium' identified exons and degree of consensus among them is discussed. We found that exon prediction by Fex was similar to the consortium prediction as compared to Gen view2 and Gene builder results. Interestingly, some exons (six exons in five genes) predicted positive only by Fex and not by the 'sequencing consortium' are found at the C. elegans EST database. This data is critical for further debate and discussion on gene finding in C. elegans.  相似文献   

12.
We describe here the cloning and functional characterization of an organic cation transporter from Caenorhabditis elegans (CeOCT1). The CeOCT1 cDNA is 1826 bp long and codes for a protein of 568 amino acids. The oct1 gene is approximately 3.2 kb in size and consists of 12 exons. The location of this gene corresponds to the F52F12.1 gene locus on chromosome I. The predicted protein contains 12 putative transmembrane domains. It exhibits significant homology to mammalian OCTs. When expressed in mammalian cells, CeOCT1 induces the transport of the prototypical organic cation tetraethylammonium. The Michaelis-Menten constant for this substrate is 80+/-16 microM. The substrate specificity of CeOCT1 is broad. This represents the first report on the cloning and functional characteristics of an organic cation transporter from C. elegans.  相似文献   

13.
Functional characterization of the human genome requires tools for systematically modulating gene expression in both loss-of-function and gain-of-function experiments. We describe the production of a sequence-confirmed, clonal collection of over 16,100 human open-reading frames (ORFs) encoded in a versatile Gateway vector system. Using this ORFeome resource, we created a genome-scale expression collection in a lentiviral vector, thereby enabling both targeted experiments and high-throughput screens in diverse cell types.  相似文献   

14.
《Genome biology》2003,4(6):1-3
A selection of evaluations from Faculty of 1000 covering the synthesis of sugar arrays, C. elegans ORFeome v1.1, a novel method for identifying tRNA genes, a step towards analysis of the human serum proteome and rates of molecular divergence in rearranged chromosomes.  相似文献   

15.
To implement the 2-DE database of serogroup A Neisseria meningitidis (MenA) and improve its potential of investigation in bacterial biology, cell extracts were separated by tricine-SDS-PAGE and 131 novel proteins were identified by microLC-ESI-IT-MS/MS. These identifications extended to 404, the number of MenA gene expression products characterized at the proteome level, approximately covering 20% of the total ORFs predicted from genome sequence. This technical approach was particularly useful in ascertaining expression of ribosomal as well as hypothetical proteins. Particular attention was paid to functional characterization of hypothetical proteins by means of software analyses and database searches.  相似文献   

16.
We report the identification, molecular cloning, and characterization of an endo-beta-N-acetylglucosaminidase from the nematode Caenorhabditis elegans. A search of the C. elegans genome database revealed the existence of a gene exhibiting 34% identity to Mucor hiemalis (a fungus) endo-beta-N-acetylglucosaminidase (Endo-M). Actually, the C. elegans extract contained endo-beta-N-acetylglucosaminidase activity. The putative cDNA for the C. elegans endo-beta-N-acetylglucosaminidase (Endo-CE) was amplified by polymerase chain reaction from the Uni-ZAP XR library, cloned, and sequenced. The recombinant Endo-CE expressed in Escherichia coli exhibited substrate specificity mainly for high-mannose type oligosaccharides. Man(8)GlcNAc(2) was the best substrate for Endo-CE, and Man(3)GlcNAc(2) was also hydrolyzed. Biantennary complex type oligosaccharides were poor substrates, and triantennary complex substrates were not hydrolyzed. Its substrate specificity was similar to those of Endo-M and endo-beta-N-acetylglucosaminidase from hen oviduct. Endo-CE was confirmed to exhibit transglycosylation activity, as seen for some microbial endo-beta-N-acetylglucosaminidases. This is the first report of the molecular cloning of an endo-beta-N-acetylglucosaminidase gene from a multicellular organism, which shows the possibility of using this well-characterized nematode as a model system for elucidating the role of this enzyme.  相似文献   

17.
18.
19.
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.  相似文献   

20.
The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis   总被引:4,自引:0,他引:4  
The methylotrophic yeast Hansenula polymorpha is a recognised model system for investigation of peroxisomal function, special metabolic pathways like methanol metabolism, of nitrate assimilation or thermostability. Strain RB11, an odc1 derivative of the particular H. polymorpha isolate CBS4732 (synonymous to ATCC34438, NRRL-Y-5445, CCY38-22-2) has been developed as a platform for heterologous gene expression. The scientific and industrial significance of this organism is now being met by the characterisation of its entire genome. The H. polymorpha RB11 genome consists of approximately 9.5 Mb and is organised as six chromosomes ranging in size from 0.9 to 2.2 Mb. Over 90% of the genome was sequenced with concomitant high accuracy and assembled into 48 contigs organised on eight scaffolds (supercontigs). After manual annotation 4767 out of 5933 open reading frames (ORFs) with significant homologies to a non-redundant protein database were predicted. The remaining 1166 ORFs showed no significant similarity to known proteins. The number of ORFs is comparable to that of other sequenced budding yeasts of similar genome size.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号