首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 324 毫秒
1.
小开放阅读框(small open reading frame, sORF)广泛存在于不同生物基因组中,由于其序列短,以及编码的产物小蛋白(smallprotein,或称微蛋白;microprotein或迷你蛋白miniprotein)检测困难等原因,小开放阅读框长期未得到充分注释和研究。近年来,随着高通量测序、翻译组和质谱分析等技术的不断发展,在不同生物中发现大量新的小开放阅读框,其编码的小蛋白及介导的翻译调控已应用于药物开发及植物抗病机理等研究。但是,目前对微生物的小开放阅读框相关研究和应用还相对有限。本文综述了小开放阅读框编码产物小蛋白的发现和鉴定,以及上游开放阅读框(upstream open reading frame, uORF)对mRNA翻译调控等最新研究进展,重点介绍了微生物基因组中小开放阅读框的鉴定和功能研究进展,为深入认识微生物中小开放阅读框的功能和作用机制,以及植物和动物等高等其他生物的小蛋白和翻译调控相关研究提供参考。  相似文献   

2.
Peng J  Yang J  Jin Q 《PloS one》2011,6(4):e18509

Background

The completion of numerous genome sequences introduced an era of whole-genome study. However, many genes are missed during genome annotation, including small RNAs (sRNAs) and small open reading frames (sORFs). In order to improve genome annotation, we aimed to identify novel sRNAs and sORFs in Shigella, the principal etiologic agents of bacillary dysentery.

Methodology/Principal Findings

We identified 64 sRNAs in Shigella, which were experimentally validated in other bacteria based on sequence conservation. We employed computer-based and tiling array-based methods to search for sRNAs, followed by RT-PCR and northern blots, to identify nine sRNAs in Shigella flexneri strain 301 (Sf301) and 256 regions containing possible sRNA genes. We found 29 candidate sORFs using bioinformatic prediction, array hybridization and RT-PCR verification. We experimentally validated 557 (57.9%) DOOR operon predictions in the chromosomes of Sf301 and 46 (76.7%) in virulence plasmid.We found 40 additional co-expressed gene pairs that were not predicted by DOOR.

Conclusions/Significance

We provide an updated and comprehensive annotation of the Shigella genome. Our study increased the expected numbers of sORFs and sRNAs, which will impact on future functional genomics and proteomics studies. Our method can be used for large scale reannotation of sRNAs and sORFs in any microbe with a known genome sequence.  相似文献   

3.
Small proteins play essential roles in bacterial physiology and virulence, however, automated algorithms for genome annotation are often not yet able to accurately predict the corresponding genes. The accuracy and reliability of genome annotations, particularly for small open reading frames (sORFs), can be significantly improved by integrating protein evidence from experimental approaches. Here we present a highly optimized and flexible bioinformatics workflow for bacterial proteogenomics covering all steps from (i) generation of protein databases, (ii) database searches and (iii) peptide-to-genome mapping to (iv) visualization of results. We used the workflow to identify high quality peptide spectrum matches (PSMs) for small proteins (≤ 100 aa, SP100) in Staphylococcus aureus Newman. Protein extracts from S. aureus were subjected to different experimental workflows for protein digestion and prefractionation and measured with highly sensitive mass spectrometers. In total, 175 proteins with up to 100 aa (SP100) were identified. Out of these 24 (ranging from 9 to 99 aa) were novel and not contained in the used genome annotation.144 SP100 are highly conserved and were found in at least 50% of the publicly available S. aureus genomes, while 127 are additionally conserved in other staphylococci. Almost half of the identified SP100 were basic, suggesting a role in binding to more acidic molecules such as nucleic acids or phospholipids.  相似文献   

4.
The genome sequence of Manduca sexta was recently determined using 454 technology. Cufflinks and MAKER2 were used to establish gene models in the genome assembly based on the RNA-Seq data and other species' sequences. Aided by the extensive RNA-Seq data from 50 tissue samples at various life stages, annotators over the world (including the present authors) have manually confirmed and improved a small percentage of the models after spending months of effort. While such collaborative efforts are highly commendable, many of the predicted genes still have problems which may hamper future research on this insect species. As a biochemical model representing lepidopteran pests, M. sexta has been used extensively to study insect physiological processes for over five decades. In this work, we assembled Manduca datasets Cufflinks 3.0, Trinity 4.0, and Oases 4.0 to assist the manual annotation efforts and development of Official Gene Set (OGS) 2.0. To further improve annotation quality, we developed methods to evaluate gene models in the MAKER2, Cufflinks, Oases and Trinity assemblies and selected the best ones to constitute MCOT 1.0 after thorough crosschecking. MCOT 1.0 has 18,089 genes encoding 31,666 proteins: 32.8% match OGS 2.0 models perfectly or near perfectly, 11,747 differ considerably, and 29.5% are absent in OGS 2.0. Future automation of this process is anticipated to greatly reduce human efforts in generating comprehensive, reliable models of structural genes in other genome projects where extensive RNA-Seq data are available.  相似文献   

5.
Accurate cDNA data is useful to validate gene structures in a genome. We sequenced 35 189 expressed sequence tags (ESTs) obtained from the highly destructive rice blast fungus, Magnaporthe grisea. Our custom-made computational programs mapped these ESTs on the M. grisea genome sequence, and reconstructed gene structures as well as protein-coding regions. As a result, we predicted 4480 protein-coding sequences, which were more accurate than ab initio predictions. Moreover, cross-species comparisons suggested that our predicted proteins were nearly complete. The cDNA clones obtained in this study will be important for further experimental studies. Our genome annotation is available at http://www.mg.dna.affrc.go.jp/.  相似文献   

6.
In bacteriophage λ, the overlapping open reading frames G and T are expressed by a programmed translational frameshift similar to that of the gag-pol genes of many retroviruses to produce the proteins gpG and gpGT. An analogous frameshift is widely conserved among other dsDNA tailed phages in their corresponding “G” and “GT” tail genes even in the absence of detectable sequence homology. The longer protein gpGT is known to be essential for tail assembly, but the requirement for the shorter gpG remained unclear because mutations in gene G affect both proteins. A plasmid system that can direct the efficient synthesis of tails was created and used to show that gpG and gpGT are both essential for correct tail assembly. Phage complementation assays under conditions where levels of plasmid-expressed gpG or gpGT could be altered independently revealed that the correct molar ratio of these two related proteins, normally determined by the efficiency of the frameshift, is also crucial for efficient assembly of functional tails. Finally, the physical connection between the G and T domains of gpGT, a consequence of the frameshift mechanism of protein expression, appears to be important for efficient tail assembly.  相似文献   

7.
This study investigates the role of translational coupling in the expression and function of DrrA and DrrB proteins, which form an efflux pump for the export of anticancer drugs doxorubicin and daunorubicin in the producer organism Streptomyces peucetius. Interest in studying the role of translational coupling came from the initial observation that DrrA and DrrB proteins confer doxorubicin resistance only when they are expressed in cis. Because of the presence of overlapping stop and start codons in the intergenic region between drrA and drrB, it has been assumed that the translation of drrB is coupled to the translation of the upstream gene drrA even though direct evidence for coupling has been lacking. In this study, we show that the expression of drrB is indeed coupled to translation of drrA. We also show that the introduction of non-coding sequences between the stop codon of drrA and the start of drrB prevents formation of a functional complex, although both proteins are still produced at normal levels, thus suggesting that translational coupling also plays a crucial role in proper assembly. Interestingly, replacement of drrA with an unrelated gene was found to result in very high drrB expression, which becomes severely growth inhibitory. This indicates that an additional mechanism within drrA may optimize expression of drrB. Based on the observations reported here, it is proposed that the production and assembly of DrrA and DrrB are tightly linked. Furthermore, we propose that the key to assembly of the DrrAB complex lies in co-folding of the two proteins, which requires that the genes be maintained in cis in a translationally coupled manner.  相似文献   

8.
The PROSITE database, its status in 2002   总被引:37,自引:2,他引:35       下载免费PDF全文
PROSITE [Bairoch and Bucher (1994) Nucleic Acids Res., 22, 3583–3589; Hofmann et al. (1999) Nucleic Acids Res., 27, 215–219] is a method of identifying the functions of uncharacterized proteins translated from genomic or cDNA sequences. The PROSITE database (http://www.expasy.org/prosite/) consists of biologically significant patterns and profiles designed in such a way that with appropriate computational tools it can rapidly and reliably help to determine to which known family of proteins (if any) a new sequence belongs, or which known domain(s) it contains.  相似文献   

9.
10.
11.
Abstract

Molecular biology, genomics and proteomics methods have been utilized to reveal a non-annotated class of endogenous polypeptides (small proteins and peptides) encoded by short open reading frames (sORFs), or small open reading frames (smORFs). We refer to these polypeptides as s(m)ORF-encoded polypeptides or SEPs. The early SEPs were identified via genetic screens, and many of the RNAs that contain s(m)ORFs were originally considered to be non-coding; however, elegant work in bacteria and flies demonstrated that these s(m)ORFs code for functional polypeptides as small as 11-amino acids in length. The discovery of these initial SEPs led to search for these molecules using methods such as ribosome profiling and proteomics, which have revealed the existence of many SEPs, including novel human SEPs. Unlike screens, omics methods do not necessarily link a SEP to a cellular or biological function, but functional genomic and proteomic strategies have demonstrated that at least some of these newly discovered SEPs have biochemical and cellular functions. Here, we provide an overview of these results and discuss the future directions in this emerging field.  相似文献   

12.
小开放阅读框(small open reading frame,sORF)一般指基因组中能够编码长度在100个氨基酸左右或以内短肽的开放阅读框。它们广泛存在于植物基因组,却因编码短肽而常被基因组注释忽视。随着翻译组学和蛋白质组学测序技术的发展,具有翻译活性的sORF被证实广泛存在于植物基因组,且参与植物生长发育等重要过程的调控。该文归纳了近些年来植物领域sORF的一些研究进展,主要包括sORF的来源与分类、信息学预测方法和生物学功能等,并基于此对植物sORF未来的研究方向进行了展望。  相似文献   

13.
Nucleic acid polymers selected from random sequence space constitute an enormous array of catalytic, diagnostic and therapeutic molecules. Despite the fact that proteins are robust polymers with far greater chemical and physical diversity, success in unlocking protein sequence space remains elusive. We have devised a combinatorial strategy for accessing nucleic acid sequence space corresponding to proteins comprising selected amino acid alphabets. Using the SynthOMIC approach (synthesis of ORFs by multimerizing in-frame codons), representative libraries comprising four amino acid alphabets were fused in-frame to the lambda repressor DNA-binding domain to provide an in vivo selection for self-interacting proteins that re-constitute lambda repressor function. The frequency of self-interactors as a function of amino acid composition ranged over five orders of magnitude, from ∼6% of clones in a library comprising the amino acid residues LARE to ∼0.6 in 106 in the MASH library. Sequence motifs were evident by inspection in many cases, and individual clones from each library presented substantial sequence identity with translated proteins by BLAST analysis. We posit that the SynthOMIC approach represents a powerful strategy for creating combinatorial libraries of open reading frames that distils protein sequence space on the basis of three inherent properties: it supports the use of selected amino acid alphabets, eliminates redundant sequences and locally constrains amino acids.  相似文献   

14.
15.
Though being able to encode various kinds of bioactive peptides, small open reading frames (sORFs) are poorly annotated in many genomic data. The present study was conducted to evaluate the potential of sORFs in encoding antimicrobial peptides (AMPs) in the basal chordate model Ciona intestinalis. About 4.8 m genomic sequence was first retrieved for sORFs mining by the program sORFfinder, then the sORFs were translated into amino acid sequences for AMP prediction via CAMP server, and thereafter, ten putative AMPs were selected for expression and antimicrobial activity validation. In total, over 180 peptides deduced from the sORFs were predicted to be AMPs. Among the ten tested peptides, six were found to have significant expressed sequence tag matches, providing strong evidence for gene expression; five were proved to be active against the bacterial strains. These results indicate that many sORFs in C. intestinalis genome contain AMP information. This work can serve as an important initial step to investigate the role of sORFs in the innate defense of C. intestinalis. Copyright © 2013 European Peptide Society and John Wiley & Sons, Ltd.  相似文献   

16.
17.
The kinetics of accumulation of RNA labeled with uridine and the time course of change in the specific activity of the UTP pool were used to estimate the rate constants for synthesis and decay of RNA synthesized in unfertilized eggs of the sea urchin Lytechinus pictus. The rate of synthesis per haploid genome is similar to that in embryos. Most of the RNA is turning over with a half-life of about 5 hr, and an average of 11 pg of newly synthesized RNA accumulates at steady state. About 3.7% of the RNA in the polysomes of the egg is newly synthesized and this RNA has the heterogeneous size distribution expected for mRNA. Thus most, probably all, of the mRNA translated in the egg is also synthesized in the egg. Little, if any, of the RNA synthesized in the egg enters polysomes following fertilization. Thus the egg synthesizes a population of mRNA which is unstable and translated, but it also contains a more stable, untranslated population of previously synthesized, stored mRNA, which is translated only after fertilization. Since the two populations of mRNA code for the same abundant proteins (Brandhorst, B. P. (1976). Develop. Biol., 52, 310–317), there is a temporal separation in the metabolism and function of coexisting mRNA molecules of identical coding sequence. Among the mRNAs synthesized and translated in the egg are histone mRNAs having the same electrophoretic mobilities and rates of synthesis per genome as those synthesized in rapidly cleaving embryos. Thus the synthesis, entry into the cytoplasm, and translation of histone mRNA are not restricted to the S phase of the cell cycle or the period of cell division.  相似文献   

18.
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an ‘RNA-first’ or ‘ORF-first’ pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations.  相似文献   

19.
Translating ribosomes often stall during elongation. The stalled ribosomes are known to be recycled by tmRNA (SsrA)-mediated trans-translation. Another process that recycles the stalled ribosomes is characterized by peptidyl-tRNA release. However, the mechanism of peptidyl-tRNA release from the stalled ribosomes is not well understood. We used a defined system of an AGA-minigene containing a small open reading frame (ATG AGA AGA). Translation of the AGA-minigene mRNA is toxic to Escherichia coli because it stalls ribosomes during elongation and sequesters tRNAArg4 as a short-chain peptidyl-tRNAArg4 in the ribosomal P-site. We show that a ribosome recycling factor (RRF)-mediated process rescues the host from the AGA-minigene toxicity by releasing the peptidyl-tRNAArg4 from the ribosomes. The growth phenotypes of E. coli strains harboring mutant alleles of RRF and initiation factor 3 (IF3) genes and their consequences on λimmP22 phage replication upon AGA-minigene expression reveal that IF3 facilitates the RRF-mediated processing of the stalled ribosomes. Additionally, we have designed a uracil DNA glycosylase gene construct, ung-stopless, whose expression is toxic to E. coli. We show that the RRF-mediated process also alleviates the ung-stopless construct-mediated toxicity to the host by releasing the ung mRNA from the ribosomes harboring long-chain peptidyl-tRNAs.  相似文献   

20.
EXProt is a non-redundant protein database containing a selection of entries from genome annotation projects and public databases, aimed at including only proteins with an experimentally verified function. In EXProt release 2.0 we have collected entries from the Pseudomonas aeruginosa community annotation project (PseudoCAP), the Escherichia coli genome and proteome database (GenProtEC) and the translated coding sequences from the Prokaryotes division of EMBL nucleotide sequence database, which are described as having an experimentally verified function. Each entry in EXProt has a unique ID number and contains information about the species, amino acid sequence, functional annotation and, in most cases, links to references in MEDLINE/PubMed and to the entry in the original database. EXProt is indexed in SRS at CMBI (http://www.cmbi.kun.nl/srs/) and can be searched with BLAST and FASTA through the EXProt web page (http://www.cmbi.kun.nl/EXProt/).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号