首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
uORFs, reinitiation and alternative translation start sites in human mRNAs   总被引:1,自引:0,他引:1  
It is known that eukaryotic ribosomes are able to translate small ORFs and reinitiate translation at downstream start codons. However, this mechanism is widely considered to be inefficient and it is not commonly taken into account. We compiled a sample of human mRNAs containing small upstream ORFs overlapping with annotated protein coding sequences. Statistical analysis supported the hypothesis on reinitiation of translation at downstream AUG codons and functional significance of potential alternative ORFs. It may be assumed that some 5'UTR-located upstream ORFs can deliver ribosomes to alternative translation starts, and they should be taken into consideration in the prediction of human mRNA coding potential.  相似文献   

3.
A frequently used approach for detecting potential coding regions is to search for stop codons. In the standard genetic code 3 out of 64 trinucleotides are stop codons. Hence, in random or non-coding DNA one can expect every 21st trinucleotide to have the same sequence as a stop codon. In contrast, the open reading frames (ORFs) of most protein-coding genes are considerably longer. Thus, the stop codon frequency in coding sequences deviates from the background frequency of the corresponding trinucleotides. This has been utilized for gene prediction, in particular, in detecting protein-coding ORFs. Traditional methods based on stop codon frequency are based on the assumption that the GC content is about 50%. However, many genomes show significant deviations from that value. With the presented method we can describe the effects of GC content on the selection of appropriate length thresholds of potentially coding ORFs. Conversely, for a given length threshold, we can calculate the probability of observing it in a random sequence. Thus, we can derive the maximum GC content for which ORF length is practicable as a feature for gene prediction methods and the resulting false positive rates. A rough estimate for an upper limit is a GC content of 80%. This estimate can be made more precise by including further parameters and by taking into account start codons as well. We demonstrate the feasibility of this method by applying it to the genomes of the bacteria Rickettsia prowazekii, Escherichia coli and Caulobacter crescentus, exemplifying the effect of GC content variations according to our predictions. We have adapted the method for predicting coding ORFs by stop codon frequency to the case of GC contents different from 50%. Usually, several methods for gene finding need to be combined. Thus, our results concern a specific part within a package of methods. Interestingly, for genomes with low GC content such as that of R. prowazekii, the presented method provides remarkably good results even when applied alone.  相似文献   

4.
5.
Emerging evidence places small proteins (≤50 amino acids) more centrally in physiological processes. Yet, their functional identification and the systematic genome annotation of their cognate small open-reading frames (smORFs) remains challenging both experimentally and computationally. Ribosome profiling or Ribo-Seq (that is a deep sequencing of ribosome-protected fragments) enables detecting of actively translated open-reading frames (ORFs) and empirical annotation of coding sequences (CDSs) using the in-register translation pattern that is characteristic for genuinely translating ribosomes. Multiple identifiers of ORFs that use the 3-nt periodicity in Ribo-Seq data sets have been successful in eukaryotic smORF annotation. They have difficulties evaluating prokaryotic genomes due to the unique architecture (e.g. polycistronic messages, overlapping ORFs, leaderless translation, non-canonical initiation etc.). Here, we present a new algorithm, smORFer, which performs with high accuracy in prokaryotic organisms in detecting putative smORFs. The unique feature of smORFer is that it uses an integrated approach and considers structural features of the genetic sequence along with in-frame translation and uses Fourier transform to convert these parameters into a measurable score to faithfully select smORFs. The algorithm is executed in a modular way, and dependent on the data available for a particular organism, different modules can be selected for smORF search.  相似文献   

6.
The complete sequence of the genome of an aerobic hyper-thermophiliccrenarchaeon, Aeropyrum pernix K1, which optimally grows at95°C, has been determined by the whole genome shotgun methodwith some modifications. The entire length of the genome was1,669,695 bp. The authenticity of the entire sequence was supportedby restriction analysis of long PCR products, which were directlyamplified from the genomic DNA. As the potential protein-codingregions, a total of 2,694 open reading frames (ORFs) were assigned.By similarity search against public databases, 633 (23.5%) ofthe ORFs were related to genes with putative function and 523(19.4%) to the sequences registered but with unknown function.All the genes in the TCA cycle except for that of alpha-ketoglutaratedehydrogenase were included, and instead of the alpha-ketoglutaratedehydrogenase gene, the genes coding for the two subunits of2-oxoacid:ferredoxin oxidoreductase were identified. The remaining1,538 ORFs (57.1%) did not show any significant similarity tothe sequences in the databases. Sequence comparison among theassigned ORFs suggested that a considerable member of ORFs weregenerated by sequence duplication. The RNA genes identifiedwere a single 16S–23S rRNA operon, two 5S rRNA genes and47 tRNA genes including 14 genes with intron structures. Allthe assigned ORFs and RNA coding regions occupied 89.12% ofthe whole genome. The data presented in this paper are availableon the internet homepage (http://www.mild.nite.go.jp).  相似文献   

7.
A fully mature mRNA is usually associated to a reference open reading frame encoding a single protein. Yet, mature mRNAs contain unconventional alternative open reading frames (AltORFs) located in untranslated regions (UTRs) or overlapping the reference ORFs (RefORFs) in non-canonical +2 and +3 reading frames. Although recent ribosome profiling and footprinting approaches have suggested the significant use of unconventional translation initiation sites in mammals, direct evidence of large-scale alternative protein expression at the proteome level is still lacking. To determine the contribution of alternative proteins to the human proteome, we generated a database of predicted human AltORFs revealing a new proteome mainly composed of small proteins with a median length of 57 amino acids, compared to 344 amino acids for the reference proteome. We experimentally detected a total of 1,259 alternative proteins by mass spectrometry analyses of human cell lines, tissues and fluids. In plasma and serum, alternative proteins represent up to 55% of the proteome and may be a potential unsuspected new source for biomarkers. We observed constitutive co-expression of RefORFs and AltORFs from endogenous genes and from transfected cDNAs, including tumor suppressor p53, and provide evidence that out-of-frame clones representing AltORFs are mistakenly rejected as false positive in cDNAs screening assays. Functional importance of alternative proteins is strongly supported by significant evolutionary conservation in vertebrates, invertebrates, and yeast. Our results imply that coding of multiple proteins in a single gene by the use of AltORFs may be a common feature in eukaryotes, and confirm that translation of unconventional ORFs generates an as yet unexplored proteome.  相似文献   

8.
9.
10.
11.
12.
Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode.  相似文献   

13.
J S Lopes  M Arenas  D Posada  M A Beaumont 《Heredity》2014,112(3):255-264
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.  相似文献   

14.
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.  相似文献   

15.
Understanding how complex networks of genes integrate to produce dividing cells is an important goal that is limited by the difficulty in defining the function of individual genes. Current resources for the systematic identification of gene function such as siRNA libraries and collections of deletion strains are costly and organism specific. We describe here integration profiling, a novel approach to identify the function of eukaryotic genes based upon dense maps of transposon integration. As a proof of concept, we used the transposon Hermes to generate a library of 360,513 insertions in the genome of Schizosaccharomyces pombe. On average, we obtained one insertion for every 29 bp of the genome. Hermes integrated more often into nucleosome free sites and 33% of the insertions occurred in ORFs. We found that ORFs with low integration densities successfully identified the genes that are essential for cell division. Importantly, the nonessential ORFs with intermediate levels of insertion correlated with the nonessential genes that have functions required for colonies to reach full size. This finding indicates that integration profiles can measure the contribution of nonessential genes to cell division. While integration profiling succeeded in identifying genes necessary for propagation, it also has the potential to identify genes important for many other functions such as DNA repair, stress response, and meiosis.  相似文献   

16.
Origin and properties of non-coding ORFs in the yeast genome.   总被引:4,自引:0,他引:4       下载免费PDF全文
In a recent paper we have estimated the total number of protein coding open reading frames (ORFs) in the Saccharomyces cerevisiae genome, based on their properties, at about 4800. This number is much smaller than the 5800-6000 which is widely accepted. In this paper we analyse differences between the set of ORFs with known phenotypes annotated in the Munich Information Centre for Protein Sequences (MIPS) database and ORFs for which the probability of coding, counted by us, is very low. We have found that many of the latter ORFs have properties of antisense sequences of coding ORFs, which suggests that they could have been generated by duplication of coding sequences. Since coding sequences generate ORFs inside themselves, with especially high frequency in the antisense sequences, we have looked for homology between known proteins and hypothetical polypeptides generated by ORFs under consideration in all the six phases. For many ORFs we have found paralogues and orthologues in phases different than the phase which had been assumed in the MIPS database as coding.  相似文献   

17.
18.
Modification of ribosomal RNA is ubiquitous among living organisms. Its functional role is well established for only a limited number of modified nucleotides. There are examples of rRNA modification involvement in the gene expression regulation in the cell. There is a need for large data set analysis in the search for potential functional partners for rRNA modification. In this study, we extracted phylogenetic profile, genome neighbourhood, co-expression and phenotype profile and co-purification data regarding Escherichia coli rRNA modification enzymes from public databases. Results were visualized as graphs using Cytoscape and analysed. Majority linked genes/proteins belong to translation apparatus. Among co-purification partners of rRNA modification enzymes are several candidates for experimental validation. Phylogenetic profiling revealed links of pseudouridine synthetases with RF2, RsmH with translation factors IF2, RF1 and LepA and RlmM with RdgC. Genome neighbourhood connections revealed several putative functionally linked genes, e.g. rlmH with genes coding for cell wall biosynthetic proteins and others. Comparative analysis of expression profiles (Gene Expression Omnibus) revealed two main associations, a group of genes expressed during fast growth and association of rrmJ with heat shock genes. This study might be used as a roadmap for further experimental verification of predicted functional interactions.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号