Predicting coding potential from genome sequence: application to betaherpesviruses infecting rats and mice |
| |
Authors: | Brocchieri Luciano Kledal Thomas N Karlin Samuel Mocarski Edward S |
| |
Affiliation: | Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA. luciano@stanford.edu |
| |
Abstract: | Prediction of protein-coding regions and other features of primary DNA sequence have greatly contributed to experimental biology. Significant challenges remain in genome annotation methods, including the identification of small or overlapping genes and the assessment of mRNA splicing or unconventional translation signals in expression. We have employed a combined analysis of compositional biases and conservation together with frame-specific G+C representation to reevaluate and annotate the genome sequences of mouse and rat cytomegaloviruses. Our analysis predicts that there are at least 34 protein-coding regions in these genomes that were not apparent in earlier annotation efforts. These include 17 single-exon genes, three new exons of previously identified genes, a newly identified four-exon gene for a lectin-like protein (in rat cytomegalovirus), and 10 probable frameshift extensions of previously annotated genes. This expanded set of candidate genes provides an additional basis for investigation in cytomegalovirus biology and pathogenesis. |
| |
Keywords: | |
本文献已被 PubMed 等数据库收录! |
|