首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Sridhar J  Rafi ZA 《Bioinformation》2008,2(7):284-295
One of the key challenges in computational genomics is annotating coding genes and identification of regulatory RNAs in complete genomes. An attempt is made in this study which uses the regulatory RNA locations and their conserved flanking genes identified within the genomic backbone of template genome to search for similar RNA locations in query genomes. The search is based on recently reported coexistence of small RNAs and their conserved flanking genes in related genomes. Based on our study, 54 additional sRNA locations and functions of 96 uncharacterized genes are predicted in two draft genomes viz., Serratia marcesens Db1 and Yersinia enterocolitica 8081. Although most of the identified additional small RNA regions and their corresponding flanking genes are homologous in nature, the proposed anchoring technique could successfully identify four non-homologous small RNA regions in Y. enterocolitica genome also. The KEGG Orthology (KO) based automated functional predictions confirms the predicted functions of 65 flanking genes having defined KO numbers, out of the total 96 predictions made by this method. This coexistence based method shows more sensitivity than controlled vocabularies in locating orthologous gene pairs even in the absence of defined Orthology numbers. All functional predictions made by this study in Y. enterocolitica 8081 were confirmed by the recently published complete genome sequence and annotations. This study also reports the possible regions of gene rearrangements in these two genomes and further characterization of such RNA regions could shed more light on their possible role in genome evolution.  相似文献   

13.
14.
15.
Proteins encoded by newly-emerged genes (‘orphan genes’) share no sequence similarity with proteins in any other species. They provide organisms with a reservoir of genetic elements to quickly respond to changing selection pressures. Here, we systematically assess the ability of five gene prediction pipelines to accurately predict genes in genomes according to phylostratal origin. BRAKER and MAKER are existing, popular ab initio tools that infer gene structures by machine learning. Direct Inference is an evidence-based pipeline we developed to predict gene structures from alignments of RNA-Seq data. The BIND pipeline integrates ab initio predictions of BRAKER and Direct inference; MIND combines Direct Inference and MAKER predictions. We use highly-curated Arabidopsis and yeast annotations as gold-standard benchmarks, and cross-validate in rice. Each pipeline under-predicts orphan genes (as few as 11 percent, under one prediction scenario). Increasing RNA-Seq diversity greatly improves prediction efficacy. The combined methods (BIND and MIND) yield best predictions overall, BIND identifying 68% of annotated orphan genes, 99% of ancient genes, and give the highest sensitivity score regardless dataset in Arabidopsis. We provide a light weight, flexible, reproducible, and well-documented solution to improve gene prediction.  相似文献   

16.
17.
18.
We previously reported close physical linkage between Pax9 and Nkx2-9 in the human, mouse, and pufferfish (Fugu rubripes) genomes. In this study, we analyzed cis-regulatory elements of the two genes by comparative sequencing in the three species and by transgenesis in the mouse. We identified two regions including conserved noncoding sequences that possessed specific enhancer activities for expression of Pax9 in the medial nasal process and of Nkx2-9 in the ventral neural tube. Remarkably, the latter contained the consensus Gli-binding motif. Interestingly, the identified Pax9 cis-regulatory sequences were located in an intron of the neighboring gene Slc25a21. Close examination of an extended genomic interval around Pax9 revealed the presence of strong synteny conservation in the human, mouse, and Fugu genomes. We propose such an intersecting organization of cis-regulatory sequences in multigenic regions as a possible mechanism that maintains evolutionary conserved synteny.  相似文献   

19.
20.
The first sequenced plant genome, from the small mustard plant Arabidopsis thaliana, was published at the end of 2000. The sequencing of the rice genome is well under way. The sizes of plant genomes vary by a factor of up to 1000, and many important crop plants have genomes that are several times larger than the human genome. To gain insight into the gene toolbox of plant species, numerous large-scale EST sequencing projects have been launched successfully, and analysis procedures are constantly being refined to add maximum value to the sequence data. In addition, an alternative approach to exclude repetitive noncoding DNA and to enrich sequence libraries for gene-containing genomic regions has been developed. This strategy has the potential to deliver information about both genes and regulatory regions outside the transcribed regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号