期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring

Usuka J Brendel V 《Journal of molecular biology》2000,297(5):1075-1085

Gene identification in genomic DNA from eukaryotes is complicated by the vast combinatorial possibilities of potential exon assemblies. If the gene encodes a protein that is closely related to known proteins, gene identification is aided by matching similarity of potential translation products to those target proteins. The genomic DNA and protein sequences can be aligned directly by scoring the implied residues of in-frame nucleotide triplets against the protein residues in conventional ways, while allowing for long gaps in the alignment corresponding to introns in the genomic DNA. We describe a novel method for such spliced alignment. The method derives an optimal alignment based on scoring for both sequence similarity of the predicted gene product to the protein sequence and intrinsic splice site strength of the predicted introns. Application of the method to a representative set of 50 known genes from Arabidopsis thaliana showed significant improvement in prediction accuracy compared to previous spliced alignment methods. The method is also more accurate than ab initio gene prediction methods, provided sufficiently close target proteins are available. In view of the fast growth of public sequence repositories, we argue that close targets will be available for the majority of novel genes, making spliced alignment an excellent practical tool for high-throughput automated genome annotation. 相似文献

2.

Tiling Assembly: a new tool for reference annotation-independent transcript assembly and novel gene identification by RNA-sequencing

Kenneth A. Watanabe Arielle Homayouni Tara Tufano Jennifer Lopez Patricia Ringler Paul Rushton Qingxi J. Shen 《DNA research》2015,22(5):319-329

相似文献

3.

Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies

Florea L Souvorov A Kalbfleisch TS Salzberg SL 《PloS one》2011,6(6):e21400

Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome''s annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12–20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6–15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly. 相似文献

4.

Multi-omics network-based functional annotation of unknown Arabidopsis genes

Thomas Depuydt Klaas Vandepoele 《The Plant journal : for cell and molecular biology》2021,108(4):1193-1212

相似文献

5.

Functional annotation of the Arabidopsis genome using controlled vocabularies 总被引：1，自引：0，他引：1

Berardini TZ Mundodi S Reiser L Huala E Garcia-Hernandez M Zhang P Mueller LA Yoon J Doyle A Lander G Moseyko N Yoo D Xu I Zoeckler B Montoya M Miller N Weems D Rhee SY 《Plant physiology》2004,135(2):745-755

相似文献

6.

Improving the Annotation of Arabidopsis lyrata Using RNA-Seq Data

Vimal Rawat Ahmed Abdelsamad Bj?rn Pietzenuk Danelle K. Seymour Daniel Koenig Detlef Weigel Ales Pecinka Korbinian Schneeberger 《PloS one》2015,10(9)

相似文献

7.

Consistent over-estimation of gene number in complex plant genomes

Bennetzen JL Coleman C Liu R Ma J Ramakrishna W 《Current opinion in plant biology》2004,7(6):732-736

The first comprehensive comparison of gene content between higher plant species provided the unexpected conclusions that rice contained about twice as many genes as Arabidopsis, and that about half of the rice genes had no obvious homologs in any other organism. Our subsequent analyses indicate that most of these "extra, novel" rice genes are mis-annotated segments of transposable elements, especially retrotransposons. Aggressive annotation of a randomly selected subset of the rice genome suggests that the gene number is less than 40000. The five fantasies of automated plant gene discovery are described and a protocol is provided to minimize (or at least predict) the inaccuracy of future plant genome annotations. 相似文献

8.

Araport11: a complete reannotation of the Arabidopsis thaliana reference genome

下载免费PDF全文

Chia‐Yi Cheng Vivek Krishnakumar Agnes P. Chan Françoise Thibaud‐Nissen Seth Schobel Christopher D. Town 《The Plant journal : for cell and molecular biology》2017,89(4):789-804

相似文献

9.

人类和模式生物标准转录数据库Web服务系统"StdTransDb"的技术实现

王小磊赵东升李稚锋杭兴宜骆志刚张成岗《生物信息学》2007,5(4):163-166

以RefSeq数据库和已测序基因组序列为模板,通过大规模计算得到代表转录各层次信息的"标准转录数据库",并利用通用网关接口技术,建立了人类和模式生物标准转录数据集Web服务系统。用户提交RefSeq记录号或自由注释词,可检索获得序列的全部信息,实现对基因结构解析的在线计算。目前系统覆盖了人、拟南芥、水稻、大鼠、小鼠、斑马鱼等6个物种,拥有数据记录18万余条。为深入研究人类及其他物种转录组提供了重要工具,并为进一步分析真核基因的可变剪接方式提供了坚实的数据基础。相似文献

10.

New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis

Ivo Schliebner Rayko Becher Marcus Hempel Holger B Deising Ralf Horbach 《BMC genomics》2014,15(1)

相似文献

11.

Discover hidden splicing variations by mapping personal transcriptomes to personal genomes

Shayna Stein Zhi-xiang Lu Emad Bahrami-Samani Juw Won Park Yi Xing 《Nucleic acids research》2015,43(22):10612-10622

相似文献

12.

Large‐scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants

下载免费PDF全文

Ahmet Sureyya Rifaioglu Tunca Doğan Ömer Sinan Saraç Tulin Ersahin Rabie Saidi Mehmet Volkan Atalay Maria Jesus Martin Rengul Cetin‐Atalay 《Proteins》2018,86(2):135-151

相似文献

13.

Has the yo-yo stopped? An assessment of human protein-coding gene number

Southan C 《Proteomics》2004,4(6):1712-1726

相似文献

14.

Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus 总被引：4，自引：0，他引：4

Brendel V Xing L Zhu W 《Bioinformatics (Oxford, England)》2004,20(7):1157-1169

MOTIVATION: Accurate gene structure annotation is a challenging computational problem in genomics. The best results are achieved with spliced alignment of full-length cDNAs or multiple expressed sequence tags (ESTs) with sufficient overlap to cover the entire gene. For most species, cDNA and EST collections are far from comprehensive. We sought to overcome this bottleneck by exploring the possibility of using combined EST resources from fairly diverged species that still share a common gene space. Previous spliced alignment tools were found inadequate for this task because they rely on very high sequence similarity between the ESTs and the genomic DNA. RESULTS: We have developed a computer program, GeneSeqer, which is capable of aligning thousands of ESTs with a long genomic sequence in a reasonable amount of time. The algorithm is uniquely designed to tolerate a high percentage of mismatches and insertions or deletions in the EST relative to the genomic template. This feature allows use of non-cognate ESTs for gene structure prediction, including ESTs derived from duplicated genes and homologous genes from related species. The increased gene prediction sensitivity results in part from novel splice site prediction models that are also available as a stand-alone splice site prediction tool. We assessed GeneSeqer performance relative to a standard Arabidopsis thaliana gene set and demonstrate its utility for plant genome annotation. In particular, we propose that this method provides a timely tool for the annotation of the rice genome, using abundant ESTs from other cereals and plants. AVAILABILITY: The source code is available for download at http://bioinformatics.iastate.edu/bioinformatics2go/gs/download.html. Web servers for Arabidopsis and other plant species are accessible at http://www.plantgdb.org/cgi-bin/AtGeneSeqer.cgi and http://www.plantgdb.org/cgi-bin/GeneSeqer.cgi, respectively. For non-plant species, use http://bioinformatics.iastate.edu/cgi-bin/gs.cgi. The splice site prediction tool (SplicePredictor) is distributed with the GeneSeqer code. A SplicePredictor web server is available at http://bioinformatics.iastate.edu/cgi-bin/sp.cgi 相似文献

15.

CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts

Alison C Testa James K Hane Simon R Ellwood Richard P Oliver 《BMC genomics》2015,16(1)

相似文献

16.

ANEXdb: an integrated animal ANnotation and microarray EXpression database

Oliver Couture Keith Callenberg Neeraj Koul Sushain Pandit Remy Younes Zhi-Liang Hu Jack Dekkers James Reecy Vasant Honavar Christopher Tuggle 《Mammalian genome》2009,20(11-12):768-777

相似文献

17.

The institute for genomic research Osa1 rice genome annotation database 总被引：22，自引：0，他引：22

下载免费PDF全文

Yuan Q Ouyang S Wang A Zhu W Maiti R Lin H Hamilton J Haas B Sultana R Cheung F Wortman J Buell CR 《Plant physiology》2005,139(1):18-26

We have developed a rice (Oryza sativa) genome annotation database (Osa1) that provides structural and functional annotation for this emerging model species. Using the sequence of O. sativa subsp. japonica cv Nipponbare from the International Rice Genome Sequencing Project, pseudomolecules, or virtual contigs, of the 12 rice chromosomes were constructed. Our most recent release, version 3, represents our third build of the pseudomolecules and is composed of 98% finished sequence. Genes were identified using a series of computational methods developed for Arabidopsis (Arabidopsis thaliana) that were modified for use with the rice genome. In release 3 of our annotation, we identified 57,915 genes, of which 14,196 are related to transposable elements. Of these 43,719 non-transposable element-related genes, 18,545 (42.4%) were annotated with a putative function, 5,777 (13.2%) were annotated as encoding an expressed protein with no known function, and the remaining 19,397 (44.4%) were annotated as encoding a hypothetical protein. Multiple splice forms (5,873) were detected for 2,538 genes, resulting in a total of 61,250 gene models in the rice genome. We incorporated experimental evidence into 18,252 gene models to improve the quality of the structural annotation. A series of functional data types has been annotated for the rice genome that includes alignment with genetic markers, assignment of gene ontologies, identification of flanking sequence tags, alignment with homologs from related species, and syntenic mapping with other cereal species. All structural and functional annotation data are available through interactive search and display windows as well as through download of flat files. To integrate the data with other genome projects, the annotation data are available through a Distributed Annotation System and a Genome Browser. All data can be obtained through the project Web pages at http://rice.tigr.org. 相似文献

18.

Refined ab initio gene predictions of Heterorhabditis bacteriophora using RNA-seq

Jonathan Vadnal Olivia G. Granger Ramesh Ratnappan Ioannis Eleftherianos Damien M. O&#x;Halloran John M. Hawdon 《International journal for parasitology》2018,48(8):585-590

相似文献

19.

Plant Gene and Alternatively Spliced Variant Annotator. A plant genome annotation pipeline for rice gene and alternatively spliced variant identification with cross-species expressed sequence tag conservation from seven plant species 总被引：2，自引：0，他引：2

下载免费PDF全文

Chen FC Wang SS Chaw SM Huang YT Chuang TJ 《Plant physiology》2007,143(3):1086-1095

相似文献

20.

Genome Update of Botrytis cinerea Strains B05.10 and T4

Martijn Staats Jan A. L. van Kan 《Eukaryotic cell》2012,11(11):1413-1414

We report here an update of the Botrytis cinerea strains B05.10 and T4 genomes, as well as an automated preliminary gene structure annotation. High-coverage de novo assemblies and reference-based alignments led to a correction of wrong base calls, elimination of sequence gaps, and improved joining of contigs. The new assemblies have substantially lower numbers of scaffolds and a concomitant increase in the N₅₀.The list of protein-coding genes was generated using the evidence-driven gene predictor Augustus, with expressed sequence tag evidence and RNA-Seq data as input. 相似文献