首页 | 本学科首页   官方微博 | 高级检索  
   检索      


JESAM: CORBA software components to create and publish EST alignments and clusters
Authors:Parsons J D  Rodriguez-Tomé P
Institution:EMBL-Outstation, The European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. jparsons@ebi.ac.uk
Abstract:MOTIVATION: Expressed Sequence Tags (ESTs) are cheap, easy and quick to obtain relative to full genomic sequencing and currently sample more eukaryotic genes than any other data source. They are particularly useful for developing Sequence Tag Sites (STSs for mapping), polymorphism discovery, disease gene hunting, mass spectrometer proteomics, and most ironically for finding genes and predicting gene structure after the great effort of genomic sequencing. However, ESTs have many problems and the public EST databases contain all the errors and high redundancy intrinsic to the submitted data so it is often found that derived database views, which reduce both errors and redundancy, are more effective starting points for research than the original raw submissions. Existing derived views such as EST cluster databases and consensus databases have never published supporting evidence or intermediary results leading to difficulties trusting, correcting, and customizing the final published database. These difficulties have lead many groups to wastefully repeat the complex intermediary work of others in order to offer slightly different final views. A better approach might be to discover the most expensive common calculations used by all the approaches and then publish all intermediary results. Given a globally accessible database with a suitable component interface, like the JESAM software described in this paper, the creation of customized EST-derived databases could be achieved with minimum effort. RESULTS: Databases of EST and full-length mRNA sequences for four model organisms have been self-compared by searching for overlaps consistent with contiguity. The sequence comparisons are performed in parallel using a PVM process farm and previous results are stored to allow incremental updates with minimal effort. The overlap databases have been published with CORBA interfaces to enable flexible global access as demonstrated by example Java applet browsers. Simple cDNA supercluster databases built as alignment database clients are themselves published via CORBA interfaces browsable with prototypical applets. A comparison with UniGene Mouse and Rat databases revealed undesirable features in both and the advantages of contrasting perspectives on complex data. AVAILABILITY: The software is packaged as two Jar files available from: URL: http://corba.ebi.ac.uk/EST/jesam/jesam. html. One jar contains all the Java source code, and the other contains all the C, C++ and IDL code. Links to working examples of the alignment and cluster viewers (if remote firewall permits) can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号