首页 | 本学科首页   官方微博 | 高级检索  
     


PromFD 1.0: a computer program that predicts eukaryotic pol II promoters using strings and IMD matrices
Authors:Chen, Qing K.   Hertz, Gerald Z.   Stormo, Gary D.
Affiliation:Department of Molecular, Cellular, and Developmental Biology, University of Colorado Boulder, CO 80309–0347, USA
Abstract:Motivation: A large number of new DNA sequences with virtuallyunknown functions are generated as the Human Genome Projectprogresses. Therefore, it is essential to develop computer algorithmsthat can predict the functionality of DNA segments accordingto their primary sequences, including algorithms that can predictpromoters. Although several promoter-predicting algorithms areavailable, they have high false-positive detections and therate of promoter detection needs to be improved further. Results: In this research, PromFD, a computer program to recognizevertebrate RNA polymerase II promoters, has been developed.Both vertebrate promoters and non-promoter sequences are usedin the analysis. The promoters are obtained from the EukaryoticPromoter Database. Promoters are divided into a training setand a test set. Non-promoter sequences are obtained from theGenBank sequence databank, and are also divided into a trainingset and a test set. The first step is to search out, among allpossible permutations, patterns of strings 5–10 bp long,that are significantly over-represented in the promoter set.The program also searches IMD (Information Matrix Database)matrices that have a significantly higher presence in the promoterset. The results of the searches are stored in the PromFD database,and the program PromFD scores input DNA sequences accordingto their content of the database entries. PromFD predicts promoters—theirlocations and the location of potential TATA boxes, if found.The program can detect 71% of promoters in the training setwith a false-positive rate of under 1 in every 13 000 bp, and47% of promoters in the test set with a false-positive rateof under 1 in every 9800 bp. PromFD uses a new approach andits false-positive identification rate is better compared withother available promoter recognition algorithms. The sourcecode for PromFD is in the ‘c++’ language. Availability: PromFD is available for Unix platforms by anonymousftp to: beagle. colorado. edu, cd pub, get promFD.tar. A Javaversion of the program is also available for netscape 2.0, byhttp: // beagle.colorado.edu/~chenq. Contact: E-mail: chenq{at}beagle.colorado.edu
Keywords:
本文献已被 Oxford 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号