首页 | 本学科首页   官方微博 | 高级检索  
   检索      


The performances of the chi-square test and complexity measures for signal recognition in biological sequences
Authors:Pirhaji Leila  Kargar Mehdi  Sheari Armita  Poormohammadi Hadi  Sadeghi Mehdi  Pezeshk Hamid  Eslahchi Changiz
Institution:a Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran
b Computer Engineering Department, Sharif University of Technology, Tehran, Iran
c Bioinformatics Group, School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran
d National Institute of Genetic Engineering and Biotechnology, Tehran-Karaj Highway, Tehran, Iran
e Faculty of Mathematics, Shahid-Beheshti University, Tehran, Iran
f Center of Excellence in Biomathematics, School of Mathematics, Statistics and Computer Science, College of Science, University of Tehran, Tehran, Iran
Abstract:With large amounts of experimental data, modern molecular biology needs appropriate methods to deal with biological sequences. In this work, we apply a statistical method (Pearson's chi-square test) to recognize the signals appear in the whole genome of the Escherichia coli. To show the effectiveness of the method, we compare the Pearson's chi-square test with linguistic complexity on the complete genome of E. coli. The results suggest that Pearson's chi-square test is an efficient method for distinguishing genes (coding regions) form pseudogenes (noncoding regions). On the other hand, the performance of the linguistic complexity is much lower than the chi-square test method. We also use the Pearson's chi-square test method to determine which parts of the Open Reading Frame (ORF) have significant effect on discriminating genes form pseudogenes. Moreover, different complexity measures and Pearson's chi-square test applied on the genes with high value of Pearson's chi-square statistic. We also compute the measures on homologous of these genes. The results illustrate that there is a region near the start codon with high value of chi-square statistic and low complexity that is conserve between homologous genes.
Keywords:Low complexity zone  Linguistic complexity  Open Reading Frame
本文献已被 ScienceDirect PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号