首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A protein alignment scoring system sensitive at all evolutionary distances
Authors:Stephen F Altschul
Institution:(1) National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 20894 Bethesda, MD, USA
Abstract:Summary Protein sequence alignments generally are constructed with the aid of a ldquosubstitution matrixrdquo that specifies a score for aligning each pair of amino acids. Assuming a simple random protein model, it can be shown that any such matrix, when used for evaluating variable-length local alignments, is implicitly a ldquolog-oddsrdquo matrix, with a specific probability distribution for amino acid pairs to which it is uniquely tailored. Given a model of protein evolution from which such distributions may be derived, a substitution matrix adapted to detecting relationships at any chosen evolutionary distance can be constructed. Because in a database search it generally is not known a priori what evolutionary distances will characterize the similarities found, it is necessary to employ an appropriate range of matrices in order not to overlook potential homologies. This paper formalizes this concept by defining a scoring system that is sensitive at all detectable evolutionary distances. The statistical behavior of this scoring system is analyzed, and it is shown that for a typical protein database search, estimating the originally unknown evolutionary distance appropriate to each alignment costs slightly over two bits of information, or somewhat less than a factor of five in statistical significance. A much greater cost may be incurred, however, if only a single substitution matrix, corresponding to the wrong evolutionary distance, is employed.
Keywords:Homology  Sequence comparison  Statistical significance  Alignment algorithms  Pattern recognition
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号