Tests for the statistical significance of protein sequence similarities in data-bank searches期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Tests for the statistical significance of protein sequence similarities in data-bank searches

Authors:	R F Mott T B Kirkwood R N Curnow

Affiliation:	Laboratory of Mathematical Biology, National Institute for Medical Research, UK.

Abstract:	A suite of tests to evaluate the statistical significance of protein sequence similarities is developed for use in data bank searches. The tests are based on the Wilbur-Lipman word-search algorithm, and take into account the sequence lengths and compositions, and optionally the weighting of amino acid matches. The method is extended to allow for the existence of a sequence insertion/deletion within the region of similarity. The accuracy of statistical distributions underlying the tests is validated using randomly generated sequences and real sequences selected at random from the data banks. A computer program to perform the tests is briefly described.

Keywords: