Inference of haplotypes from samples of diploid populations: complexity and algorithms.期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Inference of haplotypes from samples of diploid populations: complexity and algorithms.

Authors:	D Gusfield

Institution:	Department of Computer Science, University of California, Davis, CA, USA. gusfield@cs.ucdavis.edu

Abstract:	The next phase of human genomics will involve large-scale screens of populations for significant DNA polymorphisms, notably single nucleotide polymorphisms (SNPs). Dense human SNP maps are currently under construction. However, the utility of those maps and screens will be limited by the fact that humans are diploid and it is presently difficult to get separate data on the two "copies." Hence, genotype (blended) SNP data will be collected, and the desired haplotype (partitioned) data must then be (partially) inferred. A particular nondeterministic inference algorithm was proposed and studied by Clark (1990) and extensively used by Clark et al. (1998). In this paper, we more closely examine that inference method and the question of whether we can obtain an efficient, deterministic variant to optimize the obtained inferences. We show that the problem is NP-hard and, in fact, Max-SNP complete; that the reduction creates problem instances conforming to a severe restriction believed to hold in real data (Clark, 1990); and that even if we first use a natural exponential-time operation, the remaining optimization problem is NP-hard. However, we also develop, implement, and test an approach based on that operation and (integer) linear programming. The approach works quickly and correctly on simulated data.

Keywords:

设为首页 | 免责声明 | 关于勤云 | 加入收藏