首页 | 本学科首页   官方微博 | 高级检索  
     


Population Structure With Localized Haplotype Clusters
Authors:Sharon R. Browning  Bruce S. Weir
Affiliation:*Department of Statistics, University of Auckland, Auckland 1142, New Zealand and Department of Biostatistics, University of Washington, Seattle, Washington 98195
Abstract:We propose a multilocus version of FST and a measure of haplotype diversity using localized haplotype clusters. Specifically, we use haplotype clusters identified with BEAGLE, which is a program implementing a hidden Markov model for localized haplotype clustering and performing several functions including inference of haplotype phase. We apply this methodology to HapMap phase 3 data. With this haplotype-cluster approach, African populations have highest diversity and lowest divergence from the ancestral population, East Asian populations have lowest diversity and highest divergence, and other populations (European, Indian, and Mexican) have intermediate levels of diversity and divergence. These relationships accord with expectation based on other studies and accepted models of human history. In contrast, the population-specific FST estimates obtained directly from single-nucleotide polymorphisms (SNPs) do not reflect such expected relationships. We show that ascertainment bias of SNPs has less impact on the proposed haplotype-cluster-based FST than on the SNP-based version, which provides a potential explanation for these results. Thus, these new measures of FST and haplotype-cluster diversity provide an important new tool for population genetic analysis of high-density SNP data.GENOME-WIDE data sets from worldwide panels of individuals provide an outstanding opportunity to investigate the genetic structure of human populations (Conrad et al. 2006; International Hapmap Consortium 2007; Jakobsson et al. 2008; Auton et al. 2009). Populations around the globe form a continuum rather than discrete units (Serre and Paabo 2004; Weiss and Long 2009). However, notions of discrete populations can be appropriate when, for example, ancestral populations were separated by geographic distance or barriers such that little gene flow occurred.FST (Wright 1951; Weir and Cockerham 1984; Holsinger and Weir 2009) is a measure of population divergence. It measures variation between populations vs. within populations. One can calculate a global measure, assuming that all populations are equally diverged from an ancestral population, or one can calculate FST for specific populations or for pairs of populations while utilizing data from all populations (Weir and Hill 2002). One use of FST is to test for signatures of selection (reviewed in Oleksyk et al. 2010).FST may be calculated for single genetic markers. For multiallelic markers, such as microsatellites, this is useful, but single-nucleotide polymorphisms (SNPs) contain much less information when taken one at a time, and thus it is advantageous to calculate averages over windows of markers (Weir et al. 2005) or even over the whole genome. The advantage of windowed FST is that it can be used to find regions of the genome that show different patterns of divergence, indicative of selective forces at work during human history.Another measure of human evolutionary history is haplotype diversity. Haplotype diversity may be measured using a count of the number of observed haplotypes in a region or by the expected haplotype heterozygosity based on haplotype frequencies in a region. Application of this regional measure to chromosomal data can be achieved by a haplotype block strategy (Patil et al. 2001) or by windowing (Conrad et al. 2006; Auton et al. 2009).One problem with the analysis of population structure based on genome-wide panels of SNPs is that a large proportion of the SNPs were ascertained in Caucasians, potentially biasing the results of the analyses. Analysis based on haplotypes is less susceptible to such bias (Conrad et al. 2006). This is because haplotypes can be represented by multiple patterns of SNPs; thus lack of ascertainment of a particular SNP does not usually prevent observation of the haplotype. On a chromosome-wide scale, one cannot directly use entire haplotypes, because all the haplotypes in the sample will almost certainly be unique, thus providing no information on population structure. Instead one can use haplotypes on a local basis, either by using windows of adjacent markers or by using localized haplotype clusters, for example those obtained from fastPHASE (Scheet and Stephens 2006) or BEAGLE (Browning 2006; Browning and Browning 2007a).Localized haplotype clusters are a clustering of haplotypes on a localized basis. At the position of each genetic marker, haplotypes are clustered according to their similarity in the vicinity of the position. Both fastPHASE and BEAGLE use hidden Markov modeling to perform the clustering, although the specific models used by the two programs differ.Localized haplotype clusters derived from fastPHASE have been used to investigate haplotype diversity, to create neighbor-joining trees of populations, and to create multidimensional scaling (MDS) plots (Jakobsson et al. 2008). It was found that haplotype clusters showed different patterns of diversity to SNPs, while the neighbor-joining and MDS plots were similar between haplotype clusters and SNPs.In this work, we apply windowed FST methods to localized haplotype clusters derived from the BEAGLE program (Browning and Browning 2007a,b, 2009). We consider population-average, population-specific, and pairwise FST estimates (Weir and Hill 2002). Population-average FST''s either assume that all the populations are equally diverged from a common ancestor, which is not realistic, or represent the average of a set of population-specific values. This can be convenient in that the results are summarized by a single statistic; however, information is lost. A common procedure is to calculate FST for each pair of populations, and these values reflect the degree of divergence between the two populations. Different levels of divergence are allowed for each pair of populations but each estimate uses data from only that pair of populations. On the other hand, population-specific FST''s allow unequal levels of divergence in a single analysis that makes use of all the data.We compare results from the localized haplotype clusters to those using SNPs directly. The results of applying localized haplotype clusters to population-specific FST estimation are very striking, showing better separation of populations and a more realistic pattern of divergence than for population-specific FST estimation using SNPs directly. We also use BEAGLE''s haplotype clusters in a haplotype diversity measure and investigate the relationship between this measure of haplotype-cluster diversity and the recombination rate.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号