期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs 总被引：1，自引：0，他引：1

Takeuchi F Yanai K Morii T Ishinaga Y Taniguchi-Yanai K Nagano S Kato N 《Genetics》2005,170(1):291-304

Single nucleotide polymorphisms (SNPs) have been proposed to be grouped into haplotype blocks harboring a limited number of haplotypes. Within each block, the portion of haplotypes is expected to be tagged by a selected subset of SNPs; however, none of the proposed selection algorithms have been definitive. To address this issue, we developed a tag SNP selection algorithm based on grouping of SNPs by the linkage disequilibrium (LD) coefficient r(2) and examined five genes in three ethnic populations--the Japanese, African Americans, and Caucasians. Additionally, we investigated ethnic diversity by characterizing 979 SNPs distributed throughout the genome. Our algorithm could spare 60% of SNPs required for genotyping and limit the imprecision in allele-frequency estimation of nontag SNPs to 2% on average. We discovered the presence of a mosaic pattern of LD plots within a conventionally inferred haplotype block. This emerged because multiple groups of SNPs with strong intragroup LD were mingled in their physical positions. The pattern of LD plots showed some similarity, but the details of tag SNPs were not entirely concordant among three populations. Consequently, our algorithm utilizing LD grouping allows selection of a more faithful set of tag SNPs than do previous algorithms utilizing haplotype blocks. 相似文献

2.

Haplotype‐based genotyping‐by‐sequencing in oat genome research

下载免费PDF全文

Wubishet A. Bekele Charlene P. Wight Shiaoman Chao Catherine J. Howarth Nicholas A. Tinker 《Plant biotechnology journal》2018,16(8):1452-1463

In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat. 相似文献

3.

Low concordance of short‐term and long‐term selection responses in experimental Drosophila populations

Anna Maria Langmüller Christian Schltterer 《Molecular ecology》2020,29(18):3466-3475

Experimental evolution is becoming a popular approach to study the genomic selection response of evolving populations. Computer simulation studies suggest that the accuracy of the signature increases with the duration of the experiment. Since some assumptions of the computer simulations may be violated, it is important to scrutinize the influence of the experimental duration with real data. Here, we use a highly replicated Evolve and Resequence study in Drosophila simulans to compare the selection targets inferred at different time points. At each time point, approximately the same number of SNPs deviates from neutral expectations, but only 10% of the selected haplotype blocks identified from the full data set can be detected after 20 generations. Those haplotype blocks that emerge already after 20 generations differ from the others by being strongly selected at the beginning of the experiment and display a more parallel selection response. Consistent with previous computer simulations, our results demonstrate that only Evolve and Resequence experiments with a sufficient number of generations can characterize complex adaptive architectures. 相似文献

4.

Pool‐hmm: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples

Simon Boitard Robert Kofler Pierre Françoise David Robelin Christian Schlötterer Andreas Futschik 《Molecular ecology resources》2013,13(2):337-340

Due to its cost effectiveness, next generation sequencing of pools of individuals (Pool‐Seq) is becoming a popular strategy for genome‐wide estimation of allele frequencies in population samples. As the allele frequency spectrum provides information about past episodes of selection, Pool‐seq is also a promising design for genomic scans for selection. However, no software tool has yet been developed for selection scans based on Pool‐Seq data. We introduce Pool‐hmm, a Python program for the estimation of allele frequencies and the detection of selective sweeps in a Pool‐Seq sample. Pool‐hmm includes several options that allow a flexible analysis of Pool‐Seq data, and can be run in parallel on several processors. Source code and documentation for Pool‐hmm is freely available at https://qgsp.jouy.inra.fr/ . 相似文献

5.

Performance of single nucleotide polymorphisms versus haplotypes for genome-wide association analysis in barley

Lorenz AJ Hamblin MT Jannink JL 《PloS one》2010,5(11):e14079

Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data. 相似文献

6.

Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP Maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs 总被引：13，自引：0，他引：13

下载免费PDF全文

Kamatani N Sekine A Kitamoto T Iida A Saito S Kogame A Inoue E Kawamoto M Harigai M Nakamura Y 《American journal of human genetics》2004,75(2):190-203

To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03. 相似文献

7.

Accounting for linkage disequilibrium in genome scans for selection without individual genotypes: The local score approach

下载免费PDF全文

María Inés Fariello Simon Boitard Sabine Mercier David Robelin Thomas Faraut Cécile Arnould Julien Recoquillay Olivier Bouchez Gérald Salin Patrice Dehais David Gourichon Sophie Leroux Frédérique Pitel Christine Leterrier Magali SanCristobal 《Molecular ecology》2017,26(14):3700-3714

Detecting genomic footprints of selection is an important step in the understanding of evolution. Accounting for linkage disequilibrium in genome scans increases detection power, but haplotype‐based methods require individual genotypes and are not applicable on pool‐sequenced samples. We propose to take advantage of the local score approach to account for linkage disequilibrium in genome scans for selection, cumulating (possibly small) signals from single markers over a genomic segment, to clearly pinpoint a selection signal. Using computer simulations, we demonstrate that this approach detects selection with higher power than several state‐of‐the‐art single‐marker, windowing or haplotype‐based approaches. We illustrate this on two benchmark data sets including individual genotypes, for which we obtain similar results with the local score and one haplotype‐based approach. Finally, we apply the local score approach to Pool‐Seq data obtained from a divergent selection experiment on behaviour in quail and obtain precise and biologically coherent selection signals: while competing methods fail to highlight any clear selection signature, our method detects several regions involving genes known to act on social responsiveness or autistic traits. Although we focus here on the detection of positive selection from multiple population data, the local score approach is general and can be applied to other genome scans for selection or other genomewide analyses such as GWAS. 相似文献

8.

Selection of haplotype variables from a high-density marker map for genomic prediction

Beatriz CD Cuyabano Guosheng Su Mogens S. Lund 《遗传、选种与进化》2015,47(1)

Background

Using haplotype blocks as predictors rather than individual single nucleotide polymorphisms (SNPs) may improve genomic predictions, since haplotypes are in stronger linkage disequilibrium with the quantitative trait loci than are individual SNPs. It has also been hypothesized that an appropriate selection of a subset of haplotype blocks can result in similar or better predictive ability than when using the whole set of haplotype blocks. This study investigated genomic prediction using a set of haplotype blocks that contained the SNPs with large effects estimated from an individual SNP prediction model. We analyzed protein yield, fertility and mastitis of Nordic Holstein cattle, and used high-density markers (about 770k SNPs). To reach an optimum number of haplotype variables for genomic prediction, predictions were performed using subsets of haplotype blocks that contained a range of 1000 to 50 000 main SNPs.

Results

The use of haplotype blocks improved the prediction reliabilities, even when selection focused on only a group of haplotype blocks. In this case, the use of haplotype blocks that contained the 20 000 to 50 000 SNPs with the highest effect was sufficient to outperform the model that used all individual SNPs as predictors (up to 1.3 % improvement in prediction reliability for mastitis, compared to individual SNP approach), and the achieved reliabilities were similar to those using all haplotype blocks available in the genome data (from 0.6 % lower to 0.8 % higher reliability).

Conclusions

Haplotype blocks used as predictors can improve the reliability of genomic prediction compared to the individual SNP model. Furthermore, the use of a subset of haplotype blocks that contains the main SNP effects from genomic data could be a feasible approach to genomic prediction in dairy cattle, given an increase in density of genotype data available. The predictive ability of the models that use a subset of haplotype blocks was similar to that obtained using either all haplotype blocks or all individual SNPs, with the benefit of having a much lower computational demand. 相似文献

9.

Haplotype block structure and its applications to association studies: power and study designs 总被引：21，自引：0，他引：21

下载免费PDF全文

Zhang K Calabrese P Nordborg M Sun F 《American journal of human genetics》2002,71(6):1386-1394

Recent studies have shown that the human genome has a haplotype block structure, such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of single-nucleotide polymorphisms (SNPs), referred to as "tag SNPs," can be used to distinguish a large fraction of the haplotypes. These tag SNPs can potentially be extremely useful for association studies, in that it may not be necessary to genotype all SNPs; however, this depends on how much power is lost. Here we develop a simulation study to quantitatively assess the power loss for a variety of study designs, including case-control designs and case-parental control designs. First, a number of data sets containing case-parental or case-control samples are generated on the basis of a disease model. Second, a small fraction of case and control individuals in each data set are genotyped at all the loci, and a dynamic programming algorithm is used to determine the haplotype blocks and the tag SNPs based on the genotypes of the sampled individuals. Third, the statistical power of tests was evaluated on the basis of three kinds of data: (1) all of the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, and (3) the same number of randomly chosen SNPs as the number of tag SNPs and the corresponding haplotypes. We study the power of different association tests with a variety of disease models and block-partitioning criteria. Our study indicates that the genotyping efforts can be significantly reduced by the tag SNPs, without much loss of power. Depending on the specific haplotype block-partitioning algorithm and the disease model, when the identified tag SNPs are only 25% of all the SNPs, the power is reduced by only 4%, on average, compared with a power loss of approximately 12% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. When the identified tag SNPs are approximately 14% of all the SNPs, the power is reduced by approximately 9%, compared with a power loss of approximately 21% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. Our study also indicates that haplotype-based analysis can be much more powerful than marker-by-marker analysis. 相似文献

10.

Inference of chromosomal inversion dynamics from Pool‐Seq data in natural and laboratory populations of Drosophila melanogaster

Martin Kapun Hester van Schalkwyk Bryant McAllister Thomas Flatt Christian Schlötterer 《Molecular ecology》2014,23(7):1813-1827

Sequencing of pools of individuals (Pool‐Seq) represents a reliable and cost‐effective approach for estimating genome‐wide SNP and transposable element insertion frequencies. However, Pool‐Seq does not provide direct information on haplotypes so that, for example, obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for seven cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool‐Seq data. We applied our novel marker set to Pool‐Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool‐Seq data from diverse D. melanogaster populations. 相似文献

11.

Bayesian logistic regression using a perfect phylogeny 总被引：1，自引：0，他引：1

Clark TG De Iorio M Griffiths RC 《Biostatistics (Oxford, England)》2007,8(1):32-52

Haplotype data capture the genetic variation among individuals in a population and among populations. An understanding of this variation and the ancestral history of haplotypes is important in genetic association studies of complex disease. We introduce a method for detecting associations between disease and haplotypes in a candidate gene region or candidate block with little or no recombination. A perfect phylogeny demonstrates the evolutionary relationship between single-nucleotide polymorphisms (SNPs) in the haplotype blocks. Our approach extends the logic regression technique of Ruczinski and others (2003) to a Bayesian framework, and constrains the model space to that of a perfect phylogeny. Environmental factors, as well as their interactions with SNPs, may be incorporated into the regression framework. We demonstrate our method on simulated data from a coalescent model, as well as data from a candidate gene study of sarcoidosis. 相似文献

12.

Shifting the paradigm in Evolve and Resequence studies: From analysis of single nucleotide polymorphisms to selected haplotype blocks

Neda Barghi Christian Schltterer 《Molecular ecology》2019,28(3):521-524

For almost a decade the combination of whole genome sequencing with experimental evolution (Evolve and Resequence, E&R; Turner, Stewart, Fields, Rice, & Tarone, 2011) has been used to study adaptation in outcrossing organisms. However, complications caused by inversions and hitchhiking variants have prevented this powerful approach from living up to its potential. In this issue of Molecular Ecology, Michalak, Kang, Schou, Garner, and Loeschke (2018), provide an important step ahead by using a population of Drosophila melanogaster devoid of segregating inversions to identify the genetic basis of resistance to five environmental stressors. They further address the challenge of hitchhiking variants by reconstructing selected haplotype blocks. While it is apparent that the haplotype block reconstruction needs further refinements, their work underpins the potential of E&R studies in Drosophila to address fundamental questions in evolutionary biology. 相似文献

13.

A double classification tree search algorithm for index SNP selection

Peisen?Zhang Email author Huitao?Sheng Ryuhei?Uehara 《BMC bioinformatics》2004,5(1):89

Background

In population-based studies, it is generally recognized that single nucleotide polymorphism (SNP) markers are not independent. Rather, they are carried by haplotypes, groups of SNPs that tend to be coinherited. It is thus possible to choose a much smaller number of SNPs to use as indices for identifying haplotypes or haplotype blocks in genetic association studies. We refer to these characteristic SNPs as index SNPs. In order to reduce costs and work, a minimum number of index SNPs that can distinguish all SNP and haplotype patterns should be chosen. Unfortunately, this is an NP-complete problem, requiring brute force algorithms that are not feasible for large data sets. 相似文献

14.

The variable number of tandem repeats element in DAT1 regulates in vitro dopamine transporter density

Sidney H VanNess Michael J Owens Clinton D Kilts 《BMC genetics》2005,6(1):1-11

Background

The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map.

Results

Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent.

Conclusion

For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results. 相似文献

15.

A shared 336 kb haplotype associated with the belt pattern in three divergent cattle breeds

C. Drögemüller S. Demmel M. Engensteiner S. Rieder T. Leeb 《Animal genetics》2010,41(3):304-307

相似文献

16.

The effect of single-nucleotide polymorphism marker selection on patterns of haplotype blocks and haplotype frequency estimates

下载免费PDF全文

Nothnagel M Rohde K 《American journal of human genetics》2005,77(6):988-998

The definition of haplotype blocks of single-nucleotide polymorphisms (SNPs) has been proposed so that the haplotypes can be used as markers in association studies and to efficiently describe human genetic variation. The International Haplotype Map (HapMap) project to construct a comprehensive catalog of haplotypic variation in humans is underway. However, a number of factors have already been shown to influence the definition of blocks, including the population studied and the sample SNP density. Here, we examine the effect that marker selection has on the definition of blocks and the pattern of haplotypes by using comparable but complementary SNP sets and a number of block definition methods in various genomic regions and populations that were provided by the Encyclopedia of DNA Elements (ENCODE) project. We find that the chosen SNP set has a profound effect on the block-covered sequence and block borders, even at high marker densities. Our results question the very concept of discrete haplotype blocks and the possibility of generalizing block findings from the HapMap project. We comparatively apply the block-free tagging-SNP approach and discuss both the haplotype approach and the tagging-SNP approach as means to efficiently catalog genetic variation. 相似文献

17.

Haplotype reconstruction from genotype data using Imperfect Phylogeny 总被引：13，自引：0，他引：13

Halperin E Eskin E 《Bioinformatics (Oxford, England)》2004,20(12):1842-1849

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/ 相似文献

18.

Exploring a Pool‐seq‐only approach for gaining population genomic insights in nonmodel species

Sara Kurland Christopher W. Wheat Maria de la Paz Celorio Mancera Verena E. Kutschera Jason Hill Anastasia Andersson Carl‐Johan Rubin Leif Andersson Nils Ryman Linda Laikre 《Ecology and evolution》2019,9(19):11448-11463

Developing genomic insights is challenging in nonmodel species for which resources are often scarce and prohibitively costly. Here, we explore the potential of a recently established approach using Pool‐seq data to generate a de novo genome assembly for mining exons, upon which Pool‐seq data are used to estimate population divergence and diversity. We do this for two pairs of sympatric populations of brown trout (Salmo trutta): one naturally sympatric set of populations and another pair of populations introduced to a common environment. We validate our approach by comparing the results to those from markers previously used to describe the populations (allozymes and individual‐based single nucleotide polymorphisms [SNPs]) and from mapping the Pool‐seq data to a reference genome of the closely related Atlantic salmon (Salmo salar). We find that genomic differentiation (F_ST) between the two introduced populations exceeds that of the naturally sympatric populations (F_ST = 0.13 and 0.03 between the introduced and the naturally sympatric populations, respectively), in concordance with estimates from the previously used SNPs. The same level of population divergence is found for the two genome assemblies, but estimates of average nucleotide diversity differ ( ≈ 0.002 and ≈ 0.001 when mapping to S. trutta and S. salar, respectively), although the relationships between population values are largely consistent. This discrepancy might be attributed to biases when mapping to a haploid condensed assembly made of highly fragmented read data compared to using a high‐quality reference assembly from a divergent species. We conclude that the Pool‐seq‐only approach can be suitable for detecting and quantifying genome‐wide population differentiation, and for comparing genomic diversity in populations of nonmodel species where reference genomes are lacking. 相似文献

19.

HAPLOPOOL: improving haplotype frequency estimation through DNA pools and phylogenetic modeling

Kirkpatrick B Armendariz CS Karp RM Halperin E 《Bioinformatics (Oxford, England)》2007,23(22):3048-3055

MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs). 相似文献

20.

The impact of missing and erroneous genotypes on tagging SNP selection and power of subsequent association tests

Liu W Zhao W Chase GA 《Human heredity》2006,61(1):31-44

OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests. 相似文献