期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Accuracy of haplotype estimation in a region of low linkage disequilibrium

Avery CL Martin LJ Williams JT North KE 《BMC genetics》2005,6(Z1):S80

We compared the accuracy of haplotype inferences at a 6 Mb region on chromosome 7 where significant linkage between a brain oscillation phenotype and a cholinergic muscarinic receptor gene was previously reported. Individual haplotype assignments and haplotype frequencies were estimated using 5, 10, and 14 consecutive Illumina single-nucleotide polymorphisms (SNPs) within the 1-LOD unit support interval of the chromosome 7 linkage peak. Initially, haplotypes were constructed incorporating phase information provided by relatives using the pedigree analysis package MERLIN. Population-based haplotypes were inferred using the haplotype estimation software HAPLO.STATS and PHASE, using unrelated individuals. The 14 SNPs within this region exhibited markedly low linkage disequilibrium, and the average D' estimate between SNPs was 0.18 (range: 0.01-0.97). In comparison to the family-based haplotypes calculated in MERLIN, the computational inferences of individual haplotype assignments were most accurate when considering 5 consecutive SNPs, but decayed dramatically when considering 10 or 14 SNPs in both PHASE and HAPLO.STATS. When comparing the two haplotype inference methods, both PHASE and HAPLO.STATS performed poorly. These analyses underscore the difficulties of haplotype estimation in the presence of low linkage disequilibrium and stress the importance of careful consideration of confidence measures when using estimated haplotype frequencies and individual assignments in biomedical research. 相似文献

2.

Effectiveness of computational methods in haplotype prediction 总被引：11，自引：0，他引：11

Xu CF Lewis K Cantone KL Khan P Donnelly C White N Crocker N Boyd PR Zaykin DV Purvis IJ 《Human genetics》2002,110(2):148-156

Haplotype analysis has been used for narrowing down the location of disease-susceptibility genes and for investigating many population processes. Computational algorithms have been developed to estimate haplotype frequencies and to predict haplotype phases from genotype data for unrelated individuals. However, the accuracy of such computational methods needs to be evaluated before their applications can be advocated. We have experimentally determined the haplotypes at two loci, the N-acetyltransferase 2 gene ( NAT2, 850 bp, n=81) and a 140-kb region on chromosome X ( n=77), each consisting of five single nucleotide polymorphisms (SNPs). We empirically evaluated and compared the accuracy of the subtraction method, the expectation-maximization (EM) method, and the PHASE method in haplotype frequency estimation and in haplotype phase prediction. Where there was near complete linkage disequilibrium (LD) between SNPs (the NAT2 gene), all three methods provided effective and accurate estimates for haplotype frequencies and individual haplotype phases. For a genomic region in which marked LD was not maintained (the chromosome X locus), the computational methods were adequate in estimating overall haplotype frequencies. However, none of the methods was accurate in predicting individual haplotype phases. The EM and the PHASE methods provided better estimates for overall haplotype frequencies than the subtraction method for both genomic regions. 相似文献

3.

Haplotype inference in crossbred populations without pedigree information

Albart Coster Henri CM Heuven Rohan L Fernando Jack CM Dekkers 《遗传、选种与进化》2009,41(1):40

Background

Current methods for haplotype inference without pedigree information assume random mating populations. In animal and plant breeding, however, mating is often not random. A particular form of nonrandom mating occurs when parental individuals of opposite sex originate from distinct populations. In animal breeding this is called crossbreeding and hybridization in plant breeding. In these situations, association between marker and putative gene alleles might differ between the founding populations and origin of alleles should be accounted for in studies which estimate breeding values with marker data. The sequence of alleles from one parent constitutes one haplotype of an individual. Haplotypes thus reveal allele origin in data of crossbred individuals.

Results

We introduce a new method for haplotype inference without pedigree that allows nonrandom mating and that can use genotype data of the parental populations and of a crossbred population. The aim of the method is to estimate line origin of alleles. The method has a Bayesian set up with a Dirichlet Process as prior for the haplotypes in the two parental populations. The basic idea is that only a subset of the complete set of possible haplotypes is present in the population.

Conclusion

Line origin of approximately 95% of the alleles at heterozygous sites was assessed correctly in both simulated and real data. Comparing accuracy of haplotype frequencies inferred with the new algorithm to the accuracy of haplotype frequencies inferred with PHASE, an existing algorithm for haplotype inference, showed that the DP algorithm outperformed PHASE in situations of crossbreeding and that PHASE performed better in situations of random mating. 相似文献

4.

Haplotype reconstruction from genotype data using Imperfect Phylogeny 总被引：13，自引：0，他引：13

Halperin E Eskin E 《Bioinformatics (Oxford, England)》2004,20(12):1842-1849

Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual's haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes that shows that SNPs are organized in highly correlated 'blocks'. In a few recent studies, considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks, and for each block, we predict the common haplotypes and each individual's haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (<2% over the data) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared with previous methods such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large datasets. AVAILABILITY: The algorithm is available via a Web server at http://www.calit2.net/compbio/hap/ 相似文献

5.

An efficient haplotyping method with DNA pools 总被引：1，自引：1，他引：0

Inbar E Yakir B Darvasi A 《Nucleic acids research》2002,30(15):e76

Determination of haplotype frequencies (the joint distribution of genetic markers) in large population samples is a powerful tool for association studies. This is due to their greater extent of polymorphism since any two bi-allelic single nucleotide polymorphisms (SNPs) generate a potential four-allele genetic marker. Therefore, a haplotype may capture a given functional polymorphism with higher statistical power than its SNP components. The statistical estimation of haplotype frequencies, usually employed in linkage disequilibrium studies, requires individual genotyping for each SNP in the haplotype, thus making it an expensive process. In this study, we describe a new method for direct measurement of haplotype frequencies in DNA pools by allele-specific, long-range haplotype amplification. The proposed method allows the efficient determination of haplotypes composed of two SNPs in close vicinity (up to 20 kb). 相似文献

6.

HAPLOFREQ--estimating haplotype frequencies efficiently.

Eran Halperin Elad Hazan 《Journal of computational biology》2006,13(2):481-500

A commonly used tool in disease association studies is the search for discrepancies between the haplotype distribution in the case and control populations. In order to find this discrepancy, the haplotypes frequency in each of the populations is estimated from the genotypes. We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data or sequencing errors. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case, we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show that the maximum relaxed likelihood can be found in polynomial time and that the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the population. In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. We compared the performance of our algorithm to the widely used program PHASE, and we found that our estimates are at least 10% more accurate than PHASE and about ten times faster than PHASE. Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Particularly, they may be helpful in other maximum likelihood problems arising from survey sampling. 相似文献

7.

Performance of single nucleotide polymorphisms versus haplotypes for genome-wide association analysis in barley

Lorenz AJ Hamblin MT Jannink JL 《PloS one》2010,5(11):e14079

Genome-wide association studies (GWAS) may benefit from utilizing haplotype information for making marker-phenotype associations. Several rationales for grouping single nucleotide polymorphisms (SNPs) into haplotype blocks exist, but any advantage may depend on such factors as genetic architecture of traits, patterns of linkage disequilibrium in the study population, and marker density. The objective of this study was to explore the utility of haplotypes for GWAS in barley (Hordeum vulgare) to offer a first detailed look at this approach for identifying agronomically important genes in crops. To accomplish this, we used genotype and phenotype data from the Barley Coordinated Agricultural Project and constructed haplotypes using three different methods. Marker-trait associations were tested by the efficient mixed-model association algorithm (EMMA). When QTL were simulated using single SNPs dropped from the marker dataset, a simple sliding window performed as well or better than single SNPs or the more sophisticated methods of blocking SNPs into haplotypes. Moreover, the haplotype analyses performed better 1) when QTL were simulated as polymorphisms that arose subsequent to marker variants, and 2) in analysis of empirical heading date data. These results demonstrate that the information content of haplotypes is dependent on the particular mutational and recombinational history of the QTL and nearby markers. Analysis of the empirical data also confirmed our intuition that the distribution of QTL alleles in nature is often unlike the distribution of marker variants, and hence utilizing haplotype information could capture associations that would elude single SNPs. We recommend routine use of both single SNP and haplotype markers for GWAS to take advantage of the full information content of the genotype data. 相似文献

8.

A locus-wide approach to assessing variation in the avian MHC: the B-locus of the wild turkey

Chaves LD Faile GM Hendrickson JA Mock KE Reed KM 《Heredity》2011,107(1):40-49

Studies of major histocompatibility complex (MHC) diversity in non-model vertebrates typically focus on structure and sequence variation in the antigen-presenting loci: the highly variable and polymorphic class I and class IIB genes. Although these studies provide estimates of the number of genes and alleles/locus, they often overlook variation in functionally related and co-inherited genes important in the immune response. This study utilizes the sequence of the MHC B-locus derived from a commercial turkey to investigate MHC variation in wild birds. Sequences were obtained for nine interspersed MHC amplicons (non-class I/II) from each of 40 birds representing 3 subspecies of wild turkey (Meleagris gallopavo). Analysis of aligned sequences identified 238 single-nucleotide variants approximately one-third of which had minor allele frequencies >0.2 in the sampled birds. PHASE analysis identified 70 prospective MHC haplotypes in the wild turkeys, whereas a combined analysis with commercial birds identified almost 100 haplotypes in the species. Denaturing gradient gel electrophoresis (DGGE) of the class IIB loci was used to test the efficacy of single-nucleotide polymorphism (SNP) haplotyping to capture locus-wide variation. Diversity in SNP haplotypes and haplotype sharing among individuals was directly reflected in the DGGE patterns. Utilization of a reference haplotype to sequence interspersed regions of the MHC has significant advantages over other methods of surveying diversity while identifying high-frequency SNPs for genotyping. SNP haplotyping provides a means to identify both divergent haplotypes and homozygous individuals for assessment of immunological variation in wild and domestic populations. 相似文献

9.

Haplotype inference for present-absent genotype data using previously identified haplotypes and haplotype patterns 总被引：1，自引：0，他引：1

Yoo YJ Tang J Kaslow RA Zhang K 《Bioinformatics (Oxford, England)》2007,23(18):2399-2406

MOTIVATION: Killer immunoglobulin-like receptor (KIR) genes vary considerably in their presence or absence on a specific regional haplotype. Because presence or absence of these genes is largely detected using locus-specific genotyping technology, the distinction between homozygosity and hemizygosity is often ambiguous. The performance of methods for haplotype inference (e.g. PL-EM, PHASE) for KIR genes may be compromised due to the large portion of ambiguous data. At the same time, many haplotypes or partial haplotype patterns have been previously identified and can be incorporated to facilitate haplotype inference for unphased genotype data. To accommodate the increased ambiguity of present-absent genotyping of KIR genes, we developed a hybrid approach combining a greedy algorithm with the Expectation-Maximization (EM) method for haplotype inference based on previously identified haplotypes and haplotype patterns. RESULTS: We implemented this algorithm in a software package named HAPLO-IHP (Haplotype inference using identified haplotype patterns) and compared its performance with that of HAPLORE and PHASE on simulated KIR genotypes. We compared five measures in order to evaluate the reliability of haplotype assignments and the accuracy in estimating haplotype frequency. Our method outperformed the two existing techniques by all five measures when either 60% or 25% of previously identified haplotypes were incorporated into the analyses. AVAILABILITY: The HAPLO-IHP is available at http://www.soph.uab.edu/Statgenetics/People/KZhang/HAPLO-IHP/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. 相似文献

10.

HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination 总被引：4，自引：0，他引：4

Zhang K Sun F Zhao H 《Bioinformatics (Oxford, England)》2005,21(1):90-103

MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu. 相似文献

11.

Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP Maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs 总被引：13，自引：0，他引：13

下载免费PDF全文

Kamatani N Sekine A Kitamoto T Iida A Saito S Kogame A Inoue E Kawamoto M Harigai M Nakamura Y 《American journal of human genetics》2004,75(2):190-203

To optimize the strategies for population-based pharmacogenetic studies, we extensively analyzed single-nucleotide polymorphisms (SNPs) and haplotypes in 199 drug-related genes, through use of 4,190 SNPs in 752 control subjects. Drug-related genes, like other genes, have a haplotype-block structure, and a few haplotype-tagging SNPs (htSNPs) could represent most of the major haplotypes constructed with common SNPs in a block. Because our data included 860 uncommon (frequency <0.1) SNPs with frequencies that were accurately estimated, we analyzed the relationship between haplotypes and uncommon SNPs within the blocks (549 SNPs). We inferred haplotype frequencies through use of the data from all htSNPs and one of the uncommon SNPs within a block and calculated four joint probabilities for the haplotypes. We show that, irrespective of the minor-allele frequency of an uncommon SNP, the majority (mean +/- SD frequency 0.943+/-0.117) of the minor alleles were assigned to a single haplotype tagged by htSNPs if the uncommon SNP was within the block. These results support the hypothesis that recombinations occur only infrequently within blocks. The proportion of a single haplotype tagged by htSNPs to which the minor alleles of an uncommon SNP were assigned was positively correlated with the minor-allele frequency when the frequency was <0.03 (P<.000001; n=233 [Spearman's rank correlation coefficient]). The results of simulation studies suggested that haplotype analysis using htSNPs may be useful in the detection of uncommon SNPs associated with phenotypes if the frequencies of the SNPs are higher in affected than in control populations, the SNPs are within the blocks, and the frequencies of the SNPs are >0.03. 相似文献

12.

Systematic analysis of sequence variability of the endothelin-1 gene: a prerequisite for association studies

Diefenbach K Arjomand-Nahad F Meisel C Fietze I Stangl K Roots I Köpke K 《Genetic testing》2006,10(3):163-168

We analyzed allele frequencies and pairwise linkage disequilibria of 13 variants in the EDN1 gene of 298 young males, the majority of German ancestry. Our analysis comprises all common variants in the five exons and flanking intronic regions, as well as known polymorphisms in the promoter sequence. In addition to previously analyzed polymorphisms, our haplotype reconstruction included five recently described variants and was done by using three different algorithms to allow inference of result stability. More than 30 haplotypes were predicted. All haplotypes with frequencies > or = 1% were inferred by all three methods and can be described by seven haplotype tagging single-nucleotide polymorphisms (htSNPs), reducing the genotyping load to 65%. Three of these haplotypes with frequencies of about 11%, 9%, and 4% had been mistaken for one haplotype in the previous analysis, which included only six polymorphisms, some of them not being htSNPs. Systematic analysis of sequence variability and comprehensive haplotype analysis of the EDN1 gene determined a substantial part of its genetic variability for further association studies and helped to reduce the genotyping load for common phenotypes. 相似文献

13.

High-resolution SNP scan of chromosome 6p21 in pooled samples from patients with complex diseases 总被引：6，自引：0，他引：6

Herbon N Werner M Braig C Gohlke H Dütsch G Illig T Altmüller J Hampe J Lantermann A Schreiber S Bonifacio E Ziegler A Schwab S Wildenauer D van den Boom D Braun A Knapp M Reitmeir P Wjst M 《Genomics》2003,81(5):510-518

We apply a high-throughput protocol of chip-based mass spectrometry (matrix-assisted laser desorption/ionization time-of-flight; MALDI-TOF) as a method of screening for differences in single-nucleotide polymorphism (SNP) allele frequencies. Using pooled DNA from individuals with asthma, Crohn's disease (CD), schizophrenia, type 1 diabetes (T1D), and controls, we selected 534 SNPs from an initial set of 1435 SNPs spanning a 25-Mb region on chromosome 6p21. The standard deviations of measurements of time of flight at different dots, from different PCRs, and from different pools indicate reliable results on each analysis step. In 90% of the disease-control comparisons we found allelic differences of <10%. Of the T1D samples, which served as a positive control, 10 SNPs with significant differences were observed after taking into account multiple testing. Of these 10 SNPs, 5 are located between DQB1 and DRB1, confirming the known association with the DR3 and DR4 haplotypes whereas two additional SNPs also reproduced known associations of T1D with DOB and LTA. In the CD pool also, two earlier described associations were found with SNPs close to DRB1 and MICA. Additional associations were found in the schizophrenia and asthma pools. They should be confirmed in individual samples or can be used to develop further quality criteria for accepting true differences between pools. The determination of SNP allele frequencies in pooled DNA appears to be of value in assigning further genotyping priorities also in large linkage regions. 相似文献

14.

Using DNA pools for genotyping trios

Beckman KB Abel KJ Braun A Halperin E 《Nucleic acids research》2006,34(19):e129

The genotyping of mother–father–child trios is a very useful tool in disease association studies, as trios eliminate population stratification effects and increase the accuracy of haplotype inference. Unfortunately, the use of trios for association studies may reduce power, since it requires the genotyping of three individuals where only four independent haplotypes are involved. We describe here a method for genotyping a trio using two DNA pools, thus reducing the cost of genotyping trios to that of genotyping two individuals. Furthermore, we present extensions to the method that exploit the linkage disequilibrium structure to compensate for missing data and genotyping errors. We evaluated our method on trios from CEPH pedigree 66 of the Coriell Institute. We demonstrate that the error rates in the genotype calls of the proposed protocol are comparable to those of standard genotyping techniques, although the cost is reduced considerably. The approach described is generic and it can be applied to any genotyping platform that achieves a reasonable precision of allele frequency estimates from pools of two individuals. Using this approach, future trio-based association studies may be able to increase the sample size by 50% for the same cost and thereby increase the power to detect associations. 相似文献

15.

Haplotype block structure and its applications to association studies: power and study designs 总被引：21，自引：0，他引：21

下载免费PDF全文

Zhang K Calabrese P Nordborg M Sun F 《American journal of human genetics》2002,71(6):1386-1394

Recent studies have shown that the human genome has a haplotype block structure, such that it can be divided into discrete blocks of limited haplotype diversity. In each block, a small fraction of single-nucleotide polymorphisms (SNPs), referred to as "tag SNPs," can be used to distinguish a large fraction of the haplotypes. These tag SNPs can potentially be extremely useful for association studies, in that it may not be necessary to genotype all SNPs; however, this depends on how much power is lost. Here we develop a simulation study to quantitatively assess the power loss for a variety of study designs, including case-control designs and case-parental control designs. First, a number of data sets containing case-parental or case-control samples are generated on the basis of a disease model. Second, a small fraction of case and control individuals in each data set are genotyped at all the loci, and a dynamic programming algorithm is used to determine the haplotype blocks and the tag SNPs based on the genotypes of the sampled individuals. Third, the statistical power of tests was evaluated on the basis of three kinds of data: (1) all of the SNPs and the corresponding haplotypes, (2) the tag SNPs and the corresponding haplotypes, and (3) the same number of randomly chosen SNPs as the number of tag SNPs and the corresponding haplotypes. We study the power of different association tests with a variety of disease models and block-partitioning criteria. Our study indicates that the genotyping efforts can be significantly reduced by the tag SNPs, without much loss of power. Depending on the specific haplotype block-partitioning algorithm and the disease model, when the identified tag SNPs are only 25% of all the SNPs, the power is reduced by only 4%, on average, compared with a power loss of approximately 12% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. When the identified tag SNPs are approximately 14% of all the SNPs, the power is reduced by approximately 9%, compared with a power loss of approximately 21% when the same number of randomly chosen SNPs is used in a two-locus haplotype analysis. Our study also indicates that haplotype-based analysis can be much more powerful than marker-by-marker analysis. 相似文献

16.

A comparison of phasing algorithms for trios and unrelated individuals

下载免费PDF全文

Marchini J Cutler D Patterson N Stephens M Eskin E Halperin E Lin S Qin ZS Munro HM Abecasis GR Donnelly P;International HapMap Consortium 《American journal of human genetics》2006,78(3):437-450

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8. 相似文献

17.

Comparison of haplotype inference methods using genotypic data from unrelated individuals

Xu H Wu X Spitz MR Shete S 《Human heredity》2004,58(2):63-68

OBJECTIVE: Haplotypes are gaining popularity in studies of human genetics because they contain more information than does a single gene locus. However, current high-throughput genotyping techniques cannot produce haplotype information. Several statistical methods have recently been proposed to infer haplotypes based on unphased genotypes at several loci. The accuracy, efficiency, and computational time of these methods have been under intense scrutiny. In this report, our aim was to evaluate haplotype inference methods for genotypic data from unrelated individuals. METHODS: We compared the performance of three haplotype inference methods that are currently in use--HAPLOTYPER, hap, and PHASE--by applying them to a large data set from unrelated individuals with known haplotypes. We also applied these methods to coalescent-based simulation studies using both constant size and exponential growth models. The performance of these methods, along with that of the expectation-maximization algorithm, was further compared in the context of an association study. RESULTS: While the algorithm implemented in the software PHASE was found to be the most accurate in both real and simulated data comparisons, all four methods produced good results in the association study. 相似文献

18.

Using blocks of linked single nucleotide polymorphisms as highly polymorphic genetic markers for parentage analysis

Jones B Walsh D Werner L Fiumera A 《Molecular ecology resources》2009,9(2):487-497

Single nucleotide polymorphisms (SNPs) are plentiful in most genomes and amenable to high throughput genotyping, but they are not yet popular for parentage or paternity analysis. The markers are bi-allelic, so individually they contain little information about parentage, and in nonmodel organisms the process of identifying large numbers of unlinked SNPs can be daunting. We explore the possibility of using blocks of between three and 26 linked SNPs as highly polymorphic molecular markers for reconstructing male genotypes in polyandrous organisms with moderate (five offspring) to large (25 offspring) clutches of offspring. Haplotypes are inferred for each block of linked SNPs using the programs Haplore and Phase 2.1. Each multi-SNP haplotype is then treated as a separate allele, producing a highly polymorphic, 'microsatellite-like' marker. A simulation study is performed using haplotype frequencies derived from empirical data sets from Drosophila melanogaster and Mus musculus populations. We find that the markers produced are competitive with microsatellite loci in terms of single parent exclusion probabilities, particularly when using six or more linked SNPs to form a haplotype. These markers contain only modest rates of missing data and genotyping or phasing errors and thus should be seriously considered as molecular markers for parentage analysis, particularly when the study is interested in the functional significance of polymorphisms across the genome. 相似文献

19.

Linkage disequilibrium grouping of single nucleotide polymorphisms (SNPs) reflecting haplotype phylogeny for efficient selection of tag SNPs 总被引：1，自引：0，他引：1

Takeuchi F Yanai K Morii T Ishinaga Y Taniguchi-Yanai K Nagano S Kato N 《Genetics》2005,170(1):291-304

Single nucleotide polymorphisms (SNPs) have been proposed to be grouped into haplotype blocks harboring a limited number of haplotypes. Within each block, the portion of haplotypes is expected to be tagged by a selected subset of SNPs; however, none of the proposed selection algorithms have been definitive. To address this issue, we developed a tag SNP selection algorithm based on grouping of SNPs by the linkage disequilibrium (LD) coefficient r(2) and examined five genes in three ethnic populations--the Japanese, African Americans, and Caucasians. Additionally, we investigated ethnic diversity by characterizing 979 SNPs distributed throughout the genome. Our algorithm could spare 60% of SNPs required for genotyping and limit the imprecision in allele-frequency estimation of nontag SNPs to 2% on average. We discovered the presence of a mosaic pattern of LD plots within a conventionally inferred haplotype block. This emerged because multiple groups of SNPs with strong intragroup LD were mingled in their physical positions. The pattern of LD plots showed some similarity, but the details of tag SNPs were not entirely concordant among three populations. Consequently, our algorithm utilizing LD grouping allows selection of a more faithful set of tag SNPs than do previous algorithms utilizing haplotype blocks. 相似文献

20.

Haplotype‐based genotyping‐by‐sequencing in oat genome research

下载免费PDF全文

Wubishet A. Bekele Charlene P. Wight Shiaoman Chao Catherine J. Howarth Nicholas A. Tinker 《Plant biotechnology journal》2018,16(8):1452-1463

In a de novo genotyping‐by‐sequencing (GBS) analysis of short, 64‐base tag‐level haplotypes in 4657 accessions of cultivated oat, we discovered 164741 tag‐level (TL) genetic variants containing 241224 SNPs. From this, the marker density of an oat consensus map was increased by the addition of more than 70000 loci. The mapped TL genotypes of a 635‐line diversity panel were used to infer chromosome‐level (CL) haplotype maps. These maps revealed differences in the number and size of haplotype blocks, as well as differences in haplotype diversity between chromosomes and subsets of the diversity panel. We then explored potential benefits of SNP vs. TL vs. CL GBS variants for mapping, high‐resolution genome analysis and genomic selection in oats. A combined genome‐wide association study (GWAS) of heading date from multiple locations using both TL haplotypes and individual SNP markers identified 184 significant associations. A comparative GWAS using TL haplotypes, CL haplotype blocks and their combinations demonstrated the superiority of using TL haplotype markers. Using a principal component‐based genome‐wide scan, genomic regions containing signatures of selection were identified. These regions may contain genes that are responsible for the local adaptation of oats to Northern American conditions. Genomic selection for heading date using TL haplotypes or SNP markers gave comparable and promising prediction accuracies of up to r = 0.74. Genomic selection carried out in an independent calibration and test population for heading date gave promising prediction accuracies that ranged between r = 0.42 and 0.67. In conclusion, TL haplotype GBS‐derived markers facilitate genome analysis and genomic selection in oat. 相似文献