期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Internal algorithm variability and among-algorithm discordance in statistical haplotype reconstruction

ZU-SHI HUANG YA-JIE JI DE-XING ZHANG† 《Molecular ecology》2009,18(8):1556-1559

The potential effectiveness of statistical haplotype inference makes it an area of active exploration over the last decade. There are several complications of statistical inference, including: the same algorithm can produce different solutions for the same data set, which reflects the internal algorithm variability; different algorithms can give different solutions for the same data set, reflecting the discordance among algorithms; and the algorithms per se are unable to evaluate the reliability of the solutions even if they are unique, this being a general limitation of all inference methods. With the aim of increasing the confidence of statistical inference results, consensus strategy appears to be an effective means to deal with these problems. Several authors have explored this with different emphases. Here we discuss two recent studies examining the internal algorithm variability and among-algorithm discordance, respectively, and evaluate the different outcomes of these analyses, in light of Orzack (2009) comment. Until other, better methods are developed, a combination of these two approaches should provide a practical way to increase the confidence of statistical haplotyping results. 相似文献

2.

Systematic analysis of sequence variability of the endothelin-1 gene: a prerequisite for association studies

Diefenbach K Arjomand-Nahad F Meisel C Fietze I Stangl K Roots I Köpke K 《Genetic testing》2006,10(3):163-168

We analyzed allele frequencies and pairwise linkage disequilibria of 13 variants in the EDN1 gene of 298 young males, the majority of German ancestry. Our analysis comprises all common variants in the five exons and flanking intronic regions, as well as known polymorphisms in the promoter sequence. In addition to previously analyzed polymorphisms, our haplotype reconstruction included five recently described variants and was done by using three different algorithms to allow inference of result stability. More than 30 haplotypes were predicted. All haplotypes with frequencies > or = 1% were inferred by all three methods and can be described by seven haplotype tagging single-nucleotide polymorphisms (htSNPs), reducing the genotyping load to 65%. Three of these haplotypes with frequencies of about 11%, 9%, and 4% had been mistaken for one haplotype in the previous analysis, which included only six polymorphisms, some of them not being htSNPs. Systematic analysis of sequence variability and comprehensive haplotype analysis of the EDN1 gene determined a substantial part of its genetic variability for further association studies and helped to reduce the genotyping load for common phenotypes. 相似文献

3.

Haplotype association analysis of human disease traits using genotype data of unrelated individuals

Tan Q Christiansen L Christensen K Bathum L Li S Zhao JH Kruse TA 《Genetical research》2005,86(3):223-231

Haplotype inference has become an important part of human genetic data analysis due to its functional and statistical advantages over the single-locus approach in linkage disequilibrium mapping. Different statistical methods have been proposed for detecting haplotype - disease associations using unphased multi-locus genotype data, ranging from the early approach by the simple gene-counting method to the recent work using the generalized linear model. However, these methods are either confined to case - control design or unable to yield unbiased point and interval estimates of haplotype effects. Based on the popular logistic regression model, we present a new approach for haplotype association analysis of human disease traits. Using haplotype-based parameterization, our model infers the effects of specific haplotypes (point estimation) and constructs confidence interval for the risks of haplotypes (interval estimation). Based on the estimated parameters, the model calculates haplotype frequency conditional on the trait value for both discrete and continuous traits. Moreover, our model provides an overall significance level for the association between the disease trait and a group or all of the haplotypes. Featured by the direct maximization in haplotype estimation, our method also facilitates a computer simulation approach for correcting the significance level of individual haplotype to adjust for multiple testing. We show, by applying the model to an empirical data set, that our method based on the well-known logistic regression model is a useful tool for haplotype association analysis of human disease traits. 相似文献

4.

An anomalous haplotype distribution of the arrestin domain-containing 4 gene (ARRDC4) haplotypes in Caucasians

Knoll B Goldammer M Wojewoda A Flügge J Johne A Mrozikiewicz PM Roots I Köpke K 《Genetic testing》2008,12(1):147-152

Little was known about the sequence variability of the human Arrestin domain-containing 4 gene (ARRDC4). We sequenced its DNA from exon 2 to exon 8 in a sample of 92 Russians. Seven variants were identified; one of them has not been described yet. It causes an amino acid change from Thr to Met. Identified variants were genotyped in the complete sample of 253 unrelated men and women to analyze haplotype distribution. Fifteen haplotypes were inferred. Nine haplotypes had estimated frequencies > 1%. Ninety-five percent of all haplotypes were determined by five haplotype-tagging single nucleotide polymorphisms. Haplotypes form two clades. The two most common haplotypes cover 76% of all haplotypes. The certainty of the haplotype reconstruction does not depend on the haplotype-inferring algorithms, but is a result of the anomalous haplotype distribution of ARRDC4, which makes this gene a suitable candidate gene for haplotype association studies. Interestingly, there is a great evolutionary distance between the two most common haplotypes, which could suggest a more complicated coalescent process with either past gene flow, selections, or bottlenecks. 相似文献

5.

Inference on haplotype effects in case-control studies using unphased genotype data 总被引：9，自引：0，他引：9

下载免费PDF全文

Epstein MP Satten GA 《American journal of human genetics》2003,73(6):1316-1329

A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data. 相似文献

6.

HapScope: a software system for automated and visual analysis of functionally annotated haplotypes

Zhang J Rowe WL Struewing JP Buetow KH 《Nucleic acids research》2002,30(23):5213-5221

We have developed a software analysis package, HapScope, which includes a comprehensive analysis pipeline and a sophisticated visualization tool for analyzing functionally annotated haplotypes. The HapScope analysis pipeline supports: (i) computational haplotype construction with an expectation-maximization or Bayesian statistical algorithm; (ii) SNP classification by protein coding change, homology to model organisms or putative regulatory regions; and (iii) minimum SNP subset selection by either a Brute Force Algorithm or a Greedy Partition Algorithm. The HapScope viewer displays genomic structure with haplotype information in an integrated environment, providing eight alternative views for assessing genetic and functional correlation. It has a user-friendly interface for: (i) haplotype block visualization; (ii) SNP subset selection; (iii) haplotype consolidation with subset SNP markers; (iv) incorporation of both experimentally determined haplotypes and computational results; and (v) data export for additional analysis. Comparison of haplotypes constructed by the statistical algorithms with those determined experimentally shows variation in haplotype prediction accuracies in genomic regions with different levels of nucleotide diversity. We have applied HapScope in analyzing haplotypes for candidate genes and genomic regions with extensive SNP and genotype data. We envision that the systematic approach of integrating functional genomic analysis with population haplotypes, supported by HapScope, will greatly facilitate current genetic disease research. 相似文献

7.

Accurate haplotype inference for multiple linked single-nucleotide polymorphisms using sibship data 总被引：1，自引：0，他引：1

下载免费PDF全文

Liu PY Lu Y Deng HW 《Genetics》2006,174(1):499-509

Sibships are commonly used in genetic dissection of complex diseases, particularly for late-onset diseases. Haplotype-based association studies have been advocated as powerful tools for fine mapping and positional cloning of complex disease genes. Existing methods for haplotype inference using data from relatives were originally developed for pedigree data. In this study, we proposed a new statistical method for haplotype inference for multiple tightly linked single-nucleotide polymorphisms (SNPs), which is tailored for extensively accumulated sibship data. This new method was implemented via an expectation-maximization (EM) algorithm without the usual assumption of linkage equilibrium among markers. Our EM algorithm does not incur extra computational burden for haplotype inference using sibship data when compared with using unrelated parental data. Furthermore, its computational efficiency is not affected by increasing sibship size. We examined the robustness and statistical performance of our new method in simulated data created from an empirical haplotype data set of human growth hormone gene 1. The utility of our method was illustrated with an application to the analyses of haplotypes of three candidate genes for osteoporosis. 相似文献

8.

MOLECULAR PHYLOGEOGRAPHY, RETICULATION, AND LINEAGE SORTING IN MEDITERRANEAN SENECIO SECT. SENECIO (ASTERACEAE) 总被引：1，自引：0，他引：1

Hans Peter Comes Richard J. Abbott 《Evolution; international journal of organic evolution》2001,55(10):1943-1962

Abstract The Mediterranean species complex of Senecio serves to illustrate evolutionary processes that are likely to confound phylogenetic inference, including rapid diversification, gene tree‐species tree discordance, reticulation, interlocus concerted evolution, and lack of complete lineage sorting. Phylogeographic patterns of chloroplast DNA (cpDNA) haplotype variation were studied by sampling 156 populations (502 individuals) across 18 species of the complex, and a species phylogeny was reconstructed based on sequences from the internal transcribed spacer (ITS) regions of nuclear ribosomal DNA. For a subset of species, randomly amplified polymorphic DNAs (RAPDs) provided reference points for comparison with the cpDNA and ITS datasets. Two classes of cpDNA haplotypes were identified, with each predominating in certain parts of the Mediterranean region. However, with the exception of S. gallicus, intraspecific phylogeographic structure is limited, and only a few haplotypes detected were species‐specific. Nuclear sequence divergence is low, and several unresolved phylogenetic groupings are suggestive of near simultaneous diversification. Two well‐supported ITS clades contain the majority of species, amongst which there is a pronounced sharing of cpDNA haplotypes. Our data are not capable of diagnosing the relative impact of reticulation versus insufficient lineage sorting for the entire complex. However, there is firm evidence that S. flavus subsp. breviflorus and S. rupestris have acquired cpDNA haplotypes and ITS sequences from co‐occurring species by reticulation. In contrast, insufficient lineage sorting is a viable hypothesis for cpDNA haplotypes shared between S. gallicus and its close relatives. We estimated the minimum coalescent times for these haplotypes by utilizing the inferred species phylogeny and associated divergence times. Our data suggest that ancestral cpDNA polymorphisms may have survived for ca. 0.4–1.0 million years, depending on molecular clock calibrations. 相似文献

9.

Accuracy of haplotype estimation in a region of low linkage disequilibrium

Avery CL Martin LJ Williams JT North KE 《BMC genetics》2005,6(Z1):S80

We compared the accuracy of haplotype inferences at a 6 Mb region on chromosome 7 where significant linkage between a brain oscillation phenotype and a cholinergic muscarinic receptor gene was previously reported. Individual haplotype assignments and haplotype frequencies were estimated using 5, 10, and 14 consecutive Illumina single-nucleotide polymorphisms (SNPs) within the 1-LOD unit support interval of the chromosome 7 linkage peak. Initially, haplotypes were constructed incorporating phase information provided by relatives using the pedigree analysis package MERLIN. Population-based haplotypes were inferred using the haplotype estimation software HAPLO.STATS and PHASE, using unrelated individuals. The 14 SNPs within this region exhibited markedly low linkage disequilibrium, and the average D' estimate between SNPs was 0.18 (range: 0.01-0.97). In comparison to the family-based haplotypes calculated in MERLIN, the computational inferences of individual haplotype assignments were most accurate when considering 5 consecutive SNPs, but decayed dramatically when considering 10 or 14 SNPs in both PHASE and HAPLO.STATS. When comparing the two haplotype inference methods, both PHASE and HAPLO.STATS performed poorly. These analyses underscore the difficulties of haplotype estimation in the presence of low linkage disequilibrium and stress the importance of careful consideration of confidence measures when using estimated haplotype frequencies and individual assignments in biomedical research. 相似文献

10.

Accuracy assessment of diploid consensus sequences

Kim JH Waterman MS Li LM 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2007,4(1):88-97

If the origins of fragments are known in genome sequencing projects, it is straightforward to reconstruct diploid consensus sequences. In reality, however, this is not true. Although there are proposed methods to reconstruct haplotypes from genome sequencing projects, an accuracy assessment is required to evaluate the confidence of the estimated diploid consensus sequences. In this paper, we define the confidence score of diploid consensus sequences. It requires the calculation of the likelihood of an assembly. To calculate the likelihood, we propose a linear time algorithm with respect to the number of polymorphic sites. The likelihood calculation and confidence score are used for further improvements of haplotype estimation in two directions. One direction is that low-scored phases are disconnected. The other direction is that, instead of using nominal frequency 1/2, the haplotype frequency is estimated to reflect the actual contribution of each haplotype. Our method was evaluated on the simulated data whose polymorphism rate (1.2 percent) was based on Ciona intestinalis. As a result, the high accuracy of our algorithm was indicated: The true positive rate of the haplotype estimation was greater than 97 percent 相似文献

11.

Haplotype inference in general pedigrees using the cluster variation method 总被引：1，自引：0，他引：1

下载免费PDF全文

Albers CA Heskes T Kappen HJ 《Genetics》2007,177(2):1101-1116

We present CVMHAPLO, a probabilistic method for haplotyping in general pedigrees with many markers. CVMHAPLO reconstructs the haplotypes by assigning in every iteration a fixed number of the ordered genotypes with the highest marginal probability, conditioned on the marker data and ordered genotypes assigned in previous iterations. CVMHAPLO makes use of the cluster variation method (CVM) to efficiently estimate the marginal probabilities. We focused on single-nucleotide polymorphism (SNP) markers in the evaluation of our approach. In simulated data sets where exact computation was feasible, we found that the accuracy of CVMHAPLO was high and similar to that of maximum-likelihood methods. In simulated data sets where exact computation of the maximum-likelihood haplotype configuration was not feasible, the accuracy of CVMHAPLO was similar to that of state of the art Markov chain Monte Carlo (MCMC) maximum-likelihood approximations when all ordered genotypes were assigned and higher when only a subset of the ordered genotypes was assigned. CVMHAPLO was faster than the MCMC approach and provided more detailed information about the uncertainty in the inferred haplotypes. We conclude that CVMHAPLO is a practical tool for the inference of haplotypes in large complex pedigrees. 相似文献

12.

Analysis and exploration of the use of rule-based algorithms and consensus methods for the inferral of haplotypes 总被引：2，自引：0，他引：2

Orzack SH Gusfield D Olson J Nesbitt S Subrahmanyan L Stanton VP 《Genetics》2003,165(2):915-928

The difficulty of experimental determination of haplotypes from phase-unknown genotypes has stimulated the development of nonexperimental inferral methods. One well-known approach for a group of unrelated individuals involves using the trivially deducible haplotypes (those found in individuals with zero or one heterozygous sites) and a set of rules to infer the haplotypes underlying ambiguous genotypes (those with two or more heterozygous sites). Neither the manner in which this "rule-based" approach should be implemented nor the accuracy of this approach has been adequately assessed. We implemented eight variations of this approach that differed in how a reference list of haplotypes was derived and in the rules for the analysis of ambiguous genotypes. We assessed the accuracy of these variations by comparing predicted and experimentally determined haplotypes involving nine polymorphic sites in the human apolipoprotein E (APOE) locus. The eight variations resulted in substantial differences in the average number of correctly inferred haplotype pairs. More than one set of inferred haplotype pairs was found for each of the variations we analyzed, implying that the rule-based approach is not sufficient by itself for haplotype inferral, despite its appealing simplicity. Accordingly, we explored consensus methods in which multiple inferrals for a given ambiguous genotype are combined to generate a single inferral; we show that the set of these "consensus" inferrals for all ambiguous genotypes is more accurate than the typical single set of inferrals chosen at random. We also use a consensus prediction to divide ambiguous genotypes into those whose algorithmic inferral is certain or almost certain and those whose less certain inferral makes molecular inferral preferable. 相似文献

13.

Haplotype reconstruction from SNP alignment. 总被引：4，自引：0，他引：4

Lei M Li Jong Hyun Kim Michael S Waterman 《Journal of computational biology》2004,11(2-3):505-516

In this paper, we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment assembly, we wish to reconstruct the two haplotypes of the parents. Given a set of potential SNP sites inferred from the assembly alignment, we wish to divide the fragment set into two subsets, each of which represents one chromosome. Our method is based on a statistical model of sequencing errors, compositional information, and haplotype memberships. We calculate probabilities of different haplotypes conditional on the alignment. Due to computational complexity, we first determine phases for neighboring SNPs. Then we connect them and construct haplotype segments. Also, we compute the accuracy or confidence of the reconstructed haplotypes. We discuss other issues, such as alternative methods, parameter estimation, computational efficiency, and relaxation of assumptions. 相似文献

14.

Extended MHC haplotypes and CYP21/C4 gene organisation in Irish 21-hydroxylase deficiency families

Paul J. Sinnott Colm Costigan Philip A. Dyer Rodney Harris Tom Strachan 《Human genetics》1991,87(3):361-366

Summary We have analysed fifteen classical 21-hydroxylase deficiency families from throughout Southern Ireland and report the serologically defined HLA-A, HLA-B, HLA-Cw, HLA-DR, C4A and C4B polymorphisms that characterize the inferred disease haplotypes. Additionally, we have used a combination of short and long range restriction mapping procedures in order to characterize the CYP21/C4 gene organization associated with individual serologically defined haplotypes. The results obtained indicate that disease haplotypes are characterized by a high frequency (33%) of CYP21B gene deletion and 8 out of 10 such deletion haplotypes are represented by the extended haplotype HLA-DR1, C4BQo, C4A3, HLA-B40(w60), HLA-Cw3, HLA-A3. Large scale length polymorphism in the CYP21/C4 gene cluster was found to conform strictly to a variable number of tandem repeats model with 4 alleles being detected. Disease haplotypes in which defective CYP21B gene expression is inferred to result from pathological point mutations show extensive diversity of associated HLA markers and include two examples of the extended HLA haplotype HLA-DR3, B8, Cw7, A1 haplotype, which has previously been reported to be negatively associated with 21-hydroxylase deficiency. One unusual disease haplotype has two CYP21 + C4 units, both of which appear to contain CYP21B-like genes. 相似文献

15.

A statistical framework for haplotype block inference

Yuan A Chen G Rotimi C Bonney GE 《Journal of bioinformatics and computational biology》2005,3(5):1021-1038

The existence of haplotype blocks transmitted from parents to offspring has been suggested recently. This has created an interest in the inference of the block structure and length. The motivation is that haplotype blocks that are characterized well will make it relatively easier to quickly map all the genes carrying human diseases. To study the inference of haplotype block systematically, we propose a statistical framework. In this framework, the optimal haplotype block partitioning is formulated as the problem of statistical model selection; missing data can be handled in a standard statistical way; population strata can be implemented; block structure inference/hypothesis testing can be performed; prior knowledge, if present, can be incorporated to perform a Bayesian inference. The algorithm is linear in the number of loci, instead of NP-hard for many such algorithms. We illustrate the applications of our method to both simulated and real data sets. 相似文献

16.

Using an uncertainty-coding matrix in Bayesian regression models for haplotype-specific risk detection in family association studies

Huang YH Lee MH Chen WJ Hsiao CK 《PloS one》2011,6(7):e21890

Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available. 相似文献

17.

Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals 总被引：46，自引：0，他引：46

Zaykin DV Westfall PH Young SS Karnoub MA Wagner MJ Ehm MG 《Human heredity》2002,53(2):79-91

There have been increasing efforts to relate drug efficacy and disease predisposition with genetic polymorphisms. We present statistical tests for association of haplotype frequencies with discrete and continuous traits in samples of unrelated individuals. Haplotype frequencies are estimated through the expectation-maximization algorithm, and each individual in the sample is expanded into all possible haplotype configurations with corresponding probabilities, conditional on their genotype. A regression-based approach is then used to relate inferred haplotype probabilities to the response. The relationship of this technique to commonly used approaches developed for case-control data is discussed. We confirm the proper size of the test under H(0) and find an increase in power under the alternative by comparing test results using inferred haplotypes with single-marker tests using simulated data. More importantly, analysis of real data comprised of a dense map of single nucleotide polymorphisms spaced along a 12-cM chromosomal region allows us to confirm the utility of the haplotype approach as well as the validity and usefulness of the proposed statistical technique. The method appears to be successful in relating data from multiple, correlated markers to response. 相似文献

18.

A comparison of phasing algorithms for trios and unrelated individuals

下载免费PDF全文

Marchini J Cutler D Patterson N Stephens M Eskin E Halperin E Lin S Qin ZS Munro HM Abecasis GR Donnelly P;International HapMap Consortium 《American journal of human genetics》2006,78(3):437-450

Knowledge of haplotype phase is valuable for many analysis methods in the study of disease, population, and evolutionary genetics. Considerable research effort has been devoted to the development of statistical and computational methods that infer haplotype phase from genotype data. Although a substantial number of such methods have been developed, they have focused principally on inference from unrelated individuals, and comparisons between methods have been rather limited. Here, we describe the extension of five leading algorithms for phase inference for handling father-mother-child trios. We performed a comprehensive assessment of the methods applied to both trios and to unrelated individuals, with a focus on genomic-scale problems, using both simulated data and data from the HapMap project. The most accurate algorithm was PHASE (v2.1). For this method, the percentages of genotypes whose phase was incorrectly inferred were 0.12%, 0.05%, and 0.16% for trios from simulated data, HapMap Centre d'Etude du Polymorphisme Humain (CEPH) trios, and HapMap Yoruban trios, respectively, and 5.2% and 5.9% for unrelated individuals in simulated data and the HapMap CEPH data, respectively. The other methods considered in this work had comparable but slightly worse error rates. The error rates for trios are similar to the levels of genotyping error and missing data expected. We thus conclude that all the methods considered will provide highly accurate estimates of haplotypes when applied to trio data sets. Running times differ substantially between methods. Although it is one of the slowest methods, PHASE (v2.1) was used to infer haplotypes for the 1 million-SNP HapMap data set. Finally, we evaluated methods of estimating the value of r(2) between a pair of SNPs and concluded that all methods estimated r(2) well when the estimated value was >or=0.8. 相似文献

19.

HAPLORE: a program for haplotype reconstruction in general pedigrees without recombination 总被引：4，自引：0，他引：4

Zhang K Sun F Zhao H 《Bioinformatics (Oxford, England)》2005,21(1):90-103

MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu. 相似文献

20.

Evaluation of Haplotype Inference Using Definitive Haplotype Data Obtained from Complete Hydatidiform Moles, and Its Significance for the Analyses of Positively Selected Regions

Koichiro Higasa Yoji Kukita Kiyoko Kato Norio Wake Tomoko Tahira Kenshi Hayashi 《PLoS genetics》2009,5(5)

The haplotype map constructed by the HapMap Project is a valuable resource in the genetic studies of disease genes, population structure, and evolution. In the Project, Caucasian and African haplotypes are fairly accurately inferred, based mainly on the rules of Mendelian inheritance using the genotypes of trios. However, the Asian haplotypes are inferred from the genotypes of unrelated individuals based on population genetics, and are less accurate. Thus, the effects of this inaccuracy on downstream analyses needs to be assessed. We determined true Japanese haplotypes by genotyping 100 complete hydatidiform moles (CHM), each carrying a genome derived from a single sperm, using Affymetrix 500 K Arrays. We then assessed how inferred haplotypes can differ from true haplotypes, by phasing pseudo-individualized true haplotypes using the programs PHASE, fastPHASE, and Beagle. We found that, at various genomic regions, especially the MHC locus, the expansion of extended haplotype homozygosity (EHH), which is a measure of positive selection, is obscured when inferred Asian haplotype data is used to detect the expansion. We then mapped the genome using a new statistic, XDiHH, which directly detects the difference between the true and inferred haplotypes, in the determination of EHH expansion. We also show that the true haplotype data presented here is useful to assess and improve the accuracy of phasing of Asian genotypes. 相似文献