首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Assigning Linkage Haplotypes from Parent and Progeny Genotypes   总被引:2,自引:1,他引:1       下载免费PDF全文
A. Nejati-Javaremi  C. Smith 《Genetics》1996,142(4):1363-1367
Given the genotypes of parents and progeny, their haplotypes over several or many linked loci can be easily assigned by listing the allele type at each locus along the haplotype known to be from each parent. Only a small number (5-10) of progeny per family is usually needed to assign the parental and progeny haplotypes. Any gaps left in the haplotypes may be filled in from the assigned haplotypes of relatives. The process is facilitated by having multiple alleles at the loci and by using more linked loci in the haplotype and with more progeny from the mating. Crossover haplotypes in the progeny can be identified by their being unique or uncommon, and the crossover point can often be detected if the locus linkage map order is known. The haplotyping method applies to outbreeding populations in plants, animals and man, as well as to traditional experimental crosses of inbred lines. The method also applies to half-sib families, whether the genotypes of the mates are known or unknown. The haplotyping procedure is already used in linkage analysis but does not seem to have been published. It should be useful in teaching and in genetic applications of haplotypes.  相似文献   

2.
A key step toward the discovery of a gene related to a trait is the finding of an association between the trait and one or more haplotypes. Haplotype analyses can also provide critical information regarding the function of a gene; however, when unrelated subjects are sampled, haplotypes are often ambiguous because of unknown linkage phase of the measured sites along a chromosome. A popular method of accounting for this ambiguity in case-control studies uses a likelihood that depends on haplotype frequencies, so that the haplotype frequencies can be compared between the cases and controls; however, this traditional method is limited to a binary trait (case vs. control), and it does not provide a method of testing the statistical significance of specific haplotypes. To address these limitations, we developed new methods of testing the statistical association between haplotypes and a wide variety of traits, including binary, ordinal, and quantitative traits. Our methods allow adjustment for nongenetic covariates, which may be critical when analyzing genetically complex traits. Furthermore, our methods provide several different global tests for association, as well as haplotype-specific tests, which give a meaningful advantage in attempts to understand the roles of many different haplotypes. The statistics can be computed rapidly, making it feasible to evaluate the associations between many haplotypes and a trait. To illustrate the use of our new methods, they are applied to a study of the association of haplotypes (composed of genes from the human-leukocyte-antigen complex) with humoral immune response to measles vaccination. Limited simulations are also presented to demonstrate the validity of our methods, as well as to provide guidelines on how our methods could be used.  相似文献   

3.

Background  

Mitochondrial DNA (mtDNA) haplotypes have become popular tools for tracing maternal ancestry, and several companies offer this service to the general public. Numerous studies have demonstrated that human mtDNA haplotypes can be used with confidence to identify the continent where the haplotype originated. Ideally, mtDNA haplotypes could also be used to identify a particular country or ethnic group from which the maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the movement of both individuals and population groups. Consequently, common mtDNA haplotypes are shared among multiple ethnic groups. We have studied the distribution of mtDNA haplotypes among West African ethnic groups to determine how often mtDNA haplotypes can be used to reconnect Americans of African descent to a country or ethnic group of a maternal African ancestor. The nucleotide sequence of the mtDNA hypervariable segment I (HVS-I) usually provides sufficient information to assign a particular mtDNA to the proper haplogroup, and it contains most of the variation that is available to distinguish a particular mtDNA haplotype from closely related haplotypes. In this study, samples of general African-American and specific Gullah/Geechee HVS-I haplotypes were compared with two databases of HVS-I haplotypes from sub-Saharan Africa, and the incidence of perfect matches recorded for each sample.  相似文献   

4.
Case-control studies are used to map loci associated with a genetic disease. The usual case-control study tests for significant differences in frequencies of alleles at marker loci. In this paper, we consider the problem of comparing two or more marker loci simultaneously and testing for significant differences in haplotype rather than allele frequencies. We consider two situations. In the first, genotypes at marker loci are resolved into haplotypes by making use of biochemical methods or by genotyping family members. In the second, genotypes at marker loci are not resolved into haplotypes, but, by assuming random mating, haplotypes can be inferred using a likelihood method such as the expectation-maximization (EM) algorithm. We assume that a causative locus has two alleles with a multiplicative effect on the penetrance of a disease, with one allele increasing the penetrance by a factor pi. We find, for small values of pi-1 and large sample sizes, asymptotic results that predict the statistical power of a test for significant differences in haplotype frequencies between cases and a random sample of the population, both when haplotypes can be resolved and when haplotypes have to be inferred. The increase in power when haplotypes can be resolved can be expressed as a ratio R, which is the increase in sample size needed to achieve the same power when haplotypes are resolved over when they are not resolved. In general, R depends on the pattern of linkage disequilibrium between the causative allele and the marker haplotypes but is independent of the frequency of the causative allele and, to a first approximation, is independent of pi. For the special situation of two di-allelic marker loci, we obtain a simple expression for R and its upper bound.  相似文献   

5.
A commonly used tool in disease association studies is the search for discrepancies between the haplotype distribution in the case and control populations. In order to find this discrepancy, the haplotypes frequency in each of the populations is estimated from the genotypes. We present a new method HAPLOFREQ to estimate haplotype frequencies over a short genomic region given the genotypes or haplotypes with missing data or sequencing errors. Our approach incorporates a maximum likelihood model based on a simple random generative model which assumes that the genotypes are independently sampled from the population. We first show that if the phased haplotypes are given, possibly with missing data, we can estimate the frequency of the haplotypes in the population by finding the global optimum of the likelihood function in polynomial time. If the haplotypes are not phased, finding the maximum value of the likelihood function is NP-hard. In this case, we define an alternative likelihood function which can be thought of as a relaxed likelihood function. We show that the maximum relaxed likelihood can be found in polynomial time and that the optimal solution of the relaxed likelihood approaches asymptotically to the haplotype frequencies in the population. In contrast to previous approaches, our algorithms are guaranteed to converge in polynomial time to a global maximum of the different likelihood functions. We compared the performance of our algorithm to the widely used program PHASE, and we found that our estimates are at least 10% more accurate than PHASE and about ten times faster than PHASE. Our techniques involve new algorithms in convex optimization. These algorithms may be of independent interest. Particularly, they may be helpful in other maximum likelihood problems arising from survey sampling.  相似文献   

6.
Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by > 50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources.  相似文献   

7.
Environmental DNA (eDNA) analysis has recently been used as a new tool for estimating intraspecific diversity. However, whether known haplotypes contained in a sample can be detected correctly using eDNA‐based methods has been examined only by an aquarium experiment. Here, we tested whether the haplotypes of Ayu fish (Plecoglossus altivelis altivelis) detected in a capture survey could also be detected from an eDNA sample derived from the field that contained various haplotypes with low concentrations and foreign substances. A water sample and Ayu specimens collected from a river on the same day were analysed by eDNA analysis and Sanger sequencing, respectively. The 10 L water sample was divided into 20 filters for each of which 15 PCR replications were performed. After high‐throughput sequencing, denoising was performed using two of the most widely used denoising packages, unoise3 and dada2 . Of the 42 haplotypes obtained from the Sanger sequencing of 96 specimens, 38 (unoise3 ) and 41 (dada2 ) haplotypes were detected by eDNA analysis. When dada2 was used, except for one haplotype, haplotypes owned by at least two specimens were detected from all the filter replications. Accordingly, although it is important to note that eDNA‐based method has some limitations and some risk of false positive and false negative, this study showed that the eDNA analysis for evaluating intraspecific genetic diversity provides comparable results for large‐scale capture‐based conventional methods. Our results suggest that eDNA‐based methods could become a more efficient survey method for investigating intraspecific genetic diversity in the field.  相似文献   

8.
The haplotype block structure of SNP variation in human DNA has been demonstrated by several recent studies. The presence of haplotype blocks can be used to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using real-world SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.  相似文献   

9.
The rough draft of the human genome map has been used to identify most of the functional genes in the human genome, as well as to identify nucleotide variations, known as "single-nucleotide polymorphisms" (SNPs), in these genes. By use of advanced biotechnologies, researchers are beginning to genotype thousands of SNPs from biological samples. Among the many possible applications, one of them is the study of SNP associations with complex human diseases, such as cancers or coronary heart diseases, by using a case-control study design. Through the gathering of environmental risk factors and other lifestyle factors, such a study can be effectively used to investigate interactions between genes and environmental factors in their associations with disease phenotype. Earlier, we developed a method to statistically construct individuals' haplotypes and to estimate the distribution of haplotypes of multiple SNPs in a defined population, by use of estimating-equation techniques. Extending this idea, we describe here an analytic method for assessing the association between the constructed haplotypes along with environmental factors and the disease phenotype. This method is also robust to the model assumptions and is scalable to a large number of SNPs. Asymptotic properties of estimations in the method are proved theoretically and are tested for finite sample sizes by use of simulations. To demonstrate the use of the method, we applied it to assess the possible association between apolipoprotein CIII (six coding SNPs) and restenosis by using a case-control data set. Our analysis revealed two haplotypes that may reduce the risk of restenosis.  相似文献   

10.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

11.
Inference of haplotypes is important for many genetic approaches, including the process of assigning a phenotype to a genetic region. Usually, the population frequencies of haplotypes, as well as the diplotype configuration of each subject, are estimated from a set of genotypes of the subjects in a sample from the population. We have developed an algorithm to infer haplotype frequencies and the combination of haplotype copies in each pool by using pooled DNA data. The input data are the genotypes in pooled DNA samples, each of which contains the quantitative genotype data from one to six subjects. The algorithm infers by the maximum-likelihood method both frequencies of the haplotypes in the population and the combination of haplotype copies in each pool by an expectation-maximization algorithm. The algorithm was implemented in the computer program LDPooled. We also used the bootstrap method to calculate the standard errors of the estimated haplotype frequencies. Using this program, we analyzed the published genotype data for the SAA (n=156), MTHFR (n=80), and NAT2 (n=116) genes, as well as the smoothelin gene (n=102). Our study has shown that the frequencies of major (frequency >0.1 in a population) haplotypes can be inferred rather accurately from the pooled DNA data by the maximum-likelihood method, although with some limitations. The estimated D and D' values had large variations except when the /D/ values were >0.1. The estimated linkage-disequilibrium measure rho2 for 36 linked loci of the smoothelin gene when one- and two-subject pool protocols were used suggested that the gross pattern of the distribution of the measure can be reproduced using the two-subject pool data.  相似文献   

12.
We present the results of a simulation study that indicate that true haplotypes at multiple, tightly linked loci often provide little extra information for linkage-disequilibrium fine mapping, compared with the information provided by corresponding genotypes, provided that an appropriate statistical analysis method is used. In contrast, a two-stage approach to analyzing genotype data, in which haplotypes are inferred and then analyzed as if they were true haplotypes, can lead to a substantial loss of information. The study uses our COLDMAP software for fine mapping, which implements a Markov chain-Monte Carlo algorithm that is based on the shattered coalescent model of genetic heterogeneity at a disease locus. We applied COLDMAP to 100 replicate data sets simulated under each of 18 disease models. Each data set consists of haplotype pairs (diplotypes) for 20 SNPs typed at equal 50-kb intervals in a 950-kb candidate region that includes a single disease locus located at random. The data sets were analyzed in three formats: (1). as true haplotypes; (2). as haplotypes inferred from genotypes using an expectation-maximization algorithm; and (3). as unphased genotypes. On average, true haplotypes gave a 6% gain in efficiency compared with the unphased genotypes, whereas inferring haplotypes from genotypes led to a 20% loss of efficiency, where efficiency is defined in terms of root mean integrated square error of the location of the disease locus. Furthermore, treating inferred haplotypes as if they were true haplotypes leads to considerable overconfidence in estimates, with nominal 50% credibility intervals achieving, on average, only 19% coverage. We conclude that (1). given appropriate statistical analyses, the costs of directly measuring haplotypes will rarely be justified by a gain in the efficiency of fine mapping and that (2). a two-stage approach of inferring haplotypes followed by a haplotype-based analysis can be very inefficient for fine mapping, compared with an analysis based directly on the genotypes.  相似文献   

13.
In the study of complex traits, the utility of linkage analysis and single marker association tests can be limited for researchers attempting to elucidate the complex interplay between a gene and environmental covariates. For these purposes, tests of gene-environment interactions are needed. In addition, recent studies have indicated that haplotypes, which are specific combinations of nucleotides on the same chromosome, may be more suitable as the unit of analysis for statistical tests than single genetic markers. The difficulty with this approach is that, in standard laboratory genotyping, haplotypes are often not directly observable. Instead, unphased marker phenotypes are collected. In this article, we present a method for estimating and testing haplotype-environment interactions when linkage phase is potentially ambiguous. The method builds on the work of Schaid et al. [2002] and is applicable to any trait that can be placed in the generalized linear model framework. Simulations were run to illustrate the salient features of the method. In addition, the method was used to test for haplotype-smoking exposure interaction with data from the Childhood Asthma Management Program.  相似文献   

14.
15.
Dense genotype data can be used to detect chromosome fragments inherited from a common ancestor in apparently unrelated individuals. A disease-causing mutation inherited from a common founder may thus be detected by searching for a common haplotype signature in a sample population of patients. We present here FounderTracker, a computational method for the genome-wide detection of founder mutations in cancer using dense tumor SNP profiles. Our method is based on two assumptions. First, the wild-type allele frequently undergoes loss of heterozygosity (LOH) in the tumors of germline mutation carriers. Second, the overlap between the ancestral chromosome fragments inherited from a common founder will define a minimal haplotype conserved in each patient carrying the founder mutation. Our approach thus relies on the detection of haplotypes with significant identity by descent (IBD) sharing within recurrent regions of LOH to highlight genomic loci likely to harbor a founder mutation. We validated this approach by analyzing two real cancer data sets in which we successfully identified founder mutations of well-characterized tumor suppressor genes. We then used simulated data to evaluate the ability of our method to detect IBD tracts as a function of their size and frequency. We show that FounderTracker can detect haplotypes of low prevalence with high power and specificity, significantly outperforming existing methods. FounderTracker is thus a powerful tool for discovering unknown founder mutations that may explain part of the "missing" heritability in cancer. This method is freely available and can be used online at the FounderTracker website.  相似文献   

16.
Cystic fibrosis (CF) patients with the A455E mutation, in both the French Canadian and the Dutch population, share a common haplotype over distances of up to 25 cM. French Canadian patients with the 621+1G→T mutation share a common haplotype of more than 14 cM. In contrast, haplotypes containing the ΔF508 mutation show haplotype identity over a much shorter genomic distance within and between populations, probably because of the multiple introduction of this most common mutation. Haplotype analysis for specific mutations in CF or in other recessive diseases can be used as a model for studying the occurrence of genetic drift conditional on gene frequencies. Moreover, from our results, it can be inferred that analysis of shared haplotypes is a suitable method for genetic mapping in general. Received: 30 November 1995 / Revised: 11 April 1996  相似文献   

17.
Many methods exist for genotyping—revealing which alleles an individual carries at different genetic loci. A harder problem is haplotyping—determining which alleles lie on each of the two homologous chromosomes in a diploid individual. Conventional approaches to haplotyping require the use of several generations to reconstruct haplotypes within a pedigree, or use statistical methods to estimate the prevalence of different haplotypes in a population. Several molecular haplotyping methods have been proposed, but have been limited to small numbers of loci, usually over short distances. Here we demonstrate a method which allows rapid molecular haplotyping of many loci over long distances. The method requires no more genotypings than pedigree methods, but requires no family material. It relies on a procedure to identify and genotype single DNA molecules, and reconstruction of long haplotypes by a ‘tiling’ approach. We demonstrate this by resolving haplotypes in two regions of the human genome, harbouring 20 and 105 single-nucleotide polymorphisms, respectively. The method can be extended to reconstruct haplotypes of arbitrary complexity and length, and can make use of a variety of genotyping platforms. We also argue that this method is applicable in situations which are intractable to conventional approaches.  相似文献   

18.
 Several simple methods of DNA preparation from plant tissues were evaluated for PCR-RFLP analyses of SLG and SRK alleles, which can be used for the identification of S haplotypes of breeding lines in broccoli and cabbage (Brassica oleracea L.) and in purity tests of F1 hybrid seeds. On the five methods tested, the NaI method was found to be the most suitable for the amplification of the SLG and SRK alleles. This method enables the use of a single seed as testing material. Using this method, we identified S haplotypes of 31 broccoli and 31 cabbage cultivars. Ninety-four percent of the cultivars of broccoli and 97% of those of cabbage were-single cross F1 hybrids. Nine and 15 S haplotypes were found in broccoli and cabbage, respectively. The small number of S haplotypes in broccoli suggests the importance of incorporating new S haplotypes in the breeding program. Received: 18 February 1999 / Revision received: 4 May 1999 / Accepted: 14 May 1999  相似文献   

19.
Molecular haplotyping at high throughput   总被引:4,自引:2,他引:2       下载免费PDF全文
Reconstruction of haplotypes, or the allelic phase, of single nucleotide polymorphisms (SNPs) is a key component of studies aimed at the identification and dissection of genetic factors involved in complex genetic traits. In humans, this often involves investigation of SNPs in case/control or other cohorts in which the haplotypes can only be partially inferred from genotypes by statistical approaches with resulting loss of power. Moreover, alternative statistical methodologies can lead to different evaluations of the most probable haplotypes present, and different haplotype frequency estimates when data are ambiguous. Given the cost and complexity of SNP studies, a robust and easy-to-use molecular technique that allows haplotypes to be determined directly from individual DNA samples would have wide applicability. Here, we present a reliable, automated and high-throughput method for molecular haplotyping in 2 kb, and potentially longer, sequence segments that is based on the physical determination of the phase of SNP alleles on either of the individual paternal haploids. We demonstrate that molecular haplotyping with this technique is not more complicated than SNP genotyping when implemented by matrix-assisted laser desorption/ionisation mass spectrometry, and we also show that the method can be applied using other DNA variation detection platforms. Molecular haplotyping is illustrated on the well-described β2-adrenergic receptor gene.  相似文献   

20.
We recently described a method for linkage disequilibrium (LD) mapping, using cladistic analysis of phased single-nucleotide polymorphism (SNP) haplotypes in a logistic regression framework. However, haplotypes are often not available and cannot be deduced with certainty from the unphased genotypes. One possible two-stage approach is to infer the phase of multilocus genotype data and analyze the resulting haplotypes as if known. Here, haplotypes are inferred using the expectation-maximization (EM) algorithm and the best-guess phase assignment for each individual analyzed. However, inferring haplotypes from phase-unknown data is prone to error and this should be taken into account in the subsequent analysis. An alternative approach is to analyze the phase-unknown multilocus genotypes themselves. Here we present a generalization of the method for phase-known haplotype data to the case of unphased SNP genotypes. Our approach is designed for high-density SNP data, so we opted to analyze the simulated dataset. The marker spacing in the initial screen was too large for our method to be effective, so we used the answers provided to request further data in regions around the disease loci and in null regions. Power to detect the disease loci, accuracy in localizing the true site of the locus, and false-positive error rates are reported for the inferred-haplotype and unphased genotype methods. For this data, analyzing inferred haplotypes outperforms analysis of genotypes. As expected, our results suggest that when there is little or no LD between a disease locus and the flanking region, there will be no chance of detecting it unless the disease variant itself is genotyped.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号