首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes   总被引:4,自引:0,他引:4  
As more SNP marker data becomes available, researchers have used haplotypes of markers, rather than individual polymorphisms, for association analysis of candidate genes. In order to perform haplotype analysis in a population-based case-control study, haplotypes must be determined by estimation in the absence of family information or laboratory methods for establishing phase. Here, we test the accuracy of the Expectation-Maximization (EM) algorithm for estimating haplotype state and frequency in the CCR2-CCR5 gene region by comparison with haplotype state and frequency determined by pedigree analysis. To do this, we have characterized haplotypes comprising alleles at seven biallelic loci in the CCR2-CCR5 chemokine receptor gene region, a span of 20 kb on chromosome 3p21. Three-generation CEPH families (n=40), totaling 489 individuals, were genotyped by the 5'nuclease assay (TaqMan). Haplotype states and frequencies were compared in 103 grandparents who were assumed to have mated at random. Both pedigree analysis and the EM algorithm yielded the same small number of haplotypes for which linkage disequilibrium was nearly maximal. The haplotype frequencies generated by the two methods were nearly identical. These results suggest that the EM algorithm estimation of haplotype states, frequency, and linkage disequilibrium analysis will be an effective strategy in the CCR2-CCR5 gene region. For genetic epidemiology studies, CCR2-CCR5 allele and haplotype frequencies were determined in African-American (n=30), Hispanic (n=24) and European-American (n=34) populations.  相似文献   

2.
Analysis of haplotypes based on multiple single-nucleotide polymorphisms (SNP) is becoming common for both candidate gene and fine-mapping studies. Before embarking on studies of haplotypes from genetically distinct populations, however, it is important to consider variation both in linkage disequilibrium (LD) and in haplotype frequencies within and across populations, as both vary. Such diversity will influence the choice of "tagging" SNPs for candidate gene or whole-genome association studies because some markers will not be polymorphic in all samples and some haplotypes will be poorly represented or completely absent. Here we analyze 11 genes, originally chosen as candidate genes for oral clefts, where multiple markers were genotyped on individuals from four populations. Estimated haplotype frequencies, measures of pairwise LD, and genetic diversity were computed for 135 European-Americans, 57 Chinese-Singaporeans, 45 Malay-Singaporeans, and 46 Indian-Singaporeans. Patterns of pairwise LD were compared across these four populations and haplotype frequencies were used to assess genetic variation. Although these populations are fairly similar in allele frequencies and overall patterns of LD, both haplotype frequencies and genetic diversity varied significantly across populations. Such haplotype diversity has implications for designing studies of association involving samples from genetically distinct populations.  相似文献   

3.
Case-control studies are used to map loci associated with a genetic disease. The usual case-control study tests for significant differences in frequencies of alleles at marker loci. In this paper, we consider the problem of comparing two or more marker loci simultaneously and testing for significant differences in haplotype rather than allele frequencies. We consider two situations. In the first, genotypes at marker loci are resolved into haplotypes by making use of biochemical methods or by genotyping family members. In the second, genotypes at marker loci are not resolved into haplotypes, but, by assuming random mating, haplotypes can be inferred using a likelihood method such as the expectation-maximization (EM) algorithm. We assume that a causative locus has two alleles with a multiplicative effect on the penetrance of a disease, with one allele increasing the penetrance by a factor pi. We find, for small values of pi-1 and large sample sizes, asymptotic results that predict the statistical power of a test for significant differences in haplotype frequencies between cases and a random sample of the population, both when haplotypes can be resolved and when haplotypes have to be inferred. The increase in power when haplotypes can be resolved can be expressed as a ratio R, which is the increase in sample size needed to achieve the same power when haplotypes are resolved over when they are not resolved. In general, R depends on the pattern of linkage disequilibrium between the causative allele and the marker haplotypes but is independent of the frequency of the causative allele and, to a first approximation, is independent of pi. For the special situation of two di-allelic marker loci, we obtain a simple expression for R and its upper bound.  相似文献   

4.
MOTIVATION: The search for genetic variants that are linked to complex diseases such as cancer, Parkinson's;, or Alzheimer's; disease, may lead to better treatments. Since haplotypes can serve as proxies for hidden variants, one method of finding the linked variants is to look for case-control associations between the haplotypes and disease. Finding these associations requires a high-quality estimation of the haplotype frequencies in the population. To this end, we present, HaploPool, a method of estimating haplotype frequencies from blocks of consecutive SNPs. RESULTS: HaploPool leverages the efficiency of DNA pools and estimates the population haplotype frequencies from pools of disjoint sets, each containing two or three unrelated individuals. We study the trade-off between pooling efficiency and accuracy of haplotype frequency estimates. For a fixed genotyping budget, HaploPool performs favorably on pools of two individuals as compared with a state-of-the-art non-pooled phasing method, PHASE. Of independent interest, HaploPool can be used to phase non-pooled genotype data with an accuracy approaching that of PHASE. We compared our algorithm to three programs that estimate haplotype frequencies from pooled data. HaploPool is an order of magnitude more efficient (at least six times faster), and considerably more accurate than previous methods. In contrast to previous methods, HaploPool performs well with missing data, genotyping errors and long haplotype blocks (of between 5 and 25 SNPs).  相似文献   

5.
In genetic association studies, linkage disequilibrium (LD) within a region can be exploited to select a subset of single-nucleotide polymorphisms (SNPs) to genotype with minimal loss of information. A novel entropy-based method for selecting SNPs is proposed and compared to an existing method based on the coefficient of determination (R2) using simulated data from Genetic Analysis Workshop 14. The effect of the size of the sample used to investigate LD (by estimating haplotype frequencies) and hence select the SNPs is also investigated for both measures. It is found that the novel method and the established method select SNP subsets that do not differ greatly. The entropy-based measure may thus have value because it is easier to compute than R2. Increasing the sample size used to estimate haplotype frequencies improves the predictive power of the subset of SNPs selected. A smaller subset of SNPs chosen using a large initial sample to estimate LD can in some instances be more informative than a larger subset chosen based on poor estimates of LD (using a small initial sample). An initial sample size of 50 individuals is sufficient in most situations investigated, which involved selection from a set of 7 SNPs, although to select a larger number of SNPs, a larger initial sample size may be required.  相似文献   

6.
MOTIVATION: Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS: We have developed an efficient computer program, HAPLORE (HAPLOtype REconstruction), to identify all haplotype sets that are compatible with the observed genotypes in a pedigree for tightly linked genetic markers. HAPLORE consists of three steps that can serve different needs in applications. In the first step, a set of logic rules is used to reduce the number of compatible haplotypes of each individual in the pedigree as much as possible. After this step, the haplotypes of all individuals in the pedigree can be completely or partially determined. These logic rules are applicable to completely linked markers and they can be used to impute missing data and check genotyping errors. In the second step, a haplotype-elimination algorithm similar to the genotype-elimination algorithms used in linkage analysis is applied to delete incompatible haplotypes derived from the first step. All superfluous haplotypes of the pedigree members will be excluded after this step. In the third step, the expectation-maximization (EM) algorithm combined with the partition and ligation technique is used to estimate haplotype frequencies based on the inferred haplotype configurations through the first two steps. Only compatible haplotype configurations with haplotypes having frequencies greater than a threshold are retained. RESULTS: We test the effectiveness and the efficiency of HAPLORE using both simulated and real datasets. Our results show that, the rule-based algorithm is very efficient for completely genotyped pedigree. In this case, almost all of the families have one unique haplotype configuration. In the presence of missing data, the number of compatible haplotypes can be substantially reduced by HAPLORE, and the program will provide all possible haplotype configurations of a pedigree under different circumstances, if such multiple configurations exist. These inferred haplotype configurations, as well as the haplotype frequencies estimated by the EM algorithm, can be used in genetic linkage and association studies. AVAILABILITY: The program can be downloaded from http://bioinformatics.med.yale.edu.  相似文献   

7.
Little was known about the sequence variability of the human Arrestin domain-containing 4 gene (ARRDC4). We sequenced its DNA from exon 2 to exon 8 in a sample of 92 Russians. Seven variants were identified; one of them has not been described yet. It causes an amino acid change from Thr to Met. Identified variants were genotyped in the complete sample of 253 unrelated men and women to analyze haplotype distribution. Fifteen haplotypes were inferred. Nine haplotypes had estimated frequencies > 1%. Ninety-five percent of all haplotypes were determined by five haplotype-tagging single nucleotide polymorphisms. Haplotypes form two clades. The two most common haplotypes cover 76% of all haplotypes. The certainty of the haplotype reconstruction does not depend on the haplotype-inferring algorithms, but is a result of the anomalous haplotype distribution of ARRDC4, which makes this gene a suitable candidate gene for haplotype association studies. Interestingly, there is a great evolutionary distance between the two most common haplotypes, which could suggest a more complicated coalescent process with either past gene flow, selections, or bottlenecks.  相似文献   

8.
RFLP haplotypes at the alpha-globin gene complex have been examined in 190 individuals from the Niokolo Mandenka population of Senegal: haplotypes were assigned unambiguously for 210 chromosomes. The Mandenka share with other African populations a sample size-independent haplotype diversity that is much greater than that in any non-African population: the number of haplotypes observed in the Mandenka is typically twice that seen in the non-African populations sampled to date. Of these haplotypes, 17.3% had not been observed in any previous surveys, and a further 19.1% have previously been reported only in African populations. The haplotype distribution shows clear differences between African and non-African peoples, but this is on the basis of population-specific haplotypes combined with haplotypes common to all. The relationship of the newly reported haplotypes to those previously recorded suggests that several mutation processes, particularly recombination as homologous exchange or gene conversion, have been involved in their production. A computer program based on the expectation-maximization (EM) algorithm was used to obtain maximum-likelihood estimates of haplotype frequencies for the entire data set: good concordance between the unambiguous and EM-derived sets was seen for the overall haplotype frequencies. Some of the low-frequency haplotypes reported by the estimation algorithm differ greatly, in structure, from those haplotypes known to be present in human populations, and they may not represent haplotypes actually present in the sample.  相似文献   

9.
Leblois R  Slatkin M 《Molecular ecology》2007,16(11):2237-2245
We consider an isolated population founded by a small number of individuals randomly chosen from a source population of known genetic composition at a known time in the past. We develop a Monte-Carlo maximum-likelihood method for estimating the number of founding individuals from the haplotype frequencies at several SNP (single nucleotide polymorphism) loci in a sample. We assume the isolated population was founded recently enough that that mutation can be ignored and that haplotype frequencies in the source population have not changed. We apply the method to simulated data and show that it is unbiased. With a reasonable number of individuals sampled, it is possible to estimate the number of founders within a factor of 2. We show that the performance of the method is not degraded substantially if the frequencies of the rare haplotypes in the source are not known precisely and if there is some recombination. We illustrate the use of our method by applying it to a previously published data set from a recently founded population of wolves (Canis lupus) in Scandinavia.  相似文献   

10.
Power and sample size calculations are critical parts of any research design for genetic association. We present a method that utilizes haplotype frequency information and average marker-marker linkage disequilibrium on SNPs typed in and around all genes on a chromosome. The test statistic used is the classic likelihood ratio test applied to haplotypes in case/control populations. Haplotype frequencies are computed through specification of genetic model parameters. Power is determined by computation of the test's non-centrality parameter. Power per gene is computed as a weighted average of the power assuming each haplotype is associated with the trait. We apply our method to genotype data from dense SNP maps across three entire chromosomes (6, 21, and 22) for three different human populations (African-American, Caucasian, Chinese), three different models of disease (additive, dominant, and multiplicative) and two trait allele frequencies (rare, common). We perform a regression analysis using these factors, average marker-marker disequilibrium, and the haplotype diversity across the gene region to determine which factors most significantly affect average power for a gene in our data. Also, as a 'proof of principle' calculation, we perform power and sample size calculations for all genes within 100 kb of the PSORS1 locus (chromosome 6) for a previously published association study of psoriasis. Results of our regression analysis indicate that four highly significant factors that determine average power to detect association are: disease model, average marker-marker disequilibrium, haplotype diversity, and the trait allele frequency. These findings may have important implications for the design of well-powered candidate gene association studies. Our power and sample size calculations for the PSORS1 gene appear consistent with published findings, namely that there is substantial power (>0.99) for most genes within 100 kb of the PSORS1 locus at the 0.01 significance level.  相似文献   

11.
Variation in gene expression may give rise to a significant fraction of inter-individual phenotypic variation. Studies searching for the underlying genetic controls for such variation have been conducted in model organisms and humans in recent years. In our previous effort of assessing conserved underlying haplotype patterns across ethnic populations, we constructed common haplotypes using SNPs having conserved linkage disequilibrium (LD) across ethnic populations. These common haplotypes cluster into a simple evolutionary structure based on their frequencies, defining only up to three conserved clusters termed 'haplotype frameworks'. One intriguing preliminary finding was that a significant portion of reported variants strongly associated with cis-regulation tags these globally conserved haplotype frameworks. Here we expand the investigation by collecting genes showing stringently determined cis-association between genotypes and expression phenotypes from major studies. We conducted phylogenetic analysis of current major haplotypes along with the corresponding haplotypes derived from chimpanzee reference sequences. Our analysis reveals that, for the vast majority of such cis-regulatory genes, the tagging SNPs showing the strongest association also tag the haplotype lineages directly separated from ancestry, inferred from either chimpanzee reference sequences or the allele frequency-derived haplotype frameworks, suggesting that the differentially expressed phenotypes were evolved relatively early in human history. Such evolutionary signatures provide keys for a more effective identification of globally-conserved candidate regulatory haplotypes across human genes in future epidemiologic and pharmacogenetic studies.  相似文献   

12.
The general availability of reliable and affordable genotyping technology has enabled genetic association studies to move beyond small case-control studies to large prospective studies. For prospective studies, genetic information can be integrated into the analysis via haplotypes, with focus on their association with a censored survival outcome. We develop non-iterative, regression-based methods to estimate associations between common haplotypes and a censored survival outcome in large cohort studies. Our non-iterative methods--weighted estimation and weighted haplotype combination--are both based on the Cox regression model, but differ in how the imputed haplotypes are integrated into the model. Our approaches enable haplotype imputation to be performed once as a simple data-processing step, and thus avoid implementation based on sophisticated algorithms that iterate between haplotype imputation and risk estimation. We show that non-iterative weighted estimation and weighted haplotype combination provide valid tests for genetic associations and reliable estimates of moderate associations between common haplotypes and a censored survival outcome, and are straightforward to implement in standard statistical software. We apply the methods to an analysis of HSPB7-CLCNKA haplotypes and risk of adverse outcomes in a prospective cohort study of outpatients with chronic heart failure.  相似文献   

13.
Genetic variants in a gene on 6p22.3, dysbindin, have been shown recently to be associated with schizophrenia (Straub et al. 2002a). There is no doubt that replication in other independent samples would enhance the significance of this finding considerably. Since the gene is located in the center of the linkage peak on chromosome 6p that we reported earlier, we decided to test six of the most positive DNA polymorphisms in a sib-pair sample and in an independently ascertained sample of triads comprising 203 families, including the families for which we detected linkage on chromosome 6p. Evidence for association was observed in the two samples separately as well as in the combined sample (P=.00068 for SNP rs760761). Multilocus haplotype analysis increased the significance further to .00002 for a two-locus haplotype and to .00001 for a three-locus haplotype. Estimation of frequencies for six-locus haplotypes revealed one common haplotype with a frequency of 73.4% in transmitted, and only 57.6% in nontransmitted, parental haplotypes. All other six-locus haplotypes occurring at a frequency of >1% were less often transmitted than nontransmitted. Our results represent a first successful replication of linkage disequilibrium in psychiatric genetics detected in a region with previous evidence of linkage and will encourage the search for causes of schizophrenia by the genetic approach.  相似文献   

14.
Y chromosome haplotype analysis in purebred dogs   总被引:3,自引:0,他引:3  
In order to evaluate the genetic structure of purebred dogs, six Y chromosome microsatellite markers were used to analyze DNA samples from 824 unrelated dogs from 50 recognized breeds. A relatively small number of haplotypes (67) were identified in this large sample set due to extensive sharing of haplotypes between breeds and low haplotype diversity within breeds. Fifteen breeds were characterized by a single Y chromosome haplotype. Breed-specific haplotypes were identified for 26 of the 50 breeds, and haplotype sharing between some breeds indicated a common history. A molecular variance analysis (AMOVA) demonstrated significant genetic variation across breeds (63.7%) and with geographic origin of the breeds (11.5%). A network analysis of the haplotypes revealed further relationships between the breeds as well as deep rooting of many of the breed-specific haplotypes, particularly among breeds of African origin.Michael J. Bannasch and Jeanne R. Ryun contributed equally to this work.  相似文献   

15.

Background  

Increasingly researchers are turning to the use of haplotype analysis as a tool in population studies, the investigation of linkage disequilibrium, and candidate gene analysis. When the phase of the data is unknown, computational methods, in particular those employing the Expectation-Maximisation (EM) algorithm, are frequently used for estimating the phase and frequency of the underlying haplotypes. These methods have proved very successful, predicting the phase-known frequencies from data for which the phase is unknown with a high degree of accuracy. Recently there has been much speculation as to the effect of unknown, or missing allelic data – a common phenomenon even with modern automated DNA analysis techniques – on the performance of EM-based methods. To this end an EM-based program, modified to accommodate missing data, has been developed, incorporating non-parametric bootstrapping for the calculation of accurate confidence intervals.  相似文献   

16.

Background

Evidence regarding the association of variation within ADRB2, the gene encoding the beta-adrenergic receptor 2 (ADRB2) with obesity and hypertension is exceedingly ambiguous. Despite negative reports, functional impacts of individual genetic variants have been reported. Also, functional haplotypes as well as haplotype combinations affecting expression levels in vivo of ADRB2 mRNA and protein as well as receptor sensitivity have been reported. The aim of the present study was therefore to evaluate if variations within ADRB2 as haplotypes or as haplotype combinations confer an increased prevalence of obesity and hypertension among adults.

Methodology/Principal Findings

We genotyped five variants required to capture common variation in a region including the ADRB2 locus in a population-based study of 6,514 unrelated, middle-aged Danes. Phases of the genotypes were estimated in silico. The variations were then investigated for their combined association with obesity, hypertension and related quantitative traits. The present study did not find consistent evidence for an association of ADRB2 variants with either obesity or hypertension when variations were analysed in a case-control study. The same lack of impact was also seen in the quantitative trait analyses, apart from nominal differences on waist-to-hip ratio and systolic blood pressure between specific haplotype combinations.

Conclusions/Significance

In a population-based sample of 6,514 Danes we found no consistent associations between five common variants which tag the ADRB2 locus and prevalence of obesity or hypertension neither when analysed as individual haplotypes nor as haplotype pairs.  相似文献   

17.

Background

Current methods for haplotype inference without pedigree information assume random mating populations. In animal and plant breeding, however, mating is often not random. A particular form of nonrandom mating occurs when parental individuals of opposite sex originate from distinct populations. In animal breeding this is called crossbreeding and hybridization in plant breeding. In these situations, association between marker and putative gene alleles might differ between the founding populations and origin of alleles should be accounted for in studies which estimate breeding values with marker data. The sequence of alleles from one parent constitutes one haplotype of an individual. Haplotypes thus reveal allele origin in data of crossbred individuals.

Results

We introduce a new method for haplotype inference without pedigree that allows nonrandom mating and that can use genotype data of the parental populations and of a crossbred population. The aim of the method is to estimate line origin of alleles. The method has a Bayesian set up with a Dirichlet Process as prior for the haplotypes in the two parental populations. The basic idea is that only a subset of the complete set of possible haplotypes is present in the population.

Conclusion

Line origin of approximately 95% of the alleles at heterozygous sites was assessed correctly in both simulated and real data. Comparing accuracy of haplotype frequencies inferred with the new algorithm to the accuracy of haplotype frequencies inferred with PHASE, an existing algorithm for haplotype inference, showed that the DP algorithm outperformed PHASE in situations of crossbreeding and that PHASE performed better in situations of random mating.  相似文献   

18.
MOTIVATION: With the availability of large-scale, high-density single-nucleotide polymorphism markers and information on haplotype structures and frequencies, a great challenge is how to take advantage of haplotype information in the association mapping of complex diseases in case-control studies. RESULTS: We present a novel approach for association mapping based on directly mining haplotypes (i.e. phased genotype pairs) produced from case-control data or case-parent data via a density-based clustering algorithm, which can be applied to whole-genome screens as well as candidate-gene studies in small genomic regions. The method directly explores the sharing of haplotype segments in affected individuals that are rarely present in normal individuals. The measure of sharing between two haplotypes is defined by a new similarity metric that combines the length of the shared segments and the number of common alleles around any marker position of the haplotypes, which is robust against recent mutations/genotype errors and recombination events. The effectiveness of the approach is demonstrated by using both simulated datasets and real datasets. The results show that the algorithm is accurate for different population models and for different disease models, even for genes with small effects, and it outperforms some recently developed methods.  相似文献   

19.
Bayesian spatial modeling of haplotype associations   总被引:9,自引:0,他引:9  
We review methods for relating the risk of disease to a collection of single nucleotide polymorphisms (SNPs) within a small region. Association studies using case-control designs with unrelated individuals could be used either to test for a direct effect of a candidate gene and characterize the responsible variant(s), or to fine map an unknown gene by exploiting the pattern of linkage disequilibrium (LD). We consider a flexible class of logistic penetrance models based on haplotypes and compare them with an alternative formulation based on unphased multilocus genotypes. The likelihood for haplotype-based models requires summation over all possible haplotype assignments consistent with the observed genotype data, and can be fitted using either Expectation-Maximization (E-M) or Markov chain Monte Carlo (MCMC) methods. Subtleties involving ascertainment correction for case-control studies are discussed. There has been great interest in methods for LD mapping based on the coalescent or ancestral recombination graphs as well as methods based on haplotype sharing, both of which we review briefly. Because of their computational complexity, we propose some alternative empirical modeling approaches using techniques borrowed from the Bayesian spatial statistics literature. Here, space is interpreted in terms of a distance metric describing the similarity of any pair of haplotypes to each other, and hence their presumed common ancestry. Specifically, we discuss the conditional autoregressive model and two spatial clustering models: Potts and Voronoi. We conclude with a discussion of the implications of these methods for modeling cryptic relatedness, haplotype blocks, and haplotype tagging SNPs, and suggest a Bayesian framework for the HapMap project.  相似文献   

20.
Association-based linkage disequilibrium (LD) mapping is an increasingly important tool for localizing genes that show potential influence on human aging and longevity. As haplotypes contain more LD information than single markers, a haplotype-based LD approach can have increased power in detecting associations as well as increased robustness in statistical testing. In this paper, we develop a new statistical model to estimate haplotype relative risks (HRRs) on human survival using unphased multilocus genotype data from unrelated individuals in cross-sectional studies. Based on the proportional hazard assumption, the model can estimate haplotype risk and frequency parameters, incorporate observed covariates, assess interactions between haplotypes and the covariates, and investigate the modes of gene function. By introducing population survival information available from population statistics, we are able to develop a procedure that carries out the parameter estimation using a nonparametric baseline hazard function and estimates sex-specific HRRs to infer gene-sex interaction. We also evaluate the haplotype effects on human survival while taking into account individual heterogeneity in the unobserved genetic and nongenetic factors or frailty by introducing the gamma-distributed frailty into the survival function. After model validation by computer simulation, we apply our method to an empirical data set to measure haplotype effects on human survival and to estimate haplotype frequencies at birth and over the observed ages. Results from both simulation and model application indicate that our survival analysis model is an efficient method for inferring haplotype effects on human survival in population-based association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号