首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Design and analysis methods are presented for studying the association of a candidate gene with a disease by using parental data in place of nonrelated controls. This alternative design eliminates spurious differences in allele frequencies between cases and nonrelated controls resulting from different ethnic origins and population stratification for these two groups. We present analysis methods which are based on two genetic relative risks: (1) the relative risk of disease for homozygotes with two copies of the candidate gene versus homozygotes without the candidate gene and (2) the relative risk for heterozygotes with one copy of the candidate gene versus homozygotes without the candidate gene. In addition to estimating the magnitude of these relative risks, likelihood methods allow specific hypotheses to be tested, namely, a test for overall association of the candidate gene with disease, as well as specific genetic hypotheses, such as dominant or recessive inheritance. Two likelihood methods are presented: (1) a likelihood method appropriate when Hardy-Weinberg equilibrium holds and (2) a likelihood method in which we condition on parental genotype data when Hardy-Weinberg equilibrium does not hold. The results for the relative efficiency of these two methods suggest that the conditional approach may at times be preferable, even when equilibrium holds. Sample-size and power calculations are presented for a multitiered design. The purpose of tier 1 is to detect the presence of an abnormal sequence for a postulated candidate gene among a small group of cases. The purpose of tier 2 is to test for association of the abnormal variant with disease, such as by the likelihood methods presented. The purpose of tier 3 is to confirm positive results from tier 2. Results indicate that required sample sizes are smaller when expression of disease is recessive, rather than dominant, and that, for recessive disease and large relative risks, necessary sample sizes may be feasible, even if only a small percentage of the disease can be attributed to the candidate gene.  相似文献   

2.
Selecting a control group that is perfectly matched for ethnic ancestry with a group of affected individuals is a major problem in studying the association of a candidate gene with a disease. This problem can be avoided by a design that uses parental data in place of nonrelated controls. Schaid and Sommer presented two new methods for the statistical analysis using this approach: (1) a likelihood method (Hardy-Weinberg equilibrium [HWE] method), which rests on the assumption that HWE holds, and (2) a conditional likelihood method (conditional on parental genotype [CPG] method) appropriate when HWE is absent. Schaid and Sommer claimed that the CPG method can be more efficient than the HWE method, even when equilibrium holds. It can be shown, however that in the equilibrium situation the HWE method is always more efficient than the CPG method. For a dominant disease, the differences are slim. But for a recessive disease, the CPG method requires a much larger sample size to achieve a prescribed power than the HWE method. Additionally, we show how the relative risks for the various candidate-gene genotypes can be estimated without relying on iterative methods. For the CPG method, we represent an asymptotic power approximation that is sufficiently precise for planning the sample size of an association study.  相似文献   

3.
The genetic basis of many common human diseases is expected to be highly heterogeneous, with multiple causative loci and multiple alleles at some of the causative loci. Analyzing the association of disease with one genetic marker at a time can have weak power, because of relatively small genetic effects and the need to correct for multiple testing. Testing the simultaneous effects of multiple markers by multivariate statistics might improve power, but they too will not be very powerful when there are many markers, because of the many degrees of freedom. To overcome some of the limitations of current statistical methods for case-control studies of candidate genes, we develop a new class of nonparametric statistics that can simultaneously test the association of multiple markers with disease, with only a single degree of freedom. Our approach, which is based on U-statistics, first measures a score over all markers for pairs of subjects and then compares the averages of these scores between cases and controls. Genetic scoring for a pair of subjects is measured by a "kernel" function, which we allow to be fairly general. However, we provide guidelines on how to choose a kernel for different types of genetic effects. Our global statistic has the advantage of having only one degree of freedom and achieves its greatest power advantage when the contrasts of average genotype scores between cases and controls are in the same direction across multiple markers. Simulations illustrate that our proposed methods have the anticipated type I-error rate and that they can be more powerful than standard methods. Application of our methods to a study of candidate genes for prostate cancer illustrates their potential merits, and offers guidelines for interpretation.  相似文献   

4.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

5.
Han F  Pan W 《Biometrics》2012,68(1):307-315
Many statistical tests have been proposed for case-control data to detect disease association with multiple single nucleotide polymorphisms (SNPs) in linkage disequilibrium. The main reason for the existence of so many tests is that each test aims to detect one or two aspects of many possible distributional differences between cases and controls, largely due to the lack of a general and yet simple model for discrete genotype data. Here we propose a latent variable model to represent SNP data: the observed SNP data are assumed to be obtained by discretizing a latent multivariate Gaussian variate. Because the latent variate is multivariate Gaussian, its distribution is completely characterized by its mean vector and covariance matrix, in contrast to much more complex forms of a general distribution for discrete multivariate SNP data. We propose a composite likelihood approach for parameter estimation. A direct application of this latent variable model is to association testing with multiple SNPs in a candidate gene or region. In contrast to many existing tests that aim to detect only one or two aspects of many possible distributional differences of discrete SNP data, we can exclusively focus on testing the mean and covariance parameters of the latent Gaussian distributions for cases and controls. Our simulation results demonstrate potential power gains of the proposed approach over some existing methods.  相似文献   

6.
A significant proportion of the variation between individuals in gene expression levels is genetic, and it is likely that these differences correlate with phenotypic differences or with risk of disease. Cis-acting polymorphisms are important in determining interindividual differences in gene expression that lead to allelic expression imbalance, which is the unequal expression of homologous alleles in individuals heterozygous for such a polymorphism. This expression imbalance can be detected using a transcribed polymorphism, and, once it is established, the next step is to identify the polymorphisms that are responsible for or predictive of allelic expression levels. We present an expectation-maximization algorithm for such analyses, providing a formal statistical framework to test whether a candidate polymorphism is associated with allelic expression differences.  相似文献   

7.
A key step toward the discovery of a gene related to a trait is the finding of an association between the trait and one or more haplotypes. Haplotype analyses can also provide critical information regarding the function of a gene; however, when unrelated subjects are sampled, haplotypes are often ambiguous because of unknown linkage phase of the measured sites along a chromosome. A popular method of accounting for this ambiguity in case-control studies uses a likelihood that depends on haplotype frequencies, so that the haplotype frequencies can be compared between the cases and controls; however, this traditional method is limited to a binary trait (case vs. control), and it does not provide a method of testing the statistical significance of specific haplotypes. To address these limitations, we developed new methods of testing the statistical association between haplotypes and a wide variety of traits, including binary, ordinal, and quantitative traits. Our methods allow adjustment for nongenetic covariates, which may be critical when analyzing genetically complex traits. Furthermore, our methods provide several different global tests for association, as well as haplotype-specific tests, which give a meaningful advantage in attempts to understand the roles of many different haplotypes. The statistics can be computed rapidly, making it feasible to evaluate the associations between many haplotypes and a trait. To illustrate the use of our new methods, they are applied to a study of the association of haplotypes (composed of genes from the human-leukocyte-antigen complex) with humoral immune response to measles vaccination. Limited simulations are also presented to demonstrate the validity of our methods, as well as to provide guidelines on how our methods could be used.  相似文献   

8.
Traditional case-control studies provide a powerful and efficient method for evaluation of association between candidate genes and disease. The sampling of cases from multiplex pedigrees, rather than from a catchment area, can increase the likelihood that genetic cases are selected. However, use of all the related cases without accounting for their biological relationship can increase the type I error rate of the statistical test. To overcome this problem, we present an analysis method that is used to compare genotype frequencies between cases and controls, according to a trend in proportions as the dosage of the risk allele increases. This method uses the appropriate variance to account for the correlated family data, thus maintaining the correct type I error rate. The magnitude of the association is estimated by the odds ratio, with the variance of the odds ratio also accounting for the correlated data. Our method makes efficient use of data collected from multiplex families and should prove useful for the analysis of candidate genes among families sampled for linkage studies. An application of our method, to family data from a prostate cancer study, is presented to illustrate the method's utility.  相似文献   

9.
Complex diseases are multifactorial in nature and can involve multiple loci with gene x gene and gene x environment interactions. Research on methods to uncover the interactions between those genes that confer susceptibility to disease has been extensive, but many of these methods have only been developed for sibling pairs or sibships. In this report, we assess the performance of two methods for finding gene x gene interactions that are applicable to arbitrarily sized pedigrees, one based on correlation in per-family nonparametric linkage scores and another that incorporates candidate loci genotypes as covariates into an affected relative pair linkage analysis. The power and type I error rate of both of these methods was addressed using the simulated Genetic Analysis Workshop 14 data. In general, we found detection of the interacting loci to be a difficult problem, and though we experienced some modest success there is a clear need to continue developing new methods and approaches to the problem.  相似文献   

10.
We studied HLA DQB1 allele frequencies and the relative risk (RR) of various genotypes in 72 type 1 diabetic patients and 40 control individuals in Uruguay. This is a tri-racial (Caucasian, Black and Indo-American) mixed population. The products of the polymerase chain reaction amplifications were hybridized with oligonucleotides by allele-specific oligonucleotide reverse or dot blot methods. Significant differences between these two groups were observed only for allele DQB1*0302 (35%, RR = 7.34, P<0.001). The frequency of the alleles carrying a non-aspartic acid residue at position 57 was significantly higher in the diabetic patients (85 vs 53%, P<0.001). In contrast, the frequency of Asp alleles was negatively associated with type 1 diabetes (RR = 0.20, P<0.001). The genotype DQB1*0302/DQB1*0201 (33%, RR = 5.41, P<0.05) was positively associated with this disease. The genotype frequencies associated with type 1 diabetes in our population were significantly different from what is known for Caucasian and Black populations as well as compared with another admixed population, from Chile.  相似文献   

11.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

12.
One major problem in studying an association between a marker locus and a disease is the selection of an appropriate group of controls. However, this problem of population stratification can be circumvented in a quite elegant manner by family-based methods. The haplotype-relative-risk (HRR) method, which samples nuclear families with a single affected child and uses the parental haplotypes not transmitted to that child as a control individual, represents such a method for estimating the relative risk of a marker phenotype. In the special case of a recessive disease, it was already known that the equivalence of the HRR method with the classical relative risk (RR) obtained from independent samples holds only if the probability theta of a recombination between marker and disease locus is zero. We extend this result to an arbitrary mode of inheritance. Furthermore, we compare the distribution of the estimators for HRR and RR and show that, in the case of a positive linkage disequilibrium between a marker and disease allele, the distribution of the estimator for HRR is (stochastically) smaller than that for RR, irrespective of the recombination fraction. The practical implication of this result is that, for the HRR method, there is no tendency to give unduly high risk estimators, even for theta > 0. Finally, we give an expression for the standard error of the estimator for HRR by taking into account the nonindependence of transmitted and nontransmitted parental marker alleles in the case of theta > 0.  相似文献   

13.
14.
Once genetic linkage has been identified for a complex disease, the next step is often association analysis, in which single-nucleotide polymorphisms (SNPs) within the linkage region are genotyped and tested for association with the disease. If a SNP shows evidence of association, it is useful to know whether the linkage result can be explained, in part or in full, by the candidate SNP. We propose a novel approach that quantifies the degree of linkage disequilibrium (LD) between the candidate SNP and the putative disease locus through joint modeling of linkage and association. We describe a simple likelihood of the marker data conditional on the trait data for a sample of affected sib pairs, with disease penetrances and disease-SNP haplotype frequencies as parameters. We estimate model parameters by maximum likelihood and propose two likelihood-ratio tests to characterize the relationship of the candidate SNP and the disease locus. The first test assesses whether the candidate SNP and the disease locus are in linkage equilibrium so that the SNP plays no causal role in the linkage signal. The second test assesses whether the candidate SNP and the disease locus are in complete LD so that the SNP or a marker in complete LD with it may account fully for the linkage signal. Our method also yields a genetic model that includes parameter estimates for disease-SNP haplotype frequencies and the degree of disease-SNP LD. Our method provides a new tool for detecting linkage and association and can be extended to study designs that include unaffected family members.  相似文献   

15.
Where recent admixture has occurred between two populations that have different disease rates for genetic reasons, family-based association studies can be used to map the genes underlying these differences, if the ancestry of the alleles at each locus examined can be assigned to one of the two founding populations. This article explores the statistical power and design requirements of this approach. Markers suitable for assigning the ancestry of genomic regions could be defined by grouping alleles at closely spaced microsatellite loci into haplotypes, or generated by representational difference analysis. For a given relative risk between populations, the sample size required to detect a disease locus that accounts for this relative risk by linkage-disequilibrium mapping in an admixed population is not critically dependent on assumptions about genotype penetrances or allele frequencies. Using the transmission-disequilibrium test to search the genome for a locus that accounts for a relative risk of between 2 and 3 in a high-risk population, compared with a low-risk population, generally requires between 150 and 800 case-parent pairs of mixed descent. The optimal strategy is to conduct an initial study using markers spaced at < or = 10 cM with cases from the second and third generations of mixed descent, and then to map the disease loci more accurately in a subsequent study of a population with a longer history of admixture. This approach has greater statistical power than allele-sharing designs and has obvious applications to the genetics of hypertension, non-insulin-dependent diabetes, and obesity.  相似文献   

16.
Case-control studies are used to map loci associated with a genetic disease. The usual case-control study tests for significant differences in frequencies of alleles at marker loci. In this paper, we consider the problem of comparing two or more marker loci simultaneously and testing for significant differences in haplotype rather than allele frequencies. We consider two situations. In the first, genotypes at marker loci are resolved into haplotypes by making use of biochemical methods or by genotyping family members. In the second, genotypes at marker loci are not resolved into haplotypes, but, by assuming random mating, haplotypes can be inferred using a likelihood method such as the expectation-maximization (EM) algorithm. We assume that a causative locus has two alleles with a multiplicative effect on the penetrance of a disease, with one allele increasing the penetrance by a factor pi. We find, for small values of pi-1 and large sample sizes, asymptotic results that predict the statistical power of a test for significant differences in haplotype frequencies between cases and a random sample of the population, both when haplotypes can be resolved and when haplotypes have to be inferred. The increase in power when haplotypes can be resolved can be expressed as a ratio R, which is the increase in sample size needed to achieve the same power when haplotypes are resolved over when they are not resolved. In general, R depends on the pattern of linkage disequilibrium between the causative allele and the marker haplotypes but is independent of the frequency of the causative allele and, to a first approximation, is independent of pi. For the special situation of two di-allelic marker loci, we obtain a simple expression for R and its upper bound.  相似文献   

17.
 We describe a computer program, Epistat, which combines statistical methods and color-graphic displays to facilitate the analysis of interactions between pairs of quantitative trait loci (QTLs). Epistat organizes genetic-mapping data and quantitative-trait values into graphic displays which illustrate the individual effects of single loci as well as the interactions between any two loci. Keyboard commands allow the user to search the data set for individual QTLs and to test for interactions between QTLs. For a given trait, the program displays the effects of the alleles at each of two loci on the quantitative-trait value, as well as the effects of the interactions between these alleles. Loglikelihood ratios are used to compare the likelihood of explaining the effects by null, additive, or epistatic models. Examples of interactions in soybean are presented for near-infrared transmittance (NIT), seed number, and reproductive period. Epistat has been used to find numerous interactions between QTLs in soybean in which trait variation at one locus is conditional upon a specific allele at another. Received: 16 January 1996 / Accepted: 27 September 1996  相似文献   

18.
The objective of the study was to identify interacting genes contributing to rheumatoid arthritis (RA) susceptibility and identify SNPs that discriminate between RA patients who were anti-cyclic citrullinated protein positive and healthy controls. We analyzed two independent cohorts from the North American Rheumatoid Arthritis Consortium. A cohort of 908 RA cases and 1,260 controls was used to discover pairwise interactions among SNPs and to identify a set of single nucleotide polymorphisms (SNPs) that predict RA status, and a second cohort of 952 cases and 1,760 controls was used to validate the findings. After adjusting for HLA-shared epitope alleles, we identified and replicated seven SNP pairs within the HLA class II locus with significant interaction effects. We failed to replicate significant pairwise interactions among non-HLA SNPs. The machine learning approach “random forest” applied to a set of SNPs selected from single-SNP and pairwise interaction tests identified 93 SNPs that distinguish RA cases from controls with 70% accuracy. HLA SNPs provide the most classification information, and inclusion of non-HLA SNPs improved classification. While specific gene–gene interactions are difficult to validate using genome-wide SNP data, a stepwise approach combining association and classification methods identifies candidate interacting SNPs that distinguish RA cases from healthy controls.  相似文献   

19.
Approximate Bayesian computation in population genetics   总被引:23,自引:0,他引:23  
Beaumont MA  Zhang W  Balding DJ 《Genetics》2002,162(4):2025-2035
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.  相似文献   

20.
The genetic mapping of complex traits has been challenging and has required new statistical methods that are robust to misspecified models. Liang et al. proposed a robust multipoint method that can be used to simultaneously estimate, on the basis of sib-pair linkage data, both the position of a trait locus on a chromosome and its effect on disease status. The advantage of their method is that it does not require specification of an underlying genetic model, so estimation of the position of a trait locus on a specified chromosome and of its standard error is robust to a wide variety of genetic mechanisms. If multiple loci influence the trait, the method models the marginal effect of a locus on a specified chromosome. The main critical assumption is that there is only one trait locus on the chromosome of interest. We extend this method to different types of affected relative pairs (ARPs) by two approaches. One approach is to estimate the position of a trait locus yet allow unconstrained trait-locus effects across different types of ARPs. This robust approach allows for differences in sharing alleles identical-by-descent across different types of ARPs. Some examples for which an unconstrained model would apply are differences due to secular changes in diagnostic methods that can change the frequency of phenocopies among different types of relative pairs, environmental factors that modify the genetic effect, epistasis, and variation in marker-information content. However, this unconstrained model requires a parameter for each type of relative pair. To reduce the number of parameters, we propose a second approach that models the marginal effect of a susceptibility locus. This constrained model is robust for a trait caused by either a single locus or by multiple loci without epistasis. To evaluate the adequacy of the constrained model, we developed a robust score statistic. These methods are applied to a prostate cancer-linkage study, which emphasizes their potential advantages and limitations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号