首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Missing data occur in genetic association studies for several reasons including missing family members and uncertain haplotype phase. Maximum likelihood is a commonly used approach to accommodate missing data, but it can be difficult to apply to family-based association studies, because of possible loss of robustness to confounding by population stratification. Here a novel likelihood for nuclear families is proposed, in which distinct sets of association parameters are used to model the parental genotypes and the offspring genotypes. This approach is robust to population structure when the data are complete, and has only minor loss of robustness when there are missing data. It also allows a novel conditioning step that gives valid analysis for multiple offspring in the presence of linkage. Unrelated subjects are included by regarding them as the children of two missing parents. Simulations and theory indicate similar operating characteristics to TRANSMIT, but with no bias with missing data in the presence of linkage. In comparison with FBAT and PCPH, the proposed model is slightly less robust to population structure but has greater power to detect strong effects. In comparison to APL and MITDT, the model is more robust to stratification and can accommodate sibships of any size. The methods are implemented for binary and continuous traits in software, UNPHASED, available from the author.  相似文献   

2.
Tests for linkage and association in nuclear families.   总被引:12,自引:4,他引:8       下载免费PDF全文
The transmission/disequilibrium test (TDT) originally was introduced to test for linkage between a genetic marker and a disease-susceptibility locus, in the presence of association. Recently, the TDT has been used to test for association in the presence of linkage. The motivation for this is that linkage analysis typically identifies large candidate regions, and further refinement is necessary before a search for the disease gene is begun, on the molecular level. Evidence of association and linkage may indicate which markers in the region are closest to a disease locus. As a test of linkage, transmissions from heterozygous parents to all of their affected children can be included in the TDT; however, the TDT is a valid chi2 test of association only if transmissions to unrelated affected children are used in the analysis. If the sample contains independent nuclear families with multiple affected children, then one procedure that has been used to test for association is to select randomly a single affected child from each sibship and to apply the TDT to those data. As an alternative, we propose two statistics that use data from all of the affected children. The statistics give valid chi2 tests of the null hypothesis of no association or no linkage and generally are more powerful than the TDT with a single, randomly chosen, affected child from each family.  相似文献   

3.
One major problem in studying an association between a marker locus and a disease is the selection of an appropriate group of controls. However, this problem of population stratification can be circumvented in a quite elegant manner by family-based methods. The haplotype-relative-risk (HRR) method, which samples nuclear families with a single affected child and uses the parental haplotypes not transmitted to that child as a control individual, represents such a method for estimating the relative risk of a marker phenotype. In the special case of a recessive disease, it was already known that the equivalence of the HRR method with the classical relative risk (RR) obtained from independent samples holds only if the probability theta of a recombination between marker and disease locus is zero. We extend this result to an arbitrary mode of inheritance. Furthermore, we compare the distribution of the estimators for HRR and RR and show that, in the case of a positive linkage disequilibrium between a marker and disease allele, the distribution of the estimator for HRR is (stochastically) smaller than that for RR, irrespective of the recombination fraction. The practical implication of this result is that, for the HRR method, there is no tendency to give unduly high risk estimators, even for theta > 0. Finally, we give an expression for the standard error of the estimator for HRR by taking into account the nonindependence of transmitted and nontransmitted parental marker alleles in the case of theta > 0.  相似文献   

4.
High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.  相似文献   

5.
Two-stage designs in case-control association analysis   总被引:1,自引:0,他引:1       下载免费PDF全文
Zuo Y  Zou G  Zhao H 《Genetics》2006,173(3):1747-1760
DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.  相似文献   

6.
The transmission disequilibrium test (TDT) has been utilized to test the linkage and association between a genetic trait locus and a marker. Spielman et al. (1993) introduced TDT to test linkage between a qualitative trait and a marker in the presence of association. In the presence of linkage, TDT can be applied to test for association for fine mapping (Martin et al., 1997; Spielman and Ewens, 1996). In recent years, extensive research has been carried out on the TDT between a quantitative trait and a marker locus (Allison, 1997; Fan et al., 2002; George et al., 1999; Rabinowitz, 1997; Xiong et al., 1998; Zhu and Elston, 2000, 2001). The original TDT for both qualitative and quantitative traits requires unrelated offspring of heterozygous parents for analysis, and much research has been carried out to extend it to fit for different settings. For nuclear families with multiple offspring, one approach is to treat each child independently for analysis. Obviously, this may not be a valid method since offspring of one family are related to each other. Another approach is to select one offspring randomly from each family for analysis. However, with this method much information may be lost. Martin et al. (1997, 2000) constructed useful statistical tests to analyse the data for qualitative traits. In this paper, we propose to use mixed models to analyse sample data of nuclear families with multiple offspring for quantitative traits according to the models in Amos (1994). The method uses data of all offspring by taking into account their trait mean and variance-covariance structures, which contain all the effects of major gene locus, polygenic loci and environment. A test statistic based on mixed models is shown to be more powerful than the test statistic proposed by George et al. (1999) under moderate disequilibrium for nuclear families. Moreover, it has higher power than the TDT statistic which is constructed by randomly choosing a single offspring from each nuclear family.  相似文献   

7.
8.
Three lectures on case-control genetic association analysis   总被引:1,自引:0,他引:1  
The purpose of this review is to focus on the three most important themes in genetic association studies using randomly selected patients (case, affected) and normal samples (control, unaffected), so that students and researchers alike who are new to this field may quickly grasp the key issues and command basic analysis methods. These three themes are: elementary categorical analysis; disease mutation as an unobserved entity; and the importance of homogeneity in genetic association analysis.  相似文献   

9.
The Cochran-Armitage trend test (CATT) is well suited for testing association between a marker and a disease in case-control studies. When the underlying genetic model for the disease is known, the CATT optimal for the genetic model is used. For complex diseases, however, the genetic models of the true disease loci are unknown. In this situation, robust tests are preferable. We propose a two-phase analysis with model selection for the case-control design. In the first phase, we use the difference of Hardy-Weinberg disequilibrium coefficients between the cases and the controls for model selection. Then, an optimal CATT corresponding to the selected model is used for testing association. The correlation of the statistics used for selection and the test for association is derived to adjust the two-phase analysis with control of the Type-I error rate. The simulation studies show that this new approach has greater efficiency robustness than the existing methods.  相似文献   

10.
Chen J  Rodriguez C 《Biometrics》2007,63(4):1099-1107
Genetic epidemiologists routinely assess disease susceptibility in relation to haplotypes, that is, combinations of alleles on a single chromosome. We study statistical methods for inferring haplotype-related disease risk using single nucleotide polymorphism (SNP) genotype data from matched case-control studies, where controls are individually matched to cases on some selected factors. Assuming a logistic regression model for haplotype-disease association, we propose two conditional likelihood approaches that address the issue that haplotypes cannot be inferred with certainty from SNP genotype data (phase ambiguity). One approach is based on the likelihood of disease status conditioned on the total number of cases, genotypes, and other covariates within each matching stratum, and the other is based on the joint likelihood of disease status and genotypes conditioned only on the total number of cases and other covariates. The joint-likelihood approach is generally more efficient, particularly for assessing haplotype-environment interactions. Simulation studies demonstrated that the first approach was more robust to model assumptions on the diplotype distribution conditioned on environmental risk variables and matching factors in the control population. We applied the two methods to analyze a matched case-control study of prostate cancer.  相似文献   

11.
Ott J 《Human heredity》2004,58(3-4):171-174
Several sources of errors are discussed. While genotyping errors have little effect on power in case-control association studies, they tend to strongly increase false positive results in TDT type tests unless occurrence of errors is allowed for in the analysis (e.g., TDTae test). Disregarding non-genetic risk factors is shown to lead to a form of hidden heterogeneity, which can strongly reduce power. Stratification of data into more homogeneous subgroups is advocated as a simple solution to allowing for non-genetic risk factors such as socio-economic status and food preferences.  相似文献   

12.
Case-control association studies are widely used in the search for genetic variants that contribute to human diseases. It has long been known that such studies may suffer from high rates of false positives if there is unrecognized population structure. It is perhaps less widely appreciated that so-called “cryptic relatedness” (i.e., kinship among the cases or controls that is not known to the investigator) might also potentially inflate the false positive rate. Until now there has been little work to assess how serious this problem is likely to be in practice. In this paper, we develop a formal model of cryptic relatedness, and study its impact on association studies. We provide simple expressions that predict the extent of confounding due to cryptic relatedness. Surprisingly, these expressions are functions of directly observable parameters. Our analytical results show that, for well-designed studies in outbred populations, the degree of confounding due to cryptic relatedness will usually be negligible. However, in contrast, studies where there is a sampling bias toward collecting relatives may indeed suffer from excessive rates of false positives. Furthermore, cryptic relatedness may be a serious concern in founder populations that have grown rapidly and recently from a small size. As an example, we analyze the impact of excess relatedness among cases for six phenotypes measured in the Hutterite population.  相似文献   

13.
Summary A new method is given to test for phenotypic association using related individuals in pedigree analysis. It is also shown how an extension of this method allows analyses of genetic linkage in the presence of epistatic associations. A published pedigree with strong evidence for linkage between Lp and ESD is reanalyzed, resulting in a considerable drop of the lod score for linkage.Dr. Falk is supported by a grant from the National Institutes of Health (GM 29177)  相似文献   

14.
Zhang H  Wang X  Ye Y 《Genetics》2006,172(1):693-699
There is growing interest in genomewide association analysis using single-nucleotide polymorphisms (SNPs), because traditional linkage studies are not as powerful in identifying genes for common, complex diseases. Tests for linkage disequilibrium have been developed for binary and quantitative traits. However, since many human conditions and diseases are measured in an ordinal scale, methods need to be developed to investigate the association of genes and ordinal traits. Thus, in the current report we propose and derive a score test statistic that identifies genes that are associated with ordinal traits when gametic disequilibrium between a marker and trait loci exists. Through simulation, the performance of this new test is examined for both ordinal traits and quantitative traits. The proposed statistic not only accommodates and is more powerful for ordinal traits, but also has similar power to that of existing tests when the trait is quantitative. Therefore, our proposed statistic has the potential to serve as a unified approach to identifying genes that are associated with any trait, regardless of how the trait is measured. We further demonstrated the advantage of our test by revealing a significant association (P = 0.00067) between alcohol dependence and a SNP in the growth-associated protein 43.  相似文献   

15.
Zheng G  Song K  Elston RC 《Human heredity》2007,63(3-4):175-186
We study a two-stage analysis of genetic association for case-control studies. In the first stage, we compare Hardy-Weinberg disequilibrium coefficients between cases and controls and, in the second stage, we apply the Cochran- Armitage trend test. The two analyses are statistically independent when Hardy-Weinberg equilibrium holds in the population, so all the samples are used in both stages. The significance level in the first stage is adaptively determined based on its conditional power. Given the level in the first stage, the level for the second stage analysis is determined with the overall Type I error being asymptotically controlled. For finite sample sizes, a parametric bootstrap method is used to control the overall Type I error rate. This two-stage analysis is often more powerful than the Cochran-Armitage trend test alone for a large association study. The new approach is applied to SNPs from a real study.  相似文献   

16.
17.
Chen J  Chatterjee N 《Biometrics》2006,62(1):28-35
Genetic epidemiologic studies often collect genotype data at multiple loci within a genomic region of interest from a sample of unrelated individuals. One popular method for analyzing such data is to assess whether haplotypes, i.e., the arrangements of alleles along individual chromosomes, are associated with the disease phenotype or not. For many study subjects, however, the exact haplotype configuration on the pair of homologous chromosomes cannot be derived with certainty from the available locus-specific genotype data (phase ambiguity). In this article, we consider estimating haplotype-specific association parameters in the Cox proportional hazards model, using genotype, environmental exposure, and the disease endpoint data collected from cohort or nested case-control studies. We study alternative Expectation-Maximization algorithms for estimating haplotype frequencies from cohort and nested case-control studies. Based on a hazard function of the disease derived from the observed genotype data, we then propose a semiparametric method for joint estimation of relative-risk parameters and the cumulative baseline hazard function. The method is greatly simplified under a rare disease assumption, for which an asymptotic variance estimator is also proposed. The performance of the proposed estimators is assessed via simulation studies. An application of the proposed method is presented, using data from the Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study.  相似文献   

18.
Common heritable diseases ("complex traits") are assumed to be due to multiple underlying susceptibility genes. While genetic mapping methods for Mendelian disorders have been very successful, the search for genes underlying complex traits has been difficult and often disappointing. One of the reasons may be that most current gene-mapping approaches are still based on conventional methodology of testing one or a few SNPs at a time. Here, we demonstrate a simple strategy that allows for the joint analysis of multiple disease-associated SNPs in different genomic regions. Our set-association method combines information over SNPs by forming sums of relevant single-marker statistics. As previously hypothesized, we show here that this approach successfully addresses the "curse of dimensionality" problem--too many variables should be estimated with a comparatively small number of observations. We also report results of simulation studies showing that our method furnishes unbiased and accurate significance levels. Power calculations demonstrate good power even in the presence of large numbers of nondisease associated SNPs. We extended our method to microarray expression data, where expression levels for large numbers of genes should be compared between two tissue types. In applications to such data, our approach turned out to be highly efficient.  相似文献   

19.
Lin J  Liu KY 《BMC genetics》2005,6(Z1):S25
Several simulation studies have suggested that a high-density single-nucleotide polymorphisms (SNPs) marker set may be as useful as a traditional microsatellites (MS) marker set in performing whole-genome linkage analysis. However, very few studies have directly tested the SNPs-based genome-wide scan. In the present study, we compared the linkage results from the SNPs-based scan with a map density of 3-cM spacing with those from the MS scan using a 10-cM marker set among 300 nuclear families each from the Aipotu (AI), Danacaa (DA), and Karangar (KA) populations from the simulated Genetic Analysis Workshop 14 Problem 2 data. We found that information contents obtained from the SNPs scan were somewhat lower than those from the MS scan. However, the linkage results obtained from the two scans showed a high degree of similarity. Both scans identified a similar number of chromosomal regions attaining nominal significance (p < 0.05). Specifically, both scans detected confirmed evidence for linkage (NPL >or= 4.07, p = 2 x 10(-5)) to chromosome 1 in the AI families, chromosomes 1 and 3 in the DA families, and chromosomes 3, 5, and 9 in the KA families. An additional confirmed linkage to chromosome 5 in the AI families was detected only by the MS scan. We also observed slightly wider 1-LOD intervals for more of the SNP peaks than for the MS peaks, which is likely due to lower information contents for the SNPs. Subsequent fine-mapping association analysis further identified 2 to 3 markers significantly associated with disease status in each population; B03T3056, B03T3058, and B05T4139 in the AI population, B03T3056 and B03T3058 in the KA population, and B03T3056, B03T3057, and B03T3058 in the DA population. Among the four markers, three were chosen based on results obtained from the two scans, but one was solely from the SNP scan. In summary, our finding suggests that the SNP-based genome scan has the potential to be as powerful as the traditional MS-based scan and offers good identification of peak location for further fine-mapped association analysis.  相似文献   

20.
Recent studies have indicated that linkage disequilibrium (LD) between single nucleotide polymorphism (SNP) markers can be used to derive a reduced set of tagging SNPs (tSNPs) for genetic association studies. Previous strategies for identifying tSNPs have focused on LD measures or haplotype diversity, but the statistical power to detect disease-associated variants using tSNPs in genetic studies has not been fully characterized. We propose a new approach of selecting tSNPs based on determining the set of SNPs with the highest power to detect association. Two-locus genotype frequencies are used in the power calculations. To show utility, we applied this power method to a large number of SNPs that had been genotyped in Caucasian samples. We demonstrate that a significant reduction in genotyping efforts can be achieved although the reduction depends on genotypic relative risk, inheritance mode and the prevalence of disease in the human population. The tSNP sets identified by our method are remarkably robust to changes in the disease model when small relative risk and additive mode of inheritance are employed. We have also evaluated the ability of the method to detect unidentified SNPs. Our findings have important implications in applying tSNPs from different data sources in association studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号