首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Linkage disequilibrium testing when linkage phase is unknown   总被引:2,自引:0,他引:2  
Schaid DJ 《Genetics》2004,166(1):505-512
Linkage disequilibrium, the nonrandom association of alleles from different loci, can provide valuable information on the structure of haplotypes in the human genome and is often the basis for evaluating the association of genomic variation with human traits among unrelated subjects. But, linkage phase of genetic markers measured on unrelated subjects is typically unknown, and so measurement of linkage disequilibrium, and testing whether it differs significantly from the null value of zero, requires statistical methods that can account for the ambiguity of unobserved haplotypes. A common method to test whether linkage disequilibrium differs significantly from zero is the likelihood-ratio statistic, which assumes Hardy-Weinberg equilibrium of the marker phenotype proportions. We show, by simulations, that this approach can be grossly biased, with either extremely conservative or liberal type I error rates. In contrast, we use simulations to show that a composite statistic, proposed by Weir and Cockerham, maintains the correct type I error rates, and, when comparisons are appropriate, has similar power as the likelihood-ratio statistic. We extend the composite statistic to allow for more than two alleles per locus, providing a global composite statistic, which is a strong competitor to the usual likelihood-ratio statistic.  相似文献   

2.
In studies of complex diseases, a common paradigm is to conduct association analysis at markers in regions identified by linkage analysis, to attempt to narrow the region of interest. Family-based tests for association based on parental transmissions to affected offspring are often used in fine-mapping studies. However, for diseases with late onset, parental genotypes are often missing. Without parental genotypes, family-based tests either compare allele frequencies in affected individuals with those in their unaffected siblings or use siblings to infer missing parental genotypes. An example of the latter approach is the score test implemented in the computer program TRANSMIT. The inference of missing parental genotypes in TRANSMIT assumes that transmissions from parents to affected siblings are independent, which is appropriate when there is no linkage. However, using computer simulations, we show that, when the marker and disease locus are linked and the data set consists of families with multiple affected siblings, this assumption leads to a bias in the score statistic under the null hypothesis of no association between the marker and disease alleles. This bias leads to an inflated type I error rate for the score test in regions of linkage. We present a novel test for association in the presence of linkage (APL) that correctly infers missing parental genotypes in regions of linkage by estimating identity-by-descent parameters, to adjust for correlation between parental transmissions to affected siblings. In simulated data, we demonstrate the validity of the APL test under the null hypothesis of no association and show that the test can be more powerful than the pedigree disequilibrium test and family-based association test. As an example, we compare the performance of the tests in a candidate-gene study in families with Parkinson disease.  相似文献   

3.
B Haubold  M Travisano  P B Rainey  R R Hudson 《Genetics》1998,150(4):1341-1348
The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, VD, is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of VD. This critical value can be estimated either by Monte Carlo simulation or by assuming that VD is distributed normally and calculating a one-tailed 95% critical value for VD, L, L = EVD + 1.645 sqrt(VarVD), where E(VD) is the expectation of VD, and Var(VD) is the variance of VD. If VD (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(VD) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(VD). The distribution of VD is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.  相似文献   

4.
A statistic is proposed for testing the hypothesis of equality of the means of a bivariate normal distribution with unknown common variance and correlation coefficient when observations are missing on one of the variates. The distribution of the statistic is approximated by a normal distribution under the null hypothesis. The empirical powers of the statistic are computed and compared with those of the conventional paired t and the other known statistics. The power comparisons support the use of the proposed test.  相似文献   

5.
Family-based tests of association in the presence of linkage   总被引:21,自引:0,他引:21       下载免费PDF全文
Linkage analysis may not provide the necessary resolution for identification of the genes underlying phenotypic variation. This is especially true for gene-mapping studies that focus on complex diseases that do not exhibit Mendelian inheritance patterns. One positional genomic strategy involves application of association methodology to areas of identified linkage. Detection of association in the presence of linkage localizes the gene(s) of interest to more-refined regions in the genome than is possible through linkage analysis alone. This strategy introduces a statistical complexity when family-based association tests are used: the marker genotypes among siblings are correlated in linked regions. Ignoring this correlation will compromise the size of the statistical hypothesis test, thus clouding the interpretation of test results. We present a method for computing the expectation of a wide range of association test statistics under the null hypothesis that there is linkage but no association. To standardize the test statistic, an empirical variance-covariance estimator that is robust to the sibling marker-genotype correlation is used. This method is widely applicable: any type of phenotypic measure or family configuration can be used. For example, we analyze a deletion in the A2M gene at the 5' splice site of "exon II" of the bait region in Alzheimer disease (AD) discordant sibships. Since the A2M gene lies in a chromosomal region (chromosome 12p) that consistently has been linked to AD, association tests should be conducted under the null hypothesis that there is linkage but no association.  相似文献   

6.
Incorporating genotypes of relatives into a test of linkage disequilibrium.   总被引:3,自引:0,他引:3  
Genetic data from autosomal loci in diploids generally consist of genotype data for which no phase information is available, making it difficult to implement a test of linkage disequilibrium. In this paper, we describe a test of linkage disequilibrium based on an empirical null distribution of the likelihood of a sample. Information on the genotypes of related individuals is explicitly used to help reconstruct the gametic phase of the independent individuals. Simulation studies show that the present approach improves on estimates of linkage disequilibrium gathered from samples of completely independent individuals but only if some offspring are sampled together with their parents. The failure to incorporate some parents sharply decreases the sensitivity and accuracy of the test. Simulations also show that for multiallelic data (more than two alleles) our testing procedure is not as powerful as an exact test based on known haplotype frequencies, owing to the interaction between departure from Hardy-Weinberg equilibrium and linkage disequilibrium.  相似文献   

7.
Genome-wide association (GWA) studies are currently one of the most powerful tools in identifying disease-associated genes or variants. In typical GWA studies, single-nucleotide polymorphisms (SNPs) are often used as genetic makers. Therefore, it is critical to estimate the percentage of genetic variations which can be covered by SNPs through linkage disequilibrium (LD). In this study, we use the concept of haplotype blocks to evaluate the coverage of five SNP sets including the HapMap and four commercial arrays, for every exon in the human genome. We show that although some Chips can reach similar coverage as the HapMap, only about 50% of exons are completely covered by haplotype blocks of HapMap SNPs. We suggest further high-resolution genotyping methods are required, to provide adequate genome-wide power for identifying variants.  相似文献   

8.
The analysis of the haplotype-phenotype relationship has become more and more important. We have developed an algorithm, using individual genotypes at linked loci as well as their quantitative phenotypes, to estimate the parameters of the distribution of the phenotypes for subjects with and without a particular haplotype by an expectation-maximization (EM) algorithm. We assumed that the phenotype for a diplotype configuration follows a normal distribution. The algorithm simultaneously calculates the maximum likelihood (L0max) under the null hypothesis (i.e., nonassociation between the haplotype and phenotype), and the maximum likelihood (Lmax) under the alternative hypothesis (i.e., association between the haplotype and phenotype). Then we tested the association between the haplotype and the phenotype using a test statistic, -2 log(L0max/Lmax). The above algorithm along with some extensions for different modes of inheritance was implemented as a computer program, QTLHAPLO. Simulation studies using single-nucleotide polymorphism (SNP) genotypes have clarified that the estimation was very accurate when the linkage disequilibrium between linked loci was rather high. Empirical power using the simulated data was high enough. We applied QTLHAPLO for the analysis of the real data of the genotypes at the calpain 10 gene obtained from diabetic and control subjects in various laboratories.  相似文献   

9.
The sibship disequilibrium test (SDT) is designed to detect both linkage in the presence of association and association in the presence of linkage (linkage disequilibrium). The test does not require parental data but requires discordant sibships with at least one affected and one unaffected sibling. The SDT has many desirable properties: it uses all the siblings in the sibship; it remains valid if there are misclassifications of the affectation status; it does not detect spurious associations due to population stratification; asymptotically it has a chi2 distribution under the null hypothesis; and exact P values can be easily computed for a biallelic marker. We show how to extend the SDT to markers with multiple alleles and how to combine families with parents and data from discordant sibships. We discuss the power of the test by presenting sample-size calculations involving a complex disease model, and we present formulas for the asymptotic relative efficiency (which is approximately the ratio of sample sizes) between SDT and the transmission/disequilibrium test (TDT) for special family structures. For sib pairs, we compare the SDT to a test proposed both by Curtis and, independently, by Spielman and Ewens. We show that, for discordant sib pairs, the SDT has good power for testing linkage disequilibrium relative both to Curtis''s tests and to the TDT using trios comprising an affected sib and its parents. With additional sibs, we show that the SDT can be more powerful than the TDT for testing linkage disequilibrium, especially for disease prevalence >.3.  相似文献   

10.
A statistic, derived from the combination of two dependent tests, is proposed for testing the hypothesis of equality of the means of a bivariate normal distribution with unknown common variance and correlation coefficient when observations are missing on one or both variates. The null distribution of the statistic is approximated by a well-known distribution. The empirical powers of the statistic are computed and compared with some of the known statistics. The comparisons support the use of the proposed test.  相似文献   

11.
Disease association with a genetic marker is often taken as a preliminary indication of linkage with disease susceptibility. However, population subdivision and admixture may lead to disease association even in the absence of linkage. In a previous paper, we described a test for linkage (and linkage disequilibrium) between a genetic marker and disease susceptibility; linkage is detected by this test only if association is also present. This transmission/disequilibrium test (TDT) is carried out with data on transmission of marker alleles from parents heterozygous for the marker to affected offspring. The TDT is a valid test for linkage and association, even when the association is caused by population subdivision and admixture. In the previous paper, we did not explicitly consider the effect of recent history on population structure. Here we extend the previous results by examining in detail the effects of subdivision and admixture, viewed as processes in population history. We describe two models for these processes. For both models, we analyze the properties of (a) the TDT as a test for linkage (and association) between marker and disease and (b) the conventional contingency statistic used with family data to test for population association. We show that the contingency test statistic does not have a chi 2 distribution if subdivision or admixture is present. In contrast, the TDT remains a valid chi 2 statistic for the linkage hypothesis, regardless of population history.  相似文献   

12.
Genome wide association studies have been usually analyzed in a univariate manner. The commonly used univariate tests have one degree of freedom and assume an additive mode of inheritance. The experiment-wise significance of these univariate statistics is obtained by adjusting for multiple testing. Next generation sequencing studies, which assay 10-20 million variants, are beginning to come online. For these studies, the strategy of additive univariate testing and multiple testing adjustment is likely to result in a loss of power due to (1) the substantial multiple testing burden and (2) the possibility of a non-additive causal mode of inheritance. To reduce the power loss we propose: a new method (1) to summarize in a single statistic the strength of the association signals coming from all not-very-rare variants in a linkage disequilibrium block and (2) to incorporate, in any linkage disequilibrium block statistic, the strength of the association signals under multiple modes of inheritance. The proposed linkage disequilibrium block test consists of the sum of squares of nominally significant univariate statistics. We compare the performance of this method to the performance of existing linkage disequilibrium block/gene-based methods. Simulations show that (1) extending methods to combine testing for multiple modes of inheritance leads to substantial power gains, especially for a recessive mode of inheritance, and (2) the proposed method has a good overall performance. Based on simulation results, we provide practical advice on choosing suitable methods for applied analyses.  相似文献   

13.
Although high-density SNP genotyping platforms generate a momentum for detailed genome-wide association (GWA) studies, an offshoot is a new insight into population genetics. Here, we present an example in one of the best-known founder populations by scrutinizing ten distinct Finnish early- and late-settlement subpopulations. By determining genetic distances, homozygosity, and patterns of linkage disequilibrium, we demonstrate that population substructure, and even individual ancestry, is detectable at a very high resolution and supports the concept of multiple historical bottlenecks resulting from consecutive founder effects. Given that genetic studies are currently aiming at identifying smaller and smaller genetic effects, recognizing and controlling for population substructure even at this fine level becomes imperative to avoid confounding and spurious associations. This study provides an example of the power of GWA data sets to demonstrate stratification caused by population history even within a seemingly homogeneous population, like the Finns. Further, the results provide interesting lessons concerning the impact of population history on the genome landscape of humans, as well as approaches to identify rare variants enriched in these subpopulations.  相似文献   

14.
The problem of testing the separability of a covariance matrix against an unstructured variance‐covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first‐order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ2 distribution. The tests are implemented on a real dataset from medical studies.  相似文献   

15.
A recent study by Cheung et al. demonstrates how to identify expression quantitative trait loci (eQTLs) underlying gene expression phenotypes through a combination of genome-wide linkage analysis and subsequent fine mapping or by genome-wide association (GWA) analysis. This study emphasizes the complexity of human traits, highlighting the challenges faced by investigators--in particular, insufficient linkage disequilibrium between the trait and marker variant, genetic heterogeneity and correcting for multiple testing will all adversely impact the power to detect loci by association. These issues must be considered carefully if the GWA approach is to succeed in mapping complex phenotypes.  相似文献   

16.
Generalized T2 test for genome association studies   总被引:4,自引:0,他引:4       下载免费PDF全文
Recent progress in the development of single-nucleotide polymorphism (SNP) maps within genes and across the genome provides a valuable tool for fine-mapping and has led to the suggestion of genomewide association studies to search for susceptibility loci for complex traits. Test statistics for genome association studies that consider a single marker at a time, ignoring the linkage disequilibrium between markers, are inefficient. In this study, we present a generalized T2 statistic for association studies of complex traits, which can utilize multiple SNP markers simultaneously and considers the effects of multiple disease-susceptibility loci. This generalized T2 statistic is a corollary to that originally developed for multivariate analysis and has a close relationship to discriminant analysis and common measure of genetic distance. We evaluate the power of the generalized T2 statistic and show that power to be greater than or equal to those of the traditional chi2 test of association and a similar haplotype-test statistic. Finally, examples are given to evaluate the performance of the proposed T2 statistic for association studies using simulated and real data.  相似文献   

17.
With the widespread availability of SNP genotype data, there is great interest in analyzing pedigree haplotype data. Intermarker linkage disequilibrium for microsatellite markers is usually low due to their physical distance; however, for dense maps of SNP markers, there can be strong linkage disequilibrium between marker loci. Linkage analysis (parametric and nonparametric) and family-based association studies are currently being carried out using dense maps of SNP marker loci. Monte Carlo methods are often used for both linkage and association studies; however, to date there are no programs available which can generate haplotype and/or genotype data consisting of a large number of loci for pedigree structures. SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.  相似文献   

18.
We extend the methodology for family-based tests of association and linkage to allow for both variation in the phenotypes of subjects and incorporation of covariates into general-score tests of association. We use standard association models for a phenotype and any number of predictors. We then construct a score statistic, using likelihoods for the distribution of phenotype, given genotype. The distribution of the score is computed as a function of offspring genotypes, conditional on parental genotypes and trait values for offspring and parents. This approach provides a natural extension of the transmission/disequilibrium test to any phenotype and to multiple genes or environmental factors and allows the study of gene-gene and gene-environment interaction. When the trait varies among subjects or when covariates are included in the association model, the score statistic depends on one or more nuisance parameters. We suggest two approaches for obtaining parameter estimates: (1) choosing the estimate that minimizes the variance of the test statistic and (2) maximizing the statistic over a nuisance parameter and using a corrected P value. We apply our methods to a sample of families with attention-deficit/hyperactivity disorder and provide examples of how covariates and gene-environment and gene-gene interactions can be incorporated.  相似文献   

19.
Zhao J  Boerwinkle E  Xiong M 《Human genetics》2007,121(3-4):357-367
Availability of a large collection of single nucleotide polymorphisms (SNPs) and efficient genotyping methods enable the extension of linkage and association studies for complex diseases from small genomic regions to the whole genome. Establishing global significance for linkage or association requires small P-values of the test. The original TDT statistic compares the difference in linear functions of the number of transmitted and nontransmitted alleles or haplotypes. In this report, we introduce a novel TDT statistic, which uses Shannon entropy as a nonlinear transformation of the frequencies of the transmitted or nontransmitted alleles (or haplotypes), to amplify the difference in the number of transmitted and nontransmitted alleles or haplotypes in order to increase statistical power with large number of marker loci. The null distribution of the entropy-based TDT statistic and the type I error rates in both homogeneous and admixture populations are validated using a series of simulation studies. By analytical methods, we show that the power of the entropy-based TDT statistic is higher than the original TDT, and this difference increases with the number of marker loci. Finally, the new entropy-based TDT statistic is applied to two real data sets to test the association of the RET gene with Hirschsprung disease and the Fcγ receptor genes with systemic lupus erythematosus. Results show that the entropy-based TDT statistic can reach p-values that are small enough to establish genome-wide linkage or association analyses.  相似文献   

20.
数量性状的遗传分析可以通过"选择基因型"的方式完成。本文提出了一个利用极端样本来对数量性状位点(QTL)进行关联分析的统计量T。统计量T比较上极端群体样本中具有纯合子标记的性状值差异。通过计算机模拟考察了无关联情形时T的分布和Ⅰ型错误率,结果表明,在各种样本选择策略下,T的分布近似于χ^2-分布,Ⅰ型错误率接近设定的显著性水平。同时,考察了各种遗传模型下不同遗传率,不同样本大小,及不同样本选择阈值对T的统计功效的影响,结果表明,T的功效随着标记和QTL间连锁不平衡程度的增强及遗传率和样本大小的增大而增大,当样本选择阈值更严格时,功效也越大。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号