首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A power calculation is crucial in planning genetic studies. In genetic association studies, the power is often calculated using the expected number of individuals with each genotype calculated from an assumed allele frequency under Hardy-Weinberg equilibrium. Since the allele frequency is often unknown, the number of individuals with each genotype is random and so a power calculation assuming a known allele frequency may be incorrect. Ambrosius et al. recently showed that the power ignoring this randomness may lead to studies with insufficient power and proposed averaging the power due to the randomness. We extend the method of averaging power in two directions. First, for testing association in case-control studies, we use the Cochran-Armitage trend test and find that the time needed for calculating the averaged power is much reduced compared to the chi-square test with two degrees of freedom studied by Ambrosius et al. A real study is used for illustration of the method. Second, we extend the method to linkage analysis, where the number of identical-by-descent alleles shared by siblings is random. The distribution of identical-by-descent numbers depends on the underlying genetic model rather than the allele frequency. The robust test for linkage analysis is also examined using the averaged powers. We also recommend a sensitivity analysis when the true allele frequency or the number of identical-by-descent alleles is unknown.  相似文献   

2.

Background

Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results

We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions

Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.  相似文献   

3.
使用紧密相邻的标记位点且与标记基因频率无关的哈迪-温伯格不平衡(HWD)指数被用来对数量性状位点(QTL)进行精细定位.本文讨论了当存在基因型错误时HWD指数的性质.文章指出,当存在基因型错误时,对于在群体的标记基因频率已知的情形使用的两个HWD指数尽管受基因型错误的影响但仍然有效;而仅仅极端样本的标记基因频率已知的情形下使用的两个HWD指数同时与基因型错误和标记基因频率有关.计算机模拟表明,仅仅极端样本的标记基因频率已知的情形下使用的两个HWD指数在精细定位时会产生偏差,不适宜作精细定位.  相似文献   

4.
Normally, the ability to digest milk sugar (lactose) is present in every child, but not in every adult. The decrease in lactase synthesis (hypolactasia) results in the inability to digest whole milk. Recent studies of the Finnish population have associated lactase persistence in adults with allele T of the C/T?13910 polymorphism located upstream of the lactase gene; a 100% correlation of primary hypolactasia with genotype C/C has been proved. In this study, the allele and genotype frequencies of C/T?13910 were determined in populations of Russia. The frequencies of genotype C/C, varying from 36.6% in Russians to 88.2% in Chukchi, were close to the published medical and epidemiological data on the hypolactasia frequencies in these populations. Genotyping was carried out by three different methods to determine the optimal one. Genotype C/C proved to be the key determinant of primary hypolactasia. It was assumed that DNA diagnosis of genotype C/C provides a predictive test to detect primary hypolactasia long before its clinical manifestation.  相似文献   

5.
Allele frequencies are most often reported from small convenience samples of unknown demographics and limited generalizability. We determined the distribution of apolipoprotein E genotype (APOE) and allele frequencies for a large, well-defined, representative, rural, population-based sample (n = 4450) aged 55-95 years in Ballabgarh, in the northern Indian state of Haryana. The overall APOE E*2, E*3, and E*4 allele frequencies were 0.039, 0.887, and 0.073, respectively; frequencies are also reported by age, sex, and religious/caste groups. The APOE*4 frequency is among the lowest reported anywhere in the world. APOE allele frequencies did not vary significantly by age or sex in this study. To our knowledge, this is the largest Indian sample ever genotyped for the APOE polymorphism. The representativeness of the sample and its known demographics provide a much-needed normative background for studies of gene-disease associations.  相似文献   

6.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

7.
Populations may become differentiated from one another as a result of genetic drift. The amounts and patterns of differentiation at neutral loci are determined by local population sizes, migration rates among populations, and mutation rates. We provide exact analytical expressions for the mean, variance, and covariance of a stochastic model for hierarchically structured populations subject to migration, mutation, and drift. In addition to the expected correlation in allele frequencies among populations in the same geographic region, we demonstrate that there is a substantial correlation in allele frequencies among regions at the top level of the hierarchy. We propose a hierarchical Bayesian model for inference of Wright's F-statistics in a two-level hierarchy in which we estimate the among-region correlation in allele frequencies by substituting replication across loci for replication across time. We illustrate the approach through an analysis of human microsatellite data, and we show that approaches ignoring the among-region correlation in allele frequencies underestimate the amount of genetic differentiation among major geographic population groups by approximately 30%. Finally, we discuss the implications of these results for the use and interpretation of F-statistics in evolutionary studies.  相似文献   

8.
Thanks to genome‐scale diversity data, present‐day studies can provide a detailed view of how natural and cultivated species adapt to their environment and particularly to environmental gradients. However, due to their sensitivity, up‐to‐date studies might be more sensitive to undocumented demographic effects such as the pattern of migration and the reproduction regime. In this study, we provide guidelines for the use of popular or recently developed statistical methods to detect footprints of selection. We simulated 100 populations along a selective gradient and explored different migration models, sampling schemes and rates of self‐fertilization. We investigated the power and robustness of eight methods to detect loci potentially under selection: three designed to detect genotype–environment correlations and five designed to detect adaptive differentiation (based on FST or similar measures). We show that genotype–environment correlation methods have substantially more power to detect selection than differentiation‐based methods but that they generally suffer from high rates of false positives. This effect is exacerbated whenever allele frequencies are correlated, either between populations or within populations. Our results suggest that, when the underlying genetic structure of the data is unknown, a number of robust methods are preferable. Moreover, in the simulated scenario we used, sampling many populations led to better results than sampling many individuals per population. Finally, care should be taken when using methods to identify genotype–environment correlations without correcting for allele frequency autocorrelation because of the risk of spurious signals due to allele frequency correlations between populations.  相似文献   

9.
Deng HW  Li YM  Li MX  Liu PY 《Human heredity》2003,56(4):160-165
Hardy-Weinberg disequilibrium (HWD) measures have been proposed using dense markers to fine map a quantitative trait locus (QTL) to regions < approximately 1 cM. Earlier HWD measures may introduce bias in the fine mapping because they are dependent on marker allele frequencies across loci. Hence, HWD indices that do not depend on marker allele frequencies are desired for fine mapping. Based on our earlier work, here we present four new HWD indices that do not depend on marker allele frequencies. Two are for use when marker allele frequencies in a study population are known, and two are for use when marker allele frequencies in a study population are not known and are only known in the extreme samples. The new measures are a function of the genetic distance between the marker locus and a QTL. Through simulations, we investigated and compared the fine mapping performance of the new HWD measures with that of the earlier ones. Our results show that when marker allele frequencies vary across loci, the new measures presented here are more robust and powerful.  相似文献   

10.
Both theoretical calculations and simulation studies have been used to compare and contrast the statistical power of methods for mapping quantitative trait loci (QTLs) in simple and complex pedigrees. A widely used approach in such studies is to derive or simulate the expected mean test statistic under the alternative hypothesis of a segregating QTL and to equate a larger mean test statistic with larger power. In the present study, we show that, even when the test statistic under the null hypothesis of no linkage follows a known asymptotic distribution (the standard being chi(2)), it cannot be assumed that the distribution under the alternative hypothesis is noncentral chi(2). Hence, mean test statistics cannot be used to indicate power differences, and a comparison between methods that are based on simulated average test statistics may lead to the wrong conclusion. We illustrate this important finding, through simulations and analytical derivations, for a recently proposed new regression method for the analysis of general pedigrees to map quantitative trait loci. We show that this regression method is not necessarily more powerful nor computationally more efficient than a maximum-likelihood variance-component approach. We advocate the use of empirical power to compare trait-mapping methods.  相似文献   

11.
Restriction‐site associated DNA sequencing (RADSeq) facilitates rapid generation of thousands of genetic markers at relatively low cost; however, several sources of error specific to RADSeq methods often lead to biased estimates of allele frequencies and thereby to erroneous population genetic inference. Estimating the distribution of sample allele frequencies without calling genotypes was shown to improve population inference from whole genome sequencing data, but the ability of this approach to account for RADSeq‐specific biases remains unexplored. Here we assess in how far genotype‐free methods of allele frequency estimation affect demographic inference from empirical RADSeq data. Using the well‐studied pied flycatcher (Ficedula hypoleuca) as a study system, we compare allele frequency estimation and demographic inference from whole genome sequencing data with that from RADSeq data matched for samples using both genotype‐based and genotype free methods. The demographic history of pied flycatchers as inferred from RADSeq data was highly congruent with that inferred from whole genome resequencing (WGS) data when allele frequencies were estimated directly from the read data. In contrast, when allele frequencies were derived from called genotypes, RADSeq‐based estimates of most model parameters fell outside the 95% confidence interval of estimates derived from WGS data. Notably, more stringent filtering of the genotype calls tended to increase the discrepancy between parameter estimates from WGS and RADSeq data, respectively. The results from this study demonstrate the ability of genotype‐free methods to improve allele frequency spectrum‐ (AFS‐) based demographic inference from empirical RADSeq data and highlight the need to account for uncertainty in NGS data regardless of sequencing method.  相似文献   

12.
对许多人群研究表明 ,位于APOA1/C3/A4 /A5基因簇上的载脂蛋白C3基因 (APOC3)SstⅠ多态性与高甘油三酯血症 (Hypertriglyceridaemia ,HTG)密切相关 ,高甘油三酯是冠心病和糖尿病的独立危险因素。为探讨中国人群APOC3基因SstⅠ单核苷酸多态性与冠状动脉粥样硬化性心脏病 (coronaryatheroscleroticheartdisease,CHD)合并高甘油三酯血症 (HTG)、Ⅱ型糖尿病 (non insulin dependentdiabetesmellitus,NIDDM)合并高甘油三酯血症 (HTG)患者的相关性 ,应用聚合酶链反应 限制性片段长度多态性 (PCR RFLP)的方法 ,分析了 2 6 7例CHD患者、2 4 6例NIDDM患者及 4 91例健康对照APOC3基因SstⅠ位点 (S1/S2 )多态性。CHD组、NIDDM组和对照组的APOC3基因SstⅠ多态位点S2等位基因频率分别为 0 30 1、0 30 7和 0 2 86 ,其基因型频率和等位基因频率分布与对照组比较均无显著性差异 (P >0 0 5 )。以TG >1 90mmol/L为标准将CHD组、NIDDM组分为正常甘油三酯组 (NTG)和高甘油三酯组(HTG)发现 ,在CHD患者 ,HTG亚组S1S2基因型频率显著高于NTG亚组 (0 5 4 2 >0 35 7,χ2 =8 77,P =0 0 12 4 ) ;在NIDDM患者 ,HTG亚组S2S2基因型频率显著高于NTG亚组 (0 2 0 0 >0 0 5 5 ,χ2 =2 0 2 1,P =0 0 0 0 0 ) ,两亚组间等位基因频  相似文献   

13.
Population-based genetic association studies, popularly known as case-control studies, have continued to be the most preferred method for deciphering the genetic basis of various complex diseases, even in the post-human genome sequencing era. However, interpopulation differences in allele, genotype, and haplotype frequencies and linkage disequilibrium patterns lead to inconsistent results in candidate gene association studies. Therefore, for any meaningful disease association study, knowledge of the normative genetic background of the baseline population is a prerequisite. In addition, such genetic variation data also provide a ready-made menu of allele frequencies and linkage disequilibrium patterns of various polymorphisms in specific candidate genes in a particular population, which is a useful reference for further genetic association studies. Such genetic variation data are lacking for the Indian population, which represents about one-sixth of the world's population. In the present study we have reported the allele, genotype, and haplotype frequencies, Hardy-Weinberg equilibrium status, and linkage disequilibrium patterns of 12 polymorphisms in six candidate genes from the renin-angiotensin-aldosterone system among Indians. Because of their different history of origin, the Indian population is broadly divided into two subpopulations: North Indians (Caucasian Europeans) and South Indians (Dravidians). Considering this well-documented difference in gene pools, we have presented a comparative account of the normative genetic data of North Indian and South Indian populations with at least four individuals of urban and suburban origin from each of the representative states of northern and southern India.  相似文献   

14.
The angiotensin-converting enzyme gene (ACE) insertion/deletion polymorphism was determined in 211 Mexican healthy individuals belonging to different Mexican ethnic groups (98 Mestizos, 64 Teenek, and 49 Nahuas). ACE polymorphism differed among Mexicans with a high frequency of the D allele and the D/D genotype in Mexican Mestizos. The D/D genotype was absent in Teenek and present in only one Nahua individual (2.0%). When comparisons were made, we observed that Caucasian, African, and Asian populations presented the highest frequencies of the D allele, whereas Amerindian (Teenek and Pima) and Australian Aboriginals showed the highest frequencies of the I allele. The distribution of I/D genotype was heterogeneous in all populations: Australian Aboriginals presented the lowest frequency (4.9%), whereas Nahuas presented the highest (73.4%). The present study shows the frequencies of a polymorphism not analyzed previously in Mexican populations and establishes that this polymorphism distinguishes the Amerindian populations of other groups. On the other hand, since ACE alleles have been associated with genetic susceptibility to developing cardiovascular diseases and hypertension, knowledge of the distribution of these alleles could help to define the true significance of ACE polymorphism as a genetic susceptibility marker in the Amerindian populations.  相似文献   

15.
An entropy-based statistic for genomewide association studies   总被引:8,自引:0,他引:8       下载免费PDF全文
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.  相似文献   

16.
The highly polymorphic D1S80 locus has no known genetic function. However, this variable number of tandem repeats (VNTR) locus has been highly valuable in forensic identification. In this study we report the allele and genotype frequencies of five African populations (Benin, Cameroon, Egypt, Kenya, and Rwanda), which can be used as databases to help characterize populations and identify individuals. The allele frequencies were used to infer genetic associations through phylogenetic, principal component, and G test statistical analyses. Compliance with Hardy-Weinberg equilibrium expectations was determined as were F(ST) estimates, theta p values, and power of discrimination assessment for each population. Our analyses of 28 additional populations demonstrate that the D1S80 locus alone can be used to discriminate geographic and ethnic groups. We have generated databases useful for human identification and phylogenetic studies.  相似文献   

17.
Apolipoproteins (lipid-free) are lipid-binding proteins that circulate in the plasma of human blood and are responsible for the clearance of lipoproteins. Apolipoprotein E (ApoE) is one of the several classes of this protein family. It acts as a ligand for the low-density lipid (LDL) receptors and is important for the clearance of very low-density lipid (VLDL) and chylomicron remnants. The APOE gene locus is polymorphic, with three major known alleles, APOE*3, *4, and *2. We investigated the distribution of the allele frequency of the APOE gene locus and describe here the genetic variation in four Kuwaiti subpopulations: Arab origin (Arabian peninsula), Arab Bedouin tribes, Iranian origin, and the heterogeneous population. We also describe the use of Spreadex gels in resolving the amplified and digested products of the APOE gene locus. DNA was extracted from whole blood and subjected to PCR and then to RFLP analysis. Allele and genotype frequencies were estimated for the total population and for each subpopulation. Statistical analysis showed no difference in the allele frequencies between the four groups. The frequency of APOE*3 in the Kuwaiti population was highest (88.4%) followed by the frequency of APOE*4 (6.5%) and APOE*2 (5.1%). The genotype and allele frequencies obtained for the Kuwaiti population fell within the reported worldwide distribution for the APOE gene locus. Moreover, the results obtained in this study showed no statistical difference (p > 0.05) between the APOE allele and genotype frequencies between the subgroups for all six genotypes and three alleles, supporting the assumption of admixture in the Kuwaiti population and that the obtained frequencies were in Hardy-Weinberg equilibrium. Finally, we found that the distribution of the APOE alleles in Kuwait differs somewhat from those reported in other Arab populations, suggesting that the Arabs originating from the Arabian peninsula are different from those of Lebanon, Morocco, and Sudan.  相似文献   

18.
19.
Microsatellite null alleles are found to a varying degree across all taxa. They are problematic as they may inflate measures of genetic differentiation and create false homozygotes. Although there are several methods for correcting allele frequencies for null alleles and enable estimations of F(ST), much less is known about how null alleles affect assignment testing. Data presented here, based on simulations, show that the percentage of correctly assigned individuals in model-based clustering and Bayesian assignment methods were slightly, though significantly, reduced in the presence of null alleles (frequency range from 0.000 to 0.913). The bias in assignment tests caused by null alleles lead to a slight reduction in the power to correctly assigned individuals (0.2 and 1.0 percent units for STRUCTURE- and 2.4 percent units for GENECLASS-based assignment tests). Further, the presence of null alleles caused a small, however, significant overestimation of F(ST). Consequently, microsatellite loci affected by null alleles would probably not alter the overall outcome of assignment testing and could therefore be included in these types of studies. Nevertheless, loci prone to null alleles should be used with caution as they lower the power of assignment tests and alter the accuracy of F(ST), and loci less prone to null alleles should always be preferred.  相似文献   

20.
Allele and genotype frequencies at the HLA-DQ alpha locus have been determined by the use of polymerase chain reaction (PCR) amplification and nonradioactive oligonucleotide probes. The probes define six alleles and 21 genotypes in a dot-blot format. A total of over 1,400 individuals from 11 populations has been typed by two different laboratories using this method. In contrast to some variable-number-of-tandem-repeat markers that have been used for identity determination, DQ alpha genotype frequencies do not deviate significantly from Hardy-Weinberg equilibrium in all populations studied. The distribution of alleles varies significantly between most of these populations. In Caucasians, the allele frequencies range from 4.3% to 28.5%. In this population, the power of discrimination is .94, and, for paternity determination, the power of exclusion is .642. These population data will allow the use of the HLA-DQ alpha marker in paternity determination, the analysis of individual identity in forensic samples, and anthropological studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号