首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

2.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

3.
The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published models of genotyping errors on the chi-square test for independence in the 2 x 3 table. After specifying genotype frequencies for the marker locus conditional on disease status and error model in both a genetic model-based and a genetic model-free framework, we compute the asymptotic power to detect association through specification of the test's non-centrality parameter. This parameter determines the functional dependence of SSN on the genotyping error rates. Additionally, we study the dependence of SSN on linkage disequilibrium (LD), marker allele frequencies, and genotyping error rates for a dominant disease model. Increased genotyping error rate requires a larger SSN. Every 1% increase in sum of genotyping error rates requires that both case and control SSN be increased by 2-8%, with the extent of increase dependent upon the error model. For the dominant disease model, SSN is a nonlinear function of LD and genotyping error rate, with greater SSN for lower LD and higher genotyping error rate. The combination of lower LD and higher genotyping error rates requires a larger SSN than the sum of the SSN for the lower LD and for the higher genotyping error rate.  相似文献   

4.
Errors while genotyping are inevitable and can reduce the power to detect linkage. However, does genotyping error have the same impact on linkage results for single-nucleotide polymorphism (SNP) and microsatellite (MS) marker maps? To evaluate this question we detected genotyping errors that are consistent with Mendelian inheritance using large changes in multipoint identity-by-descent sharing in neighboring markers. Only a small fraction of Mendelian consistent errors were detectable (e.g., 18% of MS and 2.4% of SNP genotyping errors). More SNP genotyping errors are Mendelian consistent compared to MS genotyping errors, so genotyping error may have a greater impact on linkage results using SNP marker maps. We also evaluated the effect of genotyping error on the power and type I error rate using simulated nuclear families with missing parents under 0, 0.14, and 2.8% genotyping error rates. In the presence of genotyping error, we found that the power to detect a true linkage signal was greater for SNP (75%) than MS (67%) marker maps, although there were also slightly more false-positive signals using SNP marker maps (5 compared with 3 for MS). Finally, we evaluated the usefulness of accounting for genotyping error in the SNP data using a likelihood-based approach, which restores some of the power that is lost when genotyping error is introduced.  相似文献   

5.
A study including eight microsatellite loci for 1,014 trees from seven mapped stands of the partially clonal Populus euphratica was used to demonstrate how genotyping errors influence estimates of clonality. With a threshold of 0 (identical multilocus genotypes constitute one clone) we identified 602 genotypes. A threshold of 1 (compensating for an error in one allele) lowered this number to 563. Genotyping errors can seemingly merge (type 1 error), split really existing clones (type 2), or convert a unique genotype into another unique genotype (type 3). We used context information (sex and spatial position) to estimate the type 1 error. For thresholds of 0 and 1 the estimate was below 0.021, suggesting a high resolution for the marker system. The rate of genotyping errors was estimated by repeated genotyping for a cohort of 41 trees drawn at random (0.158), and a second cohort of 40 trees deviating in one allele from another tree (0.368). For the latter cohort, most of these deviations turned out to be errors, but 8 out of 602 obtained multilocus genotypes may represent somatic mutations, corresponding to a mutation rate of 0.013. A simulation of genotyping errors for populations with varying clonality and evenness showed the number of genotypes always to be overestimated for a system with high resolution, and this mistake increases with increasing clonality and evenness. Allowing a threshold of 1 compensates for most genotyping errors and leads to much more precise estimates of clonality compared with a threshold of 0. This lowers the resolution of the marker system, but comparison with context information can help to check if the resolution is sufficient to apply a higher threshold. We recommend simulation procedures to investigate the behavior of a marker system for different thresholds and error rates to obtain the best estimate of clonality.  相似文献   

6.
Cheng KF  Chen JH 《Human heredity》2007,64(2):114-122
The transmission/disequilibrium test (TDT), a family based test of linkage and association, is a popular test for studies of complex inheritance, as it is nonparametric and robust against spurious conclusions induced by hidden genetic structure, such as stratification or admixture. However, the TDT may be biased by genotyping errors. Undetected genotyping errors may be contributing to an inflated type I error rate among reported TDT-derived associations. To adjust for bias, a popular approach is to assume a genotype error model for describing the pattern of errors and propose association tests using likelihood method. However, all model-based approaches tend to perform unsatisfactorily if the related genotyping error rates are not identical across all families. In this paper, we propose a TDT-type association test which is not only simple, robust against population stratification (and hence the assumption of Hardy-Weinberg equilibrium is not required), but also robust against genotyping error with error rates varying across families. Simulation studies confirm that the new test has very reasonable performance.  相似文献   

7.
Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies).  相似文献   

8.
一种有效的复杂疾病基因定位的检测法   总被引:1,自引:0,他引:1  
连锁不平衡(LD)应用于某些复杂疾病基因的定位,近年来发展了许多LD定位方法,除TDT外,大多数LD定位方法须先假定无人群混和,人群混合可增大在疾病基因定位时犯Ⅰ类错误的机率,产生无效结果。此方法利用LD来检测标记位点和疾病敏感位点(DSL)的连锁(有连锁不平衡)相关(有连锁)。分析时采用不相关样本,已知其父母基因型和至少父母之一为杂合子,再将随机样本依基因型不同分类,然后对来自不同类的数据应用有力的统计方法进行单独和联合分析。此LD定位法不仅适用于患病和正常个体,而且有效消除据父母基因分类的样本定位时人群混合的影响,分析结果和模拟结果也表明此方法解决了在检测标记位点和疾病敏感位点之间的连锁和相关时人群混和的问题,但与TDT比,此法在检测的位点为DSL时丙能有效和充分地利用矫正数据,检测位点不是DSL时,此法和TDT法可相互补充更有效地检测连锁的DSL。  相似文献   

9.
OBJECTIVE: In affected sib pair studies without genotyped parents the effect of genotyping error is generally to reduce the type I error rate and power of tests for linkage. The effect of genotyping error when parents have been genotyped is unknown. We investigated the type I error rate of the single-point Mean test for studies in which genotypes of both parents are available. METHODS: Datasets were simulated assuming no linkage and one of five models for genotyping error. In each dataset, Mendelian-inconsistent families were either excluded or regenotyped, and then the Mean test applied. RESULTS: We found that genotyping errors lead to an inflated type I error rate when inconsistent families are excluded. Depending on the genotyping-error model assumed, regenotyping inconsistent families has one of several effects. It may produce the same type I error rate as if inconsistent families are excluded; it may reduce the type I error, but still leave an anti-conservative test; or it may give a conservative test. Departures of the type I error rate from its nominal level increase with both the genotyping error rate and sample size. CONCLUSION: We recommend that markers with high error rates either be excluded from the analysis or be regenotyped in all families.  相似文献   

10.
11.
To test whether plucked hairs are a reliable source of DNA for genotyping microsatellite loci, we carried out experiments using one, three, or 10 hairs per extract for 50 alpine marmots. For each extract, seven independent genotypings were performed for the same locus (multiple-tubes approach). Two types of genotyping errors were recorded: a false homozygote defined as the detection of only one allele of a true heterozygote, and a false allele defined as a PCR-generated allele that was not one of the alleles of the true genotype. Using DNA extracted from one, three, or 10 hairs, the overall error rate was 14.00%, 4.86%, and 0.29%, respectively. Based on our results, we conclude that 10 hairs should be used to obtain consistently reliable genotypings using the single-tube approach, and that a single plucked hair could represent a reliable source of DNA if the multiple-tubes approach is used. For future studies of dinucleotide repeat diversity using DNA extracted from one to three shed or plucked hairs, we strongly recommend initiating an appropriate pilot study to quantify the error rate and to determine the reliability of the single-tube approach.  相似文献   

12.
Microsatellite genotyping errors will be present in all but the smallest data sets and have the potential to undermine the conclusions of most downstream analyses. Despite this, little rigorous effort has been made to quantify the size of the problem and to identify the commonest sources of error. Here, we use a large data set comprising almost 2000 Antarctic fur seals Arctocephalus gazella genotyped at nine hypervariable microsatellite loci to explore error detection methods, common sources of error and the consequences of errors on paternal exclusion. We found good concordance among a range of contrasting approaches to error-rate estimation, our range being 0.0013 to 0.0074 per single locus PCR (polymerase chain reaction). The best approach probably involves blind repeat-genotyping, but this is also the most labour-intensive. We show that several other approaches are also effective at detecting errors, although the most convenient alternative, namely mother-offspring comparisons, yielded the lowest estimate of the error rate. In total, we found 75 errors, emphasizing their ubiquitous presence. The most common errors involved the misinterpretation of allele banding patterns (n = 60, 80%) and of these, over a third (n = 22, 36.7%) were due to confusion between homozygote and adjacent allele heterozygote genotypes. A specific test for whether a data set contains the expected number of adjacent allele heterozygotes could provide a useful tool with which workers can assess the likely size of the problem. Error rates are also positively correlated with both locus polymorphism and product size, again indicating aspects where extra effort at error reduction should be directed. Finally, we conducted simulations to explore the potential impact of genotyping errors on paternity exclusion. Error rates as low as 0.01 per allele resulted in a rate of false paternity exclusion exceeding 20%. Errors also led to reduced estimates of male reproductive skew and increases in the numbers of pups that matched more than one candidate male. Because even modest error rates can be strongly influential, we recommend that error rates should be routinely published and that researchers make an attempt to calculate how robust their analyses are to errors.  相似文献   

13.
Next-generation sequencing of DNA provides an unprecedented opportunity to discover rare genetic variants associated with complex diseases and traits. However, the common practice of first calling underlying genotypes and then treating the called values as known is prone to false positive findings, especially when genotyping errors are systematically different between cases and controls. This happens whenever cases and controls are sequenced at different depths, on different platforms, or in different batches. In this article, we provide a likelihood-based approach to testing rare variant associations that directly models sequencing reads without calling genotypes. We consider the (weighted) burden test statistic, which is the (weighted) sum of the score statistic for assessing effects of individual variants on the trait of interest. Because variant locations are unknown, we develop a simple, computationally efficient screening algorithm to estimate the loci that are variants. Because our burden statistic may not have mean zero after screening, we develop a novel bootstrap procedure for assessing the significance of the burden statistic. We demonstrate through extensive simulation studies that the proposed tests are robust to a wide range of differential sequencing qualities between cases and controls, and are at least as powerful as the standard genotype calling approach when the latter controls type I error. An application to the UK10K data reveals novel rare variants in gene BTBD18 associated with childhood onset obesity. The relevant software is freely available.  相似文献   

14.
Family-based association studies have been widely used to identify association between diseases and genetic markers. It is known that genotyping uncertainty is inherent in both directly genotyped or sequenced DNA variations and imputed data in silico. The uncertainty can lead to genotyping errors and missingness and can negatively impact the power and Type I error rates of family-based association studies even if the uncertainty is independent of disease status. Compared with studies using unrelated subjects, there are very few methods that address the issue of genotyping uncertainty for family-based designs. The limited attempts have mostly been made to correct the bias caused by genotyping errors. Without properly addressing the issue, the conventional testing strategy, i.e. family-based association tests using called genotypes, can yield invalid statistical inferences. Here, we propose a new test to address the challenges in analyzing case-parents data by using calls with high accuracy and modeling genotype-specific call rates. Our simulations show that compared with the conventional strategy and an alternative test, our new test has an improved performance in the presence of substantial uncertainty and has a similar performance when the uncertainty level is low. We also demonstrate the advantages of our new method by applying it to imputed markers from a genome-wide case-parents association study.  相似文献   

15.
By testing DNA pools rather than single samples the number of tests for a case-control association study can be decreased to only two for each marker: one on the patient and one on the control pool. A fundamental requirement is that each pool represents the frequency of the markers in the corresponding population beyond the influence of experimental errors. Consequently the latter must be carefully determined. To this aim, we prepared pools of different size (49-402 individuals) with accurately quantified DNAs, estimated the allelic frequencies in the pools of two SNPs by primer extension genotyping followed by DHPLC analysis and compared them with the real frequencies determined in the single samples. Our data show that (1) the method is highly reproducible: the standard deviation of repeated determinations was +/-0.014; (2) the experimental error (i.e., the discrepancy between the estimated and real frequencies) was +/-0.013 (95% C.I.: 0.0098-0.0165). The magnitude of this error was not correlated to the pool size or to the type of SNP. The effect of the observed experimental error on the power of the association test was evaluated. We conclude that this method constitutes an efficient tool for high-throughput association screenings provided that the experimental error is low. We therefore recommend that before a pool is used for extensive association studies, its quality, i.e., the experimental error, is verified by determining the difference between estimated and real frequencies for at least one marker.  相似文献   

16.
Summary Deficiency of mitochondrial aldehyde dehydrogenase (ALDH I) is an inborn error of metabolism that is responsible for acute alcohol sensitivity (flushing response) observed only in Orientals of Mongoloid origin. Our previous studies using electrophoretic enzyme detection have shown that this deficiency is prevalent among Japanese, Chinese, and other Orientals. We report here the genotyping of ALDH I locus in blood samples of 218 South Korean individuals by means of hybridization analysis with allele-specific oligonucleotide probes and enzymatically amplified human genomic DNA. The results of genotyping are compared with the phenotype analysis in hair roots of the same individuals. Among 62 apparently deficient phenotypes, 58 heterozygote and 4 homozygote deficient genotypes were observed.  相似文献   

17.
SNP arrays are widely used in genetic research and agricultural genomics applications, and the quality of SNP genotyping data is of paramount importance. In the present study, SNP genotyping concordance and discordance were evaluated for commercial bovine SNP arrays based on two types of quality assurance (QA) samples provided by Neogen GeneSeek. The genotyping discordance rates (GDRs) between chips were on average between 0.06% and 0.37% based on the QA type I data and between 0.05% and 0.15% based on the QA type II data. The average genotyping error rate (GER) pertaining to single SNP chips, based on the QA type II data, varied between 0.02% and 0.08% per SNP and between 0.01% and 0.06% per sample. These results indicate that genotyping concordance rate was high (i.e. from 99.63% to 99.99%). Nevertheless, mitochondrial and Y chromosome SNPs had considerably elevated GDRs and GERs compared to the SNPs on the 29 autosomes and X chromosome. The majority of genotyping errors resulted from single allotyping errors, which also included the opposite instances for allele ‘dropout’ (i.e. from AB to AA or BB). Simultaneous allotyping errors on both alleles (e.g. mistaking AA for BB or vice versa) were relatively rare. Finally, a list of SNPs with a GER greater than 1% is provided. Interpretation of association effects of these SNPs, for example in genome‐wide association studies, needs to be taken with caution. The genotyping concordance information needs to be considered in the optimal design of future bovine SNP arrays.  相似文献   

18.
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.  相似文献   

19.
The immunomodulatory role of 1,25-dihydroxyvitamin D3 is well known. An association between vitamin D receptor (VDR) gene BsmI polymorphisms and systemic lupus erythematosus (SLE) has been reported. To examine the characteristics of VDR gene BsmI polymorphisms in patients with SLE and the relationship of polymorphisms to the susceptibility and clinical manifestations of SLE, VDR genotypings of 101 Thai patients with SLE and 194 healthy controls were performed based on polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). The relationship between VDR gene BsmI polymorphisms and clinical manifestations of SLE was evaluated. The distribution of VDR genotyping in patients with SLE was 1.9% for BB (non-excisable allele homozygote), 21.78% for Bb (heterozygote), and 76.23% for bb (excisable allele homozygote). The distribution of VDR genotyping in the control group was 1.03% for BB, 15.98% for Bb, and 82.99% for bb. There was no statistically significant difference between the two groups (p = 0.357). The allelic distribution of B and b was similar within the groups (p = 0.173). The relationship between VDR genotype and clinical manifestation or laboratory profiles of SLE also cannot be statistically demonstrated. In conclusion, we cannot verify any association between VDR gene BsmI polymorphism and SLE. A larger study examining other VDR gene polymorphisms is proposed.  相似文献   

20.
OBJECTIVES: This is the first of two articles discussing the effect of population stratification on the type I error rate (i.e., false positive rate). This paper focuses on the confounding risk ratio (CRR). It is accepted that population stratification (PS) can produce false positive results in case-control genetic association. However, which values of population parameters lead to an increase in type I error rate is unknown. Some believe PS does not represent a serious concern, whereas others believe that PS may contribute to contradictory findings in genetic association. We used computer simulations to estimate the effect of PS on type I error rate over a wide range of disease frequencies and marker allele frequencies, and we compared the observed type I error rate to the magnitude of the confounding risk ratio. METHODS: We simulated two populations and mixed them to produce a combined population, specifying 160 different combinations of input parameters (disease prevalences and marker allele frequencies in the two populations). From the combined populations, we selected 5000 case-control datasets, each with either 50, 100, or 300 cases and controls, and determined the type I error rate. In all simulations, the marker allele and disease were independent (i.e., no association). RESULTS: The type I error rate is not substantially affected by changes in the disease prevalence per se. We found that the CRR provides a relatively poor indicator of the magnitude of the increase in type I error rate. We also derived a simple mathematical quantity, Delta, that is highly correlated with the type I error rate. In the companion article (part II, in this issue), we extend this work to multiple subpopulations and unequal sampling proportions. CONCLUSION: Based on these results, realistic combinations of disease prevalences and marker allele frequencies can substantially increase the probability of finding false evidence of marker disease associations. Furthermore, the CRR does not indicate when this will occur.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号