首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies).  相似文献   

2.
Population size information is critical for managing endangered or harvested populations. Population size can now be estimated from non-invasive genetic sampling. However, pitfalls remain such as genotyping errors (allele dropout and false alleles at microsatellite loci). To evaluate the feasibility of non-invasive sampling (e.g., for population size estimation), a pilot study is required. Here, we present a pilot study consisting of (i) a genetic step to test loci amplification and to estimate allele frequencies and genotyping error rates when using faecal DNA, and (ii) a simulation step to quantify and minimise the effects of errors on estimates of population size. The pilot study was conducted on a population of red deer in a fenced natural area of 5440 ha, in France. Twelve microsatellite loci were tested for amplification and genotyping errors. The genotyping error rates for microsatellite loci were 0–0.83 (mean=0.2) for allele dropout rates and 0–0.14 (mean=0.02) for false allele rates, comparable to rates encountered in other non-invasive studies. Simulation results suggest we must conduct 6 PCR amplifications per sample (per locus) to achieve approximately 97% correct genotypes. The 3% error rate appears to have little influence on the accuracy and precision of population size estimation. This paper illustrates the importance of conducting a pilot study (including genotyping and simulations) when using non-invasive sampling to study threatened or managed populations.  相似文献   

3.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

4.
The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published models of genotyping errors on the chi-square test for independence in the 2 x 3 table. After specifying genotype frequencies for the marker locus conditional on disease status and error model in both a genetic model-based and a genetic model-free framework, we compute the asymptotic power to detect association through specification of the test's non-centrality parameter. This parameter determines the functional dependence of SSN on the genotyping error rates. Additionally, we study the dependence of SSN on linkage disequilibrium (LD), marker allele frequencies, and genotyping error rates for a dominant disease model. Increased genotyping error rate requires a larger SSN. Every 1% increase in sum of genotyping error rates requires that both case and control SSN be increased by 2-8%, with the extent of increase dependent upon the error model. For the dominant disease model, SSN is a nonlinear function of LD and genotyping error rate, with greater SSN for lower LD and higher genotyping error rate. The combination of lower LD and higher genotyping error rates requires a larger SSN than the sum of the SSN for the lower LD and for the higher genotyping error rate.  相似文献   

5.
ABSTRACT Use of non-invasive sources of DNA, such as hair or scat, to obtain a genetic mark for population estimates is becoming commonplace. Unfortunately, with such marks, potentials for genotyping errors and for the shadow effect have resulted in use of many loci and amplification of each specimen many times at each locus, drastically increasing time and cost of obtaining a population estimate. We proposed a method, the Genotyping Uncertainty Added Variance Adjustment (GUAVA), which statistically adjusts for genotyping errors and the shadow effect, thereby allowing use of fewer loci and one amplification of each specimen per locus. Using allele frequencies and estimates of genotyping error rates, we determined, for each pair of specimens, the probability that the pair was obtained from the same individual, whether or not their observed genotypes match. Using these probabilities, we reconstructed possible capture history matrices and used this distribution to obtain a population estimate. With simulated data, we consistently found our estimates had lower bias and smaller variance than estimates based on single amplifications in which genotyping error was ignored and that were comparable to estimates based on data free of genotyping errors. We also demonstrated the method on a fecal DNA data set from a population of red wolves (Canis rufus). The GUAVA estimate based on only one amplification genotypes compares favorably to the estimate based on consensus genotypes. A program to conduct the analysis is available from the first author for UNIX or Windows platforms. Application of GUAVA may allow for increased accuracy in population estimates at reduced cost.  相似文献   

6.
Genotypes are frequently used to identify parentage. Such analysis is notoriously vulnerable to genotyping error, and there is ongoing debate regarding how to solve this problem. Many scientists have used the computer program cervus to estimate parentage, and have taken advantage of its option to allow for genotyping error. In this study, we show that the likelihood equations used by versions 1.0 and 2.0 of cervus to accommodate genotyping error miscalculate the probability of observing an erroneous genotype. Computer simulation and reanalysis of paternity in Rum red deer show that correcting this error increases success in paternity assignment, and that there is a clear benefit to accommodating genotyping errors when errors are present. A new version of cervus (3.0) implementing the corrected likelihood equations is available at http://www.fieldgenetics.com .  相似文献   

7.
Molecular ecologists must be vigilant in detecting and accounting for genotyping error, yet potential errors stemming from dye-induced mobility shift (dye shift) may be frequently neglected and largely unknown to researchers who employ 3-primer systems with automated genotyping. When left uncorrected, dye shift can lead to mis-scoring alleles and even to falsely calling new alleles if different dyes are used to genotype the same locus in subsequent reactions. When we used four different fluorophore labels from a standard dye set to genotype the same set of loci, differences in the resulting size estimates for a single allele ranged from 2.07 bp to 3.68 bp. The strongest effects were associated with the fluorophore PET, and relative degree of dye shift was inversely related to locus size. We found little evidence in the literature that dye shift is regularly accounted for in 3-primer studies, despite knowledge of this phenomenon existing for over a decade. However, we did find some references to erroneous standard correction factors for the same set of dyes that we tested. We thus reiterate the need for strict quality control when attempting to reduce possible sources of genotyping error, and in cases where different dyes are applied to a single locus, perhaps mistakenly, we strongly discourage researchers from assuming generic correction patterns.  相似文献   

8.
Microsatellite genotyping from samples with varying quality can result in an uneven distribution of errors. Previous studies reporting error rates have focused on estimating the effects of both randomly distributed and locus‐specific errors. Sample‐specific errors, however, can also significantly affect results in population studies despite a large sample size. From two studies including six microsatellite markers genotyped from 272 sperm whale DNA samples, and 33 microsatellites genotyped from 213 bowhead whales, we investigated the effects of sample‐ and locus‐specific errors on calculations of Hardy–Weinberg equilibrium. The results of a jackknife analysis in these two studies identified seven individuals that were highly influential on estimates of Hardy–Weinberg equilibrium for six different markers. In each case, the influential individual was homozygous for a rare allele. Our results demonstrate that Hardy–Weinberg P values are very sensitive to homozygosity in rare alleles for single individuals, and that > 50% of these cases involved genotype errors likely due to low sample quality. This raises the possibility that even small, normal levels of laboratory errors can result in an overestimate of the degree to which markers are out of Hardy–Weinberg equilibrium and hence overestimate population structure. To avoid such bias, we recommend routine identification of influential individuals and multiple replications of those samples.  相似文献   

9.
Incorrect paternity assignment in cattle can have a major effect on rates of genetic gain. Of the 576 Israeli Holstein bulls genotyped by the BovineSNP50 BeadChip, there were 204 bulls for which the father was also genotyped. The results of 38 828 valid single nucleotide polymorphisms (SNPs) were used to validate paternity, determine the genotyping error rates and determine criteria enabling deletion of defective SNPs from further analysis. Based on the criterion of >2% conflicts between the genotype of the putative sire and son, paternity was rejected for seven bulls (3.5%). The remaining bulls had fewer conflicts by one or two orders of magnitude. Excluding these seven bulls, all other discrepancies between sire and son genotypes are assumed to be caused by genotyping mistakes. The frequency of discrepancies was >0.07 for nine SNPs, and >0.025 for 81 SNPs. The overall frequency of discrepancies was reduced from 0.00017 to 0.00010 after deletion of these 81 SNPs, and the total expected fraction of genotyping errors was estimated to be 0.05%. Paternity of bulls that are genotyped for genomic selection may be verified or traced against candidate sires at virtually no additional cost.  相似文献   

10.
Zou G  Pan D  Zhao H 《Genetics》2003,164(3):1161-1173
The identification of genotyping errors is an important issue in mapping complex disease genes. Although it is common practice to genotype multiple markers in a candidate region in genetic studies, the potential benefit of jointly analyzing multiple markers to detect genotyping errors has not been investigated. In this article, we discuss genotyping error detections for a set of tightly linked markers in nuclear families, and the objective is to identify families likely to have genotyping errors at one or more markers. We make use of the fact that recombination is a very unlikely event among these markers. We first show that, with family trios, no extra information can be gained by jointly analyzing markers if no phase information is available, and error detection rates are usually low if Mendelian consistency is used as the only standard for checking errors. However, for nuclear families with more than one child, error detection rates can be greatly increased with the consideration of more markers. Error detection rates also increase with the number of children in each family. Because families displaying Mendelian consistency may still have genotyping errors, we calculate the probability that a family displaying Mendelian consistency has correct genotypes. These probabilities can help identify families that, although showing Mendelian consistency, may have genotyping errors. In addition, we examine the benefit of available haplotype frequencies in the general population on genotyping error detections. We show that both error detection rates and the probability that an observed family displaying Mendelian consistency has correct genotypes can be greatly increased when such additional information is available.  相似文献   

11.
It is well known that genotyping errors lead to loss of power in gene-mapping studies and underestimation of the strength of correlations between trait- and marker-locus genotypes. In two-point linkage analysis, these errors can be absorbed in an inflated recombination-fraction estimate, leaving the test statistic quite robust. In multipoint analysis, however, genotyping errors can easily result in false exclusion of the true location of a disease-predisposing gene. In a companion article, we described a "complex-valued" extension of the recombination fraction to accommodate errors in the assignment of trait-locus genotypes, leading to a multipoint LOD score with the same robustness to errors in trait-locus genotypes that is seen with the conventional two-point LOD score. Here, a further extension of this model to "hypercomplex-valued" recombination fractions (hereafter referred to as "hypercomplex recombination fractions") is presented, to handle random and systematic sources of marker-locus genotyping errors. This leads to a multipoint method (either "model-based" or "model-free") with the same robustness to marker-locus genotyping errors that is seen with conventional two-point analysis but with the advantage that multiple marker loci can be used jointly to increase meiotic informativeness. The cost of this increased robustness is a decrease in fine-scale resolution of the estimated map location of the trait locus, in comparison with traditional multipoint analysis. This probability model further leads to algorithms for the estimation of the lower bounds for the error rates for genomewide and locus-specific genotyping, based on the null-hypothesis distribution of the LOD-score statistic in the presence of such errors. It is argued that those genome scans in which the LOD score is 0 for >50% of the genome are likely to be characterized by high rates of genotyping errors in general.  相似文献   

12.
In noninvasive genetic sampling, when genotyping error rates are high and recapture rates are low, misidentification of individuals can lead to overestimation of population size. Thus, estimating genotyping errors is imperative. Nonetheless, conducting multiple polymerase chain reactions (PCRs) at multiple loci is time-consuming and costly. To address the controversy regarding the minimum number of PCRs required for obtaining a consensus genotype, we compared consumer-style the performance of two genotyping protocols (multiple-tubes and 'comparative method') in respect to genotyping success and error rates. Our results from 48 faecal samples of river otters (Lontra canadensis) collected in Wyoming in 2003, and from blood samples of five captive river otters amplified with four different primers, suggest that use of the comparative genotyping protocol can minimize the number of PCRs per locus. For all but five samples at one locus, the same consensus genotypes were reached with fewer PCRs and with reduced error rates with this protocol compared to the multiple-tubes method. This finding is reassuring because genotyping errors can occur at relatively high rates even in tissues such as blood and hair. In addition, we found that loci that amplify readily and yield consensus genotypes, may still exhibit high error rates (7-32%) and that amplification with different primers resulted in different types and rates of error. Thus, assigning a genotype based on a single PCR for several loci could result in misidentification of individuals. We recommend that programs designed to statistically assign consensus genotypes should be modified to allow the different treatment of heterozygotes and homozygotes intrinsic to the comparative method.  相似文献   

13.
Allelic dropouts (ADO) are an important source of genotyping error and because of their negative impact on non-invasive sampling techniques, have become the focus of considerable attention. Previous studies have noted that ADO rates are greater with increasing allele size and in tetranucleotides. It has also been suggested, but not tested, that ADO rates may be higher in studies using cross-species microsatellites and that mutations may play a role in ADO rates. Here we examine the relationship between ADO rates and the relationship between evolutionary distance since divergence time between species for which the microsatellite was designed for and species on which it was used (divergence times), and how this may interact with median allele size. In addition, as the adenosine (A) and thymine (T) content of the primer may increase mutation rates, we also included total % AT content of the primer in the analyses. Finally, we examined whether other commonly associated causes of ADO (i.e. repeat motif length, median allele size and allele number) co-varied. We found that ADO rates were positively associated to divergence time and median allele size. Repeat motif length, median allele size and allele number positively covaried suggesting a link between mutability and these parameters. Results from previous studies that did not correct for co-variation among these parameters may have been confounded. AT content of the primer was positively associated with ADO rates. The best linear regression model contained divergence time, median allele size and total % AT content, explaining 21% of the variation in ADO rates. The available evidence suggests that mutations partly cause ADO and that studies using cross-species microsatellites may be at higher risk of ADO. Based on our results we highlight some important considerations in the selection of microsatellites for all conservation genetic studies.  相似文献   

14.
As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele‐calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2‐, 3‐ or 4‐base‐pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time.  相似文献   

15.
16.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.  相似文献   

17.
In spite of more than a decade of research on noninvasive genetic sampling, the low quality and quantity of DNA in noninvasive studies continue to plague researchers. Effects of locus size on error have been documented but are still poorly understood. Further, sources of error other than allelic dropout have been described but are often not well quantified. Here we analyse the effects of locus size on allelic dropout, amplification success and error rates in noninvasive genotyping studies of three species, and quantify error other than allelic dropout.  相似文献   

18.
Tung L  Gordon D  Finch SJ 《Human heredity》2007,63(2):101-110
This paper extends gene-environment (G x E) interaction study designs in which the gene (G) is known and the environmental variable (E) is specified to the analysis of 'time-to-event' data, using Cox proportional hazards (PH) modeling. The objectives are to assess whether a random sample of subjects can be used to detect a specific G x E interaction and to study the sensitivity of the power of PH modeling to genotype misclassification. We find that a random sample of 2,100 is sufficient to detect a moderate G x E interaction. The increase in sample size necessary (SSN) to maintain Type I and Type II error rates is calculated for each of the 6 genotyping errors for both dominant and recessive modes of inheritance (MOI). The increase in SSN required is relatively small when each genotyping error rate is less than 1% and the disease allele frequency is between 0.2 and 0.5. The genotyping errors that require the greatest increase in SSN are any misclassification of a subject without the at-risk genotype as having the at-risk genotype. Such errors require an indefinitely large increase in SSN as the disease allele frequency approaches 0, suggesting that it is especially important that subjects recorded as having the at-risk genotype be correctly genotyped. Additionally, for a dominant MOI, large increases in SSN can occur with large disease allele frequency.  相似文献   

19.
Cheng KF  Chen JH 《Human heredity》2007,64(2):114-122
The transmission/disequilibrium test (TDT), a family based test of linkage and association, is a popular test for studies of complex inheritance, as it is nonparametric and robust against spurious conclusions induced by hidden genetic structure, such as stratification or admixture. However, the TDT may be biased by genotyping errors. Undetected genotyping errors may be contributing to an inflated type I error rate among reported TDT-derived associations. To adjust for bias, a popular approach is to assume a genotype error model for describing the pattern of errors and propose association tests using likelihood method. However, all model-based approaches tend to perform unsatisfactorily if the related genotyping error rates are not identical across all families. In this paper, we propose a TDT-type association test which is not only simple, robust against population stratification (and hence the assumption of Hardy-Weinberg equilibrium is not required), but also robust against genotyping error with error rates varying across families. Simulation studies confirm that the new test has very reasonable performance.  相似文献   

20.
In parentage assignment by exclusion, using multiple and very polymorphic loci, genotyping errors are a major cause of non‐assignment. Using stochastic simulations, we tested the possibility to allow for mismatches at one or more allele as a way to recover assignment power. This was very efficient provided the set of loci used had a high assignment power (> 99%) and the error rate was not too high (below 3–4%). In these cases, most of the theoretical assignment power could be recovered. We also showed the efficiency of the method in a practical experiment with rainbow trout.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号