首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Genotyping errors occur when the genotype determined after molecular analysis does not correspond to the real genotype of the individual under consideration. Virtually every genetic data set includes some erroneous genotypes, but genotyping errors remain a taboo subject in population genetics, even though they might greatly bias the final conclusions, especially for studies based on individual identification. Here, we consider four case studies representing a large variety of population genetics investigations differing in their sampling strategies (noninvasive or traditional), in the type of organism studied (plant or animal) and the molecular markers used [microsatellites or amplified fragment length polymorphisms (AFLPs)]. In these data sets, the estimated genotyping error rate ranges from 0.8% for microsatellite loci from bear tissues to 2.6% for AFLP loci from dwarf birch leaves. Main sources of errors were allelic dropouts for microsatellites and differences in peak intensities for AFLPs, but in both cases human factors were non-negligible error generators. Therefore, tracking genotyping errors and identifying their causes are necessary to clean up the data sets and validate the final results according to the precision required. In addition, we propose the outline of a protocol designed to limit and quantify genotyping errors at each step of the genotyping process. In particular, we recommend (i) several efficient precautions to prevent contaminations and technical artefacts; (ii) systematic use of blind samples and automation; (iii) experience and rigor for laboratory work and scoring; and (iv) systematic reporting of the error rate in population genetics studies.  相似文献   

2.
Despite much discussion of the importance of quantifying and reporting genotyping error in molecular studies, it is still not standard practice in the literature. This is particularly a concern for amplified fragment length polymorphism (AFLP) studies, where differences in laboratory, peak‐calling and locus‐selection protocols can generate data sets varying widely in genotyping error rate, the number of loci used and potentially estimates of genetic diversity or differentiation. In our experience, papers rarely provide adequate information on AFLP reproducibility, making meaningful comparisons among studies difficult. To quantify the extent of this problem, we reviewed the current molecular ecology literature (470 recent AFLP articles) to determine the proportion of studies that report an error rate and follow established guidelines for assessing error. Fifty‐four per cent of recent articles do not report any assessment of data set reproducibility. Of those studies that do claim to have assessed reproducibility, the majority (~90%) either do not report a specific error rate or do not provide sufficient details to allow the reader to judge whether error was assessed correctly. Even of the papers that do report an error rate and provide details, many (≥23%) do not follow recommended standards for quantifying error. These issues also exist for other marker types such as microsatellites, and next‐generation sequencing techniques, particularly those which use restriction enzymes for fragment generation. Therefore, we urge all researchers conducting genotyping studies to estimate and more transparently report genotyping error using existing guidelines and encourage journals to enforce stricter standards for the publication of genotyping studies.  相似文献   

3.
Genotyping‐by‐sequencing (GBS) and related methods are increasingly used for studies of non‐model organisms from population genetic to phylogenetic scales. We present GIbPSs, a new genotyping toolkit for the analysis of data from various protocols such as RAD, double‐digest RAD, GBS, and two‐enzyme GBS without a reference genome. GIbPSs can handle paired‐end GBS data and is able to assign reads from both strands of a restriction fragment to the same locus. GIbPSs is most suitable for population genetic and phylogeographic analyses. It avoids genotyping errors due to indel variation by identifying and discarding affected loci. GIbPSs creates a genotype database that offers rich functionality for data filtering and export in numerous formats. We performed comparative analyses of simulated and real GBS data with GIbPSs and another program, pyRAD. This program accounts for indel variation by aligning homologous sequences. GIbPSs performed better than pyRAD in several aspects. It required much less computation time and displayed higher genotyping accuracy. GIbPSs retained smaller numbers of loci overall in analyses of real GBS data. It nevertheless delivered more complete genotype matrices with greater locus overlap between individuals and greater numbers of loci sampled in all individuals.  相似文献   

4.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.  相似文献   

5.
Zhang H  Hare MP 《Heredity》2012,108(6):616-625
Phylogeographic inferences about gene flow are strengthened through comparison of co-distributed taxa, but also depend on adequate genomic sampling. Amplified fragment length polymorphisms (AFLPs) provide a rapid and inexpensive source of multilocus allele frequency data for making genomically robust inferences. Every AFLP study initially generates markers with a range of locus-specific genotyping error rates and applies criteria to select a subset for analysis. However, there has been very little empirical evaluation of the best tradeoff between culling all but the lowest-error loci to minimize overall genotyping error versus the potential for increasing population genetic signal by retaining more loci. Here, we used AFLPs to compare population structure in co-distributed broadcast spawning (Crassostrea virginica) and brooding (Ostrea equestris) oyster species. Using existing methods for almost entirely automated marker selection and scoring, genotyping error tradeoffs were evaluated by comparing results across a nested series of data sets with mean mismatch errors of 0, 1, 2, 3, 4 and >4%. Artifactual population structure was diagnosed in high-error data sets and we assessed the low-error point at which expected population substructure signal was lost. In both species, we identified substructure patterns deemed to be inaccurate at average mismatch error rates 2 and >4%. In the species comparison, the optimum data sets showed higher gene flow for the brooding oyster with more oceanic salinity tolerances. AFLP tradeoffs may differ among studies, but our results suggest that important signal may be lost in the pursuit of 'acceptable' error levels and our procedures provide a general method for empirically exploring these tradeoffs.  相似文献   

6.
Using amplified fragment length polymorphism (AFLP) fingerprinting, selective genotyping was performed to determine if this method was effective for selecting superior breeding stock. Forty-eight cows with extreme genetic merit for beef marbling score (BMS) were selected from a population of Japanese Black cattle (n = 4462), including 25 with the highest for predicted breeding value (PBV) and 23 with the lowest. Sixteen AFLP fragments were selected for further analysis based on fragment frequency differences between the high and low groups. A linear discriminant analysis using these AFLP fragments was applied in order to derive a discriminant function that classified the cows into high and low groups. Seven of the 16 fragments were included in the resulting function and the discriminant scores (general genetic values, GGV) of the 48 cows were calculated using the function. These cows were clearly separated into high and low groups by GGV with a correlation ratio of 0.91 (discriminative error of 2.1%). The same function was then applied to 121 additional cows that were randomly selected from the original population. A significant regression coefficient of GGV on BMS-PBV (R2 = 0.45) was obtained, which indicates that the GGV can be used as a selection criterion for BMS in this population. These results suggest that AFLP fingerprinting can be used for animal breeding without identifying the underlying genes affecting the trait of interest.  相似文献   

7.
Genetic mapping and the selection of closely linked molecular markers for important agronomic traits require efficient, large-scale genotyping methods. A semi-automated multifluorophore technique was applied for genotyping AFLP marker loci in barley and wheat. In comparison to conventional 33P-based AFLP analysis the technique showed a higher resolution of amplicons, thus increasing the number of distinguishable fragments. Automated sizing of the same fragment in different lanes or different gels showed high conformity, allowing subsequent unambigous allele-typing. Simultaneous electrophoresis of different AFLP samples in one lane (multimixing), as well as simultaneous amplification of AFLP fragments with different primer combinations in one reaction (multiplexing), displayed consistent results with respect to fragment number, polymorphic peaks and correct size-calling. The accuracy of semi-automated co-dominant analysis for hemizygous AFLP markers in an F2 population was too low, proposing the use of dominant allele-typing defaults. Nevertheless, the efficiency of genetic mapping, especially of complex plant genomes, will be accelerated by combining the presented genotyping procedures. Received: 10 April 1999 / Accepted: 11 May 1999  相似文献   

8.
Piepho HP  Koch G 《Genetics》2000,155(3):1459-1468
Amplified fragment length polymorphisms (AFLPs) currently are among the most widely used marker systems. In many studies, AFLPs are analyzed on the basis of the presence or absence of a band on an electrophoretic gel. As a result, dominant homozygous individuals are not distinguished from heterozygous individuals, resulting in a considerable loss of information. This article shows how codominant information can be obtained if the amount of PCR products is quantified. Due to measurement variation, genotyping on the basis of such information is not error-free. We propose use of normal mixture distributions to determine the most likely genotype, given the data. The method is exemplified using AFLP data from sugar beet.  相似文献   

9.
Many plants and animals of polyploid origin are currently enjoying a genomics explosion enabled by modern sequencing and genotyping technologies. However, routine filtering of duplicated loci in most studies using genotyping by sequencing introduces an unacceptable, but often overlooked, bias when detecting selection. Retained duplicates from ancient whole‐genome duplications (WGDs) may be found throughout genomes, whereas retained duplicates from recent WGDs are concentrated at distal ends of some chromosome arms. Additionally, segmental duplicates can be found at distal ends or nearly anywhere in a genome. Evidence shows that these duplications facilitate adaptation through one of two pathways: neo‐functionalization or increased gene expression. Filtering duplicates removes distal ends of some chromosomes, and distal ends are especially known to harbour adaptively important genes. Thus, filtering of duplicated loci impoverishes the interpretation of genomic data as signals from contiguous duplicated genes are ignored. We review existing strategies to genotype and map duplicated loci; we focus in detail on an overlooked strategy of using gynogenetic haploids (1N) as a part of new genotyping by sequencing studies. We provide guidelines on how to use this haploid strategy for studies on polyploid‐origin vertebrates including how it can be used to screen duplicated loci in natural populations. We conclude by discussing areas of research that will benefit from better inclusion of polyploid loci; we particularly stress the sometimes overlooked fact that basing genomic studies on dense maps provides value added in the form of locating and annotating outlier loci or colocating outliers into islands of divergence.  相似文献   

10.
采用扩增片段长度多态性技术(AFLP)对奈瑟氏淋球菌菌株进行基因分型研究。以EcoRI和MesI酶切26株淋球菌临床分离株基因组,并进行AFLP分析。同一地区的淋球菌分离株之间存在相当大的DNA多态性。AFLP是鉴别淋球菌临床分离株有用而敏感的基因分型技术,有助于了解流行淋球菌菌株的来源、流行菌株之间的克隆相关性,以及抗生素耐药性菌株的传播情况。  相似文献   

11.
As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele‐calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2‐, 3‐ or 4‐base‐pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time.  相似文献   

12.
In this study we developed eight quantitative PCR (qPCR) assays to evaluate the starting copy number of nuclear and mitochondrial DNA fragments ranging from 75 to 350 base-pairs in DNA extracts from Chinook salmon tissues with varying quality. Samples were genotyped with 13 microsatellite and 29 SNP assays and average genotyping success for good, intermediate, and poor quality samples was 96%, 24%, and 24% for microsatellite loci, and 98%, 97%, and 79% for SNPs, respectively. As measured by qPCR, good quality samples had a consistently high number of starting copies across all fragment sizes with little change between the smallest and largest size. In contrast, the intermediate and poor quality samples displayed decreases in starting copy number as fragment size increased, and was most pronounced with poor samples. Logistic regression of genotyping success by starting copy number indicated that in order to achieve at least 90% genotyping success, approximately 1,000 starting copies of nuclear DNA are necessary for microsatellite loci, and as few as 14 starting copies for SNP assays (but we recommend at least 50 copies to reduce genotyping error). While these guidelines apply specifically to Chinook salmon and the genetic markers included in this study, the principles are transferable to other species and markers due to the underlying process associated with template quantity and PCR amplification.  相似文献   

13.
Abstract Noninvasive DNA sampling allows studies of natural populations without disturbing the target animals. Unfortunately, high genotyping error rates often make noninvasive studies difficult. We report low error rates (0.0–7.5%/locus) when genotyping 18 microsatellite loci in only 4 multiplex polymerase chain reaction amplifications using fecal DNA from bighorn sheep (Ovis canadensis). The average locus-specific error rates varied significantly between the 2 populations (0.13% vs. 1.6%; P < 0.001), as did multi-locus genotype error rates (2.3% vs. 14.1%; P < 0.007). This illustrates the importance of quantifying error rates in each study population (and for each season and sample preservation method) before initiating a noninvasive study. Our error rates are among the lowest reported for fecal samples collected noninvasively in the field. This and other recent studies suggest that noninvasive fecal samples can be used in species with pellet-form feces for nearly any study (e.g., of population structure, gene flow, dispersal, parentage, and even genome-wide studies to detect local adaptation) that previously required high-quality blood or tissue samples.  相似文献   

14.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

15.
In noninvasive genetic sampling, when genotyping error rates are high and recapture rates are low, misidentification of individuals can lead to overestimation of population size. Thus, estimating genotyping errors is imperative. Nonetheless, conducting multiple polymerase chain reactions (PCRs) at multiple loci is time-consuming and costly. To address the controversy regarding the minimum number of PCRs required for obtaining a consensus genotype, we compared consumer-style the performance of two genotyping protocols (multiple-tubes and 'comparative method') in respect to genotyping success and error rates. Our results from 48 faecal samples of river otters (Lontra canadensis) collected in Wyoming in 2003, and from blood samples of five captive river otters amplified with four different primers, suggest that use of the comparative genotyping protocol can minimize the number of PCRs per locus. For all but five samples at one locus, the same consensus genotypes were reached with fewer PCRs and with reduced error rates with this protocol compared to the multiple-tubes method. This finding is reassuring because genotyping errors can occur at relatively high rates even in tissues such as blood and hair. In addition, we found that loci that amplify readily and yield consensus genotypes, may still exhibit high error rates (7-32%) and that amplification with different primers resulted in different types and rates of error. Thus, assigning a genotype based on a single PCR for several loci could result in misidentification of individuals. We recommend that programs designed to statistically assign consensus genotypes should be modified to allow the different treatment of heterozygotes and homozygotes intrinsic to the comparative method.  相似文献   

16.
megasat is software that enables genotyping of microsatellite loci using next‐generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. megasat reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important feature of megasat is the generation of histograms of the length–frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. megasat is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demonstrate megasat using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0–1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs.  相似文献   

17.

Background  

Amplified fragment length polymorphism (AFLP) is a PCR-based technique that involves restriction of genomic DNA followed by ligation of adaptors to the fragments generated and selective PCR amplification of a subset of these fragments. The amplified fragments are separated on a sequencing gel and visualized by autoradiography or fluorescent sequencing equipment. AFLP allows high-resolution genotyping but the lack of a format for databasing and comparison of AFLP fingerprint profiles limits its wider applications in profiling large numbers of biological samples.  相似文献   

18.
Current routine genotyping methods typically do not provide haplotype information, which is essential for many analyses of fine-scale molecular-genetics data. Haplotypes can be obtained, at considerable cost, experimentally or (partially) through genotyping of additional family members. Alternatively, a statistical method can be used to infer phase and to reconstruct haplotypes. We present a new statistical method, applicable to genotype data at linked loci from a population sample, that improves substantially on current algorithms; often, error rates are reduced by > 50%, relative to its nearest competitor. Furthermore, our algorithm performs well in absolute terms, suggesting that reconstructing haplotypes experimentally or by genotyping additional family members may be an inefficient use of resources.  相似文献   

19.
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors.  相似文献   

20.
In non‐model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here, we introduce three strategies to estimate genotyping error rates from such data: through the comparison to high quality genotypes obtained with a different technique, from individual replicates, or from a population sample when assuming Hardy‐Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD‐seq), arguably the most popular reduced representation sequencing technique, revealed per‐allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome‐wide and local ancestry of well characterized hybrids of two Eurasian poplar (Populus) species, such high error rates may lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either by incorporating genotyping errors directly or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD‐seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids. Based on these findings, we strongly recommend carefully assessing genotyping error rates in reduced representation sequencing experiments, and to properly account for these in downstream analyses, for instance using the tools presented here.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号