首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Errors while genotyping are inevitable and can reduce the power to detect linkage. However, does genotyping error have the same impact on linkage results for single-nucleotide polymorphism (SNP) and microsatellite (MS) marker maps? To evaluate this question we detected genotyping errors that are consistent with Mendelian inheritance using large changes in multipoint identity-by-descent sharing in neighboring markers. Only a small fraction of Mendelian consistent errors were detectable (e.g., 18% of MS and 2.4% of SNP genotyping errors). More SNP genotyping errors are Mendelian consistent compared to MS genotyping errors, so genotyping error may have a greater impact on linkage results using SNP marker maps. We also evaluated the effect of genotyping error on the power and type I error rate using simulated nuclear families with missing parents under 0, 0.14, and 2.8% genotyping error rates. In the presence of genotyping error, we found that the power to detect a true linkage signal was greater for SNP (75%) than MS (67%) marker maps, although there were also slightly more false-positive signals using SNP marker maps (5 compared with 3 for MS). Finally, we evaluated the usefulness of accounting for genotyping error in the SNP data using a likelihood-based approach, which restores some of the power that is lost when genotyping error is introduced.  相似文献   

2.
SNP arrays are widely used in genetic research and agricultural genomics applications, and the quality of SNP genotyping data is of paramount importance. In the present study, SNP genotyping concordance and discordance were evaluated for commercial bovine SNP arrays based on two types of quality assurance (QA) samples provided by Neogen GeneSeek. The genotyping discordance rates (GDRs) between chips were on average between 0.06% and 0.37% based on the QA type I data and between 0.05% and 0.15% based on the QA type II data. The average genotyping error rate (GER) pertaining to single SNP chips, based on the QA type II data, varied between 0.02% and 0.08% per SNP and between 0.01% and 0.06% per sample. These results indicate that genotyping concordance rate was high (i.e. from 99.63% to 99.99%). Nevertheless, mitochondrial and Y chromosome SNPs had considerably elevated GDRs and GERs compared to the SNPs on the 29 autosomes and X chromosome. The majority of genotyping errors resulted from single allotyping errors, which also included the opposite instances for allele ‘dropout’ (i.e. from AB to AA or BB). Simultaneous allotyping errors on both alleles (e.g. mistaking AA for BB or vice versa) were relatively rare. Finally, a list of SNPs with a GER greater than 1% is provided. Interpretation of association effects of these SNPs, for example in genome‐wide association studies, needs to be taken with caution. The genotyping concordance information needs to be considered in the optimal design of future bovine SNP arrays.  相似文献   

3.
Moskvina V  Schmidt KM 《Biometrics》2006,62(4):1116-1123
With the availability of fast genotyping methods and genomic databases, the search for statistical association of single nucleotide polymorphisms with a complex trait has become an important methodology in medical genetics. However, even fairly rare errors occurring during the genotyping process can lead to spurious association results and decrease in statistical power. We develop a systematic approach to study how genotyping errors change the genotype distribution in a sample. The general M-marker case is reduced to that of a single-marker locus by recognizing the underlying tensor-product structure of the error matrix. Both method and general conclusions apply to the general error model; we give detailed results for allele-based errors of size depending both on the marker locus and the allele present. Multiple errors are treated in terms of the associated diffusion process on the space of genotype distributions. We find that certain genotype and haplotype distributions remain unchanged under genotyping errors, and that genotyping errors generally render the distribution more similar to the stable one. In case-control association studies, this will lead to loss of statistical power for nondifferential genotyping errors and increase in type I error for differential genotyping errors. Moreover, we show that allele-based genotyping errors do not disturb Hardy-Weinberg equilibrium in the genotype distribution. In this setting we also identify maximally affected distributions. As they correspond to situations with rare alleles and marker loci in high linkage disequilibrium, careful checking for genotyping errors is advisable when significant association based on such alleles/haplotypes is observed in association studies.  相似文献   

4.
The identification of genes contributing to complex diseases and quantitative traits requires genetic data of high fidelity, because undetected errors and mutations can profoundly affect linkage information. The recent emphasis on the use of the sibling-pair design eliminates or decreases the likelihood of detection of genotyping errors and marker mutations through apparent Mendelian incompatibilities or close double recombinants. In this article, we describe a hidden Markov method for detecting genotyping errors and mutations in multilocus linkage data. Specifically, we calculate the posterior probability of genotyping error or mutation for each sibling-pair-marker combination, conditional on all marker data and an assumed genotype-error rate. The method is designed for use with sibling-pair data when parental genotypes are unavailable. Through Monte Carlo simulation, we explore the effects of map density, marker-allele frequencies, marker position, and genotype-error rate on the accuracy of our error-detection method. In addition, we examine the impact of genotyping errors and error detection and correction on multipoint linkage information. We illustrate that even moderate error rates can result in substantial loss of linkage information, given efforts to fine-map a putative disease locus. Although simulations suggest that our method detects 相似文献   

5.
Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies).  相似文献   

6.
Microsatellite genotyping is a common DNA characterization technique in population, ecological and evolutionary genetics research. Since different alleles are sized relative to internal size-standards, different laboratories must calibrate and standardize allelic designations when exchanging data. This interchange of microsatellite data can often prove problematic. Here, 16 microsatellite loci were calibrated and standardized for the Atlantic salmon, Salmo salar, across 12 laboratories. Although inconsistencies were observed, particularly due to differences between migration of DNA fragments and actual allelic size ('size shifts'), inter-laboratory calibration was successful. Standardization also allowed an assessment of the degree and partitioning of genotyping error. Notably, the global allelic error rate was reduced from 0.05 ± 0.01 prior to calibration to 0.01 ± 0.002 post-calibration. Most errors were found to occur during analysis (i.e. when size-calling alleles; the mean proportion of all errors that were analytical errors across loci was 0.58 after calibration). No evidence was found of an association between the degree of error and allelic size range of a locus, number of alleles, nor repeat type, nor was there evidence that genotyping errors were more prevalent when a laboratory analyzed samples outside of the usual geographic area they encounter. The microsatellite calibration between laboratories presented here will be especially important for genetic assignment of marine-caught Atlantic salmon, enabling analysis of marine mortality, a major factor in the observed declines of this highly valued species.  相似文献   

7.
The transmission/disequilibrium test (TDT), a family-based test of linkage and association, is a popular and intuitive statistical test for studies of complex inheritance, as it is nonparametric and robust to population stratification. We carried out a literature search and located 79 significant TDT-derived associations between a microsatellite marker allele and a disease. Among these, there were 31 (39%) in which the most common allele was found to exhibit distorted transmission to affected offspring, implying that the allele may be associated with either susceptibility to or protection from a disease. In 27 of these 31 studies (87%), the most common allele appeared to be overtransmitted to affected offspring (a risk factor), and, in the remaining 4 studies, the most common allele appeared to be undertransmitted (a protective factor). In a second literature search, we identified 92 case-control studies in which a microsatellite marker allele was found to have significantly different frequencies in case and control groups. Of these, there were 37 instances (40%) in which the most common allele was involved. In 12 of these 37 studies (32%), the most common allele was enriched in cases relative to controls (a risk factor), and, in the remaining 25 studies, the most common allele was enriched in controls (a protective factor). Thus, the most common allele appears to be a risk factor when identified through the TDT, and it appears to be protective when identified through case-control analysis. To understand this phenomenon, we incorporated an error model into the calculation of the TDT statistic. We show that undetected genotyping error can cause apparent transmission distortion at markers with alleles of unequal frequency. We demonstrate that this distortion is in the direction of overtransmission for common alleles. Therefore, we conclude that undetected genotyping errors may be contributing to an inflated false-positive rate among reported TDT-derived associations and that genotyping fidelity must be increased.  相似文献   

8.
Several programs are currently available for the detection of genotyping error that may or may not be Mendelianly inconsistent. However, no systematic study exists that evaluates their performance under varying pedigree structures and sizes, marker spacing, and allele frequencies. Our simulation study compares four multipoint methods: Merlin, Mendel4, SimWalk2, and Sibmed. We look at empirical thresholds, power, and false-positive rates on 7 small pedigree structures that included sibships with and without genotyped parents, and a three-generation pedigree, using 11 microsatellite markers with 3 different map spacings. Simulated data includes 5,000 replicates of each pedigree structure and marker map, with random genotyping errors in about 4% of the middle marker's genotypes. We found that the default thresholds used by these programs provide low power (47-72%). Power is improved more by adding genotyped siblings than by using more closely spaced markers. Some mistyping methods are sensitive to the frequencies of the observed alleles. Siblings of mistyped individuals have elevated false-positive rates, as do markers close to the mistyped marker. We conclude that thresholds should be decided based on the pedigree and marker data and that greater focus should be placed on modeling genotyping error when computing likelihoods, rather than on detecting and eliminating genotyping errors.  相似文献   

9.
Gene-mapping studies routinely rely on checking for Mendelian transmission of marker alleles in a pedigree, as a means of screening for genotyping errors and mutations, with the implicit assumption that, if a pedigree is consistent with Mendel's laws of inheritance, then there are no genotyping errors. However, the occurrence of inheritance inconsistencies alone is an inadequate measure of the number of genotyping errors, since the rate of occurrence depends on the number and relationships of genotyped pedigree members, the type of errors, and the distribution of marker-allele frequencies. In this article, we calculate the expected probability of detection of a genotyping error or mutation as an inheritance inconsistency in nuclear-family data, as a function of both the number of genotyped parents and offspring and the marker-allele frequency distribution. Through computer simulation, we explore the sensitivity of our analytic calculations to the underlying error model. Under a random-allele-error model, we find that detection rates are 51%-77% for multiallelic markers and 13%-75% for biallelic markers; detection rates are generally lower when the error occurs in a parent than in an offspring, unless a large number of offspring are genotyped. Errors are especially difficult to detect for biallelic markers with equally frequent alleles, even when both parents are genotyped; in this case, the maximum detection rate is 34% for four-person nuclear families. Error detection in families in which parents are not genotyped is limited, even with multiallelic markers. Given these results, we recommend that additional error checking (e.g., on the basis of multipoint analysis) be performed, beyond routine checking for Mendelian consistency. Furthermore, our results permit assessment of the plausibility of an observed number of inheritance inconsistencies for a family, allowing the detection of likely pedigree-rather than genotyping-errors in the early stages of a genome scan. Such early assessments are valuable in either the targeting of families for resampling or discontinued genotyping.  相似文献   

10.
We report 22 new polymorphic microsatellites for the Ivory gull (Pagophila eburnea), and we describe how they can be efficiently co-amplified using multiplexed polymerase chain reactions. In addition, we report DNA concentration, amplification success, rates of genotyping errors and the number of genotyping repetitions required to obtain reliable data with three types of noninvasive or nondestructive samples: shed feathers collected in colonies, feathers plucked from living individuals and buccal swabs. In two populations from Greenland (n=21) and Russia (Severnaya Zemlya Archipelago, n=21), the number of alleles per locus varied between 2 and 17, and expected heterozygosity per population ranged from 0.18 to 0.92. Twenty of the markers conformed to Hardy-Weinberg and linkage equilibrium expectations. Most markers were easily amplified and highly reliable when analysed from buccal swabs and plucked feathers, showing that buccal swabbing is a very efficient approach allowing good quality DNA retrieval. Although DNA amplification success using single shed feathers was generally high, the genotypes obtained from this type of samples were prone to error and thus need to be amplified several times. The set of microsatellite markers described here together with multiplex amplification conditions and genotyping error rates will be useful for population genetic studies of the Ivory gull.  相似文献   

11.
The use of noninvasive genetic sampling (NGS) for surveying wild populations is increasing rapidly. Currently, only a limited number of studies have evaluated potential biases associated with NGS. This paper evaluates the potential errors associated with analysing mixed samples drawn from multiple animals. Most NGS studies assume that mixed samples will be identified and removed during the genotyping process. We evaluated this assumption by creating 128 mixed samples of extracted DNA from brown bear (Ursus arctos) hair samples. These mixed samples were genotyped and screened for errors at six microsatellite loci according to protocols consistent with those used in other NGS studies. Five mixed samples produced acceptable genotypes after the first screening. However, all mixed samples produced multiple alleles at one or more loci, amplified as only one of the source samples, or yielded inconsistent electropherograms by the final stage of the error-checking process. These processes could potentially reduce the number of individuals observed in NGS studies, but errors should be conservative within demographic estimates. Researchers should be aware of the potential for mixed samples and carefully design gel analysis criteria and error checking protocols to detect mixed samples.  相似文献   

12.
Novel algorithm for automated genotyping of microsatellites   总被引:1,自引:0,他引:1       下载免费PDF全文
Microsatellites or short tandem repeats (STRs) are abundant in the human genome with easily assayed polymorphisms, providing powerful genetic tools for mapping both Mendelian and complex traits. Microsatellite genotyping requires detection of the products of polymerase chain reaction (PCR) amplification by electrophoresis, and analysis of the peak data for discrimination of the true allele. A high-throughput genotyping approach requires computer-based automation at both the detection and analysis phases. In order to achieve this, complicated peak patterns from individual alleles must be interpreted in order to assign alleles. Previous methods consider limited types of noise peaks and cannot provide enough accuracy. By pattern recognition of various types of noise peaks, such as stutter peaks and additional peaks, we have achieved an overall average accuracy of 94% for allele calling in our actual data. Our algorithm is crucial for a high-throughput genotyping system for microsatellite markers by reducing manual editing and human errors.  相似文献   

13.
Redundant duplication among putative Nordic spring barley material held at 12 gene banks worldwide was studied using 35 microsatellite primer pairs covering the entire barley genome. These microsatellite markers revealed an average of 7.1 alleles per locus, and a range of 1 to 17 different alleles per locus. Similarity of accession name was initially used to partition the 174 repatriated accessions into 36 potential duplicate groups, and one group containing 36 apparently unique or unrelated accessions. This partitioning was efficient to produce a distribution of mainly small average genetic distances within potential duplicate groups compared to distances from the group of unique accessions. However, comparisons within potential duplicate groups still contained large genetic distances of the same size as distances between unique accessions indicating classification errors. A bootstrap approach based on re-sampling of both microsatellite markers and alleles within marker loci was used to test for homogeneity within potential duplicate groups. The test was used in each group for sequential elimination of accessions with a significantly large average genetic distance to identify a homogeneous group. Such genetically homogeneous groups of two or more accessions were identified in 22 among the 36 potential duplicate groups studied. Results from the genetic analysis of some potential duplicate groups supported previous conclusions based on passport data through inclusion of the historically most-original accession in the genetically homogeneous group. In other potential duplicate groups the apparently most-original accession according to passport data was not included in the homogeneous set of accessions, indicating that this most-original accession does not have duplicate accessions in the group. During the present study the largest average genetic distance accepted in any homogeneous group was smaller than the smallest distance declared significant in any group, with a threshold average genetic distance of approximately 0.14. The results are discussed with respect to the identification of duplicate accessions within potential duplicate groups, as well as the elimination of genetic off types in such groups. Furthermore, large barley gene bank collections may be screened for potential duplicates with genetic distances below the suggested threshold of 0.14.  相似文献   

14.
Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps.  相似文献   

15.
Johnson PC  Haydon DT 《Genetics》2007,175(2):827-842
The importance of quantifying and accounting for stochastic genotyping errors when analyzing microsatellite data is increasingly being recognized. This awareness is motivating the development of data analysis methods that not only take errors into consideration but also recognize the difference between two distinct classes of error, allelic dropout and false alleles. Currently methods to estimate rates of allelic dropout and false alleles depend upon the availability of error-free reference genotypes or reliable pedigree data, which are often not available. We have developed a maximum-likelihood-based method for estimating these error rates from a single replication of a sample of genotypes. Simulations show it to be both accurate and robust to modest violations of its underlying assumptions. We have applied the method to estimating error rates in two microsatellite data sets. It is implemented in a computer program, Pedant, which estimates allelic dropout and false allele error rates with 95% confidence regions from microsatellite genotype data and performs power analysis. Pedant is freely available at http://www.stats.gla.ac.uk/ approximately paulj/pedant.html.  相似文献   

16.
Noninvasive genetics based on microsatellite markers has become an indispensable tool for wildlife monitoring and conservation research over the past decades. However, microsatellites have several drawbacks, such as the lack of standardisation between laboratories and high error rates. Here, we propose an alternative single‐nucleotide polymorphism (SNP)‐based marker system for noninvasively collected samples, which promises to solve these problems. Using nanofluidic SNP genotyping technology (Fluidigm), we genotyped 158 wolf samples (tissue, scats, hairs, urine) for 192 SNP loci selected from the Affymetrix v2 Canine SNP Array. We carefully selected an optimised final set of 96 SNPs (and discarded the worse half), based on assay performance and reliability. We found rates of missing data in this SNP set of <10% and genotyping error of ~1%, which improves genotyping accuracy by nearly an order of magnitude when compared to published data for other marker types. Our approach provides a tool for rapid and cost‐effective genotyping of noninvasively collected wildlife samples. The ability to standardise genotype scoring combined with low error rates promises to constitute a major technological advancement and could establish SNPs as a standard marker for future wildlife monitoring.  相似文献   

17.
Chinese sea perch (Lateolabrax maculates) is one of the most important commercial species of mariculture in China. In this study, we constructed a repeat-enriched genomic DNA library of L. maculates. Eighteen dinucleotide microsatellite markers were characterized by genotyping 32 samples. The number of alleles ranged from three to nine, and the observed and expected heterozygosities ranged from 0.4516 to 1.0000 and from 0.4045 to 0.8676, respectively. Significant deviations from Hardy–Weinberg expectations were detected at four loci and linkage disequilibrium between two loci was significant after applying Bonferroni correction. The 18 highly polymorphic microsatellite markers should provide sufficient level of genetic diversity to investigate the population structure and evaluate the breeding strategy in L. maculates.  相似文献   

18.
The rapid development of a dense single-nucleotide-polymorphism marker map has stimulated numerous studies attempting to characterize the magnitude and distribution of background linkage disequilibrium (LD) within and between human populations. Although genotyping errors are an inherent problem in all LD studies, there have been few systematic investigations documenting their consequences on estimates of background LD. Therefore, we derived simple deterministic formulas to investigate the effect that genotyping errors have on four commonly used LD measures-D', r, Q, and d-in studies of background LD. We have found that genotyping error rates as small as 3% can have serious affects on these LD measures, depending on the allele frequencies and the assumed error model. Furthermore, we compared the robustness of D', r, Q, and d, in the presence of genotyping errors. In general, Q and d are more robust than D' and r, although exceptions do exist. Finally, through stochastic simulations, we illustrate how genotyping errors can lead to erroneous inferences when measures of LD between two samples are compared.  相似文献   

19.
megasat is software that enables genotyping of microsatellite loci using next‐generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. megasat reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important feature of megasat is the generation of histograms of the length–frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. megasat is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demonstrate megasat using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0–1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs.  相似文献   

20.
The red panda (Ailurus fulgens) is an endangered species distributed in the Himalaya and Hengduan Mountains and extremely difficult to monitor because it is elusive, wary and nocturnal. However, recent advances in noninvasive genetics are allowing conservationists to indirectly estimate population size of this animal. Here, we present a pilot study of individual identification of wild red pandas using DNA extracted from faeces. A chain of optimal steps in noninvasive studies were used to maximize genotyping success and minimize error rate across sampling, selection of microsatellite loci, DNA extraction and amplification and data checking. As a result, 18 individual red pandas were identified successfully from 33 faecal samples collected in the field using nine red panda-specific microsatellite loci with a low probability of identity of 1.249 × 10−3 for full siblings. Multiple methods of tracking genotyping error showed that the faecal genetic profiles possessed very few genotyping errors, with an overall error rate of 1.12 × 10−5. Our findings demonstrate the feasibility and reliability of using faeces as an effective source of DNA for estimating and monitoring wild red panda populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号