期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Quantifying the percent increase in minimum sample size for SNP genotyping errors in genetic model-based association studies

Kang SJ Finch SJ Haynes C Gordon D 《Human heredity》2004,58(3-4):139-144

Kang et al. [Genet Epidemiol 2004;26:132-141] addressed the question of which genotype misclassification errors are most costly, in terms of minimum percentage increase in sample size necessary (%MSSN) to maintain constant asymptotic power and significance level, when performing case/control studies of genetic association in a genetic model-free setting. They answered the question for single nucleotide polymorphisms (SNPs) using the 2 x 3 chi2 test of independence. We address the same question here for a genetic model-based framework. The genetic model parameters considered are: disease model (dominant, recessive), genotypic relative risk, SNP (marker) and disease allele frequency, and linkage disequilibrium. %MSSN coefficients of each of the six possible error rates are determined by expanding the non-centrality parameter of the asymptotic distribution of the 2 x 3 chi2 test under a specified alternative hypothesis to approximate %MSSN using a linear Taylor series in the error rates. In this work we assume errors misclassifying one homozygote as another homozygote are 0, since these errors are thought to rarely occur in practice. Our findings are that there are settings of the genetic model parameters that lead to large total %MSSN for both dominant and recessive models. As SNP minor allele approaches 0, total %MSSN increases without bound, independent of other genetic model parameters. In general, %MSSN is a complex function of the genetic model parameters. Use of SNPs with small minor allele frequency requires careful attention to frequency of genotyping errors to insure that power specifications are met. Software to perform these calculations for study design is available, and an example of its use to study a disease is given. 相似文献

2.

Quantitative Analysis of Single Nucleotide Polymorphisms within Copy Number Variation

Soohyun Lee Simon Kasif Zhiping Weng Charles R. Cantor 《PloS one》2008,3(12)

Background

Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.

Results

We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0∼1%.

Conclusions

Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing. 相似文献

3.

When can noninvasive samples provide sufficient information in conservation genetics studies?

O. Smith J. Wang 《Molecular ecology resources》2014,14(5):1011-1023

Noninvasive sampling, of faeces and hair for example, has enabled many genetic studies of wildlife populations. However, two prevailing problems common to these studies are small sample sizes and high genotyping errors. The first problem stems from the difficulty in collecting noninvasive samples, particularly from populations of rare or elusive species, and the second is caused by the low quantity and quality of DNA extracted from a noninvasive sample. A common question is therefore whether noninvasive sampling provides sufficient information for the analyses commonly conducted in conservation genetics studies. Here, we conducted a simulation study to investigate the effect of small sample sizes and genotyping errors on the precision and accuracy of the most commonly estimated genetic parameters. Our results indicate that small sample sizes cause little bias in measures of expected heterozygosity, pairwise F_ST and population structure, but a large downward bias in estimates of allelic diversity. Allelic dropouts and false alleles had a much smaller effect than missing data, which effectively reduces sample size further. Overall, reasonable estimates of genetic variation and population subdivision are obtainable from noninvasive samples as long as error rates are kept below a frequency of 0.2. Similarly, unbiased estimates of population clustering can be made with genotyping error rates below 0.5 when the populations are highly differentiated. These results provide a useful guide for researchers faced with studying the conservation genetics of small, endangered populations from noninvasive samples. 相似文献

4.

Estimation of genotyping error rate from repeat genotyping,unintentional recaptures and known parent–offspring comparisons in 16 microsatellite loci for brown rockfish (Sebastes auriculatus)

Maureen A. Hess James G. Rhydderch Larry L. LeClair Raymond M. Buckley Mitsuhiro Kawase Lorenz Hauser 《Molecular ecology resources》2012,12(6):1114-1123

Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies). 相似文献

5.

Importance of a pilot study for non-invasive genetic sampling: genotyping errors and population size estimation in red deer

Nathaniel Valière Christophe Bonenfant Carole Toïgo Gordon Luikart Jean-Michel Gaillard François Klein 《Conservation Genetics》2007,8(1):69-78

Population size information is critical for managing endangered or harvested populations. Population size can now be estimated from non-invasive genetic sampling. However, pitfalls remain such as genotyping errors (allele dropout and false alleles at microsatellite loci). To evaluate the feasibility of non-invasive sampling (e.g., for population size estimation), a pilot study is required. Here, we present a pilot study consisting of (i) a genetic step to test loci amplification and to estimate allele frequencies and genotyping error rates when using faecal DNA, and (ii) a simulation step to quantify and minimise the effects of errors on estimates of population size. The pilot study was conducted on a population of red deer in a fenced natural area of 5440 ha, in France. Twelve microsatellite loci were tested for amplification and genotyping errors. The genotyping error rates for microsatellite loci were 0–0.83 (mean=0.2) for allele dropout rates and 0–0.14 (mean=0.02) for false allele rates, comparable to rates encountered in other non-invasive studies. Simulation results suggest we must conduct 6 PCR amplifications per sample (per locus) to achieve approximately 97% correct genotypes. The 3% error rate appears to have little influence on the accuracy and precision of population size estimation. This paper illustrates the importance of conducting a pilot study (including genotyping and simulations) when using non-invasive sampling to study threatened or managed populations. 相似文献

6.

The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures 总被引：9，自引：0，他引：9

下载免费PDF全文

Akey JM Zhang K Xiong M Doris P Jin L 《American journal of human genetics》2001,68(6):1447-1456

The rapid development of a dense single-nucleotide-polymorphism marker map has stimulated numerous studies attempting to characterize the magnitude and distribution of background linkage disequilibrium (LD) within and between human populations. Although genotyping errors are an inherent problem in all LD studies, there have been few systematic investigations documenting their consequences on estimates of background LD. Therefore, we derived simple deterministic formulas to investigate the effect that genotyping errors have on four commonly used LD measures-D', r, Q, and d-in studies of background LD. We have found that genotyping error rates as small as 3% can have serious affects on these LD measures, depending on the allele frequencies and the assumed error model. Furthermore, we compared the robustness of D', r, Q, and d, in the presence of genotyping errors. In general, Q and d are more robust than D' and r, although exceptions do exist. Finally, through stochastic simulations, we illustrate how genotyping errors can lead to erroneous inferences when measures of LD between two samples are compared. 相似文献

7.

Detection rates for genotyping errors in SNPs using the trio design

Geller F Ziegler A 《Human heredity》2002,54(3):111-117

One well-known approach for the analysis of transmission-disequilibrium is the investigation of single nucleotide polymorphisms (SNPs) in trios consisting of an affected child and its parents. Results may be biased by erroneously given genotypes. Various reasons, among them sample swap or wrong pedigree structure, represent a possible source for biased results. As these can be partly ruled out by good study conditions together with checks for correct pedigree structure by a series of independent markers, the remaining main cause for errors is genotyping errors. Some of the errors can be detected by Mendelian checks whilst others are compatible with the pedigree structure. The extent of genotyping errors can be estimated by investigating the rate of detected genotyping errors by Mendelian checks. In many studies only one SNP of a specific genomic region is investigated by TDT which leaves Mendelian checks as the only tool to control genotyping errors. From the rate of detected errors the true error rate can be estimated. Gordon et al. [Hum Hered 1999;49:65-70] considered the case of genotyping errors that occur randomly and independently with some fixed probability for the wrong ascertainment of an allele. In practice, instead of single alleles, SNP genotypes are determined. Therefore, we study the proportion of detected errors (detection rate) based on genotypes. In contrast to Gordon et al., who reported detection rates between 25 and 30%, we obtain higher detection rates ranging from 39 up to 61% considering likely error structures in the data. We conclude that detection rates are probably substantially higher than those reported by Gordon et al. 相似文献

8.

Evaluation of genotyping concordance for commercial bovine SNP arrays using quality‐assurance samples

X.‐L. Wu J. Xu H. Li R. Ferretti J. He J. Qiu Q. Xiao B. Simpson T. Michell S. D. Kachman R. G. Tait S. Bauck 《Animal genetics》2019,50(4):367-371

SNP arrays are widely used in genetic research and agricultural genomics applications, and the quality of SNP genotyping data is of paramount importance. In the present study, SNP genotyping concordance and discordance were evaluated for commercial bovine SNP arrays based on two types of quality assurance (QA) samples provided by Neogen GeneSeek. The genotyping discordance rates (GDRs) between chips were on average between 0.06% and 0.37% based on the QA type I data and between 0.05% and 0.15% based on the QA type II data. The average genotyping error rate (GER) pertaining to single SNP chips, based on the QA type II data, varied between 0.02% and 0.08% per SNP and between 0.01% and 0.06% per sample. These results indicate that genotyping concordance rate was high (i.e. from 99.63% to 99.99%). Nevertheless, mitochondrial and Y chromosome SNPs had considerably elevated GDRs and GERs compared to the SNPs on the 29 autosomes and X chromosome. The majority of genotyping errors resulted from single allotyping errors, which also included the opposite instances for allele ‘dropout’ (i.e. from AB to AA or BB). Simultaneous allotyping errors on both alleles (e.g. mistaking AA for BB or vice versa) were relatively rare. Finally, a list of SNPs with a GER greater than 1% is provided. Interpretation of association effects of these SNPs, for example in genome‐wide association studies, needs to be taken with caution. The genotyping concordance information needs to be considered in the optimal design of future bovine SNP arrays. 相似文献

9.

Effect of genotyping error in model-free linkage analysis using microsatellite or single-nucleotide polymorphism marker maps

Thompson CL Baechle D Lu Q Mathew G Song Y Iyengar SK Gray-McGuire C Goddard KA 《BMC genetics》2005,6(Z1):S153

Errors while genotyping are inevitable and can reduce the power to detect linkage. However, does genotyping error have the same impact on linkage results for single-nucleotide polymorphism (SNP) and microsatellite (MS) marker maps? To evaluate this question we detected genotyping errors that are consistent with Mendelian inheritance using large changes in multipoint identity-by-descent sharing in neighboring markers. Only a small fraction of Mendelian consistent errors were detectable (e.g., 18% of MS and 2.4% of SNP genotyping errors). More SNP genotyping errors are Mendelian consistent compared to MS genotyping errors, so genotyping error may have a greater impact on linkage results using SNP marker maps. We also evaluated the effect of genotyping error on the power and type I error rate using simulated nuclear families with missing parents under 0, 0.14, and 2.8% genotyping error rates. In the presence of genotyping error, we found that the power to detect a true linkage signal was greater for SNP (75%) than MS (67%) marker maps, although there were also slightly more false-positive signals using SNP marker maps (5 compared with 3 for MS). Finally, we evaluated the usefulness of accounting for genotyping error in the SNP data using a likelihood-based approach, which restores some of the power that is lost when genotyping error is introduced. 相似文献

10.

Quality control genotyping for assessment of genetic identity and purity in diverse tropical maize inbred lines

Kassa Semagn Yoseph Beyene Dan Makumbi Stephen Mugo B. M. Prasanna Cosmos Magorokosho Gary Atlin 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2012,125(7):1487-1501

Quality control (QC) genotyping is an important component in breeding, but to our knowledge there are not well established protocols for its implementation in practical breeding programs. The objectives of our study were to (a) ascertain genetic identity among 2–4 seed sources of the same inbred line, (b) evaluate the extent of genetic homogeneity within inbred lines, and (c) identify a subset of highly informative single-nucleotide polymorphism (SNP) markers for routine and low-cost QC genotyping and suggest guidelines for data interpretation. We used a total of 28 maize inbred lines to study genetic identity among different seed sources by genotyping them with 532 and 1,065 SNPs using the KASPar and GoldenGate platforms, respectively. An additional set of 544 inbred lines was used for studying genetic homogeneity. The proportion of alleles that differed between seed sources of the same inbred line varied from 0.1 to 42.3?%. Seed sources exhibiting high levels of genetic distance are mis-labeled, while those with lower levels of difference are contaminated or still segregating. Genetic homogeneity varied from 68.7 to 100?% with 71.3?% of the inbred lines considered to be homogenous. Based on the data sets obtained for a wide range of sample sizes and diverse genetic backgrounds, we recommended a subset of 50–100 SNPs for routine and low-cost QC genotyping, verified them in a different set of double haploid and inbred lines, and outlined a protocol that could be used to minimize errors in genetic analyses and breeding. 相似文献

11.

Simple and efficient analysis of disease association with missing genotype data 总被引：1，自引：0，他引：1

下载免费PDF全文

Lin DY Hu Y Huang BE 《American journal of human genetics》2008,82(2):444-452

Missing genotype data arise in association studies when the single-nucleotide polymorphisms (SNPs) on the genotyping platform are not assayed successfully, when the SNPs of interest are not on the platform, or when total sequence variation is determined only on a small fraction of individuals. We present a simple and flexible likelihood framework to study SNP-disease associations with such missing genotype data. Our likelihood makes full use of all available data in case-control studies and reference panels (e.g., the HapMap), and it properly accounts for the biased nature of the case-control sampling as well as the uncertainty in inferring unknown variants. The corresponding maximum-likelihood estimators for genetic effects and gene-environment interactions are unbiased and statistically efficient. We developed fast and stable numerical algorithms to calculate the maximum-likelihood estimators and their variances, and we implemented these algorithms in a freely available computer program. Simulation studies demonstrated that the new approach is more powerful than existing methods while providing accurate control of the type I error. An application to a case-control study on rheumatoid arthritis revealed several loci that deserve further investigations. 相似文献

12.

Finding the right coverage: the impact of coverage and sequence quality on single nucleotide polymorphism genotyping error rates

下载免费PDF全文

Emily D. Fountain Jonathan N. Pauli Brendan N. Reid Per J. Palsbøll M. Zachariah Peery 《Molecular ecology resources》2016,16(4):966-978

Restriction‐enzyme‐based sequencing methods enable the genotyping of thousands of single nucleotide polymorphism (SNP) loci in nonmodel organisms. However, in contrast to traditional genetic markers, genotyping error rates in SNPs derived from restriction‐enzyme‐based methods remain largely unknown. Here, we estimated genotyping error rates in SNPs genotyped with double digest RAD sequencing from Mendelian incompatibilities in known mother–offspring dyads of Hoffman's two‐toed sloth (Choloepus hoffmanni) across a range of coverage and sequence quality criteria, for both reference‐aligned and de novo‐assembled data sets. Genotyping error rates were more sensitive to coverage than sequence quality and low coverage yielded high error rates, particularly in de novo‐assembled data sets. For example, coverage ≥5 yielded median genotyping error rates of ≥0.03 and ≥0.11 in reference‐aligned and de novo‐assembled data sets, respectively. Genotyping error rates declined to ≤0.01 in reference‐aligned data sets with a coverage ≥30, but remained ≥0.04 in the de novo‐assembled data sets. We observed approximately 10‐ and 13‐fold declines in the number of loci sampled in the reference‐aligned and de novo‐assembled data sets when coverage was increased from ≥5 to ≥30 at quality score ≥30, respectively. Finally, we assessed the effects of genotyping coverage on a common population genetic application, parentage assignments, and showed that the proportion of incorrectly assigned maternities was relatively high at low coverage. Overall, our results suggest that the trade‐off between sample size and genotyping error rates be considered prior to building sequencing libraries, reporting genotyping error rates become standard practice, and that effects of genotyping errors on inference be evaluated in restriction‐enzyme‐based SNP studies. 相似文献

13.

Estimation of genotype error rate using samples with pedigree information--an application on the GeneChip Mapping 10K array

Hao K Li C Rosenow C Hung Wong W 《Genomics》2004,84(4):623-630

Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the "dose-response" reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed. The dose-response reference curve was monotone and almost linear with a large slope. The method was able to estimate accurately the error rate under various pedigree structures and error models and under heterogeneous error rates. Using this method, we found that the average genotyping error rate of the GeneChip Mapping 10K array was about 0.1%. Our method provides a quick and unbiased solution to address the genotype error rate in pedigree data. It behaves well in a wide range of settings and can be easily applied in other genetic projects. The robust estimation of genotyping error rate allows us to estimate power and sample size and conduct unbiased genetic tests. The GeneChip Mapping 10K array has a low overall error rate, which is consistent with the results obtained from alternative genotyping assays. 相似文献

14.

Comparison of F(ST) outlier tests for SNP loci under selection

Narum SR Hess JE 《Molecular ecology resources》2011,11(Z1):184-194

Genome scans with many genetic markers provide the opportunity to investigate local adaptation in natural populations and identify candidate genes under selection. In particular, SNPs are dense throughout the genome of most organisms and are commonly observed in functional genes making them ideal markers to study adaptive molecular variation. This approach has become commonly employed in ecological and population genetics studies to detect outlier loci that are putatively under selection. However, there are several challenges to address with outlier approaches including genotyping errors, underlying population structure and false positives, variation in mutation rate and limited sensitivity (false negatives). In this study, we evaluated multiple outlier tests and their type I (false positive) and type II (false negative) error rates in a series of simulated data sets. Comparisons included simulation procedures (FDIST2, ARLEQUIN v.3.5 and BAYESCAN) as well as more conventional tools such as global F(ST) histograms. Of the three simulation methods, FDIST2 and BAYESCAN typically had the lowest type II error, BAYESCAN had the least type I error and Arlequin had highest type I and II error. High error rates in Arlequin with a hierarchical approach were partially because of confounding scenarios where patterns of adaptive variation were contrary to neutral structure; however, Arlequin consistently had highest type I and type II error in all four simulation scenarios tested in this study. Given the results provided here, it is important that outlier loci are interpreted cautiously and error rates of various methods are taken into consideration in studies of adaptive molecular variation, especially when hierarchical structure is included. 相似文献

15.

Power and sample size calculations for case-control genetic association tests when errors are present: application to single nucleotide polymorphisms

Gordon D Finch SJ Nothnagel M Ott J 《Human heredity》2002,54(1):22-33

The purpose of this work is to quantify the effects that errors in genotyping have on power and the sample size necessary to maintain constant asymptotic Type I and Type II error rates (SSN) for case-control genetic association studies between a disease phenotype and a di-allelic marker locus, for example a single nucleotide polymorphism (SNP) locus. We consider the effects of three published models of genotyping errors on the chi-square test for independence in the 2 x 3 table. After specifying genotype frequencies for the marker locus conditional on disease status and error model in both a genetic model-based and a genetic model-free framework, we compute the asymptotic power to detect association through specification of the test's non-centrality parameter. This parameter determines the functional dependence of SSN on the genotyping error rates. Additionally, we study the dependence of SSN on linkage disequilibrium (LD), marker allele frequencies, and genotyping error rates for a dominant disease model. Increased genotyping error rate requires a larger SSN. Every 1% increase in sum of genotyping error rates requires that both case and control SSN be increased by 2-8%, with the extent of increase dependent upon the error model. For the dominant disease model, SSN is a nonlinear function of LD and genotyping error rate, with greater SSN for lower LD and higher genotyping error rate. The combination of lower LD and higher genotyping error rates requires a larger SSN than the sum of the SSN for the lower LD and for the higher genotyping error rate. 相似文献

16.

Rapid SNP allele frequency determination in genomic DNA pools by pyrosequencing 总被引：11，自引：0，他引：11

Neve B Froguel P Corset L Vaillant E Vatin V Boutin P 《BioTechniques》2002,32(5):1138-1142

Individual genotyping of single nucleotide polymorphisms (SNPs) remains expensive, especially for linkage disequilibrium mapping strategies involving high-throughput SNP genotyping. On one hand, current methods may suit scientific and laboratory needs in regard to accuracy, reproducibility/robustness, and large-scale application. On the other hand, a cheaper and less time-consuming alternative to individual genotyping is the use of SNP allelefrequencies determined in DNA pools. We have developed an accurate and reproducible protocol for allele frequency determination using Pyrosequencing technology in large genomic DNA pools (374 individuals). The measured correlation (R2) in large DNA pools was 0.980. In the context of disease-associated SNPs studies, we compared the allele frequencies between the disease (e.g., type 2 diabetes and obesity) and control groups detected by either individual genotyping or Pyrosequencing of DNA pools. In large pools, the variation between the two methods was 1.5 +/- 0.9%. It may be concluded that the allele frequency determination protocol could reliably detect over 4% differences between populations. The method is economical in regard to amounts of DNA, PCR, and primer extension reagents required. Furthermore, it allows the rapid determination of allelefrequency differences in case/control groups for association studies and susceptibility gene discovery in complex diseases. 相似文献

17.

A 34K SNP genotyping array for Populus trichocarpa: Design,application to the study of natural populations and transferability to other Populus species

A. Geraldes S. P. DiFazio G. T. Slavov P. Ranjan W. Muchero J. Hannemann L. E. Gunter A. M. Wymore C. J. Grassa N. Farzaneh I. Porth A. D. McKown O. Skyba E. Li M. Fujita J. Klápště J. Martin W. Schackwitz C. Pennacchio D. Rokhsar M. C. Friedmann G. O. Wasteneys R. D. Guy Y. A. El‐Kassaby S. D. Mansfield Q. C. B. Cronk J. Ehlting C. J. Douglas G. A. Tuskan 《Molecular ecology resources》2013,13(2):306-323

Genetic mapping of quantitative traits requires genotypic data for large numbers of markers in many individuals. For such studies, the use of large single nucleotide polymorphism (SNP) genotyping arrays still offers the most cost‐effective solution. Herein we report on the design and performance of a SNP genotyping array for Populus trichocarpa (black cottonwood). This genotyping array was designed with SNPs pre‐ascertained in 34 wild accessions covering most of the species latitudinal range. We adopted a candidate gene approach to the array design that resulted in the selection of 34 131 SNPs, the majority of which are located in, or within 2 kb of, 3543 candidate genes. A subset of the SNPs on the array (539) was selected based on patterns of variation among the SNP discovery accessions. We show that more than 95% of the loci produce high quality genotypes and that the genotyping error rate for these is likely below 2%. We demonstrate that even among small numbers of samples (n = 10) from local populations over 84% of loci are polymorphic. We also tested the applicability of the array to other species in the genus and found that the number of polymorphic loci decreases rapidly with genetic distance, with the largest numbers detected in other species in section Tacamahaca. Finally, we provide evidence for the utility of the array to address evolutionary questions such as intraspecific studies of genetic differentiation, species assignment and the detection of natural hybrids. 相似文献

18.

GEL: a novel genotype calling algorithm using empirical likelihood

Nicolae DL Wu X Miyake K Cox NJ 《Bioinformatics (Oxford, England)》2006,22(16):1942-1947

MOTIVATION: Preliminary results on the data produced using the Affymetrix large-scale genotyping platforms show that it is necessary to construct improved genotype calling algorithms. There is evidence that some of the existing algorithms lead to an increased error rate in heterozygous genotypes, and a disproportionately large rate of heterozygotes with missing genotypes. Non-random errors and missing data can lead to an increase in the number of false discoveries in genetic association studies. Therefore, the factors that need to be evaluated in assessing the performance of an algorithm are the missing data (call) and error rates, but also the heterozygous proportions in missing data and errors. RESULTS: We introduce a novel genotype calling algorithm (GEL) for the Affymetrix GeneChip arrays. The algorithm uses likelihood calculations that are based on distributions inferred from the observed data. A key ingredient in accurate genotype calling is weighting the information that comes from each probe quartet according to the quality/reliability of the data in the quartet, and prior information on the performance of the quartet. AVAILABILITY: The GEL software is implemented in R and is available by request from the corresponding author at nicolae@galton.uchicago.edu. 相似文献

19.

High fidelity of whole-genome amplified DNA on high-density single nucleotide polymorphism arrays

Xing J Watkins WS Zhang Y Witherspoon DJ Jorde LB 《Genomics》2008,92(6):452-456

Current microarray technology allows researchers to genotype a large number of SNPs with relatively small amounts of DNA. Nevertheless, researchers and clinicians still frequently face the problem of acquiring enough high-quality DNA for analysis. Whole-genome amplification (WGA) methods offer a solution for this problem, and earlier studies have shown that WGA samples perform reasonably well in small-scale genetic analyses (e.g. Affymetrix 10K array). To determine the performance of WGA products on a large-scale genotyping array, we compared the Affymetrix 250K array genotyping results of genomic DNA and their WGA products from four individuals. Our results indicate that WGA product performs well on the 250K array compared to genomic DNA, especially when using the BRLMM calling algorithm. WGA samples have high call rates (97.5% on average, compared to 99.4% for genomic DNA) and excellent concordance rates with their corresponding genomic DNA samples (98.7% on average). In addition, no apparent systematic genomic amplification bias can be detected. This study demonstrates that, although there is a slight decrease in the total call rates, WGA methods provide a reliable approach for increasing the amount of DNA samples for use with a common SNP genotyping array. 相似文献

20.

Paternity validation and estimation of genotyping error rate for the BovineSNP50 BeadChip

J. I. Weller G. Glick E. Ezra Y. Zeron E. Seroussi M. Ron 《Animal genetics》2010,41(5):551-553

Incorrect paternity assignment in cattle can have a major effect on rates of genetic gain. Of the 576 Israeli Holstein bulls genotyped by the BovineSNP50 BeadChip, there were 204 bulls for which the father was also genotyped. The results of 38 828 valid single nucleotide polymorphisms (SNPs) were used to validate paternity, determine the genotyping error rates and determine criteria enabling deletion of defective SNPs from further analysis. Based on the criterion of >2% conflicts between the genotype of the putative sire and son, paternity was rejected for seven bulls (3.5%). The remaining bulls had fewer conflicts by one or two orders of magnitude. Excluding these seven bulls, all other discrepancies between sire and son genotypes are assumed to be caused by genotyping mistakes. The frequency of discrepancies was >0.07 for nine SNPs, and >0.025 for 81 SNPs. The overall frequency of discrepancies was reduced from 0.00017 to 0.00010 after deletion of these 81 SNPs, and the total expected fraction of genotyping errors was estimated to be 0.05%. Paternity of bulls that are genotyped for genomic selection may be verified or traced against candidate sires at virtually no additional cost. 相似文献