首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
There are little independent data available about how well single nucleotide polymorphism (SNP) genotyping technologies perform in the typical molecular genetics laboratory. We evaluated the utility and accuracy of a widely used technology, template-directed dye-terminator incorporation with fluorescence-polarization detection (FP-TDI), in a sample of 177 SNPs selected solely on the basis of map location. Genotypes were generated without optimization using standard protocols. Overall, 81% of the SNPs we studied generated readable genotypes by FP-TDI. Thirty-two SNPs were genotyped in duplicate by PCR-RFLP orfluorescent dye-terminator sequencing. Out of a total of 631 duplicate genotypes, no true discrepancies were detected. The true error rate has a 95% chance of lying between 0 and 6 out of 1000 genotypes. We also tested for deviations from Hardy-Weinberg Equilibrium in 33 SNPs genotyped in 50 unrelated individuals, and no significant deviations were detected. Our FP-TDI data were readily adaptable to automated genotype calling using our own method of cluster analysis, which assigns a probability score to each genotype call. We conclude that FP-TDI is both efficient and accurate. The method can easily fill the needs of SNP genotyping projects at the scale typically used for regional or candidate-gene association studies.  相似文献   

2.
This investigation was undertaken to assess the sensitivity and specificity of the genotyping error detection function of the computer program SIMWALK2. We chose to examine chromosome 22, which had 7 microsatellite markers, from a single simulated replicate (330 pedigrees with a pattern of missing genotype data similar to the Framingham families). We created genotype errors at five overall frequencies (0.0, 0.025, 0.050, 0.075, and 0.100) and applied SIMWALK2 to each of these five data sets, respectively assuming that the total error rate (specified in the program), was at each of these same five levels. In this data set, up to an assumed error rate of 10%, only 50% of the Mendelian-consistent mistypings were found under any level of true errors. And since as many as 70% of the errors detected were false-positives, blanking suspect genotypes (at any error probability) will result in a reduction of statistical power due to the concomitant blanking of correctly typed alleles. This work supports the conclusion that allowing for genotyping errors within likelihood calculations during statistical analysis may be preferable to choosing an arbitrary cut-off.  相似文献   

3.
A statistical framework for quantitative trait mapping   总被引:39,自引:0,他引:39  
Sen S  Churchill GA 《Genetics》2001,159(1):371-387
We describe a general statistical framework for the genetic analysis of quantitative trait data in inbred line crosses. Our main result is based on the observation that, by conditioning on the unobserved QTL genotypes, the problem can be split into two statistically independent and manageable parts. The first part involves only the relationship between the QTL and the phenotype. The second part involves only the location of the QTL in the genome. We developed a simple Monte Carlo algorithm to implement Bayesian QTL analysis. This algorithm simulates multiple versions of complete genotype information on a genomewide grid of locations using information in the marker genotype data. Weights are assigned to the simulated genotypes to capture information in the phenotype data. The weighted complete genotypes are used to approximate quantities needed for statistical inference of QTL locations and effect sizes. One advantage of this approach is that only the weights are recomputed as the analyst considers different candidate models. This device allows the analyst to focus on modeling and model comparisons. The proposed framework can accommodate multiple interacting QTL, nonnormal and multivariate phenotypes, covariates, missing genotype data, and genotyping errors in any type of inbred line cross. A software tool implementing this procedure is available. We demonstrate our approach to QTL analysis using data from a mouse backcross population that is segregating multiple interacting QTL associated with salt-induced hypertension.  相似文献   

4.
Universal SNP genotyping assay with fluorescence polarization detection   总被引:42,自引:0,他引:42  
Hsu TM  Chen X  Duan S  Miller RD  Kwok PY 《BioTechniques》2001,31(3):560, 562, 564-560,8, passim
The degree of fluorescence polarization (FP) of a fluorescent molecule is a reflection of its molecular weight (Mr). FP is therefore a useful detection methodfor homogeneous assays in which the starting reagents and products differ significantly in Mr. We have previously shown that FP is a good detection method for the single-base extension and the 5'-nuclease assays. In this report, we describe a universal, optimized single-base extension assay for genotyping single nucleotide polymorphisms (SNPs). This assay, which we named the template-directed dye-terminator incorporation assay with fluorescence polarization detection (FP-TDI), uses four spectrally distinct dye terminators to achieve universal assay conditions. Even without optimization, approximately 70% of all SNP markers tested yielded robust assays. The addition of an E. coli ssDNA-binding protein just before the FP reading significantly increased FP values of the products and brought the success rate of FP-TDI assays up to 90%. Increasing the amount of dye terminators and reducing the number of thermal cycles in the single-base extension step of the assay increased the separation of the FP values benveen the products corresponding to different genotypes and improved the success rate of the assay to 100%. In this study the genomic DNA samples of 90 individuals were typed for a total of 38 FP-TDI assays (using both the sense and antisense TDI primers for 19 SNP markers). With the previously described modifications, the FP-TDI assay gave unambiguous genotyping data for all the samples tested in the 38 FP-TDI assays. When the genotypes determined by the FP-TDI and 5'-nuclease assays were compared, they were in 100% concordance for all experiments (a total of 3420 genotypes). The four-dye-terminator master mixture described here can be used for assaying any SNP marker and greatly simplifies the SNP genotyping assay design.  相似文献   

5.
High throughput SNP genotyping with two mini-sequencing assays   总被引:4,自引:0,他引:4  
Single nucleotide polymorphisms (SNPs) are veryimportant markers that can be used in many areas such asevolutionary genetics [1], disease-susceptibility genes[2,3], personalized medicine and forensics. Only about20% of human polymorphisms are length polymorphisms,whereas about 80% of human polymorphisms areSNPs. Kruglyak et al. [4] reported that there were about11,000,000 SNPs in the world population. There are many kinds of SNP genotyping technology[5,6]: some are only suitable to low …  相似文献   

6.
MOTIVATION: Preliminary results on the data produced using the Affymetrix large-scale genotyping platforms show that it is necessary to construct improved genotype calling algorithms. There is evidence that some of the existing algorithms lead to an increased error rate in heterozygous genotypes, and a disproportionately large rate of heterozygotes with missing genotypes. Non-random errors and missing data can lead to an increase in the number of false discoveries in genetic association studies. Therefore, the factors that need to be evaluated in assessing the performance of an algorithm are the missing data (call) and error rates, but also the heterozygous proportions in missing data and errors. RESULTS: We introduce a novel genotype calling algorithm (GEL) for the Affymetrix GeneChip arrays. The algorithm uses likelihood calculations that are based on distributions inferred from the observed data. A key ingredient in accurate genotype calling is weighting the information that comes from each probe quartet according to the quality/reliability of the data in the quartet, and prior information on the performance of the quartet. AVAILABILITY: The GEL software is implemented in R and is available by request from the corresponding author at nicolae@galton.uchicago.edu.  相似文献   

7.
In genetic association studies, tests for Hardy-Weinberg proportions are often employed as a quality control checking procedure. Missing genotypes are typically discarded prior to testing. In this paper we show that inference for Hardy-Weinberg proportions can be biased when missing values are discarded. We propose to use multiple imputation of missing values in order to improve inference for Hardy-Weinberg proportions. For imputation we employ a multinomial logit model that uses information from allele intensities and/or neighbouring markers. Analysis of an empirical data set of single nucleotide polymorphisms possibly related to colon cancer reveals that missing genotypes are not missing completely at random. Deviation from Hardy-Weinberg proportions is mostly due to a lack of heterozygotes. Inbreeding coefficients estimated by multiple imputation of the missings are typically lowered with respect to inbreeding coefficients estimated by discarding the missings. Accounting for missings by multiple imputation qualitatively changed the results of 10 to 17% of the statistical tests performed. Estimates of inbreeding coefficients obtained by multiple imputation showed high correlation with estimates obtained by single imputation using an external reference panel. Our conclusion is that imputation of missing data leads to improved statistical inference for Hardy-Weinberg proportions.  相似文献   

8.
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.  相似文献   

9.
Single nucleotide polymorphisms (SNPs) represent the most common form of DNA sequence variation in mammalian livestock genomes. While the past decade has witnessed major advances in SNP genotyping technologies, genotyping errors caused, in part, by the biochemistry underlying the genotyping platform used, can occur. These errors can distort project results and conclusions and can result in incorrect decisions in animal management and breeding programs; hence, SNP genotype calls must be accurate and reliable. In this study, 263 Bos spp. samples were genotyped commercially for a total of 16 SNPs. Of the total possible 4,208 SNP genotypes, 4,179 SNP genotypes were generated, yielding a genotype call rate of 99.31% (standard deviation?±?0.93%). Between 110 and 263 samples were subsequently re-genotyped by us for all 16 markers using a custom-designed SNP genotyping platform, and of the possible 3,819 genotypes a total of 3,768 genotypes were generated (98.70% genotype call rate, SD?±?1.89%). A total of 3,744 duplicate genotypes were generated for both genotyping platforms, and comparison of the genotype calls for both methods revealed 3,741 concordant SNP genotype call rates (99.92% SNP genotype concordance rate). These data indicate that both genotyping methods used can provide livestock geneticists with reliable, reproducible SNP genotypic data for in-depth statistical analysis.  相似文献   

10.
A variety of statistical methods exist for detecting haplotype-disease association through use of genetic data from a case-control study. Since such data often consist of unphased genotypes (resulting in haplotype ambiguity), such statistical methods typically apply the expectation-maximization (EM) algorithm for inference. However, the majority of these methods fail to perform inference on the effect of particular haplotypes or haplotype features on disease risk. Since such inference is valuable, we develop a retrospective likelihood for estimating and testing the effects of specific features of single-nucleotide polymorphism (SNP)-based haplotypes on disease risk using unphased genotype data from a case-control study. Our proposed method has a flexible structure that allows, among other choices, modeling of multiplicative, dominant, and recessive effects of specific haplotype features on disease risk. In addition, our method relaxes the requirement of Hardy-Weinberg equilibrium of haplotype frequencies in case subjects, which is typically required of EM-based haplotype methods. Also, our method easily accommodates missing SNP information. Finally, our method allows for asymptotic, permutation-based, or bootstrap inference. We apply our method to case-control SNP genotype data from the Finland-United States Investigation of Non-Insulin-Dependent Diabetes Mellitus (FUSION) Genetics study and identify two haplotypes that appear to be significantly associated with type 2 diabetes. Using the FUSION data, we assess the accuracy of asymptotic P values by comparing them with P values obtained from a permutation procedure. We also assess the accuracy of asymptotic confidence intervals for relative-risk parameters for haplotype effects, by a simulation study based on the FUSION data.  相似文献   

11.
Becker T  Knapp M 《Human heredity》2005,59(4):185-189
In the context of haplotype association analysis of unphased genotype data, methods based on Monte-Carlo simulations are often used to compensate for missing or inappropriate asymptotic theory. Moreover, such methods are an indispensable means to deal with multiple testing problems. We want to call attention to a potential trap in this usually useful approach: The simulation approach may lead to strongly inflated type I errors in the presence of different missing rates between cases and controls, depending on the chosen test statistic. Here, we consider four different testing strategies for haplotype analysis of case-control data. We recommend to interpret results for data sets with non-comparable distributions of missing genotypes with special caution, in case the test statistic is based on inferred haplotypes per individual. Moreover, our results are important for the conduction and interpretation of genome-wide association studies.  相似文献   

12.
We obtained fresh dung samples from 202 (133 mother-offspring pairs) savannah elephants (Loxodonta africana) in Samburu, Kenya, and genotyped them at 20 microsatellite loci to assess genotyping success and errors. A total of 98.6% consensus genotypes was successfully obtained, with allelic dropout and false allele rates at 1.6% (n = 46) and 0.9% (n = 37) of heterozygous and total consensus genotypes, respectively, and an overall genotyping error rate of 2.5% based on repeat typing. Mendelian analysis revealed consistent inheritance in all but 38 allelic pairs from mother-offspring, giving an average mismatch error rate of 2.06%, a possible result of null alleles, mutations, genotyping errors, or inaccuracy in maternity assignment. We detected no evidence for large allele dropout, stuttering, or scoring error in the dataset and significant Hardy-Weinberg deviations at only two loci due to heterozygosity deficiency. Across loci, null allele frequencies were low (range: 0.000-0.042) and below the 0.20 threshold that would significantly bias individual-based studies. The high genotyping success and low errors observed in this study demonstrate reliability of the method employed and underscore the application of simple pedigrees in noninvasive studies. Since none of the sires were included in this study, the error rates presented are just estimates.  相似文献   

13.
Sebastiani P  Abad MM  Alpargu G  Ramoni MF 《Genetics》2004,168(4):2329-2337
Several solutions have been proposed to extend the transmission disequilibrium test (TDT) to include cases with missing parental genotype. However, completion of the missing parental genotype may bias the test if the underlying missing data mechanism is informative. Furthermore, all these solutions resolve the problem of missing parental genotype, while offspring with missing genotypes are typically ignored. We propose here an extension to the TDT, called robust TDT (rTDT), able to handle incomplete genotypes on both parents and children and that does not rest on any assumption about the missing data mechanism. rTDT returns minimum and maximum values of TDT that are consistent with all the possible completions of the missing data. We also show that, in some situations, rTDT can achieve both greater power and greater significance than the popular TDT analysis of incomplete data. rTDT is applied to a database of markers of susceptibility to Crohn's disease and it shows that only 2 of the 11 markers originally associated with the phenotype do not depend on assumptions about the missing data mechanism.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) represent the most common form of DNA sequence variation in mammalian livestock genomes. While the past decade has witnessed major advances in SNP genotyping technologies, genotyping errors caused, in part, by the biochemistry underlying the genotyping platform used, can occur. These errors can distort project results and conclusions and can result in incorrect decisions in animal management and breeding programs; hence, SNP genotype calls must be accurate and reliable. In this study, 263 Bos spp. samples were genotyped commercially for a total of 16 SNPs. Of the total possible 4,208 SNP genotypes, 4,179 SNP genotypes were generated, yielding a genotype call rate of 99.31% (standard deviation ± 0.93%). Between 110 and 263 samples were subsequently re-genotyped by us for all 16 markers using a custom-designed SNP genotyping platform, and of the possible 3,819 genotypes a total of 3,768 genotypes were generated (98.70% genotype call rate, SD ± 1.89%). A total of 3,744 duplicate genotypes were generated for both genotyping platforms, and comparison of the genotype calls for both methods revealed 3,741 concordant SNP genotype call rates (99.92% SNP genotype concordance rate). These data indicate that both genotyping methods used can provide livestock geneticists with reliable, reproducible SNP genotypic data for in-depth statistical analysis.  相似文献   

15.
We consider the effect of informative missingness on association tests that use parental genotypes as controls and that allow for missing parental data. Parental data can be informatively missing when the probability of a parent being available for study is related to that parent's genotype; when this occurs, the distribution of genotypes among observed parents is not representative of the distribution of genotypes among the missing parents. Many previously proposed procedures that allow for missing parental data assume that these distributions are the same. We propose association tests that behave well when parental data are informatively missing, under the assumption that, for a given trio of paternal, maternal, and affected offspring genotypes, the genotypes of the parents and the sex of the missing parents, but not the genotype of the affected offspring, can affect parental missingness. (This same assumption is required for validity of an analysis that ignores incomplete parent-offspring trios.) We use simulations to compare our approach with previously proposed procedures, and we show that if even small amounts of informative missingness are not taken into account, they can have large, deleterious effects on the performance of tests.  相似文献   

16.
Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (ChromoPhase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. ChromoPhase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested ChromoPhase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, ChromoPhase correctly imputed at least 12% more genotypes than fastPHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.  相似文献   

17.
Noninvasive faecal DNA sampling has the potential to provide a wealth of information necessary for monitoring and managing endangered species while eliminating the need to capture, handle or observe rare individuals. However, scoring problems, and subsequent genotyping errors, associated with this monitoring method remain a great concern as they can lead to misidentification of individuals and biased estimates. We examined a kit fox scat data set (353 scats; 80 genotypes) for genotyping errors using both genetic and GIS analyses, and evaluated the feasibility of combining both approaches to assess reliability of the faecal DNA results. We further checked the appropriateness of using faecal genotypes to study kit fox populations by describing information about foxes that we could deduce from the 'acceptable' scat genotypes, and comparing it to information gathered with traditional field techniques. Overall, genetic tests indicated that our data set had a low rate of genotyping error. Furthermore, examination of distributions of scat locations confirmed our data set was relatively error free. We found that analysing information on sex primer consistency and scat locations provided a useful assessment of scat genotype error, and greatly limited the amount of additional laboratory work that was needed to identify potentially 'false' scores. 'Acceptable' scat genotypes revealed information on sex ratio, relatedness, fox movement patterns, latrine use, and size of home range. Results from genetic and field data were consistent, supporting the conclusion that our data set had a very low rate of genotyping error and that this noninvasive method is a reliable approach for monitoring kit foxes.  相似文献   

18.
Inference of haplotypes is important in genetic epidemiology studies. However, all large genotype data sets have errors due to the use of inexpensive genotyping machines that are fallible and shortcomings in genotyping scoring softwares, which can have an enormous impact on haplotype inference. In this article, we propose two novel strategies to reduce the impact induced by genotyping errors in haplotype inference. The first method makes use of double sampling. For each individual, the “GenoSpectrum” that consists of all possible genotypes and their corresponding likelihoods are computed. The second method is a genotype clustering algorithm based on multi‐genotyping data, which also assigns a “GenoSpectrum” for each individual. We then describe two hybrid EM algorithms (called DS‐EM and MG‐EM) that perform haplotype inference based on “GenoSpectrum” of each individual obtained by double sampling and multi‐genotyping data. Both simulated data sets and a quasi real‐data set demonstrate that our proposed methods perform well in different situations and outperform the conventional EM algorithm and the HMM algorithm proposed by Sun, Greenwood, and Neal (2007, Genetic Epidemiology 31 , 937–948) when the genotype data sets have errors.  相似文献   

19.
Hao K  Li C  Rosenow C  Hung Wong W 《Genomics》2004,84(4):623-630
Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the "dose-response" reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed. The dose-response reference curve was monotone and almost linear with a large slope. The method was able to estimate accurately the error rate under various pedigree structures and error models and under heterogeneous error rates. Using this method, we found that the average genotyping error rate of the GeneChip Mapping 10K array was about 0.1%. Our method provides a quick and unbiased solution to address the genotype error rate in pedigree data. It behaves well in a wide range of settings and can be easily applied in other genetic projects. The robust estimation of genotyping error rate allows us to estimate power and sample size and conduct unbiased genetic tests. The GeneChip Mapping 10K array has a low overall error rate, which is consistent with the results obtained from alternative genotyping assays.  相似文献   

20.
Several high-throughput statistical methods were evaluated for processing data generated by two-dimensional polyacrylamide gel electrophoresis, including how to handle missing data, normalization, and statistical analysis of data obtained from 2-D gels. Quantile normalization combined with a nonparametric permutation test based on minimizing false discover rates gave the highest yield of proteins that changed with genotype and detected the anticipated 50% decrease in Mn-superoxide dismutase (MnSOD) protein levels in mitochondrial extracts obtained from MnSOD-deficient mice.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号