首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A genotype calling algorithm for the Illumina BeadArray platform   总被引:2,自引:0,他引:2  
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors.  相似文献   

2.
Dou J  Zhao X  Fu X  Jiao W  Wang N  Zhang L  Hu X  Wang S  Bao Z 《Biology direct》2012,7(1):17-9
ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.  相似文献   

3.
Multiple algorithms have been developed for the purpose of calling single nucleotide polymorphisms (SNPs) from Affymetrix microarrays. We extend and validate the algorithm CRLMM, which incorporates HapMap information within an empirical Bayes framework. We find CRLMM to be more accurate than the Affymetrix default programs (BRLMM and Birdseed). Also, we tie our call confidence metric to percent accuracy. We intend that our validation datasets and methods, refered to as SNPaffycomp, serve as standard benchmarks for future SNP calling algorithms.  相似文献   

4.
Liu W  Zhao W  Chase GA 《Human heredity》2006,61(1):31-44
OBJECTIVE: Single nucleotide polymorphisms (SNPs) serve as effective markers for localizing disease susceptibility genes, but current genotyping technologies are inadequate for genotyping all available SNP markers in a typical linkage/association study. Much attention has recently been paid to methods for selecting the minimal informative subset of SNPs in identifying haplotypes, but there has been little investigation of the effect of missing or erroneous genotypes on the performance of these SNP selection algorithms and subsequent association tests using the selected tagging SNPs. The purpose of this study is to explore the effect of missing genotype or genotyping error on tagging SNP selection and subsequent single marker and haplotype association tests using the selected tagging SNPs. METHODS: Through two sets of simulations, we evaluated the performance of three tagging SNP selection programs in the presence of missing or erroneous genotypes: Clayton's diversity based program htstep, Carlson's linkage disequilibrium (LD) based program ldSelect, and Stram's coefficient of determination based program tagsnp.exe. RESULTS: When randomly selected known loci were relabeled as 'missing', we found that the average number of tagging SNPs selected by all three algorithms changed very little and the power of subsequent single marker and haplotype association tests using the selected tagging SNPs remained close to the power of these tests in the absence of missing genotype. When random genotyping errors were introduced, we found that the average number of tagging SNPs selected by all three algorithms increased. In data sets simulated according to the haplotype frequecies in the CYP19 region, Stram's program had larger increase than Carlson's and Clayton's programs. In data sets simulated under the coalescent model, Carlson's program had the largest increase and Clayton's program had the smallest increase. In both sets of simulations, with the presence of genotyping errors, the power of the haplotype tests from all three programs decreased quickly, but there was not much reduction in power of the single marker tests. CONCLUSIONS: Missing genotypes do not seem to have much impact on tagging SNP selection and subsequent single marker and haplotype association tests. In contrast, genotyping errors could have severe impact on tagging SNP selection and haplotype tests, but not on single marker tests.  相似文献   

5.
Single nucleotide polymorphisms (SNPs) are often determined using TaqMan real-time PCR assays (Applied Biosystems) and commercial software that assigns genotypes based on reporter probe signals at the end of amplification. Limitations to the large-scale application of this approach include the need for positive controls or operator intervention to set signal thresholds when one allele is rare. In the interest of optimizing real-time PCR genotyping, we developed an algorithm for automatic genotype calling based on the full course of real-time PCR data. Best cycle genotyping algorithm (BCGA), written in the open source language R, is based on the assumptions that classification depends on the time (cycle) of amplification and that it is possible to identify a best discriminating cycle for each SNP assay. The algorithm is unique in that it classifies samples according to the behavior of blanks (no DNA samples), which cluster with heterozygous samples. This method of classification eliminates the need for positive controls and permits accurate genotyping even in the absence of a genotype class, for example when one allele is rare. Here, we describe the algorithm and test its validity, compared to the standard end-point method and to DNA sequencing.  相似文献   

6.
MOTIVATION: Modern strategies for mapping disease loci require efficient genotyping of a large number of known polymorphic sites in the genome. The sensitive and high-throughput nature of hybridization-based DNA microarray technology provides an ideal platform for such an application by interrogating up to hundreds of thousands of single nucleotide polymorphisms (SNPs) in a single assay. Similar to the development of expression arrays, these genotyping arrays pose many data analytic challenges that are often platform specific. Affymetrix SNP arrays, e.g. use multiple sets of short oligonucleotide probes for each known SNP, and require effective statistical methods to combine these probe intensities in order to generate reliable and accurate genotype calls. RESULTS: We developed an integrated multi-SNP, multi-array genotype calling algorithm for Affymetrix SNP arrays, MAMS, that combines single-array multi-SNP (SAMS) and multi-array, single-SNP (MASS) calls to improve the accuracy of genotype calls, without the need for training data or computation-intensive normalization procedures as in other multi-array methods. The algorithm uses resampling techniques and model-based clustering to derive single array based genotype calls, which are subsequently refined by competitive genotype calls based on (MASS) clustering. The resampling scheme caps computation for single-array analysis and hence is readily scalable, important in view of expanding numbers of SNPs per array. The MASS update is designed to improve calls for atypical SNPs, harboring allele-imbalanced binding affinities, that are difficult to genotype without information from other arrays. Using a publicly available data set of HapMap samples from Affymetrix, and independent calls by alternative genotyping methods from the HapMap project, we show that our approach performs competitively to existing methods. AVAILABILITY: R functions are available upon request from the authors.  相似文献   

7.

Background  

Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide – adenine (A), thymine (T), cytosine (C) or guanine (G) – is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias.  相似文献   

8.
Johnson VJ  Yucesoy B  Luster MI 《Cytokine》2004,27(6):135-141
Single nucleotide polymorphisms (SNPs), particularly those within regulatory regions of genes that code for cytokines often impact expression levels and can be disease modifiers. Investigating associations between cytokine genotype and disease outcome provides valuable insight into disease etiology and potential therapeutic intervention. Traditionally, genotyping for cytokine SNPs has been conducted using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP), a low throughput technique not amenable for use in large-scale cytokine SNP association studies. Recently, Taqman real-time PCR chemistry has been adapted for use in allelic discrimination assays. The present study validated the accuracy and utility of real-time PCR technology for a number of commonly studied cytokine polymorphisms known to influence chronic inflammatory diseases. We show that this technique is amenable to high-throughput genotyping and overcomes many of the problematic features associated with PCR-RFLP including post-PCR manipulation, non-standardized assay conditions, manual allelic identification and false allelic identification due to incomplete enzyme digestion. The real-time PCR assays are highly accurate with an error rate in the present study of <1% and concordance rate with PCR-RFLP genotyping of 99.4%. The public databases of cytokine polymorphisms and validated genotyping assays highlighted in the present study will greatly benefit this important field of research.  相似文献   

9.
High‐density single nucleotide polymorphism (SNP) genotyping arrays are a powerful tool for studying genomic patterns of diversity, inferring ancestral relationships between individuals in populations and studying marker–trait associations in mapping experiments. We developed a genotyping array including about 90 000 gene‐associated SNPs and used it to characterize genetic variation in allohexaploid and allotetraploid wheat populations. The array includes a significant fraction of common genome‐wide distributed SNPs that are represented in populations of diverse geographical origin. We used density‐based spatial clustering algorithms to enable high‐throughput genotype calling in complex data sets obtained for polyploid wheat. We show that these model‐free clustering algorithms provide accurate genotype calling in the presence of multiple clusters including clusters with low signal intensity resulting from significant sequence divergence at the target SNP site or gene deletions. Assays that detect low‐intensity clusters can provide insight into the distribution of presence–absence variation (PAV) in wheat populations. A total of 46 977 SNPs from the wheat 90K array were genetically mapped using a combination of eight mapping populations. The developed array and cluster identification algorithms provide an opportunity to infer detailed haplotype structure in polyploid wheat and will serve as an invaluable resource for diversity studies and investigating the genetic basis of trait variation in wheat.  相似文献   

10.
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

11.
High-throughput SNP genotyping platforms use automated genotype calling algo- rithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been opti- mized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be ad- visable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

12.
High-throughput SNP genotyping platforms use automated genotype calling algo- rithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been opti- mized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be ad- visable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

13.
Single nucleotide polymorphisms (SNPs), due to their abundance and low mutation rate, are very useful genetic markers for genetic association studies. However, the current genotyping technology cannot afford to genotype all common SNPs in all the genes. By making use of linkage disequilibrium, we can reduce the experiment cost by genotyping a subset of SNPs, called Tag SNPs, which have a strong association with the ungenotyped SNPs, while are as independent from each other as possible. The problem of selecting Tag SNPs is NP-complete; when there are large number of SNPs, in order to avoid extremely long computational time, most of the existing Tag SNP selection methods first partition the SNPs into blocks based on certain block definitions, then Tag SNPs are selected in each block by brute-force search. The size of the Tag SNP set obtained in this way may usually be reduced further due to the inter-dependency among blocks. This paper proposes two algorithms, TSSA and TSSD, to tackle the block-independent Tag SNP selection problem. TSSA is based on A* search algorithm, and TSSD is a heuristic algorithm. Experiments show that TSSA can find the optimal solutions for medium-sized problems in reasonable time, while TSSD can handle very large problems and report approximate solutions very close to the optimal ones.  相似文献   

14.
Single nucleotide polymorphisms (SNPs) are the most commonly used polymorphic markers in genetics studies. Among the different platforms for SNP genotyping, Luminex is one of the less exploited mainly due to the lack of a robust (semi-automated and replicable) freely available genotype calling software. Here we describe a clustering algorithm that provides automated SNP calls for Luminex genotyping assays. We genotyped 3 SNPs in a cohort of 330 childhood leukemia patients, 200 parents of patient and 325 healthy individuals and used the Automated Luminex Genotyping (ALG) algorithm for SNP calling. ALG genotypes were called twice to test for reproducibility and were compared to sequencing data to test for accuracy. Globally, this analysis demonstrates the accuracy (99.6%) of the method, its reproducibility (99.8%) and the low level of no genotyping calls (3.4%). The high efficiency of the method proves that ALG is a suitable alternative to the current commercial software. ALG is semi-automated, and provides numerical measures of confidence for each SNP called, as well as an effective graphical plot. Moreover ALG can be used either through a graphical user interface, requiring no specific informatics knowledge, or through command line with access to the open source code. The ALG software has been implemented in R and is freely available for non-commercial use either at http://alg.sourceforge.net or by request to mathieu.bourgey@umontreal.ca.  相似文献   

15.
Genome-wide association studies (GWAS) examine the entire human genome with the goal of identifying genetic variants (usually single nucleotide polymorphisms (SNPs)) that are associated with phenotypic traits such as disease status and drug response. The discordance of significantly associated SNPs for the same disease identified from different GWAS indicates that false associations exist in such results. In addition to the possible sources of spurious associations that have been investigated and discussed intensively, such as sample size and population stratification, an accurate and reproducible genotype calling algorithm is required for concordant GWAS results from different studies. However, variations of genotype calling of an algorithm and their effects on significantly associated SNPs identified in downstream association analyses have not been systematically investigated. In this paper, the variations of genotype calling using the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) algorithm and the resulting influence on the lists of significantly associated SNPs were evaluated using the raw data of 270 HapMap samples analysed with the Affymetrix Human Mapping 500K Array Set (Affy500K) by changing algorithmic parameters. Modified were the Dynamic Model (DM) call confidence threshold (threshold) and the number of randomly selected SNPs (size). Comparative analysis of the calling results and the corresponding lists of significantly associated SNPs identified through association analysis revealed that algorithmic parameters used in BRLMM affected the genotype calls and the significantly associated SNPs. Both the threshold and the size affected the called genotypes and the lists of significantly associated SNPs in association analysis. The effect of the threshold was much larger than the effect of the size. Moreover, the heterozygous calls had lower consistency compared to the homozygous calls.  相似文献   

16.
The search for the association between complex diseases and single nucleotide polymorphisms (SNPs) or haplotypes has recently received great attention. For these studies, it is essential to use a small subset of informative SNPs accurately representing the rest of the SNPs. Informative SNP selection can achieve (1) considerable budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs or (2) necessary reduction of the huge SNP sets (obtained, e.g. from Affymetrix) for further fine haplotype analysis. A novel informative SNP selection method for unphased genotype data based on multiple linear regression (MLR) is implemented in the software package MLR-tagging. This software can be used for informative SNP (tag) selection and genotype prediction. The stepwise tag selection algorithm (STSA) selects positions of the given number of informative SNPs based on a genotype sample population. The MLR SNP prediction algorithm predicts a complete genotype based on the values of its informative SNPs, their positions among all SNPs, and a sample of complete genotypes. An extensive experimental study on various datasets including 10 regions from HapMap shows that the MLR prediction combined with stepwise tag selection uses fewer tags than the state-of-the-art method of Halperin et al. (2005). AVAILABILITY: MLR-Tagging software package is publicly available at http://alla.cs.gsu.edu/~software/tagging/tagging.html  相似文献   

17.
Single nucleotide polymorphisms (SNPs) represent the most common form of DNA sequence variation in mammalian livestock genomes. While the past decade has witnessed major advances in SNP genotyping technologies, genotyping errors caused, in part, by the biochemistry underlying the genotyping platform used, can occur. These errors can distort project results and conclusions and can result in incorrect decisions in animal management and breeding programs; hence, SNP genotype calls must be accurate and reliable. In this study, 263 Bos spp. samples were genotyped commercially for a total of 16 SNPs. Of the total possible 4,208 SNP genotypes, 4,179 SNP genotypes were generated, yielding a genotype call rate of 99.31% (standard deviation?±?0.93%). Between 110 and 263 samples were subsequently re-genotyped by us for all 16 markers using a custom-designed SNP genotyping platform, and of the possible 3,819 genotypes a total of 3,768 genotypes were generated (98.70% genotype call rate, SD?±?1.89%). A total of 3,744 duplicate genotypes were generated for both genotyping platforms, and comparison of the genotype calls for both methods revealed 3,741 concordant SNP genotype call rates (99.92% SNP genotype concordance rate). These data indicate that both genotyping methods used can provide livestock geneticists with reliable, reproducible SNP genotypic data for in-depth statistical analysis.  相似文献   

18.
Single nucleotide polymorphisms (SNPs) represent the most common form of DNA sequence variation in mammalian livestock genomes. While the past decade has witnessed major advances in SNP genotyping technologies, genotyping errors caused, in part, by the biochemistry underlying the genotyping platform used, can occur. These errors can distort project results and conclusions and can result in incorrect decisions in animal management and breeding programs; hence, SNP genotype calls must be accurate and reliable. In this study, 263 Bos spp. samples were genotyped commercially for a total of 16 SNPs. Of the total possible 4,208 SNP genotypes, 4,179 SNP genotypes were generated, yielding a genotype call rate of 99.31% (standard deviation ± 0.93%). Between 110 and 263 samples were subsequently re-genotyped by us for all 16 markers using a custom-designed SNP genotyping platform, and of the possible 3,819 genotypes a total of 3,768 genotypes were generated (98.70% genotype call rate, SD ± 1.89%). A total of 3,744 duplicate genotypes were generated for both genotyping platforms, and comparison of the genotype calls for both methods revealed 3,741 concordant SNP genotype call rates (99.92% SNP genotype concordance rate). These data indicate that both genotyping methods used can provide livestock geneticists with reliable, reproducible SNP genotypic data for in-depth statistical analysis.  相似文献   

19.
MOTIVATION: Preliminary results on the data produced using the Affymetrix large-scale genotyping platforms show that it is necessary to construct improved genotype calling algorithms. There is evidence that some of the existing algorithms lead to an increased error rate in heterozygous genotypes, and a disproportionately large rate of heterozygotes with missing genotypes. Non-random errors and missing data can lead to an increase in the number of false discoveries in genetic association studies. Therefore, the factors that need to be evaluated in assessing the performance of an algorithm are the missing data (call) and error rates, but also the heterozygous proportions in missing data and errors. RESULTS: We introduce a novel genotype calling algorithm (GEL) for the Affymetrix GeneChip arrays. The algorithm uses likelihood calculations that are based on distributions inferred from the observed data. A key ingredient in accurate genotype calling is weighting the information that comes from each probe quartet according to the quality/reliability of the data in the quartet, and prior information on the performance of the quartet. AVAILABILITY: The GEL software is implemented in R and is available by request from the corresponding author at nicolae@galton.uchicago.edu.  相似文献   

20.

Background  

The risk of common diseases is likely determined by the complex interplay between environmental and genetic factors, including single nucleotide polymorphisms (SNPs). Traditional methods of data analysis are poorly suited for detecting complex interactions due to sparseness of data in high dimensions, which often occurs when data are available for a large number of SNPs for a relatively small number of samples. Validation of associations observed using multiple methods should be implemented to minimize likelihood of false-positive associations. Moreover, high-throughput genotyping methods allow investigators to genotype thousands of SNPs at one time. Investigating associations for each individual SNP or interactions between SNPs using traditional approaches is inefficient and prone to false positives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号