首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High-throughput SNP genotyping platforms use automated genotype calling algorithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been optimized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be advisable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

2.
High-throughput SNP genotyping platforms use automated genotype calling algo- rithms to assign genotypes. While these algorithms work efficiently for individual platforms, they are not compatible with other platforms, and have individual biases that result in missed genotype calls. Here we present data on the use of a second complementary SNP genotype clustering algorithm. The algorithm was originally designed for individual fluorescent SNP genotyping assays, and has been opti- mized to permit the clustering of large datasets generated from custom-designed Affymetrix SNP panels. In an analysis of data from a 3K array genotyped on 1,560 samples, the additional analysis increased the overall number of genotypes by over 45,000, significantly improving the completeness of the experimental data. This analysis suggests that the use of multiple genotype calling algorithms may be ad- visable in high-throughput SNP genotyping experiments. The software is written in Perl and is available from the corresponding author.  相似文献   

3.
A genotype calling algorithm for the Illumina BeadArray platform   总被引:2,自引:0,他引:2  
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors.  相似文献   

4.
MOTIVATION: The technology to genotype single nucleotide polymorphisms (SNPs) at extremely high densities provides for hypothesis-free genome-wide scans for common polymorphisms associated with complex disease. However, we find that some errors introduced by commonly employed genotyping algorithms may lead to inflation of false associations between markers and phenotype. RESULTS: We have developed a novel SNP genotype calling program, SNiPer-High Density (SNiPer-HD), for highly accurate genotype calling across hundreds of thousands of SNPs. The program employs an expectation-maximization (EM) algorithm with parameters based on a training sample set. The algorithm choice allows for highly accurate genotyping for most SNPs. Also, we introduce a quality control metric for each assayed SNP, such that poor-behaving SNPs can be filtered using a metric correlating to genotype class separation in the calling algorithm. SNiPer-HD is superior to the standard dynamic modeling algorithm and is complementary and non-redundant to other algorithms, such as BRLMM. Implementing multiple algorithms together may provide highly accurate genotyping calls, without inflation of false positives due to systematically miss-called SNPs. A reliable and accurate set of SNP genotypes for increasingly dense panels will eliminate some false association signals and false negative signals, allowing for rapid identification of disease susceptibility loci for complex traits. AVAILABILITY: SNiPer-HD is available at TGen's website: http://www.tgen.org/neurogenomics/data.  相似文献   

5.
Dou J  Zhao X  Fu X  Jiao W  Wang N  Zhang L  Hu X  Wang S  Bao Z 《Biology direct》2012,7(1):17-9
ABSTRACT: BACKGROUND: Single nucleotide polymorphisms (SNPs) are the most abundant type of genetic variation in eukaryotic genomes and have recently become the marker of choice in a wide variety of ecological and evolutionary studies. The advent of next-generation sequencing (NGS) technologies has made it possible to efficiently genotype a large number of SNPs in the non-model organisms with no or limited genomic resources. Most NGS-based genotyping methods require a reference genome to perform accurate SNP calling. Little effort, however, has yet been devoted to developing or improving algorithms for accurate SNP calling in the absence of a reference genome. RESULTS: Here we describe an improved maximum likelihood (ML) algorithm called iML, which can achieve high genotyping accuracy for SNP calling in the non-model organisms without a reference genome. The iML algorithm incorporates the mixed Poisson/normal model to detect composite read clusters and can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions. Through analysis of simulation and real sequencing datasets, we demonstrate that in comparison with ML or a threshold approach, iML can remarkably improve the accuracy of de novo SNP genotyping and is especially powerful for the reference-free genotyping in diploid genomes with high repeat contents. CONCLUSIONS: The iML algorithm can efficiently prevent incorrect SNP calls resulting from repetitive genomic regions, and thus outperforms the original ML algorithm by achieving much higher genotyping accuracy. Our algorithm is therefore very useful for accurate de novo SNP genotyping in the non-model organisms without a reference genome.  相似文献   

6.
The problem of genotyping polyploids is extremely important for the creation of genetic maps and assembly of complex plant genomes. Despite its significance, polyploid genotyping still remains largely unsolved and suffers from a lack of statistical formality. In this paper a graphical bayesian model for SNP genotyping data is introduced. This model can infer genotypes even when the ploidy of the population is unknown. We also introduce an algorithm for finding the exact maximum a posteriori genotype configuration with this model. This algorithm is implemented in a freely available web-based software package SuperMASSA. We demonstrate the utility, efficiency, and flexibility of the model and algorithm by applying them to two different platforms, each of which is applied to a polyploid data set: Illumina GoldenGate data from potato and Sequenom MassARRAY data from sugarcane. Our method achieves state-of-the-art performance on both data sets and can be trivially adapted to use models that utilize prior information about any platform or species.  相似文献   

7.
High-throughput SNP genotyping   总被引:5,自引:0,他引:5  
Whole genome approaches using single nucleotide polymorphism (SNP) markers have the potential to transform complex disease genetics and expedite pharmacogenetics research. This has led to a requirement for high-throughput SNP genotyping platforms. Development of a successful high-throughput genotyping platform depends on coupling reliable assay chemistry with an appropriate detection system to maximise efficiency with respect to accuracy, speed and cost. Current technology platforms are able to deliver throughputs in excess of 100 000 genotypes per day, with an accuracy of >99%, at a cost of 20-30 cents per genotype. In order to meet the demands of the coming years, however, genotyping platforms need to deliver throughputs in the order of one million genotypes per day at a cost of only a few cents per genotype. In addition, DNA template requirements must be minimised such that hundreds of thousands of SNPs can be interrogated using a relatively small amount of genomic DNA. As such, it is predicted that the next generation of high-throughput genotyping platforms will exploit large-scale multiplex reactions and solid phase assay detection systems.  相似文献   

8.
MOTIVATION: Preliminary results on the data produced using the Affymetrix large-scale genotyping platforms show that it is necessary to construct improved genotype calling algorithms. There is evidence that some of the existing algorithms lead to an increased error rate in heterozygous genotypes, and a disproportionately large rate of heterozygotes with missing genotypes. Non-random errors and missing data can lead to an increase in the number of false discoveries in genetic association studies. Therefore, the factors that need to be evaluated in assessing the performance of an algorithm are the missing data (call) and error rates, but also the heterozygous proportions in missing data and errors. RESULTS: We introduce a novel genotype calling algorithm (GEL) for the Affymetrix GeneChip arrays. The algorithm uses likelihood calculations that are based on distributions inferred from the observed data. A key ingredient in accurate genotype calling is weighting the information that comes from each probe quartet according to the quality/reliability of the data in the quartet, and prior information on the performance of the quartet. AVAILABILITY: The GEL software is implemented in R and is available by request from the corresponding author at nicolae@galton.uchicago.edu.  相似文献   

9.

Background  

Microarray measurements are susceptible to a variety of experimental artifacts, some of which give rise to systematic biases that are spatially dependent in a unique way on each chip. It is likely that such artifacts affect many SNP arrays, but the normalization methods used in currently available genotyping algorithms make no attempt at spatial bias correction. Here, we propose an effective single-chip spatial bias removal procedure for Affymetrix 6.0 SNP arrays or platforms with similar design features. This procedure deals with both extreme and subtle biases and is intended to be applied before standard genotype calling algorithms.  相似文献   

10.
MOTIVATION: Modern strategies for mapping disease loci require efficient genotyping of a large number of known polymorphic sites in the genome. The sensitive and high-throughput nature of hybridization-based DNA microarray technology provides an ideal platform for such an application by interrogating up to hundreds of thousands of single nucleotide polymorphisms (SNPs) in a single assay. Similar to the development of expression arrays, these genotyping arrays pose many data analytic challenges that are often platform specific. Affymetrix SNP arrays, e.g. use multiple sets of short oligonucleotide probes for each known SNP, and require effective statistical methods to combine these probe intensities in order to generate reliable and accurate genotype calls. RESULTS: We developed an integrated multi-SNP, multi-array genotype calling algorithm for Affymetrix SNP arrays, MAMS, that combines single-array multi-SNP (SAMS) and multi-array, single-SNP (MASS) calls to improve the accuracy of genotype calls, without the need for training data or computation-intensive normalization procedures as in other multi-array methods. The algorithm uses resampling techniques and model-based clustering to derive single array based genotype calls, which are subsequently refined by competitive genotype calls based on (MASS) clustering. The resampling scheme caps computation for single-array analysis and hence is readily scalable, important in view of expanding numbers of SNPs per array. The MASS update is designed to improve calls for atypical SNPs, harboring allele-imbalanced binding affinities, that are difficult to genotype without information from other arrays. Using a publicly available data set of HapMap samples from Affymetrix, and independent calls by alternative genotyping methods from the HapMap project, we show that our approach performs competitively to existing methods. AVAILABILITY: R functions are available upon request from the authors.  相似文献   

11.
Single nucleotide polymorphisms (SNPs) are the most widely used marker in studies to assess associations between genetic variants and complex traits or diseases. They are also becoming increasingly important in the study of the evolution and history of humans and other species. The analysis and processing of SNPs obtained thanks to high-throughput technologies imply the time consuming and costly use of different, complex and usually format-incompatible software. SNPator is a user-friendly web-based SNP data analysis suite that integrates, among many other algorithms, the most common steps of a SNP association study. It frees the user from the need to have large computer facilities and an in depth knowledge of genetic software installation and management. Genotype data is directly read from the output files of the usual genotyping platforms. Phenotypic data on the samples can also be easily uploaded. Many different quality control and analysis procedures can be performed either by using built-in SNPator algorithms or by calling standard genetic software. AVAILABILITY: Access is granted from the SNPator webpage http://www.snpator.org.  相似文献   

12.
A genotype calling algorithm for affymetrix SNP arrays   总被引:11,自引:0,他引:11  
MOTIVATION: A classification algorithm, based on a multi-chip, multi-SNP approach is proposed for Affymetrix SNP arrays. Current procedures for calling genotypes on SNP arrays process all the features associated with one chip and one SNP at a time. Using a large training sample where the genotype labels are known, we develop a supervised learning algorithm to obtain more accurate classification results on new data. The method we propose, RLMM, is based on a robustly fitted, linear model and uses the Mahalanobis distance for classification. The chip-to-chip non-biological variance is reduced through normalization. This model-based algorithm captures the similarities across genotype groups and probes, as well as across thousands of SNPs for accurate classification. In this paper, we apply RLMM to Affymetrix 100 K SNP array data, present classification results and compare them with genotype calls obtained from the Affymetrix procedure DM, as well as to the publicly available genotype calls from the HapMap project.  相似文献   

13.
Single nucleotide polymorphisms (SNPs) are the most commonly used polymorphic markers in genetics studies. Among the different platforms for SNP genotyping, Luminex is one of the less exploited mainly due to the lack of a robust (semi-automated and replicable) freely available genotype calling software. Here we describe a clustering algorithm that provides automated SNP calls for Luminex genotyping assays. We genotyped 3 SNPs in a cohort of 330 childhood leukemia patients, 200 parents of patient and 325 healthy individuals and used the Automated Luminex Genotyping (ALG) algorithm for SNP calling. ALG genotypes were called twice to test for reproducibility and were compared to sequencing data to test for accuracy. Globally, this analysis demonstrates the accuracy (99.6%) of the method, its reproducibility (99.8%) and the low level of no genotyping calls (3.4%). The high efficiency of the method proves that ALG is a suitable alternative to the current commercial software. ALG is semi-automated, and provides numerical measures of confidence for each SNP called, as well as an effective graphical plot. Moreover ALG can be used either through a graphical user interface, requiring no specific informatics knowledge, or through command line with access to the open source code. The ALG software has been implemented in R and is freely available for non-commercial use either at http://alg.sourceforge.net or by request to mathieu.bourgey@umontreal.ca.  相似文献   

14.
BACKGROUND: Human diversity, namely single nucleotide polymorphisms (SNPs), is becoming a focus of biomedical research. Despite the binary nature of SNP determination, the majority of genotyping assay data need a critical evaluation for genotype calling. We applied statistical models to improve the automated analysis of 2-dimensional SNP data. METHODS: We derived several quantities in the framework of Gaussian mixture models that provide figures of merit to objectively measure the data quality. The accuracy of individual observations is scored as the probability of belonging to a certain genotype cluster, while the assay quality is measured by the overlap between the genotype clusters. RESULTS: The approach was extensively tested with a dataset of 438 nonredundant SNP assays comprising >150,000 datapoints. The performance of our automatic scoring method was compared with manual assignments. The agreement for the overall assay quality is remarkably good, and individual observations were scored differently by man and machine in 2.6% of cases, when applying stringent probability threshold values. CONCLUSION: Our definition of bounds for the accuracy for complete assays in terms of misclassification probabilities goes beyond other proposed analysis methods. We expect the scoring method to minimise human intervention and provide a more objective error estimate in genotype calling.  相似文献   

15.
High-throughput single nucleotide polymorphism (SNP) genotyping systems provide two kinds of fluorescent signals detected from different alleles. In current technologies, the process of genotype discrimination requires subjective judgments by expert operators, even when using clustering algorithms. Here, we propose two evaluation measures to manage fluorescent scatter data with nonclear plot aggregation. The first is the marker ranking measure, which provides a ranking system for the SNP markers based on the distance between the scatter plot distribution and a user-defined ideal distribution. The second measure, called individual genotype membership, uses the membership probability of each genotype related to an individual plot in the scatter data. In verification experiments, the marker ranking measure determined the ranking of SNP markers correlated with the subjective order of SNP markers judged by an expert operator. The experiment using the individual genotype membership measure clarified that the total number of unclassified individuals was remarkably reduced compared to that of manually unclassified ones. These two evaluation measures were implemented as the GTAssist software. GTAssist provides objective standards and avoids subjective biases in SNP genotyping workflows.  相似文献   

16.
Current genotype-calling methods such as Robust Linear Model with Mahalanobis Distance Classifier (RLMM) and Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) provide accurate calling results for Affymetrix Single Nucleotide Polymorphisms (SNP) chips. However, these methods are computationally expensive as they employ preprocess procedures, including chip data normalization and other sophisticated statistical techniques. In the small sample case the accuracy rate may drop significantly. We develop a new genotype calling method for Affymetrix 100 k and 500 k SNP chips. A two-stage classification scheme is proposed to obtain a fast genotype calling algorithm. The first stage uses unsupervised classification to quickly discriminate genotypes with high accuracy for the majority of the SNPs. And the second stage employs a supervised classification method to incorporate allele frequency information either from the HapMap data or from a self-training scheme. Confidence score is provided for every genotype call. The overall performance is shown to be comparable to that of CRLMM as verified by the known gold standard HapMap data and is superior in small sample cases. The new algorithm is computationally simple and standalone in the sense that a self-training scheme can be used without employing any other training data. A package implementing the calling algorithm is freely available at http://www.sfs.ecnu.edu.cn/teachers/xuj_en.html.  相似文献   

17.

Background

High-throughput sequencing has opened up exciting possibilities in population and conservation genetics by enabling the assessment of genetic variation at genome-wide scales. One approach to reduce genome complexity, i.e. investigating only parts of the genome, is reduced-representation library (RRL) sequencing. Like similar approaches, RRL sequencing reduces ascertainment bias due to simultaneous discovery and genotyping of single-nucleotide polymorphisms (SNPs) and does not require reference genomes. Yet, generating such datasets remains challenging due to laboratory and bioinformatical issues. In the laboratory, current protocols require improvements with regards to sequencing homologous fragments to reduce the number of missing genotypes. From the bioinformatical perspective, the reliance of most studies on a single SNP caller disregards the possibility that different algorithms may produce disparate SNP datasets.

Results

We present an improved RRL (iRRL) protocol that maximizes the generation of homologous DNA sequences, thus achieving improved genotyping-by-sequencing efficiency. Our modifications facilitate generation of single-sample libraries, enabling individual genotype assignments instead of pooled-sample analysis. We sequenced ~1% of the orangutan genome with 41-fold median coverage in 31 wild-born individuals from two populations. SNPs and genotypes were called using three different algorithms. We obtained substantially different SNP datasets depending on the SNP caller. Genotype validations revealed that the Unified Genotyper of the Genome Analysis Toolkit and SAMtools performed significantly better than a caller from CLC Genomics Workbench (CLC). Of all conflicting genotype calls, CLC was only correct in 17% of the cases. Furthermore, conflicting genotypes between two algorithms showed a systematic bias in that one caller almost exclusively assigned heterozygotes, while the other one almost exclusively assigned homozygotes.

Conclusions

Our enhanced iRRL approach greatly facilitates genotyping-by-sequencing and thus direct estimates of allele frequencies. Our direct comparison of three commonly used SNP callers emphasizes the need to question the accuracy of SNP and genotype calling, as we obtained considerably different SNP datasets depending on caller algorithms, sequencing depths and filtering criteria. These differences affected scans for signatures of natural selection, but will also exert undue influences on demographic inferences. This study presents the first effort to generate a population genomic dataset for wild-born orangutans with known population provenance.  相似文献   

18.
Current genotyping algorithms typically call genotypes by clustering allele-specific intensity data on a single nucleotide polymorphism (SNP) by SNP basis. This approach assumes the availability of a large number of control samples that have been sampled on the same array and platform. We have developed a SNP genotyping algorithm for the Illumina Infinium SNP genotyping assay that is entirely within-sample and does not require the need for a population of control samples nor parameters derived from such a population. Our algorithm exhibits high concordance with current methods and >99% call accuracy on HapMap samples. The ability to call genotypes using only within-sample information makes the method computationally light and practical for studies involving small sample sizes and provides a valuable independent quality control metric for other population-based approaches. AVAILABILITY: http://www.stats.ox.ac.uk/~giannoul/GenoSNP/.  相似文献   

19.
Genotype imputation is potentially a zero-cost method for bridging gaps in coverage and power between genotyping platforms. Here, we quantify these gains in power and coverage by using 1,376 population controls that are from the 1958 British Birth Cohort and were genotyped by the Wellcome Trust Case-Control Consortium with the Illumina HumanHap 550 and Affymetrix SNP Array 5.0 platforms. Approximately 50% of genotypes at single-nucleotide polymorphisms (SNPs) exclusively on the HumanHap 550 can be accurately imputed from direct genotypes on the SNP Array 5.0 or Illumina HumanHap 300. This roughly halves differences in coverage and power between the platforms. When the relative cost of currently available genome-wide SNP platforms is accounted for, and finances are limited but sample size is not, the highest-powered strategy in European populations is to genotype a larger number of individuals with the HumanHap 300 platform and carry out imputation. Platforms consisting of around 1 million SNPs offer poor cost efficiency for SNP association in European populations.  相似文献   

20.
Single nucleotide polymorphisms (SNPs) represent the most common form of DNA sequence variation in mammalian livestock genomes. While the past decade has witnessed major advances in SNP genotyping technologies, genotyping errors caused, in part, by the biochemistry underlying the genotyping platform used, can occur. These errors can distort project results and conclusions and can result in incorrect decisions in animal management and breeding programs; hence, SNP genotype calls must be accurate and reliable. In this study, 263 Bos spp. samples were genotyped commercially for a total of 16 SNPs. Of the total possible 4,208 SNP genotypes, 4,179 SNP genotypes were generated, yielding a genotype call rate of 99.31% (standard deviation?±?0.93%). Between 110 and 263 samples were subsequently re-genotyped by us for all 16 markers using a custom-designed SNP genotyping platform, and of the possible 3,819 genotypes a total of 3,768 genotypes were generated (98.70% genotype call rate, SD?±?1.89%). A total of 3,744 duplicate genotypes were generated for both genotyping platforms, and comparison of the genotype calls for both methods revealed 3,741 concordant SNP genotype call rates (99.92% SNP genotype concordance rate). These data indicate that both genotyping methods used can provide livestock geneticists with reliable, reproducible SNP genotypic data for in-depth statistical analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号