首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A highly reliable and efficient technology has been developed for high-throughput DNA polymorphism screening and large-scale genotyping. Photolithographic synthesis has been used to generate miniaturized, high-density oligonucleotide arrays. Dedicated instrumentation and software have been developed for array hybridization, fluorescent detection, and data acquisition and analysis. Specific oligonucleotide probe arrays have been designed to rapidly screen human STSs, known genes and full-length cDNAs. This has led to the identification of several thousand biallelic single-nucleotide polymorphisms (SNPs). Meanwhile, a rapid and robust method has been developed for genotyping these SNPs using oligonucleotide arrays. Each allele of an SNP marker is represented on the array by a set of perfect match and mismatch probes. Prototype genotyping chips have been produced to detect 400, 600 and 3000 of these SNPs. Based on the preliminary results, using oligonucleotide arrays to genotype several thousand polymorphic loci simultaneously appears feasible.  相似文献   

2.
Association studies in populations that are genetically heterogeneous can yield large numbers of spurious associations if population subgroups are unequally represented among cases and controls. This problem is particularly acute for studies involving pooled genotyping of very large numbers of single-nucleotide-polymorphism (SNP) markers, because most methods for analysis of association in structured populations require individual genotyping data. In this study, we present several strategies for matching case and control pools to have similar genetic compositions, based on ancestry information inferred from genotype data for approximately 300 SNPs tiled on an oligonucleotide-based genotyping array. We also discuss methods for measuring the impact of population stratification on an association study. Results for an admixed population and a phenotype strongly confounded with ancestry show that these simple matching strategies can effectively mitigate the impact of population stratification.  相似文献   

3.
High‐throughput high‐density genotyping arrays continue to be a fast, accurate, and cost‐effective method for genotyping thousands of polymorphisms in high numbers of individuals. Here, we have developed a new high‐density SNP genotyping array (103,270 SNPs) for honey bees, one of the most ecologically and economically important pollinators worldwide. SNPs were detected by conducting whole‐genome resequencing of 61 honey bee drones (haploid males) from throughout Europe. Selection of SNPs for the chip was done in multiple steps using several criteria. The majority of SNPs were selected based on their location within known candidate regions or genes underlying a range of honey bee traits, including hygienic behavior against pathogens, foraging, and subspecies. Additionally, markers from a GWAS of hygienic behavior against the major honey bee parasite Varroa destructor were brought over. The chip also includes SNPs associated with each of three major breeding objectives—honey yield, gentleness, and Varroa resistance. We validated the chip and make recommendations for its use by determining error rates in repeat genotypings, examining the genotyping performance of different tissues, and by testing how well different sample types represent the queen's genotype. The latter is a key test because it is highly beneficial to be able to determine the queen's genotype by nonlethal means. The array is now publicly available and we suggest it will be a useful tool in genomic selection and honey bee breeding, as well as for GWAS of different traits, and for population genomic, adaptation, and conservation questions.  相似文献   

4.
Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variations amongst species. With the genome‐wide SNP discovery, many genome‐wide association studies are likely to identify multiple genetic variants that are associated with complex diseases. However, genotyping all existing SNPs for a large number of samples is still challenging even though SNP arrays have been developed to facilitate the task. Therefore, it is essential to select only informative SNPs representing the original SNP distributions in the genome (tag SNP selection) for genome‐wide association studies. These SNPs are usually chosen from haplotypes and called haplotype tag SNPs (htSNPs). Accordingly, the scale and cost of genotyping are expected to be largely reduced. We introduce binary particle swarm optimization (BPSO) with local search capability to improve the prediction accuracy of STAMPA. The proposed method does not rely on block partitioning of the genomic region, and consistently identified tag SNPs with higher prediction accuracy than either STAMPA or SVM/STSA. We compared the prediction accuracy and time complexity of BPSO to STAMPA and an SVM‐based (SVM/STSA) method using publicly available data sets. For STAMPA and SVM/STSA, BPSO effective improved prediction accuracy for smaller and larger scale data sets. These results demonstrate that the BPSO method selects tag SNP with higher accuracy no matter the scale of data sets is used. © 2009 American Institute of Chemical Engineers Biotechnol. Prog., 2010  相似文献   

5.
MOTIVATION: Modern strategies for mapping disease loci require efficient genotyping of a large number of known polymorphic sites in the genome. The sensitive and high-throughput nature of hybridization-based DNA microarray technology provides an ideal platform for such an application by interrogating up to hundreds of thousands of single nucleotide polymorphisms (SNPs) in a single assay. Similar to the development of expression arrays, these genotyping arrays pose many data analytic challenges that are often platform specific. Affymetrix SNP arrays, e.g. use multiple sets of short oligonucleotide probes for each known SNP, and require effective statistical methods to combine these probe intensities in order to generate reliable and accurate genotype calls. RESULTS: We developed an integrated multi-SNP, multi-array genotype calling algorithm for Affymetrix SNP arrays, MAMS, that combines single-array multi-SNP (SAMS) and multi-array, single-SNP (MASS) calls to improve the accuracy of genotype calls, without the need for training data or computation-intensive normalization procedures as in other multi-array methods. The algorithm uses resampling techniques and model-based clustering to derive single array based genotype calls, which are subsequently refined by competitive genotype calls based on (MASS) clustering. The resampling scheme caps computation for single-array analysis and hence is readily scalable, important in view of expanding numbers of SNPs per array. The MASS update is designed to improve calls for atypical SNPs, harboring allele-imbalanced binding affinities, that are difficult to genotype without information from other arrays. Using a publicly available data set of HapMap samples from Affymetrix, and independent calls by alternative genotyping methods from the HapMap project, we show that our approach performs competitively to existing methods. AVAILABILITY: R functions are available upon request from the authors.  相似文献   

6.
《Genetics》2015,200(4):1051-1060
The Kaiser Permanente (KP) Research Program on Genes, Environment and Health (RPGEH), in collaboration with the University of California—San Francisco, undertook genome-wide genotyping of >100,000 subjects that constitute the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. The project, which generated >70 billion genotypes, represents the first large-scale use of the Affymetrix Axiom Genotyping Solution. Because genotyping took place over a short 14-month period, creating a near-real-time analysis pipeline for experimental assay quality control and final optimized analyses was critical. Because of the multi-ethnic nature of the cohort, four different ethnic-specific arrays were employed to enhance genome-wide coverage. All assays were performed on DNA extracted from saliva samples. To improve sample call rates and significantly increase genotype concordance, we partitioned the cohort into disjoint packages of plates with similar assay contexts. Using strict QC criteria, the overall genotyping success rate was 103,067 of 109,837 samples assayed (93.8%), with a range of 92.1–95.4% for the four different arrays. Similarly, the SNP genotyping success rate ranged from 98.1 to 99.4% across the four arrays, the variation depending mostly on how many SNPs were included as single copy vs. double copy on a particular array. The high quality and large scale of genotype data created on this cohort, in conjunction with comprehensive longitudinal data from the KP electronic health records of participants, will enable a broad range of highly powered genome-wide association studies on a diversity of traits and conditions.  相似文献   

7.
MOTIVATION: A high density of single nucleotide polymorphism (SNP) coverage on the genome is desirable and often an essential requirement for population genetics studies. Region-specific or chromosome-specific linkage studies also benefit from the availability of as many high quality SNPs as possible. The availability of millions of SNPs from both Perlegen and the public domain and the development of an efficient microarray-based assay for genotyping SNPs has brought up some interesting analytical challenges. Effective methods for the selection of optimal subsets of SNPs spanning the genome and methods for accurately calling genotypes from probe hybridization patterns have enabled the development of a new microarray-based system for robustly genotyping over 100,000 SNPs per sample. RESULTS: We introduce a new dynamic model-based algorithm (DM) for screening over 3 million SNPs and genotyping over 100,000 SNPs. The model is based on four possible underlying states: Null, A, AB and B for each probe quartet. We calculate a probe-level log likelihood for each model and then select between the four competing models with an SNP-level statistical aggregation across multiple probe quartets to provide a high-quality genotype call along with a quality measure of the call. We assess performance with HapMap reference genotypes, informative Mendelian inheritance relationship in families, and consistency between DM and another genotype classification method. At a call rate of 95.91% the concordance with reference genotypes from the HapMap Project is 99.81% based on over 1.5 million genotypes, the Mendelian error rate is 0.018% based on 10 trios, and the consistency between DM and MPAM is 99.90% at a comparable rate of 97.18%. We also develop methods for SNP selection and optimal probe selection. AVAILABILITY: The DM algorithm is available in Affymetrix's Genotyping Tools software package and in Affymetrix's GDAS software package. See http://www.affymetrix.com for further information. 10 K and 100 K mapping array data are available on the Affymetrix website.  相似文献   

8.
SNP arrays are widely used in genetic research and agricultural genomics applications, and the quality of SNP genotyping data is of paramount importance. In the present study, SNP genotyping concordance and discordance were evaluated for commercial bovine SNP arrays based on two types of quality assurance (QA) samples provided by Neogen GeneSeek. The genotyping discordance rates (GDRs) between chips were on average between 0.06% and 0.37% based on the QA type I data and between 0.05% and 0.15% based on the QA type II data. The average genotyping error rate (GER) pertaining to single SNP chips, based on the QA type II data, varied between 0.02% and 0.08% per SNP and between 0.01% and 0.06% per sample. These results indicate that genotyping concordance rate was high (i.e. from 99.63% to 99.99%). Nevertheless, mitochondrial and Y chromosome SNPs had considerably elevated GDRs and GERs compared to the SNPs on the 29 autosomes and X chromosome. The majority of genotyping errors resulted from single allotyping errors, which also included the opposite instances for allele ‘dropout’ (i.e. from AB to AA or BB). Simultaneous allotyping errors on both alleles (e.g. mistaking AA for BB or vice versa) were relatively rare. Finally, a list of SNPs with a GER greater than 1% is provided. Interpretation of association effects of these SNPs, for example in genome‐wide association studies, needs to be taken with caution. The genotyping concordance information needs to be considered in the optimal design of future bovine SNP arrays.  相似文献   

9.
Genotyping and annotation of Affymetrix SNP arrays   总被引:1,自引:0,他引:1  
In this paper we develop a new method for genotyping Affymetrix single nucleotide polymorphism (SNP) array. The method is based on (i) using multiple arrays at the same time to determine the genotypes and (ii) a model that relates intensities of individual SNPs to each other. The latter point allows us to annotate SNPs that have poor performance, either because of poor experimental conditions or because for one of the alleles the probes do not behave in a dose–response manner. Generally, our method agrees well with a method developed by Affymetrix. When both methods make a call they agree in 99.25% (using standard settings) of the cases, using a sample of 113 Affymetrix 10k SNP arrays. In the majority of cases where the two methods disagree, our method makes a genotype call, whereas the method by Affymetrix makes a no call, i.e. the genotype of the SNP is not determined. By visualization it is indicated that our method is likely to be correct in majority of these cases. In addition, we demonstrate that our method produces more SNPs that are in concordance with Hardy–Weinberg equilibrium than the method by Affymetrix. Finally, we have validated our method on HapMap data and shown that the performance of our method is comparable to other methods.  相似文献   

10.
A genotype calling algorithm for the Illumina BeadArray platform   总被引:2,自引:0,他引:2  
MOTIVATION: Large-scale genotyping relies on the use of unsupervised automated calling algorithms to assign genotypes to hybridization data. A number of such calling algorithms have been recently established for the Affymetrix GeneChip genotyping technology. Here, we present a fast and accurate genotype calling algorithm for the Illumina BeadArray genotyping platforms. As the technology moves towards assaying millions of genetic polymorphisms simultaneously, there is a need for an integrated and easy-to-use software for calling genotypes. RESULTS: We have introduced a model-based genotype calling algorithm which does not rely on having prior training data or require computationally intensive procedures. The algorithm can assign genotypes to hybridization data from thousands of individuals simultaneously and pools information across multiple individuals to improve the calling. The method can accommodate variations in hybridization intensities which result in dramatic shifts of the position of the genotype clouds by identifying the optimal coordinates to initialize the algorithm. By incorporating the process of perturbation analysis, we can obtain a quality metric measuring the stability of the assigned genotype calls. We show that this quality metric can be used to identify SNPs with low call rates and accuracy. AVAILABILITY: The C++ executable for the algorithm described here is available by request from the authors.  相似文献   

11.

Background

High density genotyping data are indispensable for genomic analyses of complex traits in animal and crop species. Maize is one of the most important crop plants worldwide, however a high density SNP genotyping array for analysis of its large and highly dynamic genome was not available so far.

Results

We developed a high density maize SNP array composed of 616,201 variants (SNPs and small indels). Initially, 57 M variants were discovered by sequencing 30 representative temperate maize lines and then stringently filtered for sequence quality scores and predicted conversion performance on the array resulting in the selection of 1.2 M polymorphic variants assayed on two screening arrays. To identify high-confidence variants, 285 DNA samples from a broad genetic diversity panel of worldwide maize lines including the samples used for sequencing, important founder lines for European maize breeding, hybrids, and proprietary samples with European, US, semi-tropical, and tropical origin were used for experimental validation. We selected 616 k variants according to their performance during validation, support of genotype calls through sequencing data, and physical distribution for further analysis and for the design of the commercially available Affymetrix® Axiom® Maize Genotyping Array. This array is composed of 609,442 SNPs and 6,759 indels. Among these are 116,224 variants in coding regions and 45,655 SNPs of the Illumina® MaizeSNP50 BeadChip for study comparison. In a subset of 45,974 variants, apart from the target SNP additional off-target variants are detected, which show only a minor bias towards intermediate allele frequencies. We performed principal coordinate and admixture analyses to determine the ability of the array to detect and resolve population structure and investigated the extent of LD within a worldwide validation panel.

Conclusions

The high density Affymetrix® Axiom® Maize Genotyping Array is optimized for European and American temperate maize and was developed based on a diverse sample panel by applying stringent quality filter criteria to ensure its suitability for a broad range of applications. With 600 k variants it is the largest currently publically available genotyping array in crop species.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-823) contains supplementary material, which is available to authorized users.  相似文献   

12.
Robust assessment of genetic effects on quantitative traits or complex-disease risk requires synthesis of evidence from multiple studies. Frequently, studies have genotyped partially overlapping sets of SNPs within a gene or region of interest, hampering attempts to combine all the available data. By using the example of C-reactive protein (CRP) as a quantitative trait, we show how linkage disequilibrium in and around its gene facilitates use of Bayesian hierarchical models to integrate informative data from all available genetic association studies of this trait, irrespective of the SNP typed. A variable selection scheme, followed by contextualization of SNPs exhibiting independent associations within the haplotype structure of the gene, enhanced our ability to infer likely causal variants in this region with population-scale data. This strategy, based on data from a literature based systematic review and substantial new genotyping, facilitated the most comprehensive evaluation to date of the role of variants governing CRP levels, providing important information on the minimal subset of SNPs necessary for comprehensive evaluation of the likely causal relevance of elevated CRP levels for coronary-heart-disease risk by Mendelian randomization. The same method could be applied to evidence synthesis of other quantitative traits, whenever the typed SNPs vary among studies, and to assist fine mapping of causal variants.  相似文献   

13.
Saunders IW  Brohede J  Hannan GN 《Genomics》2007,90(3):291-296
A simple method of inferring the genotyping error rate of SNP arrays and similar high-throughput genotyping methods from Mendelian errors is described. Application to genotypes from small families using the Affymetrix GeneChip Human Mapping 50 k Array indicates an error rate of about 0.1%, and this rate can be reduced by increasing the quality criterion for calls, though at the cost of a reduced genotype call rate, which limits the benefit available. Simulated data are used to show that the number of SNPs on this array is sufficient for such a low error rate to have little impact on identical by descent-based inference for disease linkage in sib-pair studies.  相似文献   

14.
The dog is a valuable model species for the genetic analysis of complex traits, and the use of genotype imputation in dogs will be an important tool for future studies. It is of particular interest to analyse the effect of factors like single nucleotide polymorphism (SNP) density of genotyping arrays and relatedness between dogs on imputation accuracy due to the acknowledged genetic and pedigree structure of dog breeds. In this study, we simulated different genotyping strategies based on data from 1179 Labrador Retriever dogs. The study involved 5826 SNPs on chromosome 1 representing the high density (HighD) array; the low‐density (LowD) array was simulated by masking different proportions of SNPs on the HighD array. The correlations between true and imputed genotypes for a realistic masking level of 87.5% ranged from 0.92 to 0.97, depending on the scenario used. A correlation of 0.92 was found for a likely scenario (10% of dogs genotyped using HighD, 87.5% of HighD SNPs masked in the LowD array), which indicates that genotype imputation in Labrador Retrievers can be a valuable tool to reduce experimental costs while increasing sample size. Furthermore, we show that genotype imputation can be performed successfully even without pedigree information and with low relatedness between dogs in the reference and validation sets. Based on these results, the impact of genotype imputation was evaluated in a genome‐wide association analysis and genomic prediction in Labrador Retrievers.  相似文献   

15.
A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality.  相似文献   

16.
The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.  相似文献   

17.
Hao K  Li C  Rosenow C  Hung Wong W 《Genomics》2004,84(4):623-630
Currently, most analytical methods assume all observed genotypes are correct; however, it is clear that errors may reduce statistical power or bias inference in genetic studies. We propose procedures for estimating error rate in genetic analysis and apply them to study the GeneChip Mapping 10K array, which is a technology that has recently become available and allows researchers to survey over 10,000 SNPs in a single assay. We employed a strategy to estimate the genotype error rate in pedigree data. First, the "dose-response" reference curve between error rate and the observable error number were derived by simulation, conditional on given pedigree structures and genotypes. Second, the error rate was estimated by calibrating the number of observed errors in real data to the reference curve. We evaluated the performance of this method by simulation study and applied it to a data set of 30 pedigrees genotyped using the GeneChip Mapping 10K array. This method performed favorably in all scenarios we surveyed. The dose-response reference curve was monotone and almost linear with a large slope. The method was able to estimate accurately the error rate under various pedigree structures and error models and under heterogeneous error rates. Using this method, we found that the average genotyping error rate of the GeneChip Mapping 10K array was about 0.1%. Our method provides a quick and unbiased solution to address the genotype error rate in pedigree data. It behaves well in a wide range of settings and can be easily applied in other genetic projects. The robust estimation of genotyping error rate allows us to estimate power and sample size and conduct unbiased genetic tests. The GeneChip Mapping 10K array has a low overall error rate, which is consistent with the results obtained from alternative genotyping assays.  相似文献   

18.
Loss of heterozygosity (LOH) of chromosomal regions bearing tumor suppressors is a key event in the evolution of epithelial and mesenchymal tumors. Identification of these regions usually relies on genotyping tumor and counterpart normal DNA and noting regions where heterozygous alleles in the normal DNA become homozygous in the tumor. However, paired normal samples for tumors and cell lines are often not available. With the advent of oligonucleotide arrays that simultaneously assay thousands of single-nucleotide polymorphism (SNP) markers, genotyping can now be done at high enough resolution to allow identification of LOH events by the absence of heterozygous loci, without comparison to normal controls. Here we describe a hidden Markov model-based method to identify LOH from unpaired tumor samples, taking into account SNP intermarker distances, SNP-specific heterozygosity rates, and the haplotype structure of the human genome. When we applied the method to data genotyped on 100 K arrays, we correctly identified 99% of SNP markers as either retention or loss. We also correctly identified 81% of the regions of LOH, including 98% of regions greater than 3 megabases. By integrating copy number analysis into the method, we were able to distinguish LOH from allelic imbalance. Application of this method to data from a set of prostate samples without paired normals identified known regions of prevalent LOH. We have developed a method for analyzing high-density oligonucleotide SNP array data to accurately identify of regions of LOH and retention in tumors without the need for paired normal samples.  相似文献   

19.
High‐density SNP genotyping arrays can be designed for any species given sufficient sequence information of high quality. Two high‐density SNP arrays relying on the Infinium iSelect technology (Illumina) were designed for use in the conifer white spruce (Picea glauca). One array contained 7338 segregating SNPs representative of 2814 genes of various molecular functional classes for main uses in genetic association and population genetics studies. The other one contained 9559 segregating SNPs representative of 9543 genes for main uses in population genetics, linkage mapping of the genome and genomic prediction. The SNPs assayed were discovered from various sources of gene resequencing data. SNPs predicted from high‐quality sequences derived from genomic DNA reached a genotyping success rate of 64.7%. Nonsingleton in silico SNPs (i.e. a sequence polymorphism present in at least two reads) predicted from expressed sequenced tags obtained with the Roche 454 technology and Illumina GAII analyser resulted in a similar genotyping success rate of 71.6% when the deepest alignment was used and the most favourable SNP probe per gene was selected. A variable proportion of these SNPs was shared by other nordic and subtropical spruce species from North America and Europe. The number of shared SNPs was inversely proportional to phylogenetic divergence and standing genetic variation in the recipient species, but positively related to allele frequency in P. glauca natural populations. These validated SNP resources should open up new avenues for population genetics and comparative genetic mapping at a genomic scale in spruce species.  相似文献   

20.
Single-nucleotide polymorphisms (SNPs) are considered useful polymorphic markers for genetic studies of polygenic traits. A new practical approach to high-throughput genotyping of SNPs in a large number of individuals is needed in association study and other studies on relationships between genes and diseases. We have developed an accurate and high-throughput method for determining the allele frequencies by pooling the DNA samples and applying a DNA microarray hybridization analysis. In this method, the combination of the microarray, DNA pooling, probe pair hybridization, and fluorescent ratio analysis solves the dual problems of parallel multiple sample analysis, and parallel multiplex SNP genotyping for association study. Multiple DNA samples are immobilized on a slide and a single hybridization is performed with a pool of allele-specific oligonucleotide probes. The results of this study show that hybridization of microarray from pooled DNA samples can accurately obtain estimates of absolute allele frequencies in a sample pool. This method can also be used to identify differences in allele frequencies in distinct populations. It is amenable to automation and is suitable for immediate utilization for high-throughput genotyping of SNP.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号