共查询到20条相似文献,搜索用时 9 毫秒
1.
Highly cost-efficient genome-wide association studies using DNA pools and dense SNP arrays 总被引:3,自引:0,他引:3 下载免费PDF全文
Macgregor S Zhao ZZ Henders A Nicholas MG Montgomery GW Visscher PM 《Nucleic acids research》2008,36(6):e35
Genome-wide association (GWA) studies to map genes for complex traits are powerful yet costly. DNA-pooling strategies have the potential to dramatically reduce the cost of GWA studies. Pooling using Affymetrix arrays has been proposed and used but the efficiency of these arrays has not been quantified. We compared and contrasted Affymetrix Genechip HindIII and Illumina HumanHap300 arrays on the same DNA pools and showed that the HumanHap300 arrays are substantially more efficient. In terms of effective sample size, HumanHap300-based pooling extracts >80% of the information available with individual genotyping (IG). In contrast, Genechip HindIII-based pooling only extracts ~30% of the available information. With HumanHap300 arrays concordance with IG data is excellent. Guidance is given on best study design and it is shown that even after taking into account pooling error, one stage scans can be performed for >100-fold reduced cost compared with IG. With appropriately designed two stage studies, IG can provide confirmation of pooling results whilst still providing ~20-fold reduction in total cost compared with IG-based alternatives. The large cost savings with Illumina HumanHap300-based pooling imply that future studies need only be limited by the availability of samples and not cost. 相似文献
2.
A genome-wide detection of copy number variation using SNP genotyping arrays in Beijing-You chickens
Wei Zhou Ranran Liu Jingjing Zhang Maiqing Zheng Peng Li Guobin Chang Jie Wen Guiping Zhao 《Genetica》2014,142(5):441-450
Copy number variation (CNV) has been recently examined in many species and is recognized as being a source of genetic variability, especially for disease-related phenotypes. In this study, the PennCNV software, a genome-wide CNV detection system based on the 60 K SNP BeadChip was used on a total sample size of 1,310 Beijing-You chickens (a Chinese local breed). After quality control, 137 high confidence CNVRs covering 27.31 Mb of the chicken genome and corresponding to 2.61 % of the whole chicken genome. Within these regions, 131 known genes or coding sequences were involved. Q-PCR was applied to verify some of the genes related to disease development. Results showed that copy number of genes such as, phosphatidylinositol-5-phosphate 4-kinase II alpha, PHD finger protein 14, RHACD8 (a CD8α- like messenger RNA), MHC B-G, zinc finger protein, sarcosine dehydrogenase and ficolin 2 varied between individual chickens, which also supports the reliability of chip-detection of the CNVs. As one source of genomic variation, CNVs may provide new insight into the relationship between the genome and phenotypic characteristics. 相似文献
3.
Artur Gurgul Ewelina Semik Klaudia Pawlina Tomasz Szmatoła Igor Jasielczuk Monika Bugno-Poniewierska 《Journal of applied genetics》2014,55(2):197-208
Animal genomics is currently undergoing dynamic development, which is driven by the flourishing of high-throughput genome analysis methods. Recently, a large number of animals has been genotyped with the use of whole-genome genotyping assays in the course of genomic selection programmes. The results of such genotyping can also be used for studies on different aspects of livestock genome functioning and diversity. In this article, we review the recent literature concentrating on various aspects of animal genomics, including studies on linkage disequilibrium, runs of homozygosity, selection signatures, copy number variation and genetic differentiation of animal populations. Our work is aimed at providing insight into certain achievements of animal genomics and to arouse interest in basic research on the complexity and structure of the genomes of livestock. 相似文献
4.
Eric O. Johnson Dana B. Hancock Joshua L. Levy Nathan C. Gaddis Nancy L. Saccone Laura J. Bierut Grier P. Page 《Human genetics》2013,132(5):509-522
A great promise of publicly sharing genome-wide association data is the potential to create composite sets of controls. However, studies often use different genotyping arrays, and imputation to a common set of SNPs has shown substantial bias: a problem which has no broadly applicable solution. Based on the idea that using differing genotyped SNP sets as inputs creates differential imputation errors and thus bias in the composite set of controls, we examined the degree to which each of the following occurs: (1) imputation based on the union of genotyped SNPs (i.e., SNPs available on one or more arrays) results in bias, as evidenced by spurious associations (type 1 error) between imputed genotypes and arbitrarily assigned case/control status; (2) imputation based on the intersection of genotyped SNPs (i.e., SNPs available on all arrays) does not evidence such bias; and (3) imputation quality varies by the size of the intersection of genotyped SNP sets. Imputations were conducted in European Americans and African Americans with reference to HapMap phase II and III data. Imputation based on the union of genotyped SNPs across the Illumina 1M and 550v3 arrays showed spurious associations for 0.2 % of SNPs: ~2,000 false positives per million SNPs imputed. Biases remained problematic for very similar arrays (550v1 vs. 550v3) and were substantial for dissimilar arrays (Illumina 1M vs. Affymetrix 6.0). In all instances, imputing based on the intersection of genotyped SNPs (as few as 30 % of the total SNPs genotyped) eliminated such bias while still achieving good imputation quality. 相似文献
5.
Wollstein A Herrmann A Wittig M Nothnagel M Franke A Nürnberg P Schreiber S Krawczak M Hampe J 《Nucleic acids research》2007,35(17):e113
The power of a genome-wide disease association study depends critically upon the properties of the marker set used, particularly the number and physical spacing of markers, and the level of inter-marker association due to linkage disequilibrium. Extending our previously devised theoretical framework for the entropy-based selection of genetic markers, we have developed a local measure of the efficacy of a marker set, relative to including a maximally polymorphic single nucleotide polymorphism (SNP) at the map position of interest. Using this quantitative criterion, we evaluated five currently available SNP sets, namely Affymetrix 100K and 500K, and Illumina 100K, 300K and 550K in the CEU, YRI and JPT + CHB HapMap populations. At 50% relative efficacy, the commercial marker sets cover between 19 and 68% of the human genome, depending upon the population under study. An optimal technology-independent 500K marker set constructed from HapMap for Caucasians, in contrast, would achieve 73% coverage at the same relative efficacy. 相似文献
6.
Grant SF Steinlicht S Nentwich U Kern R Burwinkel B Tolle R 《Nucleic acids research》2002,30(22):e125
With the increasing demand for higher throughput single nucleotide polymorphism (SNP) genotyping, the quantity of genomic DNA often falls short of the number of assays required. We investigated the use of degenerate oligonucleotide primed polymerase chain reaction (DOP-PCR) to generate a template for our SNP genotyping methodology of fluorescence polarization template-directed dye-terminator incorporation detection. DOP-PCR employs a degenerate primer (5′-CCGACTCGAGNNNNNNATGTGG-3′) to produce non-specific uniform amplification of DNA. This approach has been successfully applied to microsatellite genotyping. We compared genotyping of DOP-PCR-amplified genomic DNA to genomic DNA as a template. Results were analyzed with respect to feasibility, allele loss of alleles, genotyping accuracy and storage conditions in a high-throughput genotyping environment. DOP-PCR yielded overall satisfactory results, with a certain loss in accuracy and quality of the genotype assignments. Accuracy and quality of genotypes generated from the DOP-PCR template also depended on storage conditions. Adding carrier DNA to a final concentration of 10 ng/µl improved results. In conclusion, we have successfully used DOP-PCR to amplify our genomic DNA collection for subsequent SNP genotyping as a standard process. 相似文献
7.
To facilitate whole-genome association studies (WGAS), several high-density SNP genotyping arrays have been developed. Genetic coverage and statistical power are the primary benchmark metrics in evaluating the performance of SNP arrays. Ideally, such evaluations would be done on a SNP set and a cohort of individuals that are both independently sampled from the original SNPs and individuals used in developing the arrays. Without utilization of an independent test set, previous estimates of genetic coverage and statistical power may be subject to an overfitting bias. Additionally, the SNP arrays' statistical power in WGAS has not been systematically assessed on real traits. One robust setting for doing so is to evaluate statistical power on thousands of traits measured from a single set of individuals. In this study, 359 newly sampled Americans of European descent were genotyped using both Affymetrix 500K (Affx500K) and Illumina 650Y (Ilmn650K) SNP arrays. From these data, we were able to obtain estimates of genetic coverage, which are robust to overfitting, by constructing an independent test set from among these genotypes and individuals. Furthermore, we collected liver tissue RNA from the participants and profiled these samples on a comprehensive gene expression microarray. The RNA levels were used as a large-scale set of quantitative traits to calibrate the relative statistical power of the commercial arrays. Our genetic coverage estimates are lower than previous reports, providing evidence that previous estimates may be inflated due to overfitting. The Ilmn650K platform showed reasonable power (50% or greater) to detect SNPs associated with quantitative traits when the signal-to-noise ratio (SNR) is greater than or equal to 0.5 and the causal SNP's minor allele frequency (MAF) is greater than or equal to 20% (N=359). In testing each of the more than 40,000 gene expression traits for association to each of the SNPs on the Ilmn650K and Affx500K arrays, we found that the Ilmn650K yielded 15% times more discoveries than the Affx500K at the same false discovery rate (FDR) level. 相似文献
8.
Martin W Ganal Andreas Polley Eva-Maria Graner Joerg Plieske Ralf Wieseke Hartmut Luerssen Gregor Durstewitz 《Journal of biosciences》2012,37(5):821-828
Genotyping with large numbers of molecular markers is now an indispensable tool within plant genetics and breeding. Especially through the identification of large numbers of single nucleotide polymorphism (SNP) markers using the novel high-throughput sequencing technologies, it is now possible to reliably identify many thousands of SNPs at many different loci in a given plant genome. For a number of important crop plants, SNP markers are now being used to design genotyping arrays containing thousands of markers spread over the entire genome and to analyse large numbers of samples. In this article, we discuss aspects that should be considered during the design of such large genotyping arrays and the analysis of individuals. The fact that crop plants are also often autopolyploid or allopolyploid is given due consideration. Furthermore, we outline some potential applications of large genotyping arrays including high-density genetic mapping, characterization (fingerprinting) of genetic material and breeding-related aspects such as association studies and genomic selection. 相似文献
9.
High-throughput SNP genotyping on universal bead arrays 总被引:16,自引:0,他引:16
Shen R Fan JB Campbell D Chang W Chen J Doucet D Yeakley J Bibikova M Wickham Garcia E McBride C Steemers F Garcia F Kermani BG Gunderson K Oliphant A 《Mutation research》2005,573(1-2):70-82
We have developed a flexible, accurate and highly multiplexed SNP genotyping assay for high-throughput genetic analysis of large populations on a bead array platform. The novel genotyping system combines high assay conversion rate and data quality with >1500 multiplexing, and Array of Arrays formats. Genotyping assay oligos corresponding to specific SNP sequences are each linked to a unique sequence (address) that can hybridize to its complementary strand on universal arrays. The arrays are made of beads located in microwells of optical fiber bundles (Sentrix Array Matrix) or silicon slides (Sentrix BeadChip). The optical fiber bundles are further organized into a matrix that matches a 96-well microtiter plate. The arrays on the silicon slides are multi-channel pipette compatible for loading multiple samples onto a single silicon slide. These formats allow many samples to be processed in parallel. This genotyping system enables investigators to generate approximately 300,000 genotypes per day with minimal equipment requirements and greater than 1.6 million genotypes per day in a robotics-assisted process. With a streamlined and comprehensive assay, this system brings a new level of flexibility, throughput, and affordability to genetic research. 相似文献
10.
11.
Background
The goal of DNA barcoding is to develop a species-specific sequence library for all eukaryotes. A 650 bp fragment of the cytochrome c oxidase 1 (CO1) gene has been used successfully for species-level identification in several animal groups. It may be difficult in practice, however, to retrieve a 650 bp fragment from archival specimens, (because of DNA degradation) or from environmental samples (where universal primers are needed).Results
We used a bioinformatics analysis using all CO1 barcode sequences from GenBank and calculated the probability of having species-specific barcodes for varied size fragments. This analysis established the potential of much smaller fragments, mini-barcodes, for identifying unknown specimens. We then developed a universal primer set for the amplification of mini-barcodes. We further successfully tested the utility of this primer set on a comprehensive set of taxa from all major eukaryotic groups as well as archival specimens.Conclusion
In this study we address the important issue of minimum amount of sequence information required for identifying species in DNA barcoding. We establish a novel approach based on a much shorter barcode sequence and demonstrate its effectiveness in archival specimens. This approach will significantly broaden the application of DNA barcoding in biodiversity studies. 相似文献12.
SNP(single nucleotide polymorphism,单核苷酸多态)在猪基因组中的分布极其广泛,平均分布间隔为300~400 bp,相关数据库收录已达55万条。猪基因组测序已取得实质性进展,大规模搜索发现基因组及EST(expressed sequence tag)序列中的SNP已展开,应用于猪全基因组水平的SNP芯片已建立。在此基础上,基于猪SNP标记的遗传图谱绘制、QTL(quantitative trait loci)定位、遗传多样性检测及全基因组关联分析等也都相继出现。 相似文献
13.
Simultaneous discovery and testing of deletions for disease association in SNP genotyping studies 总被引:1,自引:0,他引:1 下载免费PDF全文
Copy-number variation (CNV), and deletions in particular, can play a crucial, causative role in rare disorders. The extent to which CNV contributes to common, complex disease etiology, however, is largely unknown. Current techniques to detect CNV are relatively expensive and time consuming, making it difficult to conduct the necessary large-scale genetic studies. SNP genotyping technologies, on the other hand, are relatively cheap, thereby facilitating large study designs. We have developed a computational tool capable of harnessing the information in SNP genotype data to detect deletions. Our approach not only detects deletions with high power but also returns accurate estimates of both the population frequency and the transmission frequency. This tool, therefore, lends itself to the discovery of deletions in large familial SNP genotype data sets and to simultaneous testing of the discovered deletion for association, with the use of both frequency-based and transmission/disequilibrium test-based designs. We demonstrate the effectiveness of our computer program (microdel), available for download at no cost, with both simulated and real data. Here, we report 693 deletions in the HapMap 16c collection, with each deletion assigned a population frequency. 相似文献
14.
The power of genome-wide SNP association studies is limited, among others, by the large number of false positive test results. To provide a remedy, we combined SNP association analysis with the pathway-driven gene set enrichment analysis (GSEA), recently developed to facilitate handling of genome-wide gene expression data. The resulting GSEA-SNP method rests on the assumption that SNPs underlying a disease phenotype are enriched in genes constituting a signaling pathway or those with a common regulation. Besides improving power for association mapping, GSEA-SNP may facilitate the identification of disease-associated SNPs and pathways, as well as the understanding of the underlying biological mechanisms. GSEA-SNP may also help to identify markers with weak effects, undetectable in association studies without pathway consideration. The program is freely available and can be downloaded from our website. 相似文献
15.
Multimarker Transmission/Disequilibrium Tests (TDTs) are very robust association tests to population admixture and structure which may be used to identify susceptibility loci in genome-wide association studies. Multimarker TDTs using several markers may increase power by capturing high-degree associations. However, there is also a risk of spurious associations and power reduction due to the increase in degrees of freedom. In this study we show that associations found by tests built on simple null hypotheses are highly reproducible in a second independent data set regardless the number of markers. As a test exhibiting this feature to its maximum, we introduce the multimarker 2-Groups TDT (mTDT(2G)), a test which under the hypothesis of no linkage, asymptotically follows a χ2 distribution with 1 degree of freedom regardless the number of markers. The statistic requires the division of parental haplotypes into two groups: disease susceptibility and disease protective haplotype groups. We assessed the test behavior by performing an extensive simulation study as well as a real-data study using several data sets of two complex diseases. We show that mTDT(2G) test is highly efficient and it achieves the highest power among all the tests used, even when the null hypothesis is tested in a second independent data set. Therefore, mTDT(2G) turns out to be a very promising multimarker TDT to perform genome-wide searches for disease susceptibility loci that may be used as a preprocessing step in the construction of more accurate genetic models to predict individual susceptibility to complex diseases. 相似文献
16.
Genome-wide SNP arrays have generated unprecedented quantities of data allow the detection of human evolutionary history and dense genome-wide data also enable the identification of distance ancestry among individuals or ethnic groups. To explain wider aspects of the genetic structure of Koreans and the East Asian population, we analyzed 79 individuals from the Korean HapMap project at 555,352 common single-nucleotide polymorphism loci, and compared this data with the worldwide population groups with the 53 ethnic groups from Human Genome Diversity Panel (HGDP-CEPH). Population differentiation (FST), Principal Component Analyses, STRUCTURE and ADMIXTURE are examined. In general, all the individual samples studies here were classified into subset of ethnic groups according to their geographical origins. Korean HapMap individuals were grouped together with East Asian populations from HGDP panel. Recently, a sub-population structure within Korean population has been reported. Our result, however, revealed the genetic homogeneity of Korean population. The ADMIXTURE analysis showed that, overall the Korean populations derive 79 % of their genomic ancestry from southern Asia and have relatively little northern Asian ancestry (21 %). The present work, therefore, provide the evidence that the male-biased southern-to-northern migration influenced not only for the genetic make up of the Y chromosome in the Korean population but also, its autosomal composition. 相似文献
17.
18.
Matthews AG Haynes C Liu C Ott J 《Statistical applications in genetics and molecular biology》2008,7(1):Article23
Genome-wide association studies are now widely used tools to identify genes and/or regions which may contribute to the development of various diseases. With case-control data a 2x3 contingency table can be constructed for each SNP to perform genotype-based tests of association. An increasingly common technique to increase the power to detect an association is to collapse each 2x3 table into a table assuming either a dominant or recessive mode of inheritance (2x2 table). We consider three different methods of determining which genetic model to choose and show that each of these methods of collapsing genotypes increases the type I error rate (i.e., the rate of false positives). However, one of these methods does lead to an increase in power compared with the usual genotype- and allele-based tests for most genetic models. 相似文献
19.
The pressure to publish novel genetic associations has meant that meta-analysis has been applied to genome-wide association studies without the time for a careful consideration of the methods that are used. This review distinguishes between the use of meta-analysis to validate previously reported genetic associations and its use for gene discovery, and advocates viewing gene discovery as an exploratory screen that requires independent replication instead of treating it as the application of hundreds of thousands of statistical tests. The review considers the use of fixed and random effects meta-analyses, the investigation of between-study heterogeneity, adjustment for confounding, assessing the combined evidence and genomic control, and comments on alternative approaches that have been used in the literature. 相似文献
20.
Meta-analysis is an increasingly popular tool for combining multiple genome-wide association studies in a single analysis to identify associations with small effect sizes. The effect sizes between studies in a meta-analysis may differ and these differences, or heterogeneity, can be caused by many factors. If heterogeneity is observed in the results of a meta-analysis, interpreting the cause of heterogeneity is important because the correct interpretation can lead to a better understanding of the disease and a more effective design of a replication study. However, interpreting heterogeneous results is difficult. The standard approach of examining the association p-values of the studies does not effectively predict if the effect exists in each study. In this paper, we propose a framework facilitating the interpretation of the results of a meta-analysis. Our framework is based on a new statistic representing the posterior probability that the effect exists in each study, which is estimated utilizing cross-study information. Simulations and application to the real data show that our framework can effectively segregate the studies predicted to have an effect, the studies predicted to not have an effect, and the ambiguous studies that are underpowered. In addition to helping interpretation, the new framework also allows us to develop a new association testing procedure taking into account the existence of effect. 相似文献