首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF < 5%), when low coverage sequence reads are added to dense genome-wide SNP arrays--the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.  相似文献   

2.
The success of genome-wide association studies has paralleled the development of efficient genotyping technologies. We describe the development of a next-generation microarray based on the new highly-efficient Affymetrix Axiom genotyping technology that we are using to genotype individuals of European ancestry from the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH). The array contains 674,517 SNPs, and provides excellent genome-wide as well as gene-based and candidate-SNP coverage. Coverage was calculated using an approach based on imputation and cross validation. Preliminary results for the first 80,301 saliva-derived DNA samples from the RPGEH demonstrate very high quality genotypes, with sample success rates above 94% and over 98% of successful samples having SNP call rates exceeding 98%. At steady state, we have produced 462 million genotypes per week for each Axiom system. The new array provides a valuable addition to the repertoire of tools for large scale genome-wide association studies.  相似文献   

3.
Li J  Guo YF  Pei Y  Deng HW 《PloS one》2012,7(4):e34486
Genotype imputation is often used in the meta-analysis of genome-wide association studies (GWAS), for combining data from different studies and/or genotyping platforms, in order to improve the ability for detecting disease variants with small to moderate effects. However, how genotype imputation affects the performance of the meta-analysis of GWAS is largely unknown. In this study, we investigated the effects of genotype imputation on the performance of meta-analysis through simulations based on empirical data from the Framingham Heart Study. We found that when fix-effects models were used, considerable between-study heterogeneity was detected when causal variants were typed in only some but not all individual studies, resulting in up to ~25% reduction of detection power. For certain situations, the power of the meta-analysis can be even less than that of individual studies. Additional analyses showed that the detection power was slightly improved when between-study heterogeneity was partially controlled through the random-effects model, relative to that of the fixed-effects model. Our study may aid in the planning, data analysis, and interpretation of GWAS meta-analysis results when genotype imputation is necessary.  相似文献   

4.
Genome-wide association studies are now widely used tools to identify genes and/or regions which may contribute to the development of various diseases. With case-control data a 2x3 contingency table can be constructed for each SNP to perform genotype-based tests of association. An increasingly common technique to increase the power to detect an association is to collapse each 2x3 table into a table assuming either a dominant or recessive mode of inheritance (2x2 table). We consider three different methods of determining which genetic model to choose and show that each of these methods of collapsing genotypes increases the type I error rate (i.e., the rate of false positives). However, one of these methods does lead to an increase in power compared with the usual genotype- and allele-based tests for most genetic models.  相似文献   

5.
SA Stanhope  AD Skol 《PloS one》2012,7(9):e42367
In a two stage genome-wide association study (2S-GWAS), a sample of cases and controls is allocated into two groups, and genetic markers are analyzed sequentially with respect to these groups. For such studies, experimental design considerations have primarily focused on minimizing study cost as a function of the allocation of cases and controls to stages, subject to a constraint on the power to detect an associated marker. However, most treatments of this problem implicitly restrict the set of feasible designs to only those that allocate the same proportions of cases and controls to each stage. In this paper, we demonstrate that removing this restriction can improve the cost advantages demonstrated by previous 2S-GWAS designs by up to 40%. Additionally, we consider designs that maximize study power with respect to a cost constraint, and show that recalculated power maximizing designs can recover a substantial amount of the planned study power that might otherwise be lost if study funding is reduced. We provide open source software for calculating cost minimizing or power maximizing 2S-GWAS designs.  相似文献   

6.
The aim of this study was to compare iterative and direct solvers for estimation of marker effects in genomic selection. One iterative and two direct methods were used: Gauss-Seidel with Residual Update, Cholesky Decomposition and Gentleman-Givens rotations. For resembling different scenarios with respect to number of markers and of genotyped animals, a simulated data set divided into 25 subsets was used. Number of markers ranged from 1,200 to 5,925 and number of animals ranged from 1,200 to 5,865. Methods were also applied to real data comprising 3081 individuals genotyped for 45181 SNPs. Results from simulated data showed that the iterative solver was substantially faster than direct methods for larger numbers of markers. Use of a direct solver may allow for computing (co)variances of SNP effects. When applied to real data, performance of the iterative method varied substantially, depending on the level of ill-conditioning of the coefficient matrix. From results with real data, Gentleman-Givens rotations would be the method of choice in this particular application as it provided an exact solution within a fairly reasonable time frame (less than two hours). It would indeed be the preferred method whenever computer resources allow its use.  相似文献   

7.
For many genome-wide association (GWA) studies individually genotyping one million or more SNPs provides a marginal increase in coverage at a substantial cost. Much of the information gained is redundant due to the correlation structure inherent in the human genome. Pooling-based GWA studies could benefit significantly by utilizing this redundancy to reduce noise, improve the accuracy of the observations and increase genomic coverage. We introduce a measure of correlation between individual genotyping and pooling, under the same framework that r(2) provides a measure of linkage disequilibrium (LD) between pairs of SNPs. We then report a new non-haplotype multimarker multi-loci method that leverages the correlation structure between SNPs in the human genome to increase the efficacy of pooling-based GWA studies. We first give a theoretical framework and derivation of our multimarker method. Next, we evaluate simulations using this multimarker approach in comparison to single marker analysis. Finally, we experimentally evaluate our method using different pools of HapMap individuals on the Illumina 450S Duo, Illumina 550K and Affymetrix 5.0 platforms for a combined total of 1 333 631 SNPs. Our results show that use of multimarker analysis reduces noise specific to pooling-based studies, allows for efficient integration of multiple microarray platforms and provides more accurate measures of significance than single marker analysis. Additionally, this approach can be extended to allow for imputing the association significance for SNPs not directly observed using neighboring SNPs in LD. This multimarker method can now be used to cost-effectively complete pooling-based GWA studies with multiple platforms across over one million SNPs and to impute neighboring SNPs weighted for the loss of information due to pooling.  相似文献   

8.

Background

Identifying recombination events and the chromosomal segments that constitute a gamete is useful for a number of applications in genomic analyses. In livestock, genotypic data are commonly available for half-sib families. We propose a straightforward but computationally efficient method to use single nucleotide polymorphism marker genotypes on half-sibs to reconstruct the recombination and segregation events that occurred during meiosis in a sire to form the haplotypes observed in its offspring. These meiosis events determine a block structure in paternal haplotypes of the progeny and this can be used to phase the genotypes of individuals in single half-sib families, to impute haplotypes of the sire if they are not genotyped or to impute the paternal strand of the offspring’s sequence based on sequence data of the sire.

Methods

The hsphase algorithm exploits information from opposing homozygotes among half-sibs to identify recombination events, and the chromosomal regions from the paternal and maternal strands of the sire (blocks) that were inherited by its progeny. This information is then used to impute the sire’s genotype, which, in turn, is used to phase the half-sib family. Accuracy (defined as R2) and performance of this approach were evaluated by using simulated and real datasets. Phasing results for the half-sibs were benchmarked to other commonly used phasing programs – AlphaPhase, BEAGLE and PedPhase 3.

Results

Using a simulated dataset with 20 markers per cM, and for a half-sib family size of 4 and 40, the accuracy of block detection, was 0.58 and 0.96, respectively. The accuracy of inferring sire genotypes was 0.75 and 1.00 and the accuracy of phasing was around 0.97, respectively. hsphase was more robust to genotyping errors than PedPhase 3, AlphaPhase and BEAGLE. Computationally, hsphase was much faster than AlphaPhase and BEAGLE.

Conclusions

In half-sib families of size 8 and above, hsphase can accurately detect block structure of paternal haplotypes, impute genotypes of ungenotyped sires and reconstruct haplotypes in progeny. The method is much faster and more accurate than other widely used population-based phasing programs. A program implementing the method is freely available as an R package (hsphase).  相似文献   

9.
In this review we describe the principles, protocols, and applications of two commercially available SNP genotyping platforms, the TaqMan SNP Genotyping Assays and the SNPlex Genotyping System. Combined, these two technologies meet the requirements of multiple SNP applications in genetics research and pharmacogenetics. We also describe a set of SNP selection tools and validated assay resources which we developed to accelerate the cycle of experimentation on these platforms. Criteria for selecting the more appropriate of these two genotyping technologies are presented: the genetic architecture of the trait of interest, the throughput required, and the number of SNPs and samples needed for a successful study. Overall, the TaqMan assay format is suitable for low- to mid-throughput applications in which a high assay conversion rate, simple assay workflow, and low cost of automation are desirable. The SNPlex Genotyping System, on the other hand, is well suited for SNP applications in which throughput and cost-efficiency are essential, e.g., applications requiring either the testing of large numbers of SNPs and samples, or the flexibility to select various SNP subsets.  相似文献   

10.
Genome-wide SNP arrays have generated unprecedented quantities of data allow the detection of human evolutionary history and dense genome-wide data also enable the identification of distance ancestry among individuals or ethnic groups. To explain wider aspects of the genetic structure of Koreans and the East Asian population, we analyzed 79 individuals from the Korean HapMap project at 555,352 common single-nucleotide polymorphism loci, and compared this data with the worldwide population groups with the 53 ethnic groups from Human Genome Diversity Panel (HGDP-CEPH). Population differentiation (FST), Principal Component Analyses, STRUCTURE and ADMIXTURE are examined. In general, all the individual samples studies here were classified into subset of ethnic groups according to their geographical origins. Korean HapMap individuals were grouped together with East Asian populations from HGDP panel. Recently, a sub-population structure within Korean population has been reported. Our result, however, revealed the genetic homogeneity of Korean population. The ADMIXTURE analysis showed that, overall the Korean populations derive 79 % of their genomic ancestry from southern Asia and have relatively little northern Asian ancestry (21 %). The present work, therefore, provide the evidence that the male-biased southern-to-northern migration influenced not only for the genetic make up of the Y chromosome in the Korean population but also, its autosomal composition.  相似文献   

11.
12.
Four custom Axiom genotyping arrays were designed for a genome-wide association (GWA) study of 100,000 participants from the Kaiser Permanente Research Program on Genes, Environment and Health. The array optimized for individuals of European race/ethnicity was previously described. Here we detail the development of three additional microarrays optimized for individuals of East Asian, African American, and Latino race/ethnicity. For these arrays, we decreased redundancy of high-performing SNPs to increase SNP capacity. The East Asian array was designed using greedy pairwise SNP selection. However, removing SNPs from the target set based on imputation coverage is more efficient than pairwise tagging. Therefore, we developed a novel hybrid SNP selection method for the African American and Latino arrays utilizing rounds of greedy pairwise SNP selection, followed by removal from the target set of SNPs covered by imputation. The arrays provide excellent genome-wide coverage and are valuable additions for large-scale GWA studies.  相似文献   

13.
Animal genomics is currently undergoing dynamic development, which is driven by the flourishing of high-throughput genome analysis methods. Recently, a large number of animals has been genotyped with the use of whole-genome genotyping assays in the course of genomic selection programmes. The results of such genotyping can also be used for studies on different aspects of livestock genome functioning and diversity. In this article, we review the recent literature concentrating on various aspects of animal genomics, including studies on linkage disequilibrium, runs of homozygosity, selection signatures, copy number variation and genetic differentiation of animal populations. Our work is aimed at providing insight into certain achievements of animal genomics and to arouse interest in basic research on the complexity and structure of the genomes of livestock.  相似文献   

14.

Background  

Genome-wide association studies (GWAS) using single nucleotide polymorphism (SNP) markers provide opportunities to detect epistatic SNPs associated with quantitative traits and to detect the exact mode of an epistasis effect. Computational difficulty is the main bottleneck for epistasis testing in large scale GWAS.  相似文献   

15.
Objective: Our goal was to evaluate the influence of quality control (QC) decisions using two genotype calling algorithms, CRLMM and Birdseed, designed for the Affymetrix SNP Array 6.0. Methods: Various QC options were tried using the two algorithms and comparisons were made on subject and call rate and on association results using two data sets. Results: For Birdseed, we recommend using the contrast QC instead of QC call rate for sample QC. For CRLMM, we recommend using the signal-to-noise rate ≥4 for sample QC and a posterior probability of 90% for genotype accuracy. For both algorithms, we recommend calling the genotype separately for each plate, and dropping SNPs with a lower call rate (<95%) before evaluating samples with lower call rates. To investigate whether the genotype calls from the two algorithms impacted the genome-wide association results, we performed association analysis using data from the GENOA cohort; we observed that the number of significant SNPs were similar using either CRLMM or Birdseed. Conclusions: Using our suggested workflow both algorithms performed similarly; however, fewer samples were removed and CRLMM took half the time to run our 854 study samples (4.2 h) compared to Birdseed (8.4 h).  相似文献   

16.
Ma L  Han S  Yang J  Da Y 《PloS one》2010,5(11):e15006
Complex diseases or phenotypes may involve multiple genetic variants and interactions between genetic, environmental and other factors. Current genome-wide association studies (GWAS) mostly used single-locus analysis and had identified genetic effects with multiple confirmations. Such confirmed single-nucleotide polymorphism (SNP) effects were likely to be true genetic effects and ignoring this information in testing new effects of the same phenotype results in decreased statistical power due to increased residual variance that has a component of the omitted effects. In this study, a multi-locus association test (MLT) was proposed for GWAS analysis conditional on SNPs with confirmed effects to improve statistical power. Analytical formulae for statistical power were derived and were verified by simulation for MLT accounting for confirmed SNPs and for single-locus test (SLT) without accounting for confirmed SNPs. Statistical power of the two methods was compared by case studies with simulated and the Framingham Heart Study (FHS) GWAS data. Results showed that the MLT method had increased statistical power over SLT. In the GWAS case study on four cholesterol phenotypes and serum metabolites, the MLT method improved statistical power by 5% to 38% depending on the number and effect sizes of the conditional SNPs. For the analysis of HDL cholesterol (HDL-C) and total cholesterol (TC) of the FHS data, the MLT method conditional on confirmed SNPs from GWAS catalog and NCBI had considerably more significant results than SLT.  相似文献   

17.
18.
Simple, defensible sample sizes based on cost efficiency   总被引:1,自引:0,他引:1  
Summary .   The conventional approach of choosing sample size to provide 80% or greater power ignores the cost implications of different sample size choices. Costs, however, are often impossible for investigators and funders to ignore in actual practice. Here, we propose and justify a new approach for choosing sample size based on cost efficiency, the ratio of a study's projected scientific and/or practical value to its total cost. By showing that a study's projected value exhibits diminishing marginal returns as a function of increasing sample size for a wide variety of definitions of study value, we are able to develop two simple choices that can be defended as more cost efficient than any larger sample size. The first is to choose the sample size that minimizes the average cost per subject. The second is to choose sample size to minimize total cost divided by the square root of sample size. This latter method is theoretically more justifiable for innovative studies, but also performs reasonably well and has some justification in other cases. For example, if projected study value is assumed to be proportional to power at a specific alternative and total cost is a linear function of sample size, then this approach is guaranteed either to produce more than 90% power or to be more cost efficient than any sample size that does. These methods are easy to implement, based on reliable inputs, and well justified, so they should be regarded as acceptable alternatives to current conventional approaches.  相似文献   

19.
The genetic analysis of quantitative traits in humans is changing as a result of the availability of whole-genome SNP data. Heritability analysis can make use of actual genetic sharing between pairs of individuals estimated from the genotype data, rather than the expected genetic sharing implied by their family relationship. This could provide more accurate heritability estimates and help to overcome the equal environment assumption. Quantitative trait locus (QTL) linkage mapping can make use of local genetic sharing inferred from very dense local genotype data from pedigree members or individuals not previously known to be related. This approach may be particularly suited for detecting loci that contain rare variants with major effect on the phenotype. Finally, whole-genome SNP data can be used to measure the genetic similarity between individuals to provide matched sets for association studies, in order to avoid spurious association from population stratification.  相似文献   

20.
The uptake of genomic selection (GS) by the swine industry is still limited by the costs of genotyping. A feasible alternative to overcome this challenge is to genotype animals using an affordable low-density (LD) single nucleotide polymorphism (SNP) chip panel followed by accurate imputation to a high-density panel. Therefore, the main objective of this study was to screen incremental densities of LD panels in order to systematically identify one that balances the tradeoffs among imputation accuracy, prediction accuracy of genomic estimated breeding values (GEBVs), and genotype density (directly associated with genotyping costs). Genotypes using the Illumina Porcine60K BeadChip were available for 1378 Duroc (DU), 2361 Landrace (LA) and 3192 Yorkshire (YO) pigs. In addition, pseudo-phenotypes (de-regressed estimated breeding values) for five economically important traits were provided for the analysis. The reference population for genotyping imputation consisted of 931 DU, 1631 LA and 2103 YO animals and the remainder individuals were included in the validation population of each breed. A LD panel of 3000 evenly spaced SNPs (LD3K) yielded high imputation accuracy rates: 93.78% (DU), 97.07% (LA) and 97.00% (YO) and high correlations (>0.97) between the predicted GEBVs using the actual 60 K SNP genotypes and the imputed 60 K SNP genotypes for all traits and breeds. The imputation accuracy was influenced by the reference population size as well as the amount of parental genotype information available in the reference population. However, parental genotype information became less important when the LD panel had at least 3000 SNPs. The correlation of the GEBVs directly increased with an increase in imputation accuracy. When genotype information for both parents was available, a panel of 300 SNPs (imputed to 60 K) yielded GEBV predictions highly correlated (⩾0.90) with genomic predictions obtained based on the true 60 K panel, for all traits and breeds. For a small reference population size with no parents on reference population, it is recommended the use of a panel at least as dense as the LD3K and, when there are two parents in the reference population, a panel as small as the LD300 might be a feasible option. These findings are of great importance for the development of LD panels for swine in order to reduce genotyping costs, increase the uptake of GS and, therefore, optimize the profitability of the swine industry.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号