首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability.  相似文献   

2.

Background  

Over the last few years, genome-wide association (GWA) studies became a tool of choice for the identification of loci associated with complex traits. Currently, imputed single nucleotide polymorphisms (SNP) data are frequently used in GWA analyzes. Correct analysis of imputed data calls for the implementation of specific methods which take genotype imputation uncertainty into account.  相似文献   

3.

Background  

Since the introduction of large-scale genotyping methods that can be utilized in genome-wide association (GWA) studies for deciphering complex diseases, statistical genetics has been posed with a tremendous challenge of how to most appropriately analyze such data. A plethora of advanced model-based methods for genetic mapping of traits has been available for more than 10 years in animal and plant breeding. However, most such methods are computationally intractable in the context of genome-wide studies. Therefore, it is hardly surprising that GWA analyses have in practice been dominated by simple statistical tests concerned with a single marker locus at a time, while the more advanced approaches have appeared only relatively recently in the biomedical and statistical literature.  相似文献   

4.

Background

Genome-wide association (GWA) study has recently become a powerful approach for detecting genetic variants for common diseases without prior knowledge of the variant's location or function. Generally, in GWA studies, the most significant single-nucleotide polymorphisms (SNPs) associated with top-ranked p values are selected in stage one, with follow-up in stage two. The value of selecting SNPs based on statistically significant p values is obvious. However, when minor allele frequencies (MAFs) are relatively low, less-significant p values can still correspond to higher odds ratios (ORs), which might be more useful for prediction of disease status. Therefore, if SNPs are selected using an approach based only on significant p values, some important genetic variants might be missed. We proposed a hybrid approach for selecting candidate SNPs from the discovery stage of GWA study, based on both p values and ORs, and conducted a simulation study to demonstrate the performance of our approach.

Results

The simulation results showed that our hybrid ranking approach was more powerful than the existing ranked p value approach for identifying relatively less-common SNPs. Meanwhile, the type I error probabilities of the hybrid approach is well-controlled at the end of the second stage of the two-stage GWA study.

Conclusions

In GWA studies, SNPs should be considered for inclusion based not only on ranked p values but also on ranked ORs.  相似文献   

5.

Background

Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.

Methodology/Principal Findings

To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.

Conclusions/Significance

Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.  相似文献   

6.

Background

Obesity is a major health problem. Although heritability is substantial, genetic mechanisms predisposing to obesity are not very well understood. We have performed a genome wide association study (GWA) for early onset (extreme) obesity.

Methodology/Principal Findings

a) GWA (Genome-Wide Human SNP Array 5.0 comprising 440,794 single nucleotide polymorphisms) for early onset extreme obesity based on 487 extremely obese young German individuals and 442 healthy lean German controls; b) confirmatory analyses on 644 independent families with at least one obese offspring and both parents. We aimed to identify and subsequently confirm the 15 SNPs (minor allele frequency ≥10%) with the lowest p-values of the GWA by four genetic models: additive, recessive, dominant and allelic. Six single nucleotide polymorphisms (SNPs) in FTO (fat mass and obesity associated gene) within one linkage disequilibrium (LD) block including the GWA SNP rendering the lowest p-value (rs1121980; log-additive model: nominal p = 1.13×10−7, corrected p = 0.0494; odds ratio (OR)CT 1.67, 95% confidence interval (CI) 1.22–2.27; ORTT 2.76, 95% CI 1.88–4.03) belonged to the 15 SNPs showing the strongest evidence for association with obesity. For confirmation we genotyped 11 of these in the 644 independent families (of the six FTO SNPs we chose only two representing the LD bock). For both FTO SNPs the initial association was confirmed (both Bonferroni corrected p<0.01). However, none of the nine non-FTO SNPs revealed significant transmission disequilibrium.

Conclusions/Significance

Our GWA for extreme early onset obesity substantiates that variation in FTO strongly contributes to early onset obesity. This is a further proof of concept for GWA to detect genes relevant for highly complex phenotypes. We concurrently show that nine additional SNPs with initially low p-values in the GWA were not confirmed in our family study, thus suggesting that of the best 15 SNPs in the GWA only the FTO SNPs represent true positive findings.  相似文献   

7.

Background

In designing genome-wide association (GWA) studies it is important to calculate statistical power. General statistical power calculation procedures for quantitative measures often require information concerning summary statistics of distributions such as mean and variance. However, with genetic studies, the effect size of quantitative traits is traditionally expressed as heritability, a quantity defined as the amount of phenotypic variation in the population that can be ascribed to the genetic variants among individuals. Heritability is hard to transform into summary statistics. Therefore, general power calculation procedures cannot be used directly in GWA studies. The development of appropriate statistical methods and a user-friendly software package to address this problem would be welcomed.

Results

This paper presents GWAPower, a statistical software package of power calculation designed for GWA studies with quantitative traits, where genetic effect is defined as heritability. Based on several popular one-degree-of-freedom genetic models, this method avoids the need to specify the non-centrality parameter of the F-distribution under the alternative hypothesis. Therefore, it can use heritability information directly without approximation. In GWAPower, the power calculation can be easily adjusted for adding covariates and linkage disequilibrium information. An example is provided to illustrate GWAPower, followed by discussions.

Conclusions

GWAPower is a user-friendly free software package for calculating statistical power based on heritability in GWA studies with quantitative traits. The software is freely available at: http://dl.dropbox.com/u/10502931/GWAPower.zip  相似文献   

8.

Background  

Genome wide association (GWA) studies are now being widely undertaken aiming to find the link between genetic variations and common diseases. Ideally, a well-powered GWA study will involve the measurement of hundreds of thousands of single nucleotide polymorphisms (SNPs) in thousands of individuals. The sheer volume of data generated by these experiments creates very high analytical demands. There are a number of important steps during the analysis of such data, many of which may present severe bottlenecks. The data need to be imported and reviewed to perform initial quality control (QC) before proceeding to association testing. Evaluation of results may involve further statistical analysis, such as permutation testing, or further QC of associated markers, for example, reviewing raw genotyping intensities. Finally significant associations need to be prioritised using functional and biological interpretation methods, browsing available biological annotation, pathway information and patterns of linkage disequilibrium (LD).  相似文献   

9.

Background

The genome-wide association (GWA) approach represents an alternative to biparental linkage mapping for determining the genetic basis of trait variation. Both approaches rely on recombination to re-arrange the genome, and seek to establish correlations between phenotype and genotype. The major advantages of GWA lie in being able to sample a much wider range of the phenotypic and genotypic variation present, in being able to exploit multiple rounds of historical recombination in many different lineages and to include multiple accessions of direct relevance to crop improvement.

Results

A 191 accessions eggplant (Solanum melongena L.) association panel, comprising a mixture of breeding lines, old varieties and landrace selections originating from Asia and the Mediterranean Basin, was SNP genotyped and scored for anthocyanin pigmentation and fruit color at two locations over two years. The panel formed two major clusters, reflecting geographical provenance and fruit type. The global level of linkage disequilibrium was 3.4 cM. A mixed linear model appeared to be the most appropriate for GWA. A set of 56 SNP locus/phenotype associations was identified and the genomic regions harboring these loci were distributed over nine of the 12 eggplant chromosomes. The associations were compared with the location of known QTL for the same traits.

Conclusion

The GWA mapping approach was effective in validating a number of established QTL and, thanks to the wide diversity captured by the panel, was able to detect a series of novel marker/trait associations.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-896) contains supplementary material, which is available to authorized users.  相似文献   

10.

Background

Genome wide association (GWA) studies provide the opportunity to develop new kinds of analysis. Analysing pairs of markers from separate regions might lead to the detection of allelic association which might indicate an interaction between nearby genes.

Methods

396,591 markers typed in 541 subjects were studied. 7.8*1010 pairs of markers were screened and those showing initial evidence for allelic association were subjected to more thorough investigation along with 10 flanking markers on either side.

Results

No evidence was detected for interaction. However 6 markers appeared to have an incorrect map position according to NCBI Build 35. One of these was corrected in Build 36 and 2 were dropped. The remaining 3 were left with map positions inconsistent with their allelic association relationships.

Discussion

Although no interaction effects were detected the method was successful in identifying markers with probably incorrect map positions.

Conclusion

The study of allelic association can supplement other methods for assigning markers to particular map positions. Analyses of this type may usefully be applied to data from future GWA studies.  相似文献   

11.

Background

Both genome-wide association (GWA) studies and genomic selection depend on the level of non-random association of alleles at different loci, i.e. linkage disequilibrium (LD), across the genome. Therefore, characterizing LD is of fundamental importance to implement both approaches. In this study, using a 60K single nucleotide polymorphism (SNP) panel, we estimated LD and haplotype structure in crossbred broiler chickens and their component pure lines (one male and two female lines) and calculated the consistency of LD between these populations.

Results

The average level of LD (measured by r2) between adjacent SNPs across the chicken autosomes studied here ranged from 0.34 to 0.40 in the pure lines but was only 0.24 in the crossbred populations, with 28.4% of adjacent SNP pairs having an r2 higher than 0.3. Compared with the pure lines, the crossbred populations consistently showed a lower level of LD, smaller haploblock sizes and lower haplotype homozygosity on macro-, intermediate and micro-chromosomes. Furthermore, correlations of LD between markers at short distances (0 to 10 kb) were high between crossbred and pure lines (0.83 to 0.94).

Conclusions

Our results suggest that using crossbred populations instead of pure lines can be advantageous for high-resolution QTL (quantitative trait loci) mapping in GWA studies and to achieve good persistence of accuracy of genomic breeding values over generations in genomic selection. These results also provide useful information for the design and implementation of GWA studies and genomic selection using crossbred populations.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-015-0098-4) contains supplementary material, which is available to authorized users.  相似文献   

12.

Background

High-throughput genotype (HTG) data has been used primarily in genome-wide association (GWA) studies; however, GWA results explain only a limited part of the complete genetic variation of traits. In systems genetics, network approaches have been shown to be able to identify pathways and their underlying causal genes to unravel the biological and genetic background of complex diseases and traits, e.g., the Weighted Gene Co-expression Network Analysis (WGCNA) method based on microarray gene expression data. The main objective of this study was to develop a scale-free weighted genetic interaction network method using whole genome HTG data in order to detect biologically relevant pathways and potential genetic biomarkers for complex diseases and traits.

Results

We developed the Weighted Interaction SNP Hub (WISH) network method that uses HTG data to detect genome-wide interactions between single nucleotide polymorphism (SNPs) and its relationship with complex traits. Data dimensionality reduction was achieved by selecting SNPs based on its: 1) degree of genome-wide significance and 2) degree of genetic variation in a population. Network construction was based on pairwise Pearson's correlation between SNP genotypes or the epistatic interaction effect between SNP pairs. To identify modules the Topological Overlap Measure (TOM) was calculated, reflecting the degree of overlap in shared neighbours between SNP pairs. Modules, clusters of highly interconnected SNPs, were defined using a tree-cutting algorithm on the SNP dendrogram created from the dissimilarity TOM (1-TOM). Modules were selected for functional annotation based on their association with the trait of interest, defined by the Genome-wide Module Association Test (GMAT). We successfully tested the established WISH network method using simulated and real SNP interaction data and GWA study results for carcass weight in a pig resource population; this resulted in detecting modules and key functional and biological pathways related to carcass weight.

Conclusions

We developed the WISH network method which is a novel 'systems genetics' approach to study genetic networks underlying complex trait variation. The WISH network method reduces data dimensionality and statistical complexity in associating genotypes with phenotypes in GWA studies and enables researchers to identify biologically relevant pathways and potential genetic biomarkers for any complex trait of interest.
  相似文献   

13.

Background

Longitudinal phenotypic data provides a rich potential resource for genetic studies which may allow for greater understanding of variants and their covariates over time. Herein, we review 3 longitudinal analytical approaches from the Genetic Analysis Workshop 19 (GAW19). These contributions investigated both genome-wide association (GWA) and whole genome sequence (WGS) data from odd numbered chromosomes on up to 4 time points for blood pressure–related phenotypes. The statistical models used included generalized estimating equations (GEEs), latent class growth modeling (LCGM), linear mixed-effect (LME), and variance components (VC). The goal of these analyses was to test statistical approaches that use repeat measurements to increase genetic signal for variant identification.

Results

Two analytical methods were applied to the GAW19: GWA using real phenotypic data, and one approach to WGS using 200 simulated replicates. The first GWA approach applied a GEE-based model to identify gene-based associations with 4 derived hypertension phenotypes. This GEE model identified 1 significant locus, GRM7, which passed multiple test corrections for 2 hypertension-derived traits. The second GWA approach employed the LME to estimate genetic associations with systolic blood pressure (SBP) change trajectories identified using LCGM. This LCGM method identified 5 SBP trajectories and association analyses identified a genome-wide significant locus, near ATOX1 (p?=?1.0E?8). Finally, a third VC-based model using WGS and simulated SBP phenotypes that constrained the β coefficient for a genetic variant across each time point was calculated and compared to an unconstrained approach. This constrained VC approach demonstrated increased power for WGS variants of moderate effect, but when larger genetic effects were present, averaging across time points was as effective.

Conclusion

In this paper, we summarize 3 GAW19 contributions applying novel statistical methods and testing previously proposed techniques under alternative conditions for longitudinal genetic association. We conclude that these approaches when appropriately applied have the potential to: (a) increase statistical power; (b) decrease trait heterogeneity and standard error; (c) decrease computational burden in WGS; and (d) have the potential to identify genetic variants influencing subphenotypes important for understanding disease progression.
  相似文献   

14.

Background

The recent completion of the swine genome sequencing project and development of a high density porcine SNP array has made genome-wide association (GWA) studies feasible in pigs.

Methodology/Principal Findings

Using Illumina''s PorcineSNP60 BeadChip, we performed a pilot GWA study in 820 commercial female pigs phenotyped for backfat, loin muscle area, body conformation in addition to feet and leg (FL) structural soundness traits. A total of 51,385 SNPs were jointly fitted using Bayesian techniques as random effects in a mixture model that assumed a known large proportion (99.5%) of SNPs had zero effect. SNP annotations were implemented through the Sus scrofa Build 9 available from pig Ensembl. We discovered a number of candidate chromosomal regions, and some of them corresponded to QTL regions previously reported. We not only have identified some well-known candidate genes for the traits of interest, such as MC4R (for backfat) and IGF2 (for loin muscle area), but also obtained novel promising genes, including CHCHD3 (for backfat), BMP2 (for loin muscle area, body size and several FL structure traits), and some HOXA family genes (for overall leg action). The candidate regions responsible for body conformation and FL structure soundness did not overlap greatly which implied that these traits were controlled by different genes. Functional clustering analyses classified the genes into categories related to bone and cartilage development, muscle growth and development or the insulin pathway suggesting the traits are regulated by common pathways or gene networks that exert roles at different spatial and temporal stages.

Conclusions/Significance

This study is one of the earliest GWA reports on important quantitative traits in pigs, and the findings will contribute to the further biological function analysis of the identified candidate genes and potential utilization of them in marker assisted selection.  相似文献   

15.

Objectives

Brain-derived neurotrophic factor (BDNF) plays important roles in neuronal survival and differentiation; however, the effects of BDNF on mood disorders remain unclear. We investigated BDNF from the perspective of various aspects of systems biology, including its molecular evolution, genomic studies, protein functions, and pathway analysis.

Methods

We conducted analyses examining sequences, multiple alignments, phylogenetic trees and positive selection across 12 species and several human populations. We summarized the results of previous genomic and functional studies of pro-BDNF and mature-BDNF (m-BDNF) found in a literature review. We identified proteins that interact with BDNF and performed pathway-based analysis using large genome-wide association (GWA) datasets obtained for mood disorders.

Results

BDNF is encoded by a highly conserved gene. The chordate BDNF genes exhibit an average of 75% identity with the human gene, while vertebrate orthologues are 85.9%-100% identical to human BDNF. No signs of recent positive selection were found. Associations between BDNF and mood disorders were not significant in most of the genomic studies (e.g., linkage, association, gene expression, GWA), while relationships between serum/plasma BDNF level and mood disorders were consistently reported. Pro-BDNF is important in the response to stress; the literature review suggests the necessity of studying both pro- and m-BDNF with regard to mood disorders. In addition to conventional pathway analysis, we further considered proteins that interact with BDNF (I-Genes) and identified several biological pathways involved with BDNF or I-Genes to be significantly associated with mood disorders.

Conclusions

Systematically examining the features and biological pathways of BDNF may provide opportunities to deepen our understanding of the mechanisms underlying mood disorders.  相似文献   

16.
Kao CF  Fang YS  Zhao Z  Kuo PH 《PloS one》2011,6(4):e18696

Background

Large scale and individual genetic studies have suggested numerous susceptible genes for depression in the past decade without conclusive results. There is a strong need to review and integrate multi-dimensional data for follow up validation. The present study aimed to apply prioritization procedures to build-up an evidence-based candidate genes dataset for depression.

Methods

Depression candidate genes were collected in human and animal studies across various data resources. Each gene was scored according to its magnitude of evidence related to depression and was multiplied by a source-specific weight to form a combined score measure. All genes were evaluated through a prioritization system to obtain an optimal weight matrix to rank their relative importance with depression using the combined scores. The resulting candidate gene list for depression (DEPgenes) was further evaluated by a genome-wide association (GWA) dataset and microarray gene expression in human tissues.

Results

A total of 5,055 candidate genes (4,850 genes from human and 387 genes from animal studies with 182 being overlapped) were included from seven data sources. Through the prioritization procedures, we identified 169 DEPgenes, which exhibited high chance to be associated with depression in GWA dataset (Wilcoxon rank-sum test, p = 0.00005). Additionally, the DEPgenes had a higher percentage to express in human brain or nerve related tissues than non-DEPgenes, supporting the neurotransmitter and neuroplasticity theories in depression.

Conclusions

With comprehensive data collection and curation and an application of integrative approach, we successfully generated DEPgenes through an effective gene prioritization system. The prioritized DEPgenes are promising for future biological experiments or replication efforts to discoverthe underlying molecular mechanisms for depression.  相似文献   

17.

Background

Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs.

Methods

We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1.

Results

We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes.

Conclusions

We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.  相似文献   

18.

Background

Drug resistance remains a chief concern for malaria control. In order to determine the genetic markers of drug resistant parasites, we tested the genome-wide associations (GWA) of sequence-based genotypes from 35 Kenyan P. falciparum parasites with the activities of 22 antimalarial drugs.

Methods and Principal Findings

Parasites isolated from children with acute febrile malaria were adapted to culture, and sensitivity was determined by in vitro growth in the presence of anti-malarial drugs. Parasites were genotyped using whole genome sequencing techniques. Associations between 6250 single nucleotide polymorphisms (SNPs) and resistance to individual anti-malarial agents were determined, with false discovery rate adjustment for multiple hypothesis testing. We identified expected associations in the pfcrt region with chloroquine (CQ) activity, and other novel loci associated with amodiaquine, quinazoline, and quinine activities. Signals for CQ and primaquine (PQ) overlap in and around pfcrt, and interestingly the phenotypes are inversely related for these two drugs. We catalog the variation in dhfr, dhps, mdr1, nhe, and crt, including novel SNPs, and confirm the presence of a dhfr-164L quadruple mutant in coastal Kenya. Mutations implicated in sulfadoxine-pyrimethamine resistance are at or near fixation in this sample set.

Conclusions/Significance

Sequence-based GWA studies are powerful tools for phenotypic association tests. Using this approach on falciparum parasites from coastal Kenya we identified known and previously unreported genes associated with phenotypic resistance to anti-malarial drugs, and observe in high-resolution haplotype visualizations a possible signature of an inverse selective relationship between CQ and PQ.  相似文献   

19.

Background

Vulnerabilities to dependence on addictive substances are substantially heritable complex disorders whose underlying genetic architecture is likely to be polygenic, with modest contributions from variants in many individual genes. “Nontemplate” genome wide association (GWA) approaches can identity groups of chromosomal regions and genes that, taken together, are much more likely to contain allelic variants that alter vulnerability to substance dependence than expected by chance.

Methodology/Principal Findings

We report pooled “nontemplate” genome-wide association studies of two independent samples of substance dependent vs control research volunteers (n = 1620), one European-American and the other African-American using 1 million SNP (single nucleotide polymorphism) Affymetrix genotyping arrays. We assess convergence between results from these two samples using two related methods that seek clustering of nominally-positive results and assess significance levels with Monte Carlo and permutation approaches. Both “converge then cluster” and “cluster then converge” analyses document convergence between the results obtained from these two independent datasets in ways that are virtually never found by chance. The genes identified in this fashion are also identified by individually-genotyped dbGAP data that compare allele frequencies in cocaine dependent vs control individuals.

Conclusions/Significance

These overlapping results identify small chromosomal regions that are also identified by genome wide data from studies of other relevant samples to extents much greater than chance. These chromosomal regions contain more genes related to “cell adhesion” processes than expected by chance. They also contain a number of genes that encode potential targets for anti-addiction pharmacotherapeutics. “Nontemplate” GWA approaches that seek chromosomal regions in which nominally-positive associations are found in multiple independent samples are likely to complement classical, “template” GWA approaches in which “genome wide” levels of significance are sought for SNP data from single case vs control comparisons.  相似文献   

20.
Dong C  Qian Z  Jia P  Wang Y  Huang W  Li Y 《PloS one》2007,2(12):e1262

Background

The high-throughput genotyping chips have contributed greatly to genome-wide association (GWA) studies to identify novel disease susceptibility single nucleotide polymorphisms (SNPs). The high-density chips are designed using two different SNP selection approaches, the direct gene-centric approach, and the indirect quasi-random SNPs or linkage disequilibrium (LD)-based tagSNPs approaches. Although all these approaches can provide high genome coverage and ascertain variants in genes, it is not clear to which extent these approaches could capture the common genic variants. It is also important to characterize and compare the differences between these approaches.

Methodology/Principal Findings

In our study, by using both the Phase II HapMap data and the disease variants extracted from OMIM, a gene-centric evaluation was first performed to evaluate the ability of the approaches in capturing the disease variants in Caucasian population. Then the distribution patterns of SNPs were also characterized in genic regions, evolutionarily conserved introns and nongenic regions, ontologies and pathways. The results show that, no mater which SNP selection approach is used, the current high-density SNP chips provide very high coverage in genic regions and can capture most of known common disease variants under HapMap frame. The results also show that the differences between the direct and the indirect approaches are relatively small. Both have similar SNP distribution patterns in these gene-centric characteristics.

Conclusions/Significance

This study suggests that the indirect approaches not only have the advantage of high coverage but also are useful for studies focusing on various functional SNPs either in genes or in the conserved regions that the direct approach supports. The study and the annotation of characteristics will be helpful for designing and analyzing GWA studies that aim to identify genetic risk factors involved in common diseases, especially variants in genes and conserved regions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号