首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provides important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.  相似文献   

2.

Background  

Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.  相似文献   

3.

Background

Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs.

Methods

We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1.

Results

We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes.

Conclusions

We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.  相似文献   

4.

Background  

Over the last few years, genome-wide association (GWA) studies became a tool of choice for the identification of loci associated with complex traits. Currently, imputed single nucleotide polymorphisms (SNP) data are frequently used in GWA analyzes. Correct analysis of imputed data calls for the implementation of specific methods which take genotype imputation uncertainty into account.  相似文献   

5.

Background  

Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions.  相似文献   

6.

Background  

The success achieved by genome-wide association (GWA) studies in the identification of candidate loci for complex diseases has been accompanied by an inability to explain the bulk of heritability. Here, we describe the algorithm V-Bay, a variational Bayes algorithm for multiple locus GWA analysis, which is designed to identify weaker associations that may contribute to this missing heritability.  相似文献   

7.

Background

Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.

Methodology/Principal Findings

To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.

Conclusions/Significance

Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.  相似文献   

8.

Background  

Since the introduction of large-scale genotyping methods that can be utilized in genome-wide association (GWA) studies for deciphering complex diseases, statistical genetics has been posed with a tremendous challenge of how to most appropriately analyze such data. A plethora of advanced model-based methods for genetic mapping of traits has been available for more than 10 years in animal and plant breeding. However, most such methods are computationally intractable in the context of genome-wide studies. Therefore, it is hardly surprising that GWA analyses have in practice been dominated by simple statistical tests concerned with a single marker locus at a time, while the more advanced approaches have appeared only relatively recently in the biomedical and statistical literature.  相似文献   

9.

Background  

The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage).  相似文献   

10.

Background  

The developments of high-throughput genotyping technologies, which enable the simultaneous genotyping of hundreds of thousands of single nucleotide polymorphisms (SNP) have the potential to increase the benefits of genetic epidemiology studies. Although the enhanced resolution of these platforms increases the chance of interrogating functional SNPs that are themselves causative or in linkage disequilibrium with causal SNPs, commonly used single SNP-association approaches suffer from serious multiple hypothesis testing problems and provide limited insights into combinations of loci that may contribute to complex diseases. Drawing inspiration from Gene Set Enrichment Analysis developed for gene expression data, we have developed a method, named GLOSSI (Gene-loci Set Analysis), that integrates prior biological knowledge into the statistical analysis of genotyping data to test the association of a group of SNPs (loci-set) with complex disease phenotypes. The most significant loci-sets can be used to formulate hypotheses from a functional viewpoint that can be validated experimentally.  相似文献   

11.

Background  

Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process.  相似文献   

12.

Objectives

Brain-derived neurotrophic factor (BDNF) plays important roles in neuronal survival and differentiation; however, the effects of BDNF on mood disorders remain unclear. We investigated BDNF from the perspective of various aspects of systems biology, including its molecular evolution, genomic studies, protein functions, and pathway analysis.

Methods

We conducted analyses examining sequences, multiple alignments, phylogenetic trees and positive selection across 12 species and several human populations. We summarized the results of previous genomic and functional studies of pro-BDNF and mature-BDNF (m-BDNF) found in a literature review. We identified proteins that interact with BDNF and performed pathway-based analysis using large genome-wide association (GWA) datasets obtained for mood disorders.

Results

BDNF is encoded by a highly conserved gene. The chordate BDNF genes exhibit an average of 75% identity with the human gene, while vertebrate orthologues are 85.9%-100% identical to human BDNF. No signs of recent positive selection were found. Associations between BDNF and mood disorders were not significant in most of the genomic studies (e.g., linkage, association, gene expression, GWA), while relationships between serum/plasma BDNF level and mood disorders were consistently reported. Pro-BDNF is important in the response to stress; the literature review suggests the necessity of studying both pro- and m-BDNF with regard to mood disorders. In addition to conventional pathway analysis, we further considered proteins that interact with BDNF (I-Genes) and identified several biological pathways involved with BDNF or I-Genes to be significantly associated with mood disorders.

Conclusions

Systematically examining the features and biological pathways of BDNF may provide opportunities to deepen our understanding of the mechanisms underlying mood disorders.  相似文献   

13.
Bipolar disorder (BPD) is a complex psychiatric trait with high heritability. Despite efforts through conducting genome-wide association (GWA) studies, the success of identifying susceptibility loci for BPD has been limited, which is partially attributed to the complex nature of its pathogenesis. Pathway-based analytic strategy is a powerful tool to explore joint effects of gene sets within specific biological pathways. Additionally, to incorporate other aspects of genomic data into pathway analysis may further enhance our understanding for the underlying mechanisms for BPD. Patterns of DNA methylation play important roles in regulating gene expression and function. A commonly observed phenomenon, allele-specific methylation (ASM) describes the associations between genetic variants and DNA methylation patterns. The present study aimed to identify biological pathways that are involve in the pathogenesis of BPD while incorporating brain specific ASM information in pathway analysis using two large-scale GWA datasets in Caucasian populations. A weighting scheme was adopted to take ASM information into consideration for each pathway. After multiple testing corrections, we identified 88 and 15 enriched pathways for their biological relevance for BPD in the Genetic Association Information Network (GAIN) and the Wellcome Trust Case Control Consortium dataset, respectively. Many of these pathways were significant only when applying the weighting scheme. Three ion channel related pathways were consistently identified in both datasets. Results in the GAIN dataset also suggest for the roles of extracellular matrix in brain for BPD. Findings from Gene Ontology (GO) analysis exhibited functional enrichment among genes of non-GO pathways in activity of gated channel, transporter, and neurotransmitter receptor. We demonstrated that integrating different data sources with pathway analysis provides an avenue to identify promising and novel biological pathways for exploring the underlying molecular mechanisms for bipolar disorder. Further basic research can be conducted to target the biological mechanisms for the identified genes and pathways.  相似文献   

14.

Background  

Large-scale genetic association studies can test hundreds of thousands of genetic markers for association with a trait. Since the genetic markers may be correlated, a Bonferroni correction is typically too stringent a correction for multiple testing. Permutation testing is a standard statistical technique for determining statistical significance when performing multiple correlated tests for genetic association. However, permutation testing for large-scale genetic association studies is computationally demanding and calls for optimized algorithms and software. PRESTO is a new software package for genetic association studies that performs fast computation of multiple-testing adjusted P-values via permutation of the trait.  相似文献   

15.
Kao CF  Fang YS  Zhao Z  Kuo PH 《PloS one》2011,6(4):e18696

Background

Large scale and individual genetic studies have suggested numerous susceptible genes for depression in the past decade without conclusive results. There is a strong need to review and integrate multi-dimensional data for follow up validation. The present study aimed to apply prioritization procedures to build-up an evidence-based candidate genes dataset for depression.

Methods

Depression candidate genes were collected in human and animal studies across various data resources. Each gene was scored according to its magnitude of evidence related to depression and was multiplied by a source-specific weight to form a combined score measure. All genes were evaluated through a prioritization system to obtain an optimal weight matrix to rank their relative importance with depression using the combined scores. The resulting candidate gene list for depression (DEPgenes) was further evaluated by a genome-wide association (GWA) dataset and microarray gene expression in human tissues.

Results

A total of 5,055 candidate genes (4,850 genes from human and 387 genes from animal studies with 182 being overlapped) were included from seven data sources. Through the prioritization procedures, we identified 169 DEPgenes, which exhibited high chance to be associated with depression in GWA dataset (Wilcoxon rank-sum test, p = 0.00005). Additionally, the DEPgenes had a higher percentage to express in human brain or nerve related tissues than non-DEPgenes, supporting the neurotransmitter and neuroplasticity theories in depression.

Conclusions

With comprehensive data collection and curation and an application of integrative approach, we successfully generated DEPgenes through an effective gene prioritization system. The prioritized DEPgenes are promising for future biological experiments or replication efforts to discoverthe underlying molecular mechanisms for depression.  相似文献   

16.

Background  

Gene expression profiling using microarrays has become an important genetic tool. Spotted arrays prepared in academic labs have the advantage of low cost and high design and content flexibility, but are often limited by their susceptibility to quality control (QC) issues. Previously, we have reported a novel 3-color microarray technology that enabled array fabrication QC. In this report we further investigated its advantage in spot-level data QC.  相似文献   

17.

Background

Genome wide association (GWA) studies provide the opportunity to develop new kinds of analysis. Analysing pairs of markers from separate regions might lead to the detection of allelic association which might indicate an interaction between nearby genes.

Methods

396,591 markers typed in 541 subjects were studied. 7.8*1010 pairs of markers were screened and those showing initial evidence for allelic association were subjected to more thorough investigation along with 10 flanking markers on either side.

Results

No evidence was detected for interaction. However 6 markers appeared to have an incorrect map position according to NCBI Build 35. One of these was corrected in Build 36 and 2 were dropped. The remaining 3 were left with map positions inconsistent with their allelic association relationships.

Discussion

Although no interaction effects were detected the method was successful in identifying markers with probably incorrect map positions.

Conclusion

The study of allelic association can supplement other methods for assigning markers to particular map positions. Analyses of this type may usefully be applied to data from future GWA studies.  相似文献   

18.

Background

The use of biological annotation such as genes and pathways in the analysis of gene expression data has aided the identification of genes for follow-up studies and suggested functional information to uncharacterized genes. Several studies have applied similar methods to genome wide association studies and identified a number of disease related pathways. However, many questions remain on how to best approach this problem, such as whether there is a need to obtain a score to summarize association evidence at the gene level, and whether a pathway, dominated by just a few highly significant genes, is of interest.

Methods

We evaluated the performance of two pathway-based methods (Random Set, and Binomial approximation to the hypergeometric test) based on their applications to three data sets of Crohn's disease. We consider both the disease status as a phenotype as well as the residuals after conditioning on IL23R, a known Crohn's related gene, as a phenotype.

Results

Our results show that Random Set method has the most power to identify disease related pathways. We confirm previously reported disease related pathways and provide evidence for IL-2 Receptor Beta Chain in T cell Activation and IL-9 signaling as Crohn's disease associated pathways.

Conclusions

Our results highlight the need to apply powerful gene score methods prior to pathway enrichment tests, and that controlling for genes that attain genome wide significance enable further biological insight.  相似文献   

19.

Background

High-throughput genotype (HTG) data has been used primarily in genome-wide association (GWA) studies; however, GWA results explain only a limited part of the complete genetic variation of traits. In systems genetics, network approaches have been shown to be able to identify pathways and their underlying causal genes to unravel the biological and genetic background of complex diseases and traits, e.g., the Weighted Gene Co-expression Network Analysis (WGCNA) method based on microarray gene expression data. The main objective of this study was to develop a scale-free weighted genetic interaction network method using whole genome HTG data in order to detect biologically relevant pathways and potential genetic biomarkers for complex diseases and traits.

Results

We developed the Weighted Interaction SNP Hub (WISH) network method that uses HTG data to detect genome-wide interactions between single nucleotide polymorphism (SNPs) and its relationship with complex traits. Data dimensionality reduction was achieved by selecting SNPs based on its: 1) degree of genome-wide significance and 2) degree of genetic variation in a population. Network construction was based on pairwise Pearson's correlation between SNP genotypes or the epistatic interaction effect between SNP pairs. To identify modules the Topological Overlap Measure (TOM) was calculated, reflecting the degree of overlap in shared neighbours between SNP pairs. Modules, clusters of highly interconnected SNPs, were defined using a tree-cutting algorithm on the SNP dendrogram created from the dissimilarity TOM (1-TOM). Modules were selected for functional annotation based on their association with the trait of interest, defined by the Genome-wide Module Association Test (GMAT). We successfully tested the established WISH network method using simulated and real SNP interaction data and GWA study results for carcass weight in a pig resource population; this resulted in detecting modules and key functional and biological pathways related to carcass weight.

Conclusions

We developed the WISH network method which is a novel 'systems genetics' approach to study genetic networks underlying complex trait variation. The WISH network method reduces data dimensionality and statistical complexity in associating genotypes with phenotypes in GWA studies and enables researchers to identify biologically relevant pathways and potential genetic biomarkers for any complex trait of interest.
  相似文献   

20.

Background

The recent completion of the swine genome sequencing project and development of a high density porcine SNP array has made genome-wide association (GWA) studies feasible in pigs.

Methodology/Principal Findings

Using Illumina''s PorcineSNP60 BeadChip, we performed a pilot GWA study in 820 commercial female pigs phenotyped for backfat, loin muscle area, body conformation in addition to feet and leg (FL) structural soundness traits. A total of 51,385 SNPs were jointly fitted using Bayesian techniques as random effects in a mixture model that assumed a known large proportion (99.5%) of SNPs had zero effect. SNP annotations were implemented through the Sus scrofa Build 9 available from pig Ensembl. We discovered a number of candidate chromosomal regions, and some of them corresponded to QTL regions previously reported. We not only have identified some well-known candidate genes for the traits of interest, such as MC4R (for backfat) and IGF2 (for loin muscle area), but also obtained novel promising genes, including CHCHD3 (for backfat), BMP2 (for loin muscle area, body size and several FL structure traits), and some HOXA family genes (for overall leg action). The candidate regions responsible for body conformation and FL structure soundness did not overlap greatly which implied that these traits were controlled by different genes. Functional clustering analyses classified the genes into categories related to bone and cartilage development, muscle growth and development or the insulin pathway suggesting the traits are regulated by common pathways or gene networks that exert roles at different spatial and temporal stages.

Conclusions/Significance

This study is one of the earliest GWA reports on important quantitative traits in pigs, and the findings will contribute to the further biological function analysis of the identified candidate genes and potential utilization of them in marker assisted selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号