共查询到20条相似文献,搜索用时 0 毫秒
1.
Probabilistic graphical models have been widely recognized as a powerful formalism in the bioinformatics field, especially in gene expression studies and linkage analysis. Although less well known in association genetics, many successful methods have recently emerged to dissect the genetic architecture of complex diseases. In this review article, we cover the applications of these models to the population association studies' context, such as linkage disequilibrium modeling, fine mapping and candidate gene studies, and genome-scale association studies. Significant breakthroughs of the corresponding methods are highlighted, but emphasis is also given to their current limitations, in particular, to the issue of scalability. Finally, we give promising directions for future research in this field. 相似文献
2.
As millions of single-nucleotide polymorphisms (SNPs) have been identified and high-throughput genotyping technologies have been rapidly developed, large-scale genomewide association studies are soon within reach. However, since a genomewide association study involves a large number of SNPs it is therefore nearly impossible to ensure a genomewide significance level of 0.05 using the available statistics, although the multiple-test problems can be alleviated, but not sufficiently, by the use of tagging SNPs. One strategy to circumvent the multiple-test problem associated with genome-wide association tests is to develop novel test statistics with high power. In this report, we introduce several nonlinear tests, which are based on nonlinear transformation of allele or haplotype frequencies. We investigate the power of the nonlinear test statistics and demonstrate that under certain conditions, some nonlinear test statistics have much higher power than the standard chi2-test statistic. Type I error rates of the nonlinear tests are validated using simulation studies. We also show that a class of similarity measure-based test statistics is based on the quadratic function of allele or haplotype frequencies, and thus they belong to nonlinear tests. To evaluate their performance, the nonlinear test statistics are also applied to three real data sets. Our study shows that nonlinear test statistics have great potential in association studies of complex diseases. 相似文献
3.
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic. 相似文献
4.
5.
Published genomewide association (GWA) studies typically analyze and report single-nucleotide polymorphisms (SNPs) and their neighboring genes with the strongest evidence of association (the “most-significant SNPs/genes” approach), while paying little attention to the rest. Borrowing ideas from microarray data analysis, we demonstrate that pathway-based approaches, which jointly consider multiple contributing factors in the same pathway, might complement the most-significant SNPs/genes approach and provide additional insights into interpretation of GWA data on complex diseases. 相似文献
6.
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. 相似文献
7.
The ability of genomewide association studies to decipher genetic traits is driven in part by how well the measured single-nucleotide polymorphisms "cover" the unmeasured causal variants. Estimates of coverage based on standard linkage-disequilibrium measures, such as the average maximum squared correlation coefficient (r2), can lead to inaccurate and inflated estimates of the power of genomewide association studies. In contrast, use of the "cumulative r2 adjusted power" measure presented here gives more-accurate estimates of power for genomewide association studies. 相似文献
8.
Lin DY 《American journal of human genetics》2006,78(3):505-509
Genomewide association studies are being conducted to unravel the genetic etiology of complex human diseases. Because of cost constraints, these studies typically employ a two-stage design, under which a large panel of markers is examined in a subsample of subjects, and the most-promising markers are then examined in all subjects. This report describes a simple and efficient method to evaluate statistical significance for such genome studies. The proposed method, which properly accounts for the correlated nature of polymorphism data, provides accurate control of the overall false-positive rate and is substantially more powerful than the standard Bonferroni correction, especially when the markers are in strong linkage disequilibrium. 相似文献
9.
With millions of single-nucleotide polymorphisms (SNPs) identified and characterized, genomewide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very-high-resolution SNP genotype data for hundreds or thousands of individuals. We describe a computationally efficient approach to testing association between SNPs and quantitative phenotypes, which can be applied to whole-genome association scans. In addition to observed genotypes, our approach allows estimation of missing genotypes, resulting in substantial increases in power when genotyping resources are limited. We estimate missing genotypes probabilistically using the Lander-Green or Elston-Stewart algorithms and combine high-resolution SNP genotypes for a subset of individuals in each pedigree with sparser marker data for the remaining individuals. We show that power is increased whenever phenotype information for ungenotyped individuals is included in analyses and that high-density genotyping of just three carefully selected individuals in a nuclear family can recover >90% of the information available if every individual were genotyped, for a fraction of the cost and experimental effort. To aid in study design, we evaluate the power of strategies that genotype different subsets of individuals in each pedigree and make recommendations about which individuals should be genotyped at a high density. To illustrate our method, we performed genomewide association analysis for 27 gene-expression phenotypes in 3-generation families (Centre d'Etude du Polymorphisme Humain pedigrees), in which genotypes for ~860,000 SNPs in 90 grandparents and parents are complemented by genotypes for ~6,700 SNPs in a total of 168 individuals. In addition to increasing the evidence of association at 15 previously identified cis-acting associated alleles, our genotype-inference algorithm allowed us to identify associated alleles at 4 cis-acting loci that were missed when analysis was restricted to individuals with the high-density SNP data. Our genotype-inference algorithm and the proposed association tests are implemented in software that is available for free. 相似文献
10.
A weighted-Holm procedure accounting for allele frequencies in genomewide association studies
下载免费PDF全文

In the context of genomewide association studies where hundreds of thousand of polymorphisms are tested, stringent thresholds on the raw association test P-values are generally used to limit false-positive results. Instead of using thresholds based on raw P-values as in Bonferroni and sequential Sidak (SidakSD) corrections, we propose here to use a weighted-Holm procedure with weights depending on allele frequency of the polymorphisms. This method is shown to substantially improve the power to detect associations, in particular by favoring the detection of rare variants with high genetic effects over more frequent ones with lower effects. 相似文献
11.
Yijun Zuo 《Journal of theoretical biology》2010,262(4):576-583
Genomewide association studies (GWAS) are being conducted to unravel the genetic etiology of complex diseases, in which complex epistasis may play an important role. One-stage method in which interactions are tested using all samples at one time may be computationally problematic, may have low power as the number of markers tested increases and may not be cost-efficient. A common two-stage method may be a reasonable and powerful approach for detecting interacting genes using all samples in both two stages. In this study, we introduce an alternative two-stage method, in which some promising markers are selected using a proportion of samples in the first stage and interactions are then tested using the remaining samples in the second stage. This two-stage method is called mixed two-stage method. We then investigate the power of both one-stage method and mixed two-stage method to detect interacting disease loci for a range of two-locus epistatic models in a case-control study design. Our results suggest that mixed two-stage method may be more powerful than one-stage method if we choose about 30% of samples for single-locus tests in the first stage, and identify less than and equal to 1% of markers for follow-up interaction tests. In addition, we compare both two-stage methods and find that our two-stage method will lose power because we only use part of samples in both two stages. 相似文献
12.
The Bayesian lasso for genome-wide association studies 总被引:1,自引:0,他引:1
13.
Enriching the analysis of genomewide association studies with hierarchical modeling 总被引:1,自引:0,他引:1
下载免费PDF全文

Genomewide association studies (GWAs) initially investigate hundreds of thousands of single-nucleotide polymorphisms (SNPs), and the most promising SNPs are further evaluated with additional subjects, for replication or a joint analysis. Deciding which SNPs merit follow-up is one of the most crucial aspects of these studies. We present here an approach for selecting the most-promising SNPs that incorporates into a hierarchical model both conventional results and other existing information about the SNPs. The model is developed for general use, its potential value is shown by application, and tools are provided for undertaking hierarchical modeling. By quantitatively harnessing all available information in GWAs, hierarchical modeling may more clearly distinguish true causal variants from noise. 相似文献
14.
Poon AF Lewis FI Frost SD Kosakovsky Pond SL 《Bioinformatics (Oxford, England)》2008,24(17):1949-1950
Spidermonkey is a new component of the Datamonkey suite of phylogenetic tools that provides methods for detecting coevolving sites from a multiple alignment of homologous nucleotide or amino acid sequences. It reconstructs the substitution history of the alignment by maximum likelihood-based phylogenetic methods, and then analyzes the joint distribution of substitution events using Bayesian graphical models to identify significant associations among sites. AVAILABILITY: Spidermonkey is publicly available both as a web application at http://www.data-monkey.org and as a stand-alone component of the phylogenetic software package HyPhy, which is freely distributed on the web (http://www.hyphy.org) as precompiled binaries and open source. 相似文献
15.
Steffen Uebe Francesca Pasutto Mandy Krumbiegel Denny Schanze Arif B Ekici André Reis 《BMC bioinformatics》2010,11(1):472
Background
Most software packages for whole genome association studies are non-graphical, purely text based programs originally designed to run with UNIX-like operating systems. Graphical output is often not intended or supposed to be performed with other command line tools, e.g. gnuplot. 相似文献16.
Haplotype association studies based on family genotype data can provide more biological information than single marker association studies. Difficulties arise, however, in the inference of haplotype phase determination and in haplotype transmission/non-transmission status. Incorporation of the uncertainty associated with haplotype inference into regression models requires special care. This task can get even more complicated when the genetic region contains a large number of haplotypes. To avoid the curse of dimensionality, we employ a clustering algorithm based on the evolutionary relationship among haplotypes and retain for regression analysis only the ancestral core haplotypes identified by it. To integrate the three sources of variation, phase ambiguity, transmission status and ancestral uncertainty, we propose an uncertainty-coding matrix which combines these three types of variability simultaneously. Next we evaluate haplotype risk with the use of such a matrix in a Bayesian conditional logistic regression model. Simulation studies and one application, a schizophrenia multiplex family study, are presented and the results are compared with those from other family based analysis tools such as FBAT. Our proposed method (Bayesian regression using uncertainty-coding matrix, BRUCM) is shown to perform better and the implementation in R is freely available. 相似文献
17.
18.
A Bayesian multilocus association method: allowing for higher-order interaction in association studies
下载免费PDF全文

For most common diseases with heritable components, not a single or a few single-nucleotide polymorphisms (SNPs) explain most of the variance for these disorders. Instead, much of the variance may be caused by interactions (epistasis) among multiple SNPs or interactions with environmental conditions. We present a new powerful statistical model for analyzing and interpreting genomic data that influence multifactorial phenotypic traits with a complex and likely polygenic inheritance. The new method is based on Markov chain Monte Carlo (MCMC) and allows for identification of sets of SNPs and environmental factors that when combined increase disease risk or change the distribution of a quantitative trait. Using simulations, we show that the MCMC method can detect disease association when multiple, interacting SNPs are present in the data. When applying the method on real large-scale data from a Danish population-based cohort, multiple interactions are identified that severely affect serum triglyceride levels in the study individuals. The method is designed for quantitative traits but can also be applied on qualitative traits. It is computationally feasible even for a large number of possible interactions and differs fundamentally from most previous approaches by entertaining nonlinear interactions and by directly addressing the multiple-testing problem. 相似文献
19.
Background
Graphical Gaussian models are popular tools for the estimation of (undirected) gene association networks from microarray data. A key issue when the number of variables greatly exceeds the number of samples is the estimation of the matrix of partial correlations. Since the (Moore-Penrose) inverse of the sample covariance matrix leads to poor estimates in this scenario, standard methods are inappropriate and adequate regularization techniques are needed. Popular approaches include biased estimates of the covariance matrix and high-dimensional regression schemes, such as the Lasso and Partial Least Squares. 相似文献20.