首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The “missing heritability” has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called “Epistasis Test in Meta-Analysis” (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis. We defined a series of conditions to generate simulation data and tested the power and type I error rates in ETMA, individual data analysis and conventional meta-regression-based method. ETMA not only successfully facilitated consistency of evidence but also yielded acceptable type I error and higher power than conventional meta-regression. We applied ETMA to three real meta-analysis data sets. We found significant gene–gene interactions in the renin–angiotensin system and the polycyclic aromatic hydrocarbon metabolism pathway, with strong supporting evidence. In addition, glutathione S-transferase (GST) mu 1 and theta 1 were confirmed to exert independent effects on cancer. We concluded that the application of ETMA to real meta-analysis data was successful. Finally, we developed an R package, etma, for the detection of epistasis in meta-analysis [etma is available via the Comprehensive R Archive Network (CRAN) at https://cran.r-project.org/web/packages/etma/index.html].  相似文献   

2.
Specific selection pressures often lead to specifically mutated genomes. The open source software SeqFeatR has been developed to identify associations between mutation patterns in biological sequences and specific selection pressures (“features”). For instance, SeqFeatR has been used to discover in viral protein sequences new T cell epitopes for hosts of given HLA types. SeqFeatR supports frequentist and Bayesian methods for the discovery of statistical sequence-feature associations. Moreover, it offers novel ways to visualize results of the statistical analyses and to relate them to further properties. In this article we demonstrate various functions of SeqFeatR with real data. The most frequently used set of functions is also provided by a web server. SeqFeatR is implemented as R package and freely available from the R archive CRAN (http://cran.r-project.org/web/packages/SeqFeatR/index.html). The package includes a tutorial vignette. The software is distributed under the GNU General Public License (version 3 or later). The web server URL is https://seqfeatr.zmb.uni-due.de.  相似文献   

3.
Survival prediction from a large number of covariates is a current focus of statistical and medical research. In this paper, we study a methodology known as the compound covariate prediction performed under univariate Cox proportional hazard models. We demonstrate via simulations and real data analysis that the compound covariate method generally competes well with ridge regression and Lasso methods, both already well-studied methods for predicting survival outcomes with a large number of covariates. Furthermore, we develop a refinement of the compound covariate method by incorporating likelihood information from multivariate Cox models. The new proposal is an adaptive method that borrows information contained in both the univariate and multivariate Cox regression estimators. We show that the new proposal has a theoretical justification from a statistical large sample theory and is naturally interpreted as a shrinkage-type estimator, a popular class of estimators in statistical literature. Two datasets, the primary biliary cirrhosis of the liver data and the non-small-cell lung cancer data, are used for illustration. The proposed method is implemented in R package “compound.Cox” available in CRAN at http://cran.r-project.org/.  相似文献   

4.
Reverse engineering approaches to constructing gene regulatory networks (GRNs) based on genome-wide mRNA expression data have led to significant biological findings, such as the discovery of novel drug targets. However, the reliability of the reconstructed GRNs needs to be improved. Here, we propose an ensemble-based network aggregation approach to improving the accuracy of network topologies constructed from mRNA expression data. To evaluate the performances of different approaches, we created dozens of simulated networks from combinations of gene-set sizes and sample sizes and also tested our methods on three Escherichia coli datasets. We demonstrate that the ensemble-based network aggregation approach can be used to effectively integrate GRNs constructed from different studies – producing more accurate networks. We also apply this approach to building a network from epithelial mesenchymal transition (EMT) signature microarray data and identify hub genes that might be potential drug targets. The R code used to perform all of the analyses is available in an R package entitled “ENA”, accessible on CRAN (http://cran.r-project.org/web/packages/ENA/).  相似文献   

5.
Chemical graph generators are software packages to generate computer representations of chemical structures adhering to certain boundary conditions. Their development is a research topic of cheminformatics. Chemical graph generators are used in areas such as virtual library generation in drug design, in molecular design with specified properties, called inverse QSAR/QSPR, as well as in organic synthesis design, retrosynthesis or in systems for computer-assisted structure elucidation (CASE). CASE systems again have regained interest for the structure elucidation of unknowns in computational metabolomics, a current area of computational biology.  相似文献   

6.
Algorithms for active module identification (AMI) are central to analysis of omics data. Such algorithms receive a gene network and nodes'' activity scores as input and report subnetworks that show significant over‐representation of accrued activity signal (“active modules”), thus representing biological processes that presumably play key roles in the analyzed conditions. Here, we systematically evaluated six popular AMI methods on gene expression and GWAS data. We observed that GO terms enriched in modules detected on the real data were often also enriched on modules found on randomly permuted data. This indicated that AMI methods frequently report modules that are not specific to the biological context measured by the analyzed omics dataset. To tackle this bias, we designed a permutation‐based method that empirically evaluates GO terms reported by AMI methods. We used the method to fashion five novel AMI performance criteria. Last, we developed DOMINO, a novel AMI algorithm, that outperformed the other six algorithms in extensive testing on GE and GWAS data. Software is available at https://github.com/Shamir‐Lab.  相似文献   

7.
Interest in gene drive technology has continued to grow as promising new drive systems have been developed in the lab and discussions are moving towards implementing field trials. The prospect of field trials requires models that incorporate a significant degree of ecological detail, including parameters that change over time in response to environmental data such as temperature and rainfall, leading to seasonal patterns in mosquito population density. Epidemiological outcomes are also of growing importance, as: i) the suitability of a gene drive construct for release will depend on its expected impact on disease transmission, and ii) initial field trials are expected to have a measured entomological outcome and a modeled epidemiological outcome. We present MGDrivE 2 (Mosquito Gene Drive Explorer 2): a significant development from the MGDrivE 1 simulation framework that investigates the population dynamics of a variety of gene drive architectures and their spread through spatially-explicit mosquito populations. Key strengths and fundamental improvements of the MGDrivE 2 framework are: i) the ability of parameters to vary with time and induce seasonal population dynamics, ii) an epidemiological module accommodating reciprocal pathogen transmission between humans and mosquitoes, and iii) an implementation framework based on stochastic Petri nets that enables efficient model formulation and flexible implementation. Example MGDrivE 2 simulations are presented to demonstrate the application of the framework to a CRISPR-based split gene drive system intended to drive a disease-refractory gene into a population in a confinable and reversible manner, incorporating time-varying temperature and rainfall data. The simulations also evaluate impact on human disease incidence and prevalence. Further documentation and use examples are provided in vignettes at the project’s CRAN repository. MGDrivE 2 is freely available as an open-source R package on CRAN (https://CRAN.R-project.org/package=MGDrivE2). We intend the package to provide a flexible tool capable of modeling gene drive constructs as they move closer to field application and to infer their expected impact on disease transmission.  相似文献   

8.
Dependence measures and tests for independence have recently attracted a lot of attention, because they are the cornerstone of algorithms for network inference in probabilistic graphical models. Pearson''s product moment correlation coefficient is still by far the most widely used statistic yet it is largely constrained to detecting linear relationships. In this work we provide an exact formula for the th nearest neighbor distance distribution of rank-transformed data. Based on that, we propose two novel tests for independence. An implementation of these tests, together with a general benchmark framework for independence testing, are freely available as a CRAN software package (http://cran.r-project.org/web/packages/knnIndep). In this paper we have benchmarked Pearson''s correlation, Hoeffding''s , dcor, Kraskov''s estimator for mutual information, maximal information criterion and our two tests. We conclude that no particular method is generally superior to all other methods. However, dcor and Hoeffding''s are the most powerful tests for many different types of dependence.  相似文献   

9.
10.
We propose permutation tests based on the pairwise distances between microarrays to compare location, variability, or equivalence of gene expression between two populations. For these tests the entire microarray or some pre-specified subset of genes is the unit of analysis. The pairwise distances only have to be computed once so the procedure is not computationally intensive despite the high dimensionality of the data. An R software package, permtest, implementing the method is freely available from the Comprehensive R Archive Network at http://cran.r-project.org.  相似文献   

11.
Identification of biopolymer motifs represents a key step in the analysis of biological sequences. The MEME Suite is a widely used toolkit for comprehensive analysis of biopolymer motifs; however, these tools are poorly integrated within popular analysis frameworks like the R/Bioconductor project, creating barriers to their use. Here we present memes, an R package that provides a seamless R interface to a selection of popular MEME Suite tools. memes provides a novel “data aware” interface to these tools, enabling rapid and complex discriminative motif analysis workflows. In addition to interfacing with popular MEME Suite tools, memes leverages existing R/Bioconductor data structures to store the multidimensional data returned by MEME Suite tools for rapid data access and manipulation. Finally, memes provides data visualization capabilities to facilitate communication of results. memes is available as a Bioconductor package at https://bioconductor.org/packages/memes, and the source code can be found at github.com/snystrom/memes.  相似文献   

12.
Genome sequencing is an increasingly common component of infectious disease outbreak investigations. However, the relationship between pathogen transmission and observed genetic data is complex, and dependent on several uncertain factors. As such, simulation of pathogen dynamics is an important tool for interpreting observed genomic data in an infectious disease outbreak setting, in order to test hypotheses and to explore the range of outcomes consistent with a given set of parameters. We introduce ‘seedy’, an R package for the simulation of evolutionary and epidemiological dynamics (http://cran.r-project.org/web/packages/seedy/). Our software implements stochastic models for the accumulation of mutations within hosts, as well as individual-level disease transmission. By allowing variables such as the transmission bottleneck size, within-host effective population size and population mixing rates to be specified by the user, our package offers a flexible framework to investigate evolutionary dynamics during disease outbreaks. Furthermore, our software provides theoretical pairwise genetic distance distributions to provide a likelihood of person-to-person transmission based on genomic observations, and using this framework, implements transmission route assessment for genomic data collected during an outbreak. Our open source software provides an accessible platform for users to explore pathogen evolution and outbreak dynamics via simulation, and offers tools to assess observed genomic data in this context.  相似文献   

13.
Expression quantitative trait loci (eQTL) studies are used to understand the regulatory function of non-coding genome-wide association study (GWAS) risk loci, but colocalization alone does not demonstrate a causal relationship of gene expression affecting a trait. Evidence for mediation, that perturbation of gene expression in a given tissue or developmental context will induce a change in the downstream GWAS trait, can be provided by two-sample Mendelian Randomization (MR). Here, we introduce a new statistical method, MRLocus, for Bayesian estimation of the gene-to-trait effect from eQTL and GWAS summary data for loci with evidence of allelic heterogeneity, that is, containing multiple causal variants. MRLocus makes use of a colocalization step applied to each nearly-LD-independent eQTL, followed by an MR analysis step across eQTLs. Additionally, our method involves estimation of the extent of allelic heterogeneity through a dispersion parameter, indicating variable mediation effects from each individual eQTL on the downstream trait. Our method is evaluated against other state-of-the-art methods for estimation of the gene-to-trait mediation effect, using an existing simulation framework. In simulation, MRLocus often has the highest accuracy among competing methods, and in each case provides more accurate estimation of uncertainty as assessed through interval coverage. MRLocus is then applied to five candidate causal genes for mediation of particular GWAS traits, where gene-to-trait effects are concordant with those previously reported. We find that MRLocus’s estimation of the causal effect across eQTLs within a locus provides useful information for determining how perturbation of gene expression or individual regulatory elements will affect downstream traits. The MRLocus method is implemented as an R package available at https://mikelove.github.io/mrlocus.  相似文献   

14.
Supervised machine learning is an essential but difficult to use approach in biomedical data analysis. The Galaxy-ML toolkit (https://galaxyproject.org/community/machine-learning/) makes supervised machine learning more accessible to biomedical scientists by enabling them to perform end-to-end reproducible machine learning analyses at large scale using only a web browser. Galaxy-ML extends Galaxy (https://galaxyproject.org), a biomedical computational workbench used by tens of thousands of scientists across the world, with a suite of tools for all aspects of supervised machine learning.

This is a PLOS Computational Biology Software paper.
  相似文献   

15.
16.
Contemporary genetic studies are revealing the genetic complexity of many traits in humans and model organisms. Two hallmarks of this complexity are epistasis, meaning gene-gene interaction, and pleiotropy, in which one gene affects multiple phenotypes. Understanding the genetic architecture of complex traits requires addressing these phenomena, but interpreting the biological significance of epistasis and pleiotropy is often difficult. While epistasis reveals dependencies between genetic variants, it is often unclear how the activity of one variant is specifically modifying the other. Epistasis found in one phenotypic context may disappear in another context, rendering the genetic interaction ambiguous. Pleiotropy can suggest either redundant phenotype measures or gene variants that affect multiple biological processes. Here we present an R package, R/cape, which addresses these interpretation ambiguities by implementing a novel method to generate predictive and interpretable genetic networks that influence quantitative phenotypes. R/cape integrates information from multiple related phenotypes to constrain models of epistasis, thereby enhancing the detection of interactions that simultaneously describe all phenotypes. The networks inferred by R/cape are readily interpretable in terms of directed influences that indicate suppressive and enhancing effects of individual genetic variants on other variants, which in turn account for the variance in quantitative traits. We demonstrate the utility of R/cape by analyzing a mouse backcross, thereby discovering novel epistatic interactions influencing phenotypes related to obesity and diabetes. R/cape is an easy-to-use, platform-independent R package and can be applied to data from both genetic screens and a variety of segregating populations including backcrosses, intercrosses, and natural populations. The package is freely available under the GPL-3 license at http://cran.r-project.org/web/packages/cape.
This is a PLOS Computational Biology Software Article
  相似文献   

17.
18.
Endothelial nitric oxide synthase (eNOS) and receptor-type vascular endothelial protein tyrosine phosphatase (VE-PTP) are one of the majors signaling pathways related to endothelial health in diabetes. Several reports have shown that the inhibition of VE-PTP can lead the nitric oxide production, although repeated studies showed that VE-PTP regulated the eNOS exclusive at Ser1177 in indirect-manner. A recent, exciting paper (Siragusa et al. in Cardiovasc Res, 2020. https://doi.org/10.1093/cvr/cvaa213), showing that VE-PTP regulates eNOS in a direct-manner, dephosphorylating eNOS at Tyr81 and indirect at Ser1177 and the effects of a VE-PTP inhibitor, AKB-9778, in the blood pressure from diabetic patients.  相似文献   

19.
dadi is a popular but computationally intensive program for inferring models of demographic history and natural selection from population genetic data. I show that running dadi on a Graphics Processing Unit can dramatically speed computation compared with the CPU implementation, with minimal user burden. Motivated by this speed increase, I also extended dadi to four- and five-population models. This functionality is available in dadi version 2.1.0, https://bitbucket.org/gutenkunstlab/dadi/.  相似文献   

20.
One of the most accurate multi-class protein classification systems continues to be the profile-based SVM kernel introduced by the Leslie group. Unfortunately, its CPU requirements render it too slow for practical applications of large-scale classification tasks. Here, we introduce several software improvements that enable significant acceleration. Using various non-redundant data sets, we demonstrate that our new implementation reaches a maximal speed-up as high as 14-fold for calculating the same kernel matrix. Some predictions are over 200 times faster and render the kernel as possibly the top contender in a low ratio of speed/performance. Additionally, we explain how to parallelize various computations and provide an integrative program that reduces creating a production-quality classifier to a single program call. The new implementation is available as a Debian package under a free academic license and does not depend on commercial software. For non-Debian based distributions, the source package ships with a traditional Makefile-based installer. Download and installation instructions can be found at https://rostlab.org/owiki/index.php/Fast_Profile_Kernel. Bugs and other issues may be reported at https://rostlab.org/bugzilla3/enter_bug.cgi?product=fastprofkernel.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号