首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Cancer has long been understood as a somatic evolutionary process, but many details of tumor progression remain elusive. Here, we present BitPhylogeny, a probabilistic framework to reconstruct intra-tumor evolutionary pathways. Using a full Bayesian approach, we jointly estimate the number and composition of clones in the sample as well as the most likely tree connecting them. We validate our approach in the controlled setting of a simulation study and compare it against several competing methods. In two case studies, we demonstrate how BitPhylogeny reconstructs tumor phylogenies from methylation patterns in colon cancer and from single-cell exomes in myeloproliferative neoplasm.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-015-0592-6) contains supplementary material, which is available to authorized users.  相似文献   

2.
Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene''s intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol''s ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease.  相似文献   

3.
Due to large sizes and complex nature, few large macromolecular complexes have been solved to atomic resolution. This has lead to an under-representation of these structures, which are composed of novel and/or homologous folds, in the library of known structures and folds. While it is often difficult to achieve a high-resolution model for these structures, X-ray crystallography and electron cryomicroscopy are capable of determining structures of large assemblies at low to intermediate resolutions. To aid in the interpretation and analysis of such structures, we have developed two programs: helixhunter and foldhunter. Helixhunter is capable of reliably identifying helix position, orientation and length using a five-dimensional cross-correlation search of a three-dimensional density map followed by feature extraction. Helixhunter's results can in turn be used to probe a library of secondary structure elements derived from the structures in the Protein Data Bank (PDB). From this analysis, it is then possible to identify potential homologous folds or suggest novel folds based on the arrangement of alpha helix elements, resulting in a structure-based recognition of folds containing alpha helices. Foldhunter uses a six-dimensional cross-correlation search allowing a probe structure to be fitted within a region or component of a target structure. The structural fitting therefore provides a quantitative means to further examine the architecture and organization of large, complex assemblies. These two methods have been successfully tested with simulated structures modeled from the PDB at resolutions between 6 and 12 A. With the integration of helixhunter and foldhunter into sequence and structural informatics techniques, we have the potential to deduce or confirm known or novel folds in domains or components within large complexes.  相似文献   

4.
Computational protein design procedures were applied to the redesign of the entire sequence of a 51 amino acid residue protein, Drosophila melanogaster engrailed homeodomain. Various sequence optimization algorithms were compared and two resulting designed sequences were experimentally evaluated. The two sequences differ by 11 mutations and share 22% and 24% sequence identity with the wild-type protein. Both computationally designed proteins were considerably more stable than the naturally occurring protein, with midpoints of thermal denaturation greater than 99 degrees C. The solution structure was determined for one of the two sequences using multidimensional heteronuclear NMR spectroscopy, and the structure was found to closely match the original design template scaffold.  相似文献   

5.
6.
Deng X  Geng H  Ali H 《Bio Systems》2005,81(2):125-136
Reverse-engineering of gene networks using linear models often results in an underdetermined system because of excessive unknown parameters. In addition, the practical utility of linear models has remained unclear. We address these problems by developing an improved method, EXpression Array MINing Engine (EXAMINE), to infer gene regulatory networks from time-series gene expression data sets. EXAMINE takes advantage of sparse graph theory to overcome the excessive-parameter problem with an adaptive-connectivity model and fitting algorithm. EXAMINE also guarantees that the most parsimonious network structure will be found with its incremental adaptive fitting process. Compared to previous linear models, where a fully connected model is used, EXAMINE reduces the number of parameters by O(N), thereby increasing the chance of recovering the underlying regulatory network. The fitting algorithm increments the connectivity during the fitting process until a satisfactory fit is obtained. We performed a systematic study to explore the data mining ability of linear models. A guideline for using linear models is provided: If the system is small (3-20 elements), more than 90% of the regulation pathways can be determined correctly. For a large-scale system, either clustering is needed or it is necessary to integrate information in addition to expression profile. Coupled with the clustering method, we applied EXAMINE to rat central nervous system development (CNS) data with 112 genes. We were able to efficiently generate regulatory networks with statistically significant pathways that have been predicted previously.  相似文献   

7.
8.
The "Virtual Cell" provides a general system for testing cell biological mechanisms and creates a framework for encapsulating the burgeoning knowledge base comprising the distribution and dynamics of intracellular biochemical processes. It approaches the problem by associating biochemical and electrophysiological data describing individual reactions with experimental microscopic image data describing their subcellular localizations. Individual processes are collected within a physical and computational infrastructure that accommodates any molecular mechanism expressible as rate equations or membrane fluxes. An illustration of the method is provided by a dynamic simulation of IP3-mediated Ca2+ release from endoplasmic reticulum in a neuronal cell. The results can be directly compared to experimental observations and provide insight into the role of experimentally inaccessible components of the overall mechanism.  相似文献   

9.
Comprehensive discovery of structural variation (SV) from whole genome sequencing data requires multiple detection signals including read-pair, split-read, read-depth and prior knowledge. Owing to technical challenges, extant SV discovery algorithms either use one signal in isolation, or at best use two sequentially. We present LUMPY, a novel SV discovery framework that naturally integrates multiple SV signals jointly across multiple samples. We show that LUMPY yields improved sensitivity, especially when SV signal is reduced owing to either low coverage data or low intra-sample variant allele frequency. We also report a set of 4,564 validated breakpoints from the NA12878 human genome. https://github.com/arq5x/lumpy-sv.  相似文献   

10.
The IUPS Physiome Project is an internationally collaborative open-source project to provide a public domain framework for computational physiology, including the development of modelling standards, computational tools and web-accessible databases of models of structure and function at all spatial scales. A number of papers in this volume deal with the development of specific mathematical models of physiological processes. This paper stands back from the detail of individual models and reviews the current state of the IUPS Physiome Project including organ and organ system continuum models, the interpretation of constitutive law parameters in terms of micro-structural models, and markup languages for standardizing cellular processes. Some current practical applications of the physiome models are given and some of the challenges for the next 5 years of the Physiome Project at the level of organs, cells and proteins are proposed.  相似文献   

11.
Non-coding variants have long been recognized as important contributors to common disease risks, but with the expansion of clinical whole genome sequencing, examples of rare, high-impact non-coding variants are also accumulating. Despite recent advances in the study of regulatory elements and the availability of specialized data collections, the systematic annotation of non-coding variants from genome sequencing remains challenging. Here, we propose a new framework for the prioritization of non-coding regulatory variants that integrates information about regulatory regions with prediction scores and HPO-based prioritization. Firstly, we created a comprehensive collection of annotations for regulatory regions including a database of 2.4 million regulatory elements (GREEN-DB) annotated with controlled gene(s), tissue(s) and associated phenotype(s) where available. Secondly, we calculated a variation constraint metric and showed that constrained regulatory regions associate with disease-associated genes and essential genes from mouse knock-outs. Thirdly, we compared 19 non-coding impact prediction scores providing suggestions for variant prioritization. Finally, we developed a VCF annotation tool (GREEN-VARAN) that can integrate all these elements to annotate variants for their potential regulatory impact. In our evaluation, we show that GREEN-DB can capture previously published disease-associated non-coding variants as well as identify additional candidate disease genes in trio analyses.  相似文献   

12.
Inferring drug–drug interactions (DDIs) is an essential step in drug development and drug administration. Most computational inference methods focus on modeling drug pharmacokinetics, aiming at interactions that result from a common metabolizing enzyme (CYP). Here, we introduce a novel prediction method, INDI (INferring Drug Interactions), allowing the inference of both pharmacokinetic, CYP‐related DDIs (along with their associated CYPs) and pharmacodynamic, non‐CYP associated ones. On cross validation, it obtains high specificity and sensitivity levels (AUC (area under the receiver‐operating characteristic curve)?0.93). In application to the FDA adverse event reporting system, 53% of the drug events could potentially be connected to known (41%) or predicted (12%) DDIs. Additionally, INDI predicts the severity level of each DDI upon co‐administration of the involved drugs, suggesting that severe interactions are abundant in the clinical practice. Examining regularly taken medications by hospitalized patients, 18% of the patients receive known or predicted severely interacting drugs and are hospitalized more frequently. Access to INDI and its predictions is provided via a web tool at http://www.cs.tau.ac.il/~bnet/software/INDI , facilitating the inference and exploration of drug interactions and providing important leads for physicians and pharmaceutical companies alike.  相似文献   

13.
14.

Background  

An increasing number of scientific research projects require access to large-scale computational resources. This is particularly true in the biological field, whether to facilitate the analysis of large high-throughput data sets, or to perform large numbers of complex simulations – a characteristic of the emerging field of systems biology.  相似文献   

15.
Small silencing RNAs, including microRNAs, endogenous small interfering RNAs (endo-siRNAs) and Piwi-interacting RNAs (piRNAs), have been shown to play important roles in fine-tuning gene expression, defending virus and controlling transposons. Loss of small silencing RNAs or components in their pathways often leads to severe developmental defects, including lethality and sterility. Recently, non-templated addition of nucleotides to the 3′ end, namely tailing, was found to associate with the processing and stability of small silencing RNAs. Next Generation Sequencing has made it possible to detect such modifications at nucleotide resolution in an unprecedented throughput. Unfortunately, detecting such events from millions of short reads confounded by sequencing errors and RNA editing is still a tricky problem. Here, we developed a computational framework, Tailor, driven by an efficient and accurate aligner specifically designed for capturing the tailing events directly from the alignments without extensive post-processing. The performance of Tailor was fully tested and compared favorably with other general-purpose aligners using both simulated and real datasets for tailing analysis. Moreover, to show the broad utility of Tailor, we used Tailor to reanalyze published datasets and revealed novel findings worth further experimental validation. The source code and the executable binaries are freely available at https://github.com/jhhung/Tailor.  相似文献   

16.
17.
18.
Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号