首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Identifying novel therapeutic targets for the treatment of disease is challenging. To this end, we developed a genome-wide approach of candidate gene prioritization. We independently collocated sets of genes that were implicated in rheumatoid arthritis (RA) pathogenicity through three genome-wide assays: (i) genome-wide association studies (GWAS), (ii) differentially expression in RA fibroblast-like synoviocytes (FLS), and (iii) differentially methylation in RA FLS. Integrated analysis of these complementary data sets identified a significant enrichment of multi-evidence genes (MEGs) within pathways relating to RA pathogenicity. One MEG is Engulfment and Cell Motility Protein-1 (ELMO1), a gene not previously considered as a therapeutic target in RA FLS. We demonstrated in RA FLS that ELMO1 is: (i) expressed, (ii) promotes cell migration and invasion, and (iii) regulates Rac1 activity. Thus, we created links between ELMO1 and RA pathogenicity, which in turn validates ELMO1 as a potential RA therapeutic target. This study illustrated the power of MEG-based approaches for therapeutic target identification.  相似文献   

3.
4.
Module-based analysis (MBA) aims to evaluate the effect of a group of biological elements sharing common features, such as SNPs in the same gene or metabolites in the same pathways, and has become an attractive alternative to traditional single bio-element approaches. Because bio-elements regulate and interact with each other as part of network, incorporating network structure information can more precisely model the biological effects, enhance the ability to detect true associations, and facilitate our understanding of the underlying biological mechanisms. How-ever, most MBA methods ignore the network structure information, which depicts the interaction and regulation relationship among basic functional units in biology system. We construct the con-nectivity kernel and the topology kernel to capture the relationship among bio-elements in a mod-ule, and use a kernel machine framework to evaluate the joint effect of bio-elements. Our proposed kernel machine approach directly incorporates network structure so to enhance the study effi-ciency; it can assess interactions among modules, account covariates, and is computational effi-cient. Through simulation studies and real data application, we demonstrate that the proposed network-based methods can have markedly better power than the approaches ignoring network information under a range of scenarios.  相似文献   

5.
Omics data integration is becoming necessary to investigate the genomic mechanisms involved in complex diseases. During the integration process, many challenges arise such as data heterogeneity, the smaller number of individuals in comparison to the number of parameters, multicollinearity, and interpretation and validation of results due to their complexity and lack of knowledge about biological processes. To overcome some of these issues, innovative statistical approaches are being developed. In this work, we propose a permutation-based method to concomitantly assess significance and correct by multiple testing with the MaxT algorithm. This was applied with penalized regression methods (LASSO and ENET) when exploring relationships between common genetic variants, DNA methylation and gene expression measured in bladder tumor samples. The overall analysis flow consisted of three steps: (1) SNPs/CpGs were selected per each gene probe within 1Mb window upstream and downstream the gene; (2) LASSO and ENET were applied to assess the association between each expression probe and the selected SNPs/CpGs in three multivariable models (SNP, CPG, and Global models, the latter integrating SNPs and CPGs); and (3) the significance of each model was assessed using the permutation-based MaxT method. We identified 48 genes whose expression levels were significantly associated with both SNPs and CPGs. Importantly, 36 (75%) of them were replicated in an independent data set (TCGA) and the performance of the proposed method was checked with a simulation study. We further support our results with a biological interpretation based on an enrichment analysis. The approach we propose allows reducing computational time and is flexible and easy to implement when analyzing several types of omics data. Our results highlight the importance of integrating omics data by applying appropriate statistical strategies to discover new insights into the complex genetic mechanisms involved in disease conditions.  相似文献   

6.
7.
8.
9.
10.
11.
Huo  Zhiguang  Zhu  Li  Ma  Tianzhou  Liu  Hongcheng  Han  Song  Liao  Daiqing  Zhao  Jinying  Tseng  George 《Statistics in biosciences》2020,12(1):1-22

Disease subtype discovery is an essential step in delivering personalized medicine. Disease subtyping via omics data has become a common approach for this purpose. With the advancement of technology and the lower price for generating omics data, multi-level and multi-cohort omics data are prevalent in the public domain, providing unprecedented opportunities to decrypt disease mechanisms. How to fully utilize multi-level/multi-cohort omics data and incorporate established biological knowledge toward disease subtyping remains a challenging problem. In this paper, we propose a meta-analytic integrative sparse Kmeans (MISKmeans) algorithm for integrating multi-cohort/multi-level omics data and prior biological knowledge. Compared with previous methods, MISKmeans shows better clustering accuracy and feature selection relevancy. An efficient R package, “MIS-Kmeans”, calling C++ is freely available on GitHub (https://github.com/Caleb-Huo/MIS-Kmeans).

  相似文献   

12.
The vast array of in silico resources and data of high throughput profiling currently available in life sciences research offer the possibility of aiding cancer gene and drug discovery process. Here we propose to take advantage of these resources to develop a tool, TARGETgene, for efficiently identifying mutation drivers, possible therapeutic targets, and drug candidates in cancer. The simple graphical user interface enables rapid, intuitive mapping and analysis at the systems level. Users can find, select, and explore identified target genes and compounds of interest (e.g., novel cancer genes and their enriched biological processes), and validate predictions using user-defined benchmark genes (e.g., target genes detected in RNAi screens) and curated cancer genes via TARGETgene. The high-level capabilities of TARGETgene are also demonstrated through two applications in this paper. The predictions in these two applications were then satisfactorily validated by several ways, including known cancer genes, results of RNAi screens, gene function annotations, and target genes of drugs that have been used or in clinical trial in cancer treatments. TARGETgene is freely available from the Biomedical Simulations Resource web site (http://bmsr.usc.edu/Software/TARGET/TARGET.html).  相似文献   

13.
Knowledge of the protein interaction network is useful to assist molecular mechanism studies. Several major repositories have been established to collect and organize reported protein interactions. Many interactions have been reported in several model organisms, yet a very limited number of plant interactions can thus far be found in these major databases. Computational identification of potential plant interactions, therefore, is desired to facilitate relevant research. In this work, we constructed a support vector machine model to predict potential Arabidopsis (Arabidopsis thaliana) protein interactions based on a variety of indirect evidence. In a 100-iteration bootstrap evaluation, the confidence of our predicted interactions was estimated to be 48.67%, and these interactions were expected to cover 29.02% of the entire interactome. The sensitivity of our model was validated with an independent evaluation data set consisting of newly reported interactions that did not overlap with the examples used in model training and testing. Results showed that our model successfully recognized 28.91% of the new interactions, similar to its expected sensitivity (29.02%). Applying this model to all possible Arabidopsis protein pairs resulted in 224,206 potential interactions, which is the largest and most accurate set of predicted Arabidopsis interactions at present. In order to facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource, with detailed annotations and more specific per interaction confidence measurements. This database and related documents are freely accessible at http://www.cls.zju.edu.cn/pair/.The complex cellular functions of an organism rely on physical interactions between proteins. Deciphering the protein-protein interaction network to understand higher level phenotypes and their regulations is always a major focus of both experimental biologists and computational biologists. A number of high-throughput (HTP) assays have been developed to identify in vitro protein interactions from several model organisms (Uetz et al., 2000; Giot et al., 2003; Li et al., 2004). A number of initiatives, such as IntAct (Kerrien et al., 2006), Molecular INTeraction database (Chatr-aryamontri et al., 2007), the Database of Interacting Proteins (Salwinski et al., 2004), Biomolecular Interaction Network Database (BIND; Alfarano et al., 2005), and BioGRID (Stark et al., 2006), have been established to systematically collect and organize the interaction data reported by both proteome-scale HTP experiments and traditional low-throughput studies focusing on individual proteins or pathways.Arabidopsis (Arabidopsis thaliana) has long been studied as a model organism to investigate the physiology, biochemistry, growth, development, and metabolism of a flowering plant at the molecular level. The molecular mechanism studies of various phenotypes and their regulations in Arabidopsis may be facilitated by a comprehensive reference protein interaction network, based on which working hypotheses could be invented with more guidance and confidence. However, due to technological limitations, most experimentally reported protein interactions in available databases were from other organisms. A very limited number of plant interactions could be found in these databases. Therefore, an accurate prediction of the Arabidopsis interactome would be valuable to assist relevant research.Studies on the computational identification of potential interactions started along with the advent of HTP interaction-detection technologies, which often produced a large number of false positives (Deane et al., 2002). Indirect evidence of protein interaction (e.g. protein colocalization and relevance in function) were hence introduced to boost the confidence of HTP results (Jansen et al., 2003). Further investigations demonstrated that direct inference of protein interactions from such indirect evidence alone was possible (Scott and Barton, 2007). The accuracy and effectiveness of using indirect evidence to predict interactions have also been thoroughly assessed (Qi et al., 2006; Suthram et al., 2006). These works offered precious insights into how protein interactions may be predicted accurately on a proteomic scale. In other organisms such as Homo sapiens, the prediction of an entire interactome has already been proven applicable and useful (Rhodes et al., 2005).On the other side, several efforts have been made to collect and organize a comprehensive map of Arabidopsis molecular interactions. For instances, around 20,000 interactions were inferred by homology to known interactions in other organisms (Geisler-Lee et al., 2007). Another work predicted 23,396 interactions based on multiple indirect data and curated 4,666 interactions from the literature and enzyme complexes (Cui et al., 2008). The Arabidopsis reactome database was established describing the functions of 2,195 proteins with 8,269 reactions in 318 superpathways (Tsesmetzis et al., 2008). And a general interaction database, IntAct (Kerrien et al., 2006), had allocated a special unit actively curating all plant protein interactions from literature and submitted data sets, which now contains 2,649 Arabidopsis interactions. However, in yeast, approximately 18,000 protein-protein interactions had been estimated for approximately 6,000 genes (Yu et al., 2008). Assuming the same rate of interaction, approximately 200,000 protein interactions would be expected for approximately 20,000 Arabidopsis genes. Therefore, the current collection of Arabidopsis interactions is still significantly limited. Moreover, most previous prediction works did not provide rigorous confidence measurements for their predicted interactions, which further limited their scope of applications.Recent advances in statistical learning presented a powerful algorithm, support vector machine (SVM), which may be used to predict interactions based on multiple indirect data. Although the basis of SVM had been laid in the 1960s, the idea of SVM was only officially proposed in the 1990s by Vapnik (1998, 2000). Then, research on its theoretical and application aspects thrived. It has been applied in a wide range of problems, including text categorization (de Vel et al., 2001; Kim et al., 2001), image classification and object detection (Ben-Yacoub et al., 1999; Karlsen et al., 2000), flood stage forecasting (Liong and Sivapragasam, 2002), microarray gene expression data analysis (Brown et al., 2000), drug design (Zhao et al., 2006a, 2006b), protein solvent accessibility prediction (Yuan et al., 2002), and protein fold prediction (Ding and Dubchak, 2001; Hua and Sun, 2001). Many studies have demonstrated that SVM was consistently superior to other supervised learning methods (Brown et al., 2000; Burbidge et al., 2001; Cai et al., 2003).In this work, with careful preparation of example data and selection of indirect evidence, we constructed an SVM model to predict potential Arabidopsis interactions. False positives were tightly controlled. With the high-confidence model, we identified altogether 224,206 potential interactions, which were expected to be 48.67% accurate and to cover 29.02% of the entire Arabidopsis interactome. More specific confidence measurements were also assigned on a per interaction basis. To facilitate the use of our results, we present the Predicted Arabidopsis Interactome Resource (PAIR; http://www.cls.zju.edu.cn/pair/), featuring detailed annotations and a friendly user interface.  相似文献   

14.

Background

Infections caused by Salmonella enterica, a Gram-negative facultative anaerobic bacteria belonging to the family of Enterobacteriaceae, are major threats to the health of humans and animals. The recent availability of complete genome data of pathogenic strains of the S. enterica gives new avenues for the identification of drug targets and drug candidates. We have used the genomic and metabolic pathway data to identify pathways and proteins essential to the pathogen and absent from the host.

Methods

We took the whole proteome sequence data of 42 strains of S. enterica and Homo sapiens along with KEGG-annotated metabolic pathway data, clustered proteins sequences using CD-HIT, identified essential genes using DEG database and discarded S. enterica homologs of human proteins in unique metabolic pathways (UMPs) and characterized hypothetical proteins with SVM-prot and InterProScan. Through this core proteomic analysis we have identified enzymes essential to the pathogen.

Results

The identification of 73 enzymes common in 42 strains of S. enterica is the real strength of the current study. We proposed all 73 unexplored enzymes as potential drug targets against the infections caused by the S. enterica. The study is comprehensive around S. enterica and simultaneously considered every possible pathogenic strain of S. enterica. This comprehensiveness turned the current study significant since, to the best of our knowledge it is the first subtractive core proteomic analysis of the unique metabolic pathways applied to any pathogen for the identification of drug targets. We applied extensive computational methods to shortlist few potential drug targets considering the druggability criteria e.g. Non-homologous to the human host, essential to the pathogen and playing significant role in essential metabolic pathways of the pathogen (i.e. S. enterica). In the current study, the subtractive proteomics through a novel approach was applied i.e. by considering only proteins of the unique metabolic pathways of the pathogens and mining the proteomic data of all completely sequenced strains of the pathogen, thus improving the quality and application of the results. We believe that the sharing of the knowledge from this study would eventually lead to bring about novel and unique therapeutic regimens against the infections caused by the S. enterica.  相似文献   

15.
16.
Recent advances in reconstruction and analytical methods for signaling networks have spurred the development of large-scale models that incorporate fully functional and biologically relevant features. An extended reconstruction of the human Toll-like receptor signaling network is presented herein. This reconstruction contains an extensive complement of kinases, phosphatases, and other associated proteins that mediate the signaling cascade along with a delineation of their associated chemical reactions. A computational framework based on the methods of large-scale convex analysis was developed and applied to this network to characterize input–output relationships. The input–output relationships enabled significant modularization of the network into ten pathways. The analysis identified potential candidates for inhibitory mediation of TLR signaling with respect to their specificity and potency. Subsequently, we were able to identify eight novel inhibition targets through constraint-based modeling methods. The results of this study are expected to yield meaningful avenues for further research in the task of mediating the Toll-like receptor signaling network and its effects.  相似文献   

17.
  相似文献   

18.
Data from the electronic medical record comprise numerous structured but uncoded ele-ments, which are not linked to standard terminologies. Reuse of such data for secondary research purposes has gained in importance recently. However, the identification of rele-vant data elements and the creation of database jobs for extraction, transformation and loading (ETL) are challenging: With current methods such as data warehousing, it is not feasible to efficiently maintain and reuse semantically complex data extraction and trans-formation routines. We present an ontology-supported approach to overcome this challenge by making use of abstraction: Instead of defining ETL procedures at the database level, we use ontologies to organize and describe the medical concepts of both the source system and the target system. Instead of using unique, specifically developed SQL statements or ETL jobs, we define declarative transformation rules within ontologies and illustrate how these constructs can then be used to automatically generate SQL code to perform the desired ETL procedures. This demonstrates how a suitable level of abstraction may not only aid the interpretation of clinical data, but can also foster the reutilization of methods for un-locking it.  相似文献   

19.
20.
高通量实验方法的发展导致大量基因组、转录组、代谢组等组学数据的出现,组学数据的整合为全面了解生物学系统提供了条件.但是,由于当前实验技术手段的限制,高通量组学数据大多存在系统偏差,数据类型和可靠程度也各不相同,这给组学数据的整合带来了困难.本文以转录组、蛋白质组和代谢组为重点,综述了近年来组学数据整合方面的研究进展,包括新的数据整合方法和分析平台.虽然现存的数据统计和网络分析的方法有助于发现不同组学数据之间的关联,但是生物学意义上的深层次的数据整合还有待于生物、数学、计算机等各种领域的全面发展.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号