首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Identifying latent structure in high-dimensional genomic data is essential for exploring biological processes. Here, we consider recovering gene co-expression networks from gene expression data, where each network encodes relationships between genes that are co-regulated by shared biological mechanisms. To do this, we develop a Bayesian statistical model for biclustering to infer subsets of co-regulated genes that covary in all of the samples or in only a subset of the samples. Our biclustering method, BicMix, allows overcomplete representations of the data, computational tractability, and joint modeling of unknown confounders and biological signals. Compared with related biclustering methods, BicMix recovers latent structure with higher precision across diverse simulation scenarios as compared to state-of-the-art biclustering methods. Further, we develop a principled method to recover context specific gene co-expression networks from the estimated sparse biclustering matrices. We apply BicMix to breast cancer gene expression data and to gene expression data from a cardiovascular study cohort, and we recover gene co-expression networks that are differential across ER+ and ER- samples and across male and female samples. We apply BicMix to the Genotype-Tissue Expression (GTEx) pilot data, and we find tissue specific gene networks. We validate these findings by using our tissue specific networks to identify trans-eQTLs specific to one of four primary tissues.  相似文献   

2.
3.
4.
5.
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.  相似文献   

6.
A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator''s organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction.We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study.  相似文献   

7.
Context-sensitive data integration and prediction of biological networks   总被引:4,自引:0,他引:4  
MOTIVATION: Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context. RESULTS: We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios. AVAILABILITY: A software implementation of our approach is available on request from the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at http://avis.princeton.edu/contextPIXIE/  相似文献   

8.
The majority of the heritability of coronary artery disease (CAD) remains unexplained, despite recent successes of genome-wide association studies (GWAS) in identifying novel susceptibility loci. Integrating functional genomic data from a variety of sources with a large-scale meta-analysis of CAD GWAS may facilitate the identification of novel biological processes and genes involved in CAD, as well as clarify the causal relationships of established processes. Towards this end, we integrated 14 GWAS from the CARDIoGRAM Consortium and two additional GWAS from the Ottawa Heart Institute (25,491 cases and 66,819 controls) with 1) genetics of gene expression studies of CAD-relevant tissues in humans, 2) metabolic and signaling pathways from public databases, and 3) data-driven, tissue-specific gene networks from a multitude of human and mouse experiments. We not only detected CAD-associated gene networks of lipid metabolism, coagulation, immunity, and additional networks with no clear functional annotation, but also revealed key driver genes for each CAD network based on the topology of the gene regulatory networks. In particular, we found a gene network involved in antigen processing to be strongly associated with CAD. The key driver genes of this network included glyoxalase I (GLO1) and peptidylprolyl isomerase I (PPIL1), which we verified as regulatory by siRNA experiments in human aortic endothelial cells. Our results suggest genetic influences on a diverse set of both known and novel biological processes that contribute to CAD risk. The key driver genes for these networks highlight potential novel targets for further mechanistic studies and therapeutic interventions.  相似文献   

9.
Immunogenic cell death (ICD) is one of the mechanisms regulating cell death, which activates adaptive immunity in immunocompetent hosts and is associated with tumor progression, prognosis and therapeutic response. Endometrial cancer (EC) is one of the most common malignancies of the female genital tract, and the potential role of immunogenic cell death-related genes (IRGs) in the tumor microenvironment (TME) remains unclear. We describe the variation of IRGs and assess the expression patterns in EC samples from The Cancer Genome Atlas and Gene Expression Omnibus cohorts. Based on the expression of 34 IRGs, we identified two different ICD-related clusters and subsequently differentially expressed genes between the two ICD-related clusters were used for the identification of two ICD gene clusters. We identified the clusters and found that alterations in the multilayer IRG were associated with patient prognosis and TME cell infiltration characteristics. On this basis, ICD score risk scores were calculated, and ICD signatures were constructed and validated for their predictive power in EC patients. To help clinicians better apply the ICD signature, an accurate nomogram was constructed. The low ICD risk group was characterized by high microsatellite instability, high tumor mutational load, high IPS score and stronger immune activation. Our comprehensive analysis of IRGs in EC patients suggested a potential role in the tumor immune interstitial microenvironment, clinicopathological features and prognosis. These findings may improve our understanding of the role of ICDs, and provide a new basis for assessing prognosis and developing more effective immunotherapeutic strategies in EC.  相似文献   

10.
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.  相似文献   

11.
New genes originate frequently across diverse taxa. Given that genetic networks are typically comprised of robust, co-evolved interactions, the emergence of new genes raises an intriguing question: how do new genes interact with pre-existing genes? Here, we show that a recently originated gene rapidly evolved new gene networks and impacted sex-biased gene expression in Drosophila. This 4–6 million-year-old factor, named Zeus for its role in male fecundity, originated through retroposition of a highly conserved housekeeping gene, Caf40. Zeus acquired male reproductive organ expression patterns and phenotypes. Comparative expression profiling of mutants and closely related species revealed that Zeus has recruited a new set of downstream genes, and shaped the evolution of gene expression in germline. Comparative ChIP-chip revealed that the genomic binding profile of Zeus diverged rapidly from Caf40. These data demonstrate, for the first time, how a new gene quickly evolved novel networks governing essential biological processes at the genomic level.  相似文献   

12.
Renal clear cell carcinoma (ccRCC) is the most common type of renal cell carcinoma, which has strong immunogenicity. A comprehensive study of the role of immune-related genes (IRGs) in ccRCC is of great significance in finding ccRCC treatment targets and improving patient prognosis. In this study, we comprehensively analyzed the expression of IRGs in ccRCC based on The Cancer Genome Atlas datasets. The mechanism of differentially expressed IRGs in ccRCC was analyzed by bioinformatics. In addition, Cox regression analysis was used to screen prognostic related IRGs from differentially expressed IRGs. We also identified a four IRGs signature consisting of four IRGs (CXCL2, SEMA3G, PDGFD, and UCN) through lasso regression and multivariate Cox regression analysis. Further analysis results showed that the four IRGs signature could effectively predict the prognosis of patients with ccRCC, and its predictive power is independent of other clinical factors. In addition, the correlation analysis of immune cell infiltration showed that this four IRGs signature could effectively reflect the level of immune cell infiltration of ccRCC. We also found that the expression of immune checkpoint genes CTLA-4, LAG3, and PD-1 in the high-risk group was higher than that in the low-risk group. Our research revealed the role of IRGs in ccRCC, and developed a four IRGs signature that could be used to evaluate the prognosis of patients with ccRCC, which will help to develop personalized treatment strategies for patients with ccRCC and improve their prognosis. In addition, these four IRGs may be effective therapeutic targets for ccRCC.  相似文献   

13.
Unlike other important Solanaceae crops such as tomato, potato, chili pepper, and tobacco, all of which originated in South America and are cultivated worldwide, eggplant (Solanum melongena L.) is indigenous to the Old World and in this respect it is phylogenetically unique. To broaden our knowledge of the genomic nature of solanaceous plants further, we dissected the eggplant genome and built a draft genome dataset with 33,873 scaffolds termed SME_r2.5.1 that covers 833.1 Mb, ca. 74% of the eggplant genome. Approximately 90% of the gene space was estimated to be covered by SME_r2.5.1 and 85,446 genes were predicted in the genome. Clustering analysis of the predicted genes of eggplant along with the genes of three other solanaceous plants as well as Arabidopsis thaliana revealed that, of the 35,000 clusters generated, 4,018 were exclusively composed of eggplant genes that would perhaps confer eggplant-specific traits. Between eggplant and tomato, 16,573 pairs of genes were deduced to be orthologous, and 9,489 eggplant scaffolds could be mapped onto the tomato genome. Furthermore, 56 conserved synteny blocks were identified between the two species. The detailed comparative analysis of the eggplant and tomato genomes will facilitate our understanding of the genomic architecture of solanaceous plants, which will contribute to cultivation and further utilization of these crops.  相似文献   

14.
Information Flow Analysis of Interactome Networks   总被引:1,自引:0,他引:1  
Recent studies of cellular networks have revealed modular organizations of genes and proteins. For example, in interactome networks, a module refers to a group of interacting proteins that form molecular complexes and/or biochemical pathways and together mediate a biological process. However, it is still poorly understood how biological information is transmitted between different modules. We have developed information flow analysis, a new computational approach that identifies proteins central to the transmission of biological information throughout the network. In the information flow analysis, we represent an interactome network as an electrical circuit, where interactions are modeled as resistors and proteins as interconnecting junctions. Construing the propagation of biological signals as flow of electrical current, our method calculates an information flow score for every protein. Unlike previous metrics of network centrality such as degree or betweenness that only consider topological features, our approach incorporates confidence scores of protein–protein interactions and automatically considers all possible paths in a network when evaluating the importance of each protein. We apply our method to the interactome networks of Saccharomyces cerevisiae and Caenorhabditis elegans. We find that the likelihood of observing lethality and pleiotropy when a protein is eliminated is positively correlated with the protein's information flow score. Even among proteins of low degree or low betweenness, high information scores serve as a strong predictor of loss-of-function lethality or pleiotropy. The correlation between information flow scores and phenotypes supports our hypothesis that the proteins of high information flow reside in central positions in interactome networks. We also show that the ranks of information flow scores are more consistent than that of betweenness when a large amount of noisy data is added to an interactome. Finally, we combine gene expression data with interaction data in C. elegans and construct an interactome network for muscle-specific genes. We find that genes that rank high in terms of information flow in the muscle interactome network but not in the entire network tend to play important roles in muscle function. This framework for studying tissue-specific networks by the information flow model can be applied to other tissues and other organisms as well.  相似文献   

15.
To identify key microRNAs (miRNAs) associated with hepatocellular carcinoma (HCC) using small RNA-seq data. Small RNA-seq data for two HCC samples and two normal samples were downloaded from NCBI Gene Expression Omnibus. MiRNAs were identified through database search. Differentially expressed miRNAs were screened out with t test and their target genes were retrieved. Functional enrichment analysis was performed to uncover their biological functions. Regulatory networks and core metabolic networks were also constructed to present the global patterns. In addition, new miRNAs and their target genes were predicted. A total of 59 differentially expressed miRNAs were obtained, 12 up-regulated and 47 down-regulated. A total of 3,306 target genes were retrieved for eight miRNAs. Pathway enrichment analysis for the target genes showed that “pathways in cancer” and “MAPK signaling pathway” were significantly over-represented. Functional enrichment analysis found that “biological regulation” and “macromolecule modification” were significantly related to the target genes. Two regulatory networks were constructed for up- and down-regulated differentially expressed miRNAs with information from Ingenuity Pathway Analysis database. Two metabolic networks were also established based upon “pathways in cancer” and “MAPK signaling pathway”, consisting of miRNAs, target genes, compounds and others genes. Moreover, a number of new miRNAs and relevant target genes were predicted. Our study discloses a number of miRNAs as well as genes which may be involved in the development of HCC and these findings are beneficial in guiding future researches.  相似文献   

16.
We have generated and made publicly available two very large networks of molecular interactions: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration.  相似文献   

17.
18.
19.
The Red Queen hypothesis proposes that there is an evolutionary arms race between host and pathogen. One possible example of such a phenomenon could be the recently discovered interaction between host defense proteins known as immunity-related GTPases (IRGs) and a family of rhoptry pseudokinases (ROP5) expressed by the protozoan parasite, Toxoplasma gondii. Mouse IRGs are encoded by an extensive and rapidly evolving family of over 20 genes. Similarly, the ROP5 family is highly polymorphic and consists of 4–10 genes, depending on the strain of Toxoplasma. IRGs are known to be avidly bound and functionally inactivated by ROP5 proteins, but the molecular basis of this interaction/inactivation has not previously been known. Here we show that ROP5 uses a highly polymorphic surface to bind adjacent to the nucleotide-binding domain of an IRG and that this produces a profound allosteric change in the IRG structure. This has two dramatic effects: 1) it prevents oligomerization of the IRG, and 2) it alters the orientation of two threonine residues that are targeted by the Toxoplasma Ser/Thr kinases, ROP17 and ROP18. ROP5s are highly specific in the IRGs that they will bind, and the fact that it is the most highly polymorphic surface of ROP5 that binds the IRG strongly supports the notion that these two protein families are co-evolving in a way predicted by the Red Queen hypothesis.  相似文献   

20.
Closing gaps in our current knowledge about biological pathways is a fundamental challenge. The development of novel computational methods along with high-throughput experimental data carries the promise to help in the challenge. We present an algorithm called MORPH (for module-guided ranking of candidate pathway genes) for revealing unknown genes in biological pathways. The method receives as input a set of known genes from the target pathway, a collection of expression profiles, and interaction and metabolic networks. Using machine learning techniques, MORPH selects the best combination of data and analysis method and outputs a ranking of candidate genes predicted to belong to the target pathway. We tested MORPH on 230 known pathways in Arabidopsis thaliana and 93 known pathways in tomato (Solanum lycopersicum) and obtained high-quality cross-validation results. In the photosynthesis light reactions, homogalacturonan biosynthesis, and chlorophyll biosynthetic pathways of Arabidopsis, genes ranked highly by MORPH were recently verified to be associated with these pathways. MORPH candidates ranked for the carotenoid pathway from Arabidopsis and tomato are derived from pathways that compete for common precursors or from pathways that are coregulated with or regulate the carotenoid biosynthetic pathway.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号