期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Pathway networks generated from human disease phenome

Ann G. Cirincione Kaylyn L. Clark Maricel G. Kann 《BMC medical genomics》2018,11(3):75

Background

Understanding the effect of human genetic variations on disease can provide insight into phenotype-genotype relationships, and has great potential for improving the effectiveness of personalized medicine. While some genetic markers linked to disease susceptibility have been identified, a large number are still unknown. In this paper, we propose a pathway-based approach to extend disease-variant associations and find new molecular connections between genetic mutations and diseases.

Methods

We used a compilation of over 80,000 human genetic variants with known disease associations from databases including the Online Mendelian Inheritance in Man (OMIM), Clinical Variance database (ClinVar), Universal Protein Resource (UniProt), and Human Gene Mutation Database (HGMD). Furthermore, we used the Unified Medical Language System (UMLS) to normalize variant phenotype terminologies, mapping 87% of unique genetic variants to phenotypic disorder concepts. Lastly, variants were grouped by UMLS Medical Subject Heading (MeSH) identifiers to determine pathway enrichment in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Results

By linking KEGG pathways through underlying variant associations, we elucidated connections between the human genetic variant-based disease phenome and metabolic pathways, finding novel disease connections not otherwise detected through gene-level analysis. When looking at broader disease categories, our network analysis showed that large complex diseases, such as cancers, are highly linked by their common pathways. In addition, we found Cardiovascular Diseases and Skin and Connective Tissue Diseases to have the highest number of common pathways, among 35 significant main disease category (MeSH) pairings.

Conclusions

This study constitutes an important contribution to extending disease-variant connections and new molecular links between diseases. Novel disease connections were made by disease-pathway associations not otherwise detected through single-gene analysis. For instance, we found that mutations in different genes associated to Noonan Syndrome and Essential Hypertension share a common pathway. This analysis also provides the foundation to build novel disease-drug networks through their underlying common metabolic pathways, thus enabling new diagnostic and therapeutic interventions.

相似文献

2.

Nanodiscs and SILAC-based mass spectrometry to identify a membrane protein interactome

Zhang XX Chan CS Bao H Fang Y Foster LJ Duong F 《Journal of proteome research》2012,11(2):1454-1459

Integral membrane proteins are challenging to work with biochemically given their insoluble nature; the nanodisc circumvents the difficulty by stabilizing them in small patches of lipid bilayer. Here, we show that nanodiscs combined with SILAC-based quantitative proteomics can be used to identify the soluble interacting partners of virtually any membrane protein. As a proof of principle, we applied the method to the bacterial SecYEG protein-conducting channel, the maltose transporter MalFGK(2) and the membrane integrase YidC. In contrast to the detergent micelles, which tend to destabilize interactions, the nanodisc was able to capture out of a complex whole cell extract the proteins SecA, Syd, and MalE with a high degree of confidence and specificity. The method was sensitive enough to isolate these interactors as a function of the lipid composition in the disc and the culture conditions. In agreement with a previous photo-cross linking analysis, YidC did not show any high-affinity interactions with cytosolic or periplasmic proteins. These three examples illustrate the utility of nanoscale lipid bilayers to identify the soluble peripheral partners of proteins intergrated in the lipid bilayer. 相似文献

3.

基于已知疾病基因构建共表达网络识别胃癌进展及预后相关非编码基因

下载免费PDF全文

马清珠季昆王焱《生物信息学》2023,21(3):226-232

本研究旨在总结整理已有胃癌研究的基础上,进一步挖掘出非编码基因在胃癌的进展及预后中起的关键作用。通过胃癌患者编码及非编码基因的表达数据,结合已知胃癌致病基因,进行编码基因与非编码基因的共表达计算,识别出由miRNA介导的并且与已知胃癌基因互作的lncRNA,挖掘出三者(mRNA-miRNA-lncRNA)相互作用的模块,进而对模块进行筛选,并对疾病相关的显著模块的基因进行生存分析。除已知的胃癌相关基因外,研究也使用了差异表达的胃癌基因,这些基因显著的富集在细胞增殖、细胞粘附、肌肉收缩、血管重塑、细胞分裂、染色体分离等生物过程,这些生物过程都与胃的基础功能及胃癌发生发展密切相关。分值最高的三元组模块内核心基因BGN在胃癌患者中显著高表达,而且和胃癌患者的预后显著相关;同时也发现该模块内的miRNA has-miRNA-153-5p和has-miRNA-5001-5p均为已证实的胃癌相关基因;模块内的mRNA和miRNA的表达异常可能是由于与他们显著相关的lncRNA的表达异常导致的。本研究为胃癌已知致病基因的表达异常研究找到了新突破口,潜在的胃癌相关的非编码基因的发现为胃癌预防与治疗提供了新的靶点,为未来的临床应用提供了依据。相似文献

4.

Algorithm to find gene expression profiles of deregulation and identify families of disease-altered genes

Prieto C Rivas MJ Sánchez JM López-Fidalgo J De Las Rivas J 《Bioinformatics (Oxford, England)》2006,22(9):1103-1110

MOTIVATION: Alteration of gene expression often results in up- or down-regulated genes and the most common analysis strategies look for such differentially expressed genes. However, molecular disease mechanisms typically constitute abnormalities in the regulation of genes producing strong alterations in the expression levels. The search for such deregulation states in the genomic expression profiles will help to identify disease-altered genes better. RESULTS: We have developed an algorithm that searches for the genes which present a significant alteration in the variability of their expression profiles, by comparing an altered state with a control state. The algorithm provides groups of genes and assigns a statistical measure of significance to each group of genes selected. The method also includes a prefilter tool to select genes with a threshold of differential expression that can be set by the user ad casum. The method is evaluated using an experimental set of microarrays of human control and cancer samples from patients with acute promyelocytic leukemia. 相似文献

5.

Mutations of 60 known causative genes in 157 families with retinitis pigmentosa based on exome sequencing

Yan Xu Liping Guan Tao Shen Jianguo Zhang Xueshan Xiao Hui Jiang Shiqiang Li Jianhua Yang Xiaoyun Jia Ye Yin Xiangming Guo Jun Wang Qingjiong Zhang 《Human genetics》2014,133(10):1255-1271

Retinitis pigmentosa (RP) is the most common and highly heterogeneous form of hereditary retinal degeneration. This study was to identify mutations in the 60 genes that were known to be associated with RP in 157 unrelated Chinese families with RP. Genomic DNA from probands was initially analyzed by whole exome sequencing. Sanger sequencing was used to confirm potential candidate variants affecting the encoded residues in the 60 genes, including heterozygous variants from genes that are related to autosomal dominant RP, homozygous or compound heterozygous variants from genes that are related to autosomal recessive RP, and hemizygous variants from genes that are related to X-linked RP. Synonymous and intronic variants were also examined to confirm whether they could affect splicing. A total of 244 candidate variants were detected by exome sequencing. Sanger sequencing confirmed 240 variants out of the 244 candidates. Informatics and segregation analyses suggested 110 potential pathogenic mutations in 28 out of the 60 genes involving 79 of the 157 (50 %) families, including 31 (39 %, 31/79) families with heterozygous mutations in autosomal dominant genes, 37 (47 %, 37/79) families with homozygous (9) or compound heterozygous (28) mutations in autosomal recessive genes, and 11 (14 %, 11/79) families with hemizygous mutations in X-linked genes. Of the 110 identified variants, 74 (67 %) were novel. The genetic defects in approximately half of the 157 studies families were detected by exome sequencing. A comprehensive analysis of the 60 known genes not only expanded the mutation spectrum and frequency of the 60 genes in Chinese patients with RP, but also provided an overview of the molecular etiology of RP in Chinese patients. The analysis of the known genes also supplied the foundation and clues for discovering novel causative RP genes. 相似文献

6.

Prediction of missing common genes for disease pairs using network based module separation on incomplete human interactome

Pakeeza Akram Li Liao 《BMC genomics》2017,18(10):902

Background

Identification of common genes associated with comorbid diseases can be critical in understanding their pathobiological mechanism. This work presents a novel method to predict missing common genes associated with a disease pair. Searching for missing common genes is formulated as an optimization problem to minimize network based module separation from two subgraphs produced by mapping genes associated with disease onto the interactome.

Results

Using cross validation on more than 600 disease pairs, our method achieves significantly higher average receiver operating characteristic ROC Score of 0.95 compared to a baseline ROC score 0.60 using randomized data.

Conclusion

Missing common genes prediction is aimed to complete gene set associated with comorbid disease for better understanding of biological intervention. It will also be useful for gene targeted therapeutics related to comorbid diseases. This method can be further considered for prediction of missing edges to complete the subgraph associated with disease pair.

相似文献

7.

Using human microarrays to identify differentially expressed genes associated with increased steroidogenesis in boars

Stewart JD Lou Y Squires EJ Coussens PM 《Animal biotechnology》2005,16(2):139-151

Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs. 相似文献

8.

Whole transcriptome analyis of human lung tissue to identify COPD-associated genes

《Genomics》2020,112(5):3135-3141

相似文献

9.

Genomic convergence and network analysis approach to identify candidate genes in Alzheimer's disease

Puneet Talwar Yumnam Silla Sandeep Grover Meenal Gupta Rachna Agarwal Suman Kushwaha Ritushree Kukreti 《BMC genomics》2014,15(1)

Background

Alzheimer’s disease (AD) is one of the leading genetically complex and heterogeneous disorder that is influenced by both genetic and environmental factors. The underlying risk factors remain largely unclear for this heterogeneous disorder. In recent years, high throughput methodologies, such as genome-wide linkage analysis (GWL), genome-wide association (GWA) studies, and genome-wide expression profiling (GWE), have led to the identification of several candidate genes associated with AD. However, due to lack of consistency within their findings, an integrative approach is warranted. Here, we have designed a rank based gene prioritization approach involving convergent analysis of multi-dimensional data and protein-protein interaction (PPI) network modelling.

Results

Our approach employs integration of three different AD datasets- GWL,GWA and GWE to identify overlapping candidate genes ranked using a novel cumulative rank score (S_R) based method followed by prioritization using clusters derived from PPI network. S_R for each gene is calculated by addition of rank assigned to individual gene based on either p value or score in three datasets. This analysis yielded 108 plausible AD genes. Network modelling by creating PPI using proteins encoded by these genes and their direct interactors resulted in a layered network of 640 proteins. Clustering of these proteins further helped us in identifying 6 significant clusters with 7 proteins (EGFR, ACTB, CDC2, IRAK1, APOE, ABCA1 and AMPH) forming the central hub nodes. Functional annotation of 108 genes revealed their role in several biological activities such as neurogenesis, regulation of MAP kinase activity, response to calcium ion, endocytosis paralleling the AD specific attributes. Finally, 3 potential biochemical biomarkers were found from the overlap of 108 AD proteins with proteins from CSF and plasma proteome. EGFR and ACTB were found to be the two most significant AD risk genes.

Conclusions

With the assumption that common genetic signals obtained from different methodological platforms might serve as robust AD risk markers than candidates identified using single dimension approach, here we demonstrated an integrated genomic convergence approach for disease candidate gene prioritization from heterogeneous data sources linked to AD.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2164-15-199) contains supplementary material, which is available to authorized users. 相似文献

10.

Combining genome-wide association mapping and transcriptional networks to identify novel genes controlling glucosinolates in Arabidopsis thaliana

Chan EK Rowe HC Corwin JA Joseph B Kliebenstein DJ 《PLoS biology》2011,9(8):e1001125

Background

Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.

Methodology/Principal Findings

To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ∼230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.

Conclusions/Significance

Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse. 相似文献

11.

Targeted genomic capture and massively parallel sequencing to identify genes for hereditary hearing loss in middle eastern families

Brownstein Z Friedman LM Shahin H Oron-Karni V Kol N Abu Rayyan A Parzefall T Lev D Shalev S Frydman M Davidov B Shohat M Rahile M Lieberman S Levy-Lahad E Lee MK Shomron N King MC Walsh T Kanaan M Avraham KB 《Genome biology》2011,12(9):R89-11

Background

Identification of genes responsible for medically important traits is a major challenge in human genetics. Due to the genetic heterogeneity of hearing loss, targeted DNA capture and massively parallel sequencing are ideal tools to address this challenge. Our subjects for genome analysis are Israeli Jewish and Palestinian Arab families with hearing loss that varies in mode of inheritance and severity.

Results

A custom 1.46 MB design of cRNA oligonucleotides was constructed containing 246 genes responsible for either human or mouse deafness. Paired-end libraries were prepared from 11 probands and bar-coded multiplexed samples were sequenced to high depth of coverage. Rare single base pair and indel variants were identified by filtering sequence reads against polymorphisms in dbSNP132 and the 1000 Genomes Project. We identified deleterious mutations in CDH23, MYO15A, TECTA, TMC1, and WFS1. Critical mutations of the probands co-segregated with hearing loss. Screening of additional families in a relevant population was performed. TMC1 p.S647P proved to be a founder allele, contributing to 34% of genetic hearing loss in the Moroccan Jewish population.

Conclusions

Critical mutations were identified in 6 of the 11 original probands and their families, leading to the identification of causative alleles in 20 additional probands and their families. The integration of genomic analysis into early clinical diagnosis of hearing loss will enable prediction of related phenotypes and enhance rehabilitation. Characterization of the proteins encoded by these genes will enable an understanding of the biological mechanisms involved in hearing loss. 相似文献

12.

An evolutionary genomic approach to identify genes involved in human birth timing

Plunkett J Doniger S Orabona G Morgan T Haataja R Hallman M Puttonen H Menon R Kuczynski E Norwitz E Snegovskikh V Palotie A Peltonen L Fellman V DeFranco EA Chaudhari BP McGregor TL McElroy JJ Oetjens MT Teramo K Borecki I Fay J Muglia L 《PLoS genetics》2011,7(4):e1001365

Coordination of fetal maturation with birth timing is essential for mammalian reproduction. In humans, preterm birth is a disorder of profound global health significance. The signals initiating parturition in humans have remained elusive, due to divergence in physiological mechanisms between humans and model organisms typically studied. Because of relatively large human head size and narrow birth canal cross-sectional area compared to other primates, we hypothesized that genes involved in parturition would display accelerated evolution along the human and/or higher primate phylogenetic lineages to decrease the length of gestation and promote delivery of a smaller fetus that transits the birth canal more readily. Further, we tested whether current variation in such accelerated genes contributes to preterm birth risk. Evidence from allometric scaling of gestational age suggests human gestation has been shortened relative to other primates. Consistent with our hypothesis, many genes involved in reproduction show human acceleration in their coding or adjacent noncoding regions. We screened >8,400 SNPs in 150 human accelerated genes in 165 Finnish preterm and 163 control mothers for association with preterm birth. In this cohort, the most significant association was in FSHR, and 8 of the 10 most significant SNPs were in this gene. Further evidence for association of a linkage disequilibrium block of SNPs in FSHR, rs11686474, rs11680730, rs12473870, and rs1247381 was found in African Americans. By considering human acceleration, we identified a novel gene that may be associated with preterm birth, FSHR. We anticipate other human accelerated genes will similarly be associated with preterm birth risk and elucidate essential pathways for human parturition. 相似文献

13.

Interactome networks and human disease

Vidal M Cusick ME Barabási AL 《Cell》2011,144(6):986-998

Complex biological systems and cellular networks may underlie most genotype to phenotype relationships. Here, we review basic concepts in network biology, discussing different types of interactome networks and the insights that can come from analyzing them. We elaborate on why interactome networks are important to consider in biology, how they can be mapped and integrated with each other, what global properties are starting to emerge from interactome network models, and how these properties may relate to human disease. 相似文献

14.

Combining protein-protein interactions information with support vector machine to identify chronic obstructive pulmonary disease related genes

Lin Hua Ping Zhou 《Molecular Biology》2014,48(2):287-296

Chronic obstructive pulmonary disease (COPD) is a complex human disease with a high mortality rate. So far, the studies of COPD have not been well organized despite the well-documented role of cigarette smoking in the genesis of COPD. In the recent years, microarray analyses have helped to identify some potential disease related genes. However, the low reproducibility of many published gene signatures has been criticized. It therefore suggested that incorporation of network or pathway information into prognostic biomarker discovery might improve the prediction performance. In this analysis, we combined protein-protein interactions (PPI) information with the support vector machine (SVM) method to identify potential COPD-related genes that would allow one to distinguish accurately severe emphysema from non-/mildly emphysematous lung tissue. We identified 8 COPD-related feature genes. When compared with another SVM method which did not use the prior PPI information, the prediction accuracy was significantly enhanced (AUC was increased from 0.513 to 0.909). On the base of results obtained one can suppose that incorporating network of prior knowledge into gene selection methods significantly improves classification accuracy. Consequently, the gene expression profiles from human emphysematous lung tissue may provide insight into the pathogenesis, and a good classification prediction algorithm based on prior biological knowledge can further strengthen this performance. 相似文献

15.

Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits

M. E. Adriaens C. R. Bezzina 《Biophysical reviews》2018,10(4):1053-1060

相似文献

16.

Forkhead genes and human disease

Erickson RP 《Journal of applied genetics》2001,42(2):211-221

相似文献

17.

Combining phylogenetic data with co-regulated genes to identify regulatory motifs 总被引：17，自引：0，他引：17

Wang T Stormo GD 《Bioinformatics (Oxford, England)》2003,19(18):2369-2380

MOTIVATION: Discovery of regulatory motifs in unaligned DNA sequences remains a fundamental problem in computational biology. Two categories of algorithms have been developed to identify common motifs from a set of DNA sequences. The first can be called a 'multiple genes, single species' approach. It proposes that a degenerate motif is embedded in some or all of the otherwise unrelated input sequences and tries to describe a consensus motif and identify its occurrences. It is often used for co-regulated genes identified through experimental approaches. The second approach can be called 'single gene, multiple species'. It requires orthologous input sequences and tries to identify unusually well conserved regions by phylogenetic footprinting. Both approaches perform well, but each has some limitations. It is tempting to combine the knowledge of co-regulation among different genes and conservation among orthologous genes to improve our ability to identify motifs. RESULTS: Based on the Consensus algorithm previously established by our group, we introduce a new algorithm called PhyloCon (Phylogenetic Consensus) that takes into account both conservation among orthologous genes and co-regulation of genes within a species. This algorithm first aligns conserved regions of orthologous sequences into multiple sequence alignments, or profiles, then compares profiles representing non-orthologous sequences. Motifs emerge as common regions in these profiles. Here we present a novel statistic to compare profiles of DNA sequences and a greedy approach to search for common subprofiles. We demonstrate that PhyloCon performs well on both synthetic and biological data. AVAILABILITY: Software available upon request from the authors. http://ural.wustl.edu/softwares.html 相似文献

18.

Integrative strategies to identify candidate genes in rodent models of human alcoholism.

Julie A Treadwell 《Génome》2006,49(1):1-7

The search for genes underlying alcohol-related behaviours in rodent models of human alcoholism has been ongoing for many years with only limited success. Recently, new strategies that integrate several of the traditional approaches have provided new insights into the molecular mechanisms underlying ethanol's actions in the brain. We have used alcohol-preferring C57BL/6J (B6) and alcohol-avoiding DBA/2J (D2) genetic strains of mice in an integrative strategy combining high-throughput gene expression screening, genetic segregation analysis, and mapping to previously published quantitative trait loci to uncover candidate genes for the ethanol-preference phenotype. In our study, 2 genes, retinaldehyde binding protein 1 (Rlbp1) and syntaxin 12 (Stx12), were found to be strong candidates for ethanol preference. Such experimental approaches have the power and the potential to greatly speed up the laborious process of identifying candidate genes for the animal models of human alcoholism. 相似文献

19.

Genetic approaches to identify disease genes for birth defects with cleft lip/palate as a model 总被引：3，自引：0，他引：3

Lidral AC Murray JC 《Birth defects research. Part A, Clinical and molecular teratology》2004,70(12):893-901

BACKGROUND: Understanding the etiology of birth defects is an important step toward developing improved treatment and preventive strategies. Most birth defects have an underlying genetic basis, ranging from single genes playing dominant or recessive roles in Mendelian disorders to a mixture of contributions from multiple genes and environmental triggers in complex traits. The purpose of this article is to provide an overview of genetic approaches to identifying disease genes for genetically complex birth defects. METHODS: A review of the literature describing successes and limitations for identifying disease genes for complex traits was conducted. RESULTS: Cleft lip and cleft palate are common congenital anomalies with significant medical, psychological, social, and economic ramifications. The Online Mendelian Inheritance in Man catalog (OMIM; http://www3.ncbi.nlm.nih.gov/Omim) lists more than 400 single-gene causes of clefts of the lip and/or palate. Genetic causes of clefting also include chromosomal rearrangements, genetic susceptibility to teratogenic exposures, and complex genetic contributions of multiple genes. CONCLUSIONS: Genetic causes of birth defects can be identified using an increasingly powerful combination of careful sample collection, molecular analytic methods, and statistical evaluations. We will describe a range of approaches to search for genetic factors of birth defects and use our own work with cleft lip and palate as a model. 相似文献

20.

The identification of similarities between biological networks: application to the metabolome and interactome

Cootes AP Muggleton SH Sternberg MJ 《Journal of molecular biology》2007,369(4):1126-1139

The increasing interest in systems biology has resulted in extensive experimental data describing networks of interactions (or associations) between molecules in metabolism, protein-protein interactions and gene regulation. Comparative analysis of these networks is central to understanding biological systems. We report a novel method (PHUNKEE: Pairing subgrapHs Using NetworK Environment Equivalence) by which similar subgraphs in a pair of networks can be identified. Like other methods, PHUNKEE explicitly considers the graphical form of the data and allows for gaps. However, it is novel in that it includes information about the context of the subgraph within the adjacent network. We also explore a new approach to quantifying the statistical significance of matching subgraphs. We report similar subgraphs in metabolic pathways and in protein-protein interaction networks. The most similar metabolic subgraphs were generally found to occur in processes central to all life, such as purine, pyrimidine and amino acid metabolism. The most similar pairs of subgraphs found in the protein-protein interaction networks of Drosophila melanogaster and Saccharomyces cerevisiae also include central processes such as cell division but, interestingly, also include protein sub-networks involved in pre-mRNA processing. The inclusion of network context information in the comparison of protein interaction networks increased the number of similar subgraphs found consisting of proteins involved in the same functional process. This could have implications for the prediction of protein function. 相似文献