首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
BackgroundPhenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO.ResultsHPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA).ConclusionsHPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).  相似文献   

3.
There are many thousands of hereditary diseases in humans, each of which has a specific combination of phenotypic features, but computational analysis of phenotypic data has been hampered by lack of adequate computational data structures. Therefore, we have developed a Human Phenotype Ontology (HPO) with over 8000 terms representing individual phenotypic anomalies and have annotated all clinical entries in Online Mendelian Inheritance in Man with the terms of the HPO. We show that the HPO is able to capture phenotypic similarities between diseases in a useful and highly significant fashion.  相似文献   

4.
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.  相似文献   

5.
6.
Zhang S  Chang Z  Li Z  DuanMu H  Li Z  Li K  Liu Y  Qiu F  Xu Y 《Gene》2012,497(1):58-65
Phenotypic similarity is correlated with a number of measures of gene function, such as relatedness at the level of direct protein-protein interaction. The phenotypic effect of a deleted or mutated gene, which is one part of gene annotation, has caught broad attention. However, there have been few measures to study phenotypic similarity with the data from Human Phenotype Ontology (HPO) database, therefore more analogous measures should be developed and investigated. We used five semantic similarity-based measures (Jiang and Conrath, Lin, Schlicker, Yu and Wu) to calculate the human phenotypic similarity between genes (PSG) with data from HPO database, and evaluated their accuracy with information of protein-protein interaction, protein complex, protein family, gene function or DNA sequence. Compared with the gene pairs that were random selected, the results of these methods were statistically significant (all P<0.001). Furthermore, we assessed the performance of these five measures by receiver operating characteristic (ROC) curve analysis, and found that most of them performed better than the previous methods. This work had proved that these measures based on semantic similarity for calculation of PSG were effective for hierarchical structure data. Our study contributes to the development and optimization of novel algorithms of PSG calculation and provides more alternative methods to researchers as well as tools and directions for PSG study.  相似文献   

7.
The molecular complexity of genetic diseases requires novel approaches to break it down into coherent biological modules. For this purpose, many disease network models have been created and analyzed. We highlight two of them, “the human diseases networks” (HDN) and “the orphan disease networks” (ODN). However, in these models, each single node represents one disease or an ambiguous group of diseases. In these cases, the notion of diseases as unique entities reduces the usefulness of network-based methods. We hypothesize that using the clinical features (pathophenotypes) to define pathophenotypic connections between disease-causing genes improve our understanding of the molecular events originated by genetic disturbances. For this, we have built a pathophenotypic similarity gene network (PSGN) and compared it with the unipartite projections (based on gene-to-gene edges) similar to those used in previous network models (HDN and ODN). Unlike these disease network models, the PSGN uses semantic similarities. This pathophenotypic similarity has been calculated by comparing pathophenotypic annotations of genes (human abnormalities of HPO terms) in the “Human Phenotype Ontology”. The resulting network contains 1075 genes (nodes) and 26197 significant pathophenotypic similarities (edges). A global analysis of this network reveals: unnoticed pairs of genes showing significant pathophenotypic similarity, a biological meaningful re-arrangement of the pathological relationships between genes, correlations of biochemical interactions with higher similarity scores and functional biases in metabolic and essential genes toward the pathophenotypic specificity and the pleiotropy, respectively. Additionally, pathophenotypic similarities and metabolic interactions of genes associated with maple syrup urine disease (MSUD) have been used to merge into a coherent pathological module.Our results indicate that pathophenotypes contribute to identify underlying co-dependencies among disease-causing genes that are useful to describe disease modularity.  相似文献   

8.
Recently, many long non-coding RNAs (lncRNAs) have been identified and their biological function has been characterized; however, our understanding of their underlying molecular mechanisms related to disease is still limited. To overcome the limitation in experimentally identifying disease–lncRNA associations, computational methods have been proposed as a powerful tool to predict such associations. These methods are usually based on the similarities between diseases or lncRNAs since it was reported that similar diseases are associated with functionally similar lncRNAs. Therefore, prediction performance is highly dependent on how well the similarities can be captured. Previous studies have calculated the similarity between two diseases by mapping exactly each disease to a single Disease Ontology (DO) term, and then use a semantic similarity measure to calculate the similarity between them. However, the problem of this approach is that a disease can be described by more than one DO terms. Until now, there is no annotation database of DO terms for diseases except for genes. In contrast, Human Phenotype Ontology (HPO) is designed to fully annotate human disease phenotypes. Therefore, in this study, we constructed disease similarity networks/matrices using HPO instead of DO. Then, we used these networks/matrices as inputs of two representative machine learning-based and network-based ranking algorithms, that is, regularized least square and heterogeneous graph-based inference, respectively. The results showed that the prediction performance of the two algorithms on HPO-based is better than that on DO-based networks/matrices. In addition, our method can predict 11 novel cancer-associated lncRNAs, which are supported by literature evidence.  相似文献   

9.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes.  相似文献   

10.
Diagnosing mitochondrial disorders remains challenging. This is partly because the clinical phenotypes of patients overlap with those of other sporadic and inherited disorders. Although the widespread availability of genetic testing has increased the rate of diagnosis, the combination of phenotypic and genetic heterogeneity still makes it difficult to reach a timely molecular diagnosis with confidence. An objective, systematic method for describing the phenotypic spectra for each variant provides a potential solution to this problem. We curated the clinical phenotypes of 6688 published individuals with 89 pathogenic mitochondrial DNA (mtDNA) mutations, collating 26 348 human phenotype ontology (HPO) terms to establish the MitoPhen database. This enabled a hypothesis-free definition of mtDNA clinical syndromes, an overview of heteroplasmy-phenotype relationships, the identification of under-recognized phenotypes, and provides a publicly available reference dataset for objective clinical comparison with new patients using the HPO. Studying 77 patients with independently confirmed positive mtDNA diagnoses and 1083 confirmed rare disease cases with a non-mitochondrial nuclear genetic diagnosis, we show that HPO-based phenotype similarity scores can distinguish these two classes of rare disease patients with a false discovery rate <10% at a sensitivity of 80%. Enriching the MitoPhen database with more patients will improve predictions for increasingly rare variants.  相似文献   

11.
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.  相似文献   

12.
Kanbe T 《Mycopathologia》2008,166(5-6):307-317
Dermatophytosis is one of the most common infectious diseases in the world and can be caused by several dermatophyte species. These species are closely related in genetic structure in spite of different phenotypic and ecological features. The morphological similarity, variability, and polymorphism of dermatophytes have meant that species identification for dermatophytes is time consuming and requires a significant degree of knowledge and technological expertise. Molecular biology-based techniques have solved problems concerning the morphology-based identification of dermatophytes and have improved our knowledge on the epidemiology of dermatophytosis. Further development of molecular diagnosis of dermatophytosis requires the investigation of additional molecular markers for diagnostic tools targeting multiple loci as well as the improvement of techniques.  相似文献   

13.
《Genomics》2020,112(2):1335-1342
Circular RNAs (circRNAs) are a new kind of endogenous non-coding RNAs, which have been discovered continuously. More and more studies have shown that circRNAs are related to the occurrence and development of human diseases. Identification of circRNAs associated with diseases can contribute to understand the pathogenesis, diagnosis and treatment of diseases. However, experimental methods of circRNA prediction remain expensive and time-consuming. Therefore, it is urgent to propose novel computational methods for the prediction of circRNA-disease associations. In this study, we develop a computational method called LLCDC that integrates the known circRNA-disease associations, circRNA semantic similarity network, disease semantic similarity network, reconstructed circRNA similarity network, and reconstructed disease similarity network to predict circRNAs related to human diseases. Specifically, the reconstructed similarity networks are obtained by using Locality-Constrained Linear Coding (LLC) on the known association matrix, cosine similarities of circRNAs and diseases. Then, the label propagation method is applied to the similarity networks, and four relevant score matrices are respectively obtained. Finally, we use 5-fold cross validation (5-fold CV) to evaluate the performance of LLCDC, and the AUC value of the method is 0.9177, indicating that our method performs better than the other three methods. In addition, case studies on gastric cancer, breast cancer and papillary thyroid carcinoma further verify the reliability of our method in predicting disease-associated circRNAs.  相似文献   

14.
已知一种药物可用于治疗某疾病,则该药物可能对与该疾病具有相似表型的其他疾病有疗效。因此,大规模地计算疾病表型相似性可辅助发现的疾病新的治疗方法。我们从OMIM下载了3742种疾病的表型信息,从Mesh词库下载13721个关联解剖学和疾病症状的注释词。我们将以上的Mesh词逐一在3742种疾病的表型信息文本中搜索,得到每种疾病涉及的Mesh词汇列表,进而基于语义分析的方法系统地计算了疾病表型的两两相似性矩阵。我们发现疾病关联生物通路最多的有肿瘤生物通路,胰岛素信号通路,肥大心肌病通路和细胞粘附通路等。随疾病对表型相似度的增加,其更涉及相同KEGG生物通路的概率亦增加,证明了本文方法的可靠性。疾病表型相似性可作为疾病在基因水平相似性的补充,有望为药物发现研究提供一条新途径。  相似文献   

15.
One of the obstacles hindering a better understanding of cancer is its heterogeneity. However, computational approaches to model cancer heterogeneity have lagged behind. To bridge this gap, we have developed a new probabilistic approach that models individual cancer cases as mixtures of subtypes. Our approach can be seen as a meta-model that summarizes the results of a large number of alternative models. It does not assume predefined subtypes nor does it assume that such subtypes have to be sharply defined. Instead given a measure of phenotypic similarity between patients and a list of potential explanatory features, such as mutations, copy number variation, microRNA levels, etc., it explains phenotypic similarities with the help of these features. We applied our approach to Glioblastoma Multiforme (GBM). The resulting model Prob_GBM, not only correctly inferred known relationships but also identified new properties underlining phenotypic similarities. The proposed probabilistic framework can be applied to model relations between similarity of gene expression and a broad spectrum of potential genetic causes.  相似文献   

16.
Zhao J  Yang TH  Huang Y  Holme P 《PloS one》2011,6(9):e24306
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions--that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.  相似文献   

17.
Biomedical research is increasingly collaborative, and successful collaborations often produce high impact work. Computational approaches can be developed for automatically predicting biomedical research collaborations. Previous works of collaboration prediction mainly explored the topological structures of research collaboration networks, leaving out rich semantic information from the publications themselves. In this paper, we propose supervised machine learning approaches to predict research collaborations in the biomedical field. We explored both the semantic features extracted from author research interest profile and the author network topological features. We found that the most informative semantic features for author collaborations are related to research interest, including similarity of out-citing citations, similarity of abstracts. Of the four supervised machine learning models (naïve Bayes, naïve Bayes multinomial, SVMs, and logistic regression), the best performing model is logistic regression with an ROC ranging from 0.766 to 0.980 on different datasets. To our knowledge we are the first to study in depth how research interest and productivities can be used for collaboration prediction. Our approach is computationally efficient, scalable and yet simple to implement. The datasets of this study are available at https://github.com/qingzhanggithub/medline-collaboration-datasets.  相似文献   

18.
BACKGROUND: Infantile cartilaginous hamartoma of the rib is a rare condition occurring in newborn infants, with an incidence of 1 in 3,000 (0.03%) among primary bone tumor cases. Reports of this condition so far have presented the clinical, radiologic and histopathologic features. To the best of our knowledge, reports of the cytopathologic features have not been documented. In the present case report, clinical, radiologic and cytopathologic features and differential diagnosis are enumerated. CASE: A 1-month-old, male infant presented with a chest wall mass with a clinical diagnosis of osteochondroma. On fine needle aspiration cytology, a diagnosis of infantile cartilaginous hamartoma of the rib was suggested; it was supplemented by the clinical history and radiologic findings. CONCLUSION: Although rare, this condition ought to be kept in mind while dealing with infantile chest wall masses to avoid an erroneous diagnosis of malignancy, owing to its ominous cytopathologic features.  相似文献   

19.
Wang Z  Zhang XC  Le MH  Xu D  Stacey G  Cheng J 《PloS one》2011,6(3):e17906
Protein Domain Co-occurrence Network (DCN) is a biological network that has not been fully-studied. We analyzed the properties of the DCNs of H. sapiens, S. cerevisiae, C. elegans, D. melanogaster, and 15 plant genomes. These DCNs have the hallmark features of scale-free networks. We investigated the possibility of using DCNs to predict protein and domain functions. Based on our experiment conducted on 66 randomly selected proteins, the best of top 3 predictions made by our DCN-based aggregated neighbor-counting method achieved a semantic similarity score of 0.81 to the actual Gene Ontology terms of the proteins. Moreover, the top 3 predictions using neighbor-counting, χ(2), and a SVM-based method achieved an accuracy of 66%, 59%, and 61%, respectively, when used to predict specific Gene Ontology terms of human target domains. These predictions on average had a semantic similarity score of 0.82, 0.80, and 0.79 to the actual Gene Ontology terms, respectively. We also used DCNs to predict whether a domain is an enzyme domain, and our SVM-based and neighbor-inference method correctly classified 79% and 77% of the target domains, respectively. When using DCNs to classify a target domain into one of the six enzyme classes, we found that, as long as there is one EC number available in the neighboring domains, our SVM-based and neighboring-counting method correctly classified 92.4% and 91.9% of the target domains, respectively. Furthermore, we benchmarked the performance of using DCNs to infer species phylogenies on six different combinations of 398 single-chromosome prokaryotic genomes. The phylogenetic tree of 54 prokaryotic taxa generated by our DCNs-alignment-based method achieved a 93.45% similarity score compared to the Bergey's taxonomy. In summary, our studies show that genome-wide DCNs contain rich information that can be effectively used to decipher protein function and reveal the evolutionary relationship among species.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号