首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background

Genetic and genomic data analyses are outputting large sets of genes. Functional comparison of these gene sets is a key part of the analysis, as it identifies their shared functions, and the functions that distinguish each set. The Gene Ontology (GO) initiative provides a unified reference for analyzing the genes molecular functions, biological processes and cellular components. Numerous semantic similarity measures have been developed to systematically quantify the weight of the GO terms shared by two genes. We studied how gene set comparisons can be improved by considering gene set particularity in addition to gene set similarity.

Results

We propose a new approach to compute gene set particularities based on the information conveyed by GO terms. A GO term informativeness can be computed using either its information content based on the term frequency in a corpus, or a function of the term''s distance to the root. We defined the semantic particularity of a set of GO terms Sg1 compared to another set of GO terms Sg2. We combined our particularity measure with a similarity measure to compare gene sets. We demonstrated that the combination of semantic similarity and semantic particularity measures was able to identify genes with particular functions from among similar genes. This differentiation was not recognized using only a semantic similarity measure.

Conclusion

Semantic particularity should be used in conjunction with semantic similarity to perform functional analysis of GO-annotated gene sets. The principle is generalizable to other ontologies.  相似文献   

2.
3.
Chromosomal abnormalities provide clinical utility in the diagnosis and treatment of hematologic malignancies, and may be predictive of malignant transformation in individuals without apparent clinical presentation of a hematologic cancer. In an effort to confirm previous reports of an association between clonal mosaicism and incident hematologic cancer, we applied the anomDetectBAF algorithm to call chromosomal anomalies in genotype data from previously conducted Genome Wide Association Studies (GWAS). The genotypes were initially collected from DNA derived from peripheral blood of 12,176 participants in the Group Health electronic Medical Records and Genomics study (eMERGE) and the Women’s Health Initiative (WHI). We detected clonal mosaicism in 169 individuals (1.4%) and large clonal mosaic events (>2 mb) in 117 (1.0%) individuals. Though only 9.5% of clonal mosaic carriers had an incident diagnosis of hematologic cancer (multiple myeloma, myelodysplastic syndrome, lymphoma, or leukemia), the carriers had a 5.5-fold increased risk (95% CI: 3.3–9.3; p-value = 7.5×10−11) of developing these cancers subsequently. Carriers of large mosaic anomalies showed particularly pronounced risk of subsequent leukemia (HR = 19.2, 95% CI: 8.9–41.6; p-value = 7.3×10−14). Thus we independently confirm the association between detectable clonal mosaicism and hematologic cancer found previously in two recent publications.  相似文献   

4.
MOTIVATION: The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes. RESULTS: We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes.  相似文献   

5.

Background

Communalities between large sets of genes obtained from high-throughput experiments are often identified by searching for enrichments of genes with the same Gene Ontology (GO) annotations. The GO analysis tools used for these enrichment analyses assume that GO terms are independent and the semantic distances between all parent–child terms are identical, which is not true in a biological sense. In addition these tools output lists of often redundant or too specific GO terms, which are difficult to interpret in the context of the biological question investigated by the user. Therefore, there is a demand for a robust and reliable method for gene categorization and enrichment analysis.

Results

We have developed Categorizer, a tool that classifies genes into user-defined groups (categories) and calculates p-values for the enrichment of the categories. Categorizer identifies the biologically best-fit category for each gene by taking advantage of a specialized semantic similarity measure for GO terms. We demonstrate that Categorizer provides improved categorization and enrichment results of genetic modifiers of Huntington’s disease compared to a classical GO Slim-based approach or categorizations using other semantic similarity measures.

Conclusion

Categorizer enables more accurate categorizations of genes than currently available methods. This new tool will help experimental and computational biologists analyzing genomic and proteomic data according to their specific needs in a more reliable manner.  相似文献   

6.
《Reproductive biology》2020,20(2):259-263
Klinefelter syndrome (KS) is the most common chromosomal syndrome, causing infertility in men and leading to non-obstructive azoospermia. Previous studies on mosaicism have shown contradictory results on its correlation with both serum hormone levels and the presence of spermatozoa in the ejaculate of KS, KS-like, and non-KS-like infertile patients. So, the present study was designed to detect low-grade mosaicism in the peripheral blood lymphocytes and buccal mucosa cells of 14 KS and 8 KS-like patients by using fluorescence in situ hybridization (FISH) and to investigate its correlation with luteinizing hormone (LH), follicle-stimulating hormone (FSH), and testosterone (T) levels, testicular volume, and semen analysis compared with 10 normal healthy fertile men. Our results indicated that mosaicism was only found in 42.9 % of the KS patients and completely absent in all KS-like patients. Moreover, mosaicism has led to complete azoospermia and non-significant differences in both hormone levels and testicular volume between mosaic and non-mosaic KS patients. All KS patients demonstrated significant differences in both hormone levels and testicular volume compared with normal men. Conversely, they revealed non-significant differences in hormone levels and significant differences in testicular volume compared with KS-like patients. Additionally, the KS-like patients exhibited non-significant variations in both LH and FSH levels and significant variations in T level and testicular volume compared with normal men. Moreover, all KS-like patients had azoospermia, except for one patient who showed oligozoospermia. Therefore, no correlations were found either between mosaicism and serum hormone levels or with testicular volume and semen analysis.  相似文献   

7.
前列腺癌病因及发病机理研究有助于前列腺癌预防和治疗.目前,前列腺癌生化试验研究方法成本高、耗时,而基于网络计算方法容易受基因表达谱数据不完整、噪声高及实验样本数量少等约束.为此,本文提出一种基于节点-模块置信度及局部模块度的双重约束算法(命名为NMCOM),挖掘前列腺癌候选疾病模块.NMCOM算法不依赖基因表达谱数据,采用候选基因与致病表型之间一致性得分,候选基因与致病基因之间语义相似性得分融合排序策略,选取起始节点,并基于节点-模块置信度及局部模块度双重约束挖掘前列腺癌候选疾病模块.通过对挖掘出的模块进行富集分析,最终得到18个有显著意义的候选疾病基因模块.与单一打分排序方法及随机游走重开始方法相比,NMCOM融合排序策略的平均排名比小、AUC值大,且挖掘出结果明显优于其他模块挖掘算法,模块生物学意义显著.NMCOM算法不仅能准确有效地挖掘前列腺癌候选疾病模块,且可扩展挖掘其他疾病候选模块.  相似文献   

8.
Bacterial whole genome sequencing holds promise as a disruptive technology in clinical microbiology, but it has not yet been applied systematically or comprehensively within a clinical context. Here, over the course of one year, we performed prospective collection and whole genome sequencing of nearly all bacterial isolates obtained from a tertiary care hospital’s intensive care units (ICUs). This unbiased collection of 1,229 bacterial genomes from 391 patients enables detailed exploration of several features of clinical pathogens. A sizable fraction of isolates identified as clinically relevant corresponded to previously undescribed species: 12% of isolates assigned a species-level classification by conventional methods actually qualified as distinct, novel genomospecies on the basis of genomic similarity. Pan-genome analysis of the most frequently encountered pathogens in the collection revealed substantial variation in pan-genome size (1,420 to 20,432 genes) and the rate of gene discovery (1 to 152 genes per isolate sequenced). Surprisingly, although potential nosocomial transmission of actively surveilled pathogens was rare, 8.7% of isolates belonged to genomically related clonal lineages that were present among multiple patients, usually with overlapping hospital admissions, and were associated with clinically significant infection in 62% of patients from which they were recovered. Multi-patient clonal lineages were particularly evident in the neonatal care unit, where seven separate Staphylococcus epidermidis clonal lineages were identified, including one lineage associated with bacteremia in 5/9 neonates. Our study highlights key differences in the information made available by conventional microbiological practices versus whole genome sequencing, and motivates the further integration of microbial genome sequencing into routine clinical care.  相似文献   

9.
A new method to measure the semantic similarity of GO terms   总被引:4,自引:0,他引:4  
  相似文献   

10.
Genomic mosaicism arising from post-zygotic mutation has recently been demonstrated to occur in normal tissue of individuals ascertained with varied phenotypes, indicating that detectable mosaicism may be less an exception than a rule in the general population. A challenge to comprehensive cataloging of mosaic mutations and their consequences is the presence of heterogeneous mixtures of cells, rendering low-frequency clones difficult to discern. Here we applied a computational method using estimated haplotypes to characterize mosaic megabase-scale structural mutations in 31,100 GWA study subjects. We provide in silico validation of 293 previously identified somatic mutations and identify an additional 794 novel mutations, most of which exist at lower aberrant cell fractions than have been demonstrated in previous surveys. These mutations occurred across the genome but in a nonrandom manner, and several chromosomes and loci showed unusual levels of mutation. Our analysis supports recent findings about the relationship between clonal mosaicism and old age. Finally, our results, in which we demonstrate a nearly 3-fold higher rate of clonal mosaicism, suggest that SNP-based population surveys of mosaic structural mutations should be conducted with haplotypes for optimal discovery.  相似文献   

11.
Somatic transposon mutagenesis in mice is an efficient strategy to investigate the genetic mechanisms of tumorigenesis. The identification of tumor driving transposon insertions traditionally requires the generation of large tumor cohorts to obtain information about common insertion sites. Tumor driving insertions are also characterized by their clonal expansion in tumor tissue, a phenomenon that is facilitated by the slow and evolving transformation process of transposon mutagenesis. We describe here an improved approach for the detection of tumor driving insertions that assesses the clonal expansion of insertions by quantifying the relative proportion of sequence reads obtained in individual tumors. To this end, we have developed a protocol for insertion site sequencing that utilizes acoustic shearing of tumor DNA and Illumina sequencing. We analyzed various solid tumors generated by PiggyBac mutagenesis and for each tumor >106 reads corresponding to >104 insertion sites were obtained. In each tumor, 9 to 25 insertions stood out by their enriched sequence read frequencies when compared to frequencies obtained from tail DNA controls. These enriched insertions are potential clonally expanded tumor driving insertions, and thus identify candidate cancer genes. The candidate cancer genes of our study comprised many established cancer genes, but also novel candidate genes such as Mastermind-like1 (Mamld1) and Diacylglycerolkinase delta (Dgkd). We show that clonal expansion analysis by high-throughput sequencing is a robust approach for the identification of candidate cancer genes in insertional mutagenesis screens on the level of individual tumors.  相似文献   

12.
It is crucial to understand the differences across papillary thyroid cancer (PTC) stages, so as to provide a basis for individualized treatments. Here, comprehensive function characterization of PTC stage-related genes was performed and a new prognostic signature was developed for advanced patients. Two gene modules were confirmed to be closely associated with PTC stages and further six hub genes were identified that yield excellent diagnostic efficiency between tumour and normal tissues. Genetic alteration analysis indicates that they are much conservative since mutations in the DNA of them rarely occur, but changes of DNA methylation on these six genes show that 12 DNA methylation sites are significantly associated with their corresponding genes' expression. Validation data set testing also suggests that these six stage-related hub genes would be probably potential biomarkers for marking four stages. Subsequently, a 21-mRNA-based prognostic risk model was constructed for PTC stage III/IV patients and it could effectively predict the survival of patients with strong prognostic ability. Functional analysis shows that differential expression genes between high- and low-risk patients would promote the progress of PTC to some extent. Moreover, tumour microenvironment (TME) of high-risk patients may be more conducive to tumour growth by ESTIMATE analysis.  相似文献   

13.
NOP56是一种与癌基因表达密切相关的核仁蛋白。本文通过对在线数据进行差异表达基因分析,发现NOP56在乳腺癌组织中高表达。再以NOP56的表达高低为表型,分析不同表型与临床预后的差异,结果表明NOP56高表达与乳腺癌不良临床病理参数和预后密切相关。通过富集分析获得NOP56的蛋白互作网络、计算共表达基因语义相似性。最后通过在线数据库获得NOP56及其共表达基因的的临床靶向药物放线菌素D(更生霉素)。这些结果为乳腺癌防治提供了潜在的新的预测指标,完善了临床靶向药物使用的分子机制,为靶向药物的临床使用提供依据和线索。  相似文献   

14.
BackgroundPhenotypic features associated with genes and diseases play an important role in disease-related studies and most of the available methods focus solely on the Online Mendelian Inheritance in Man (OMIM) database without considering the controlled vocabulary. The Human Phenotype Ontology (HPO) provides a standardized and controlled vocabulary covering phenotypic abnormalities in human diseases, and becomes a comprehensive resource for computational analysis of human disease phenotypes. Most of the existing HPO-based software tools cannot be used offline and provide only few similarity measures. Therefore, there is a critical need for developing a comprehensive and offline software for phenotypic features similarity based on HPO.ResultsHPOSim is an R package for analyzing phenotypic similarity for genes and diseases based on HPO data. Seven commonly used semantic similarity measures are implemented in HPOSim. Enrichment analysis of gene sets and disease sets are also implemented, including hypergeometric enrichment analysis and network ontology analysis (NOA).ConclusionsHPOSim can be used to predict disease genes and explore disease-related function of gene modules. HPOSim is open source and freely available at SourceForge (https://sourceforge.net/p/hposim/).  相似文献   

15.
16.
Yang  Yang  Xu  Zhuangdi  Song  Dandan 《BMC bioinformatics》2016,17(1):109-116
Missing values are commonly present in microarray data profiles. Instead of discarding genes or samples with incomplete expression level, missing values need to be properly imputed for accurate data analysis. The imputation methods can be roughly categorized as expression level-based and domain knowledge-based. The first type of methods only rely on expression data without the help of external data sources, while the second type incorporates available domain knowledge into expression data to improve imputation accuracy. In recent years, microRNA (miRNA) microarray has been largely developed and used for identifying miRNA biomarkers in complex human disease studies. Similar to mRNA profiles, miRNA expression profiles with missing values can be treated with the existing imputation methods. However, the domain knowledge-based methods are hard to be applied due to the lack of direct functional annotation for miRNAs. With the rapid accumulation of miRNA microarray data, it is increasingly needed to develop domain knowledge-based imputation algorithms specific to miRNA expression profiles to improve the quality of miRNA data analysis. We connect miRNAs with domain knowledge of Gene Ontology (GO) via their target genes, and define miRNA functional similarity based on the semantic similarity of GO terms in GO graphs. A new measure combining miRNA functional similarity and expression similarity is used in the imputation of missing values. The new measure is tested on two miRNA microarray datasets from breast cancer research and achieves improved performance compared with the expression-based method on both datasets. The experimental results demonstrate that the biological domain knowledge can benefit the estimation of missing values in miRNA profiles as well as mRNA profiles. Especially, functional similarity defined by GO terms annotated for the target genes of miRNAs can be useful complementary information for the expression-based method to improve the imputation accuracy of miRNA array data. Our method and data are available to the public upon request.  相似文献   

17.
Hu  Jialu  Gao  Yiqun  Li  Jing  Zheng  Yan  Wang  Jingru  Shang  Xuequn 《BMC bioinformatics》2019,20(18):1-12
Background

It’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors.

Results

In this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section.

Conclusion

In summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective.

  相似文献   

18.
MOTIVATION: In microarray studies, numerous tools are available for functional enrichment analysis based on GO categories. Most of these tools, due to their requirement of a prior threshold for designating genes as differentially expressed genes (DEGs), are categorized as threshold-dependent methods that often suffer from a major criticism on their changing results with different thresholds. RESULTS: In the present article, by considering the inherent correlation structure of the GO categories, a continuous measure based on semantic similarity of GO categories is proposed to investigate the functional consistence (or stability) of threshold-dependent methods. The results from several datasets show when simply counting overlapping categories between two groups, the significant category groups selected under different DEG thresholds are seemingly very different. However, based on the semantic similarity measure proposed in this article, the results are rather functionally consistent for a wide range of DEG thresholds. Moreover, we find that the functional consistence of gene lists ranked by SAM metric behaves relatively robust against changing DEG thresholds. AVAILABILITY: Source code in R is available on request from the authors.  相似文献   

19.
Cancer progression is due to the accumulation of recurrent genomic alterations that induce growth advantage and clonal expansion. Most of these genomic changes can be detected using the array comparative genomic hybridization (CGH) technique. The accurate classification of these genomic alterations is expected to have an important impact on translational and basic research. Here we review recent advances in CGH technology used in the characterization of different features of breast cancer. First, we present bioinformatics methods that have been developed for the analysis of CGH arrays; next, we discuss the use of array CGH technology to classify tumor stages and to identify and stratify subgroups of patients with different prognoses and clinical behaviors. We finish our review with a discussion of how CGH arrays are being used to identify oncogenes, tumor suppressor genes, and breast cancer susceptibility genes.  相似文献   

20.

Background  

The availability of various high-throughput experimental and computational methods allows biologists to rapidly infer functional relationships between genes. It is often necessary to evaluate these predictions computationally, a task that requires a reference database for functional relatedness. One such reference is the Gene Ontology (GO). A number of groups have suggested that the semantic similarity of the GO annotations of genes can serve as a proxy for functional relatedness. Here we evaluate a simple measure of semantic similarity, term overlap (TO).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号