首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
This article is part of the “Translational Bioinformatics" collection for PLOS Computational Biology.

What to Learn in This Chapter

  • Identification of specific disease genes is complicated by gene pleiotropy, polygenic nature of many diseases, varied influence of environmental factors, and overlying genome variation.
  • Gene prioritization is the process of assigning likelihood of gene involvement in generating a disease phenotype. This approach narrows down, and arranges in the order of likelihood in disease involvement, the set of genes to be tested experimentally.
  • The gene “priority" in disease is assigned by considering a set of relevant features such as gene expression and function, pathway involvement, and mutation effects.
  • In general, disease genes tend to 1) interact with other disease genes, 2) harbor functionally deleterious mutations, 3) code for proteins localizing to the affected biological compartment (pathway, cellular space, or tissue), 4) have distinct sequence properties such as longer length and a higher number of exons, 5) have more orthologues and fewer paralogues.
  • Data sources (directly experimental, extracted from knowledge-bases, or text-mining based) and mathematical/computational models used for gene prioritization vary widely.
  相似文献   

2.
Based on the hypothesis that the neighbors of disease genes trend to cause similar diseases, network-based methods for disease prediction have received increasing attention. Taking full advantage of network structure, the performance of global distance measurements is generally superior to local distance measurements. However, some problems exist in the global distance measurements. For example, global distance measurements may mistake non-disease hub proteins that have dense interactions with known disease proteins for potential disease proteins. To find a new method to avoid the aforementioned problem, we analyzed the differences between disease proteins and other proteins by using essential proteins (proteins encoded by essential genes) as references. We find that disease proteins are not well connected with essential proteins in the protein interaction networks. Based on this new finding, we proposed a novel strategy for gene prioritization based on protein interaction networks. We allocated positive flow to disease genes and negative flow to essential genes, and adopted network propagation for gene prioritization. Experimental results on 110 diseases verified the effectiveness and potential of the proposed method.  相似文献   

3.
A network-based approach has proven useful for the identification of novel genes associated with complex phenotypes, including human diseases. Because network-based gene prioritization algorithms are based on propagating information of known phenotype-associated genes through networks, the pathway structure of each phenotype might significantly affect the effectiveness of algorithms. We systematically compared two popular network algorithms with distinct mechanisms – direct neighborhood which propagates information to only direct network neighbors, and network diffusion which diffuses information throughout the entire network – in prioritization of genes for worm and human phenotypes. Previous studies reported that network diffusion generally outperforms direct neighborhood for human diseases. Although prioritization power is generally measured for all ranked genes, only the top candidates are significant for subsequent functional analysis. We found that high prioritizing power of a network algorithm for all genes cannot guarantee successful prioritization of top ranked candidates for a given phenotype. Indeed, the majority of the phenotypes that were more efficiently prioritized by network diffusion showed higher prioritizing power for top candidates by direct neighborhood. We also found that connectivity among pathway genes for each phenotype largely determines which network algorithm is more effective, suggesting that the network algorithm used for each phenotype should be chosen with consideration of pathway gene connectivity.  相似文献   

4.
5.
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.  相似文献   

6.
线粒体融合蛋白Mfn1/2的结构和功能   总被引:1,自引:0,他引:1  
线粒体融合素基因(mitofusin gene,Mfn)在哺乳动物中编码两种蛋白质分子,Mfn1和Mfn2,它们在线粒体融合、分裂与细胞凋亡中起重要作用,调控着线粒体形态的动态变化。另外,Mfn1/2还参与线粒体的能量代谢并与相关疾病的发生有着密切关系。  相似文献   

7.
Sarcomas are relatively rare malignancies and include a large number of histological subgroups. Based on morphology alone, the differential diagnoses of sarcoma subtypes can be challenging, but the identification of specific fusion genes aids correct diagnostication. The presence of individual fusion products are routinely investigated in Pathology labs. However, the methods used are time-consuming and based on prior knowledge about the expected fusion gene and often the most likely break-point. In this study, 16 sarcoma samples, representing seven different sarcoma subtypes with known fusion gene status from a diagnostic setting, were investigated using a fusion gene microarray. The microarray was designed to detect all possible exon-exon breakpoints between all known fusion genes in a single analysis. An automated scoring of the microarray data from the 38 known sarcoma-related fusion genes identified the correct fusion gene among the top-three hits in 11 of the samples. The analytical sensitivity may be further optimised, but we conclude that a sarcoma-fusion gene microarray is suitable as a time-saving screening tool to identify the majority of the correct fusion genes.  相似文献   

8.
9.
Selecting relevant features is a common task in most OMICs data analysis, where the aim is to identify a small set of key features to be used as biomarkers. To this end, two alternative but equally valid methods are mainly available, namely the univariate (filter) or the multivariate (wrapper) approach. The stability of the selected lists of features is an often neglected but very important requirement. If the same features are selected in multiple independent iterations, they more likely are reliable biomarkers. In this study, we developed and evaluated the performance of a novel method for feature selection and prioritization, aiming at generating robust and stable sets of features with high predictive power. The proposed method uses the fuzzy logic for a first unbiased feature selection and a Random Forest built from conditional inference trees to prioritize the candidate discriminant features. Analyzing several multi-class gene expression microarray data sets, we demonstrate that our technique provides equal or better classification performance and a greater stability as compared to other Random Forest-based feature selection methods.  相似文献   

10.
活细胞依赖其众多的转录调控模块来实现复杂的生物功能,识别转录调控模块对深入理解细胞的功能及其转录机制有着重要的意义。本文结合酵母基因表达数据和ChIP-chip数据,提出了一种转录调控模块识别算法。该算法通过采用不同的P值阈值分别得到了核心集和粗糙集,然后对核心集和粗糙集进行判别,最后对基因进行扩展之后得到基因转录调控模块。将该算法运用到两个酵母基因表达数据中,得到了一些具有显著生物学意义的基因转录调控模块。与其它算法相比,该算法不仅可以识别含有较多基因的转录调控模块,而且可以识别一些其它算法不能识别的基因转录调控模块。识别得到的基因转录调控模块有着不同的生物学功能,并且有助于进一步理解酵母的转录调控机制。  相似文献   

11.
12.
Identification of functional sets of genes associated with conditions of interest from omics data was first reported in 1999, and since, a plethora of enrichment methods were published for systematic analysis of gene sets collections including Gene Ontology and biological pathways. Despite their widespread usage in reducing the complexity of omics experiment results, their performance is poorly understood. Leveraging the existence of disease specific gene sets in KEGG and Metacore® databases, we compared the performance of sixteen methods under relaxed assumptions while using 42 real datasets (over 1,400 samples). Most of the methods ranked high the gene sets designed for specific diseases whenever samples from affected individuals were compared against controls via microarrays. The top methods for gene set prioritization were different from the top ones in terms of sensitivity, and four of the sixteen methods had large false positives rates assessed by permuting the phenotype of the samples. The best overall methods among those that generated reasonably low false positive rates, when permuting phenotypes, were PLAGE, GLOBALTEST, and PADOG. The best method in the category that generated higher than expected false positives was MRGSE.  相似文献   

13.
Fusion genes formed by chromosomal rearrangements are common drivers of cancer. Recent innovations in the field of next-generation sequencing (NGS) have seen a dynamic shift from traditional fusion detection approaches, such as visual characterization by fluorescence, to more precise multiplexed methods. There are many different NGS-based approaches to fusion gene detection and deciding on the most appropriate method can be difficult. Beyond the experimental approach, consideration needs to be given to factors such as the ease of implementation, processing time, associated costs, and the level of expertise required for data analysis. Here, the different NGS-based methods for fusion gene detection, the basic principles underlying the techniques, and the benefits and limitations of each approach are reviewed. This article concludes with a discussion of how NGS will impact fusion gene detection in a clinical context and from where the next innovations are evolving.  相似文献   

14.
C型产气荚膜梭菌α、β_1毒素基因的融合   总被引:3,自引:0,他引:3  
利用PCR技术,从C型产气荚膜梭菌染色体DNA中扩增出α和β1毒素基因,通过分离、纯化、内切酶酶切、连接和转化,构建了含αβ1融合基因表达质粒重组菌株BL21(DE3)(pETXAB1)。经酶切鉴定和核苷酸序列测定证实,构建的重组质粒pETXAB1含有αβ1融合基因,且基因序列和阅读框架均正确。经ELISA检测,重组菌株表达的αβ1融合蛋白能够被α、β1毒素抗体识别。免疫实验结果表明,αβ1融合蛋白免疫的小鼠可以抵抗1MLD的C型产气荚膜梭菌C5944毒素攻击,表明构建的重组菌株可以作为预防仔猪红痢基因工程亚单位苗的候选菌株。  相似文献   

15.
构建家蝇天蚕素-人溶菌酶(Mdc-hly)融合基因,实现Mdc-hly基因在大肠杆菌中的表达。通过RT-PCR分别扩增出家蝇天蚕素和人溶菌酶的成熟肽基因序列,再利用Gene-SOEing技术构建融合基因,将融合基因克隆至pET32a表达载体,转化E.coli BL21(DE3),经IPTG诱导得到高效表达,融合蛋白分子量约为38kD。Western blotting杂交证实了表达蛋白的抗原活性。成功构建了融合其因并进行了原核表达,为进一步的生物活性研究打下基础。  相似文献   

16.

Background

Candidate gene prioritization aims to identify promising new genes associated with a disease or a biological process from a larger set of candidate genes. In recent years, network-based methods – which utilize a knowledge network derived from biological knowledge – have been utilized for gene prioritization. Biological knowledge can be encoded either through the network''s links or nodes. Current network-based methods can only encode knowledge through links. This paper describes a new network-based method that can encode knowledge in links as well as in nodes.

Results

We developed a new network inference algorithm called the Knowledge Network Gene Prioritization (KNGP) algorithm which can incorporate both link and node knowledge. The performance of the KNGP algorithm was evaluated on both synthetic networks and on networks incorporating biological knowledge. The results showed that the combination of link knowledge and node knowledge provided a significant benefit across 19 experimental diseases over using link knowledge alone or node knowledge alone.

Conclusions

The KNGP algorithm provides an advance over current network-based algorithms, because the algorithm can encode both link and node knowledge. We hope the algorithm will aid researchers with gene prioritization.  相似文献   

17.
本研究利用改进SOE-PCR技术构建肝靶向穿膜肽(HTPP)与家蝇天蚕素(MDC)融合基因并对其分子特征进行了预测和分析。结果表明:成功融合了HTPP与MDC,并构建了HTPP-MDC融合基因的克隆重组质粒HTPP-MDC/pMD20-T。PCR和KpnⅠ/HindⅢ双酶切结果显示获得与预期大小一致的基因片段,测序结果显示获得的基因序列没有发生突变,与预期完全一致。分子特征分析表明,该融合基因编码60个氨基酸,分子量为6516.2Da,理论等电点为9.31,二级结构主要由α-螺旋、无规则卷曲、延伸链和β-转角组成。研究结果为HTPP-MDC后续的功能研究奠定了基础,同时也为应用SOE-PCR技术构建融合基因提供了有益借鉴。  相似文献   

18.
Auxilin蛋白诱导Hsp70c蛋白与笼形蛋白的结合,在真核细胞衣被小泡脱衣被的过程中扮演了重要的角色.通过对已有EST,STS等数据库的综合分析,我们将人类auxilin基因定位到1p31,D1S515和D1S198标记之间.26个EST构成的5个重叠群,占该基因中共约2.3 kb的部分cDNA序列,其中编码区长501 bp,得到的序列与牛的auxilin基因显示有极高的同源性.各EST数据显示,auxilin在人胚胎的多种组织中表达,在成人脑、表皮组织中也有表达.  相似文献   

19.
Genes involved in the same function tend to have similar evolutionary histories, in that their rates of evolution covary over time. This coevolutionary signature, termed Evolutionary Rate Covariation (ERC), is calculated using only gene sequences from a set of closely related species and has demonstrated potential as a computational tool for inferring functional relationships between genes. To further define applications of ERC, we first established that roughly 55% of genetic diseases posses an ERC signature between their contributing genes. At a false discovery rate of 5% we report 40 such diseases including cancers, developmental disorders and mitochondrial diseases. Given these coevolutionary signatures between disease genes, we then assessed ERC''s ability to prioritize known disease genes out of a list of unrelated candidates. We found that in the presence of an ERC signature, the true disease gene is effectively prioritized to the top 6% of candidates on average. We then apply this strategy to a melanoma-associated region on chromosome 1 and identify MCL1 as a potential causative gene. Furthermore, to gain global insight into disease mechanisms, we used ERC to predict molecular connections between 310 nominally distinct diseases. The resulting “disease map” network associates several diseases with related pathogenic mechanisms and unveils many novel relationships between clinically distinct diseases, such as between Hirschsprung''s disease and melanoma. Taken together, these results demonstrate the utility of molecular evolution as a gene discovery platform and show that evolutionary signatures can be used to build informative gene-based networks.  相似文献   

20.
构建了新型联合基因载体pcDNA3.1(-)VEGF-siRNA/yCDglyTK,研究其在人胃癌细胞系SGC7901细胞中的表达和杀伤作用.构建靶向血管内皮生长因子(VEGF)的干扰质粒pGenesil-VEGF-siRNA,采用PCR法从中扩增siRNA表达框(含U6启动子),亚克隆至双自杀基因载体pcDNA3.1(-)CV-yCDglyTK,构建联合基因质粒pcDNA3.1(-)VEGF-siRNA/yCDglyTK;通过酶切、测序等鉴定重组质粒;以磷酸钙纳米颗粒为载体,将干扰质粒、双自杀基因质粒及联合基因质粒转染SGC7901细胞,RT-PCR、Western-blot验证目的基因表达;MTT法检测转染细胞对5-氟胞嘧啶(5-FC)的敏感性.结果表明:酶切及测序证实联合基因载体pcDNA3.1(-)VEGF-siRNA/yCDglyTK构建成功;SGC7901细胞转染联合基因质粒后,RT-PCR、Western-blot证实融合自杀基因表达,而VEGF基因表达下调;在前体药物5-FC作用下,转染联合基因组细胞存活率最低,与其他组比较有统计学差异.成功构建联合基因载体pcDNA3.1(-)VEGF-si...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号