首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
应用DNA芯片数据挖掘复杂疾病相关基因的集成决策方法   总被引:11,自引:2,他引:9  
DNA芯片技术的迅速发展, 可同时检测成千上万个基因的表达谱数据, 为生命科学家们从一个全新的角度阐明生命的本质提供了可能性. 目前, 基因表达谱分析的工作大多集中在对癌症等疾病分类、疾病亚型识别等方面, 而从这些基因表达谱信息中挖掘反映疾病本质特征的相关基因, 是一项在后基因组时代更具挑战意义的科学研究, 基因挖掘由于缺少理想的数据挖掘技术而被忽视. 我们提出了一种新颖的特征基因挖掘的集成决策方法, 目的在于解决三个重要的生物学问题: 生物学分类及疾病分型、复杂疾病相关基因深度挖掘和目标驱使的基因网络构建. 我们成功地将此集成决策方法应用于一套结肠癌DNA表达谱数据, 结果显示这一新颖的特征基因挖掘技术在应用DNA芯片数据分析、挖掘复杂疾病相关基因等方面具有很高的价值.  相似文献   

2.
基于基因表达谱的疾病亚型特征基因挖掘方法   总被引:1,自引:0,他引:1  
在本研究中,提出了一种基于基因表达谱的疾病亚型特征基因挖掘方法,该方法基于过滤后基因表达谱,融合无监督聚类识别疾病亚型技术和提出的衡量特征基因对疾病亚型鉴别能力的模式质量测度,以嵌入的方式实现特征基因挖掘。最后将提出的方法应用于40例结肠癌组织与22例正常结肠组织中2000个基因的表达谱实验数据,结果显示:提出的方法是一种可行的疾病亚型特征基因挖掘方法,方法的优势在于可并行实现疾病亚型划分和特征基因识别。  相似文献   

3.
前列腺癌病因及发病机理研究有助于前列腺癌预防和治疗.目前,前列腺癌生化试验研究方法成本高、耗时,而基于网络计算方法容易受基因表达谱数据不完整、噪声高及实验样本数量少等约束.为此,本文提出一种基于节点-模块置信度及局部模块度的双重约束算法(命名为NMCOM),挖掘前列腺癌候选疾病模块.NMCOM算法不依赖基因表达谱数据,采用候选基因与致病表型之间一致性得分,候选基因与致病基因之间语义相似性得分融合排序策略,选取起始节点,并基于节点-模块置信度及局部模块度双重约束挖掘前列腺癌候选疾病模块.通过对挖掘出的模块进行富集分析,最终得到18个有显著意义的候选疾病基因模块.与单一打分排序方法及随机游走重开始方法相比,NMCOM融合排序策略的平均排名比小、AUC值大,且挖掘出结果明显优于其他模块挖掘算法,模块生物学意义显著.NMCOM算法不仅能准确有效地挖掘前列腺癌候选疾病模块,且可扩展挖掘其他疾病候选模块.  相似文献   

4.
自提出全基因组关联研究(genome-wide association study,GWAS)设想以来,在人类复杂疾病和水稻农艺性状关联研究方面,GWAS已得到广泛运用。但作为一种典型的单标记研究方法,GWAS不能检测小效应的遗传变异,而稀有变异间的联合效应往往与表型密切相关,因此,需对GWAS结果进行深入的数据挖掘。基于通路的分析方法(pathway-based analysis,PBA)就是利用基因功能、生物代谢通路等相关信息建立的对GWAS结果进行二次挖掘的方法。该方法能从GWAS结果挖掘出与性状、疾病相关联的通路及具有相同功能的基因集等数据,从而获得更多的遗传信息。现对PBA的出现、计算方法和相关软件进行简要综述,以期为人们进行通路分析提供参考。  相似文献   

5.
复杂疾病的发生发展与机体内生物学通路的功能紊乱有密切联系,从高通量数据出发,利用计算机辅助方法来研究疾病与通路间的关系具有重要意义.本文提出了一个新的基于网络的全局性通路识别方法.该方法利用蛋白质互作信息和通路的基因集组成信息构建复杂的蛋白质-通路网.然后,基于表达谱数据,通过随机游走算法从全局层面优化疾病风险通路.最终,通过扰动方式识别统计学显著的风险通路.将该网络运用于结肠直肠癌风险通路识别,识别出15个与结肠直肠癌发生与发展过程显著相关的通路.通过与其他通路识别方法(超几何检验,SPIA)相比较,该方法能够更有效识别出疾病相关的风险通路.  相似文献   

6.
李霞  姜伟  张帆 《生物物理学报》2007,23(4):296-306
复杂疾病相关靶基因的识别、构建疾病驱使相关基因网络及进行疾病机制研究,是功能基因组学研究中非常重要的科学问题。文章以计算系统生物学的观点和三维的角度,综述了基于生物谱(SNP遗传谱、芯片表达谱和2D-PAGE蛋白质谱等)的复杂疾病靶基因识别、多水平(SNPs虚拟网络、基因调控网络、蛋白质互作网络等)遗传网络逆向重构方法,及不同水平的网络之间在生物学和拓扑学上的纵向映射关系,并给出复杂疾病靶基因识别与网络关系的计算系统生物方法研究的未来展望。  相似文献   

7.
目前,生物遗传学领域在区分复杂性状的研究上正面临着巨大挑战,许多方法都被用来应对这项挑战,其中分子标记法,QTL作图法和序列分析法等就是用来区分控制复杂性状基因的主要应对策略。测定生物复杂性状对于研究生物多样性具有重要意义,也是进一步研究基因控制性状作用机理的重要途径,但是,现有的方法并不成熟也不完善,因此给有效区分复杂性状带来了一定难度。近年来,由于生长曲线能够有效地描述复杂性状,基于生长曲线来区分复杂性状的方法是目前常用的方式,Functional Mapping(FM)就是其中具有代表性的一种方法。在过去的十年间,FM方法是复杂性状区分效果最好的,但不能有效处理非单调类型的生长曲线。Earliness index(E-index)方法的问世,解决了非单调类型的曲线不能有效识别的难题,它能够将任意生物类型的复杂性状发展过程描述为生长曲线并加以区分。基于E-index方法的原理,开发了一套Eindex Application(EIA)分析工具,该工具中集成了E-index方法,利用生物数据可视化技术动态绘制生长曲线,包含数据获取、数据处理和结果输出等功能,为遗传工作者的研究提供了良好平台。仿真实验的结果证明了EIA分析工具具有高效、实时和准确的性能,是区分复杂性状的有力工具。  相似文献   

8.
神经胶质瘤(glioma)是一种严重的颅内肿瘤疾病,具有高复发率、高死亡率和低治愈率等特点。利用基因微阵列数据识别与神经胶质瘤相关的特征基因,对该疾病的临床诊断和生物医学研究将起到有益的参考和借鉴作用。作者针对神经胶质瘤数据,提出了一种集成类随机森林特征基因选择方法。首先应用有监督奇异值分解对数据进行降维并粗选出基因;其次应用类随机森林特征选择方法选出特征基因。实验结果显示,该方法对分类器的适应性强;对比其他方法,分类率优势明显;更重要的是,在选出的前50个特征基因中有39个基因与神经胶质瘤或肿瘤细胞生物过程存在着密切联系,证实该方法不仅保持了较高的分类率,而且保证了选择的特征基因具有很强的生物学关联意义,具有较高的可行性和实用性。  相似文献   

9.
摘要目的:类风湿性关节炎是一种全身的慢性炎症型疾病,可能影响许多组织和器官,主要发作于灵活的关节。全世界人群中大 约有1%会患有类风湿性关节炎。目前已经证实了一些基因与类风湿性关节炎相关,但是这些基因只能解释一小部分遗传风险, 因此我们需要新的策略和方法来解决这个问题。方法:表达数量性状位点(eQTL)是指能够调控基因或蛋白质表达的基因组位点, 本文采用了eQTL数据构建基因- 基因网络并挖掘候选类风湿性关节炎风险基因。结果:首先,利用eQTL 数据,基于基因之间的 共调控系数,建立基因- 基因网络,我们建立了5 个不同阈值(0、0.2、0.4、0.6和0.8)的基因-基因网络;然后,在OMIM 和GAD数 据库中搜索已经证实的与类风湿性关节炎相关的186 个基因;最后我们将已证实与类风湿性关节炎相关的186 个基因分别投入 到这5 个网络中,利用基因与基因之间的相关性来挖掘到一些可能与类风湿性关节炎相关的候选风险基因。结论:本文基于 eQTL构建了基因-基因网络,结合已知类风湿性关节炎风险基因,挖掘未知风险基因,得到了较好的结果,证明了本方法的有效 性,且对于类风湿性关节炎的发病机制研究具有重要价值。除了类风湿性关节炎外,本方法还可推广到其它复杂疾病中,因此本 方法对人类复杂疾病的研究具有很强的学术理论价值和应用价值。  相似文献   

10.
基于功能一致性预测冠心病致病基因   总被引:1,自引:0,他引:1  
目的:为了解疾病致病机理和改进临床治疗,基于功能一致性挖掘潜在的疾病致病基因.方法:本文基于功能一致性基因的共定位特性,结合蛋白质互作网络拓扑结构,获取疾病候选基因集,并通过GO及KEGG功能富集分析方法进一步筛选,预测出新的致病基因.结果:挖掘得到的59个冠心病致病基因通过文献证实绝大部分基因与疾病的发生发展存在着联系.结论:本方法具有可行性,研究者能够在此基础上很好地进行疾病致病机理的研究.  相似文献   

11.
Chung RH  Chen YE 《PloS one》2012,7(5):e36662
Pathway analysis provides a powerful approach for identifying the joint effect of genes grouped into biologically-based pathways on disease. Pathway analysis is also an attractive approach for a secondary analysis of genome-wide association study (GWAS) data that may still yield new results from these valuable datasets. Most of the current pathway analysis methods focused on testing the cumulative main effects of genes in a pathway. However, for complex diseases, gene-gene interactions are expected to play a critical role in disease etiology. We extended a random forest-based method for pathway analysis by incorporating a two-stage design. We used simulations to verify that the proposed method has the correct type I error rates. We also used simulations to show that the method is more powerful than the original random forest-based pathway approach and the set-based test implemented in PLINK in the presence of gene-gene interactions. Finally, we applied the method to a breast cancer GWAS dataset and a lung cancer GWAS dataset and interesting pathways were identified that have implications for breast and lung cancers.  相似文献   

12.
单倍型分析技术研究进展   总被引:1,自引:0,他引:1  
单倍型是指共存于单条染色体上的一系列遗传变异位点的组合,每条染色体都有自己独特的单倍型。单倍型分析技术作为一种常用的数据分析方法,是寻找单染色体上杂合SNP变异位点的有效方法,也对挖掘致病基因、寻找疾病治疗新方法有重要作用。它主要包括间接推断法和直接实验法。文中介绍了各种单倍型分析方法及应用,尤其详细介绍了单分子稀释法和保留邻近性的转座酶测序法,同时对单倍型分析技术的应用前景进行了展望。  相似文献   

13.
14.
Pathway analysis has lead to a new era in genomic research by providing further biological process information compared to traditional single gene analysis. Beside the advantage, pathway analysis provides some challenges to the researchers, one of which is the quality of pathway data itself. The pathway data usually defined from biological context free, when it comes to a specific biological context (e.g. lung cancer disease), typically only several genes within pathways are responsible for the corresponding cellular process. It also can be that some pathways may be included with uninformative genes or perhaps informative genes were excluded. Moreover, many algorithms in pathway analysis neglect these limitations by treating all the genes within pathways as significant. In previous study, a hybrid of support vector machines and smoothly clipped absolute deviation with groups-specific tuning parameters (gSVM-SCAD) was proposed in order to identify and select the informative genes before the pathway evaluation process. However, gSVM-SCAD had showed a limitation in terms of the performance of classification accuracy. In order to deal with this limitation, we made an enhancement to the tuning parameter method for gSVM-SCAD by applying the B-Type generalized approximate cross validation (BGACV). Experimental analyses using one simulated data and two gene expression data have shown that the proposed method obtains significant results in identifying biologically significant genes and pathways, and in classification accuracy.  相似文献   

15.

Background  

Single Nucleotide Polymorphism (SNP) analysis only captures a small proportion of associated genetic variants in Genome-Wide Association Studies (GWAS) partly due to small marginal effects. Pathway level analysis incorporating prior biological information offers another way to analyze GWAS's of complex diseases, and promises to reveal the mechanisms leading to complex diseases. Biologically defined pathways are typically comprised of numerous genes. If only a subset of genes in the pathways is associated with disease then a joint analysis including all individual genes would result in a loss of power. To address this issue, we propose a pathway-based method that allows us to test for joint effects by using a pre-selected gene subset. In the proposed approach, each gene is considered as the basic unit, which reduces the number of genetic variants considered and hence reduces the degrees of freedom in the joint analysis. The proposed approach also can be used to investigate the joint effect of several genes in a candidate gene study.  相似文献   

16.
Since genes associated with similar diseases/disorders show an increased tendency for their protein products to interact with each other through protein-protein interactions (PPI), clustering analysis obviously as an efficient technique can be easily used to predict human disease-related gene clusters/subnetworks. Firstly, we used clustering algorithms, Markov cluster algorithm (MCL), Molecular complex detection (MCODE) and Clique percolation method (CPM) to decompose human PPI network into dense clusters as the candidates of disease-related clusters, and then a log likelihood model that integrates multiple biological evidences was proposed to score these dense clusters. Finally, we identified disease-related clusters using these dense clusters if they had higher scores. The efficiency was evaluated by a leave-one-out cross validation procedure. Our method achieved a success rate with 98.59% and recovered the hidden disease-related clusters in 34.04% cases when removed one known disease gene and all its gene-disease associations. We found that the clusters decomposed by CPM outperformed MCL and MCODE as the candidates of disease-related clusters with well-supported biological significance in biological process, molecular function and cellular component of Gene Ontology (GO) and expression of human tissues. We also found that most of the disease-related clusters consisted of tissue-specific genes that were highly expressed only in one or several tissues, and a few of those were composed of housekeeping genes (maintenance genes) that were ubiquitously expressed in most of all the tissues.  相似文献   

17.
DNA microarray gene expression and microarray-based comparative genomic hybridization (aCGH) have been widely used for biomedical discovery. Because of the large number of genes and the complex nature of biological networks, various analysis methods have been proposed. One such method is "gene shaving," a procedure which identifies subsets of the genes with coherent expression patterns and large variation across samples. Since combining genomic information from multiple sources can improve classification and prediction of diseases, in this paper we proposed a new method, "ICA gene shaving" (ICA, independent component analysis), for jointly analyzing gene expression and copy number data. First we used ICA to analyze joint measurements, gene expression and copy number, of a biological system and project the data onto statistically independent biological processes. Next, we used these results to identify patterns of variation in the data and then applied an iterative shaving method. We investigated the properties of our proposed method by analyzing both simulated and real data. We demonstrated that the robustness of our method to noise using simulated data. Using breast cancer data, we showed that our method is superior to the Generalized Singular Value Decomposition (GSVD) gene shaving method for identifying genes associated with breast cancer.  相似文献   

18.
Based on the hypothesis that the neighbors of disease genes trend to cause similar diseases, network-based methods for disease prediction have received increasing attention. Taking full advantage of network structure, the performance of global distance measurements is generally superior to local distance measurements. However, some problems exist in the global distance measurements. For example, global distance measurements may mistake non-disease hub proteins that have dense interactions with known disease proteins for potential disease proteins. To find a new method to avoid the aforementioned problem, we analyzed the differences between disease proteins and other proteins by using essential proteins (proteins encoded by essential genes) as references. We find that disease proteins are not well connected with essential proteins in the protein interaction networks. Based on this new finding, we proposed a novel strategy for gene prioritization based on protein interaction networks. We allocated positive flow to disease genes and negative flow to essential genes, and adopted network propagation for gene prioritization. Experimental results on 110 diseases verified the effectiveness and potential of the proposed method.  相似文献   

19.
The present study proposed a two-step drug repositioning method based on a protein-protein interaction (PPI) network of two diseases and the similarity of the drugs prescribed for one of the two. In the proposed method, first, lists of disease related genes were obtained from a meta-database called Genotator. Then genes shared by a pair of diseases were sought. At the first step of the method, if a drug having its target(s) in the PPI network, the drug was deemed a repositioning candidate. Because targets of many drugs are still unknown, the similarities between the prescribed drugs for a specific disease were used to infer repositioning candidates at the second step. As a first attempt, we applied the proposed method to four different types of diseases: hypertension, diabetes mellitus, Crohn disease, and autism. Some repositioning candidates were found both at the first and second steps.  相似文献   

20.
We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a real study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号