首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 276 毫秒
1.
基于基因表达谱的疾病亚型特征基因挖掘方法   总被引:1,自引:0,他引:1  
在本研究中,提出了一种基于基因表达谱的疾病亚型特征基因挖掘方法,该方法基于过滤后基因表达谱,融合无监督聚类识别疾病亚型技术和提出的衡量特征基因对疾病亚型鉴别能力的模式质量测度,以嵌入的方式实现特征基因挖掘。最后将提出的方法应用于40例结肠癌组织与22例正常结肠组织中2000个基因的表达谱实验数据,结果显示:提出的方法是一种可行的疾病亚型特征基因挖掘方法,方法的优势在于可并行实现疾病亚型划分和特征基因识别。  相似文献   

2.
为分析甲状腺癌基因表达谱,筛选疾病相关的基因标志物。基于肿瘤基因组图谱(TCGA)数据库中的甲状腺癌基因表达数据,运用R/Bioconductor统计平台进行数据处理与统计学分析。分别应用edgeR算法和limma算法选取肿瘤组织与对照组间倍数改变 > 2,P< 0.05的基因作为差异基因;进一步运用Medcalc统计软件进行受试者工作特征曲线(ROC)分析,鉴定出有诊断标志物潜在应用价值的基因标志物。通过两种运算方法筛选出甲状腺癌组织中存在着1 945个差异基因(上调基因1 033个,下调基因912个);根据差异倍数进一步鉴定出11个基因在肿瘤组织中表达上调,且对鉴别肿瘤组与对照组有较好的应用价值。本研究分析了TCGA中的甲状腺癌表达谱数据,鉴定出了与疾病诊断显著相关的差异表达基因,能够为探索疾病发生发展机制及寻找新型分子标志物提供依据。  相似文献   

3.
应用DNA芯片数据挖掘复杂疾病相关基因的集成决策方法   总被引:11,自引:2,他引:9  
DNA芯片技术的迅速发展, 可同时检测成千上万个基因的表达谱数据, 为生命科学家们从一个全新的角度阐明生命的本质提供了可能性. 目前, 基因表达谱分析的工作大多集中在对癌症等疾病分类、疾病亚型识别等方面, 而从这些基因表达谱信息中挖掘反映疾病本质特征的相关基因, 是一项在后基因组时代更具挑战意义的科学研究, 基因挖掘由于缺少理想的数据挖掘技术而被忽视. 我们提出了一种新颖的特征基因挖掘的集成决策方法, 目的在于解决三个重要的生物学问题: 生物学分类及疾病分型、复杂疾病相关基因深度挖掘和目标驱使的基因网络构建. 我们成功地将此集成决策方法应用于一套结肠癌DNA表达谱数据, 结果显示这一新颖的特征基因挖掘技术在应用DNA芯片数据分析、挖掘复杂疾病相关基因等方面具有很高的价值.  相似文献   

4.
基于SVM和平均影响值的人肿瘤信息基因提取   总被引:1,自引:0,他引:1       下载免费PDF全文
基于基因表达谱的肿瘤分类信息基因选取是发现肿瘤特异表达基因、探索肿瘤基因表达模式的重要手段。借助由基因表达谱获得的分类信息进行肿瘤诊断是当今生物信息学领域中的一个重要研究方向,有望成为临床医学上一种快速而有效的肿瘤分子诊断方法。鉴于肿瘤基因表达谱样本数据维数高、样本量小以及噪音大等特点,提出一种结合支持向量机应用平均影响值来寻找肿瘤信息基因的算法,其优点是能够搜索到基因数量尽可能少而分类能力尽可能强的多个信息基因子集。采用二分类肿瘤数据集验证算法的可行性和有效性,对于结肠癌样本集,只需3个基因就能获得100%的留一法交叉验证识别准确率。为避免样本集的不同划分对分类性能的影响,进一步采用全折交叉验证方法来评估各信息基因子集的分类性能,优选出更可靠的信息基因子集。与基它肿瘤分类方法相比,实验结果在信息基因数量以及分类性能方面具有明显的优势。  相似文献   

5.
通过对基因表达谱数据的分析从而促进肿瘤诊断与治疗技术的发展,其研究正成为生物医学领域的一个热点。因此,提出了一种熵信息处理和主成分分析(principal component analysis,PCA)相结合的方法。首先运用熵信息对超高维基因表达谱数据进行粗选取,得到特征基因子集;由于基因子集仍存在相关性,进而利用PCA对其进一步冗余剔除;最后对得到的无冗余且具有正交性信息的基因特征进行真实数据实验。实验结果显示所采用的方法能有效去除肿瘤样本中的不相关和冗余信息,同时最大程度的保留肿瘤分类信息。与其他肿瘤分类方法相比,在精度上具有比较明显的优势,从而验证了该方法是有效的、可行的。  相似文献   

6.
随机森林:一种重要的肿瘤特征基因选择法   总被引:2,自引:0,他引:2  
特征选择技术已经被广泛地应用于生物信息学科,随机森林(random forests,RF)是其中一种重要的特征选择方法。利用RF对胃癌、结肠癌和肺癌等5组基因表达谱数据进行特征基因选择,将选择结果与支持向量机(support vector machine,SVM)结合对原数据集分类,并对特征基因选择及分类结果进行初步的分析。同时使用微阵列显著性分析(significant analysis of microarray,SAM)和ReliefF法与RF比较,结果显示随机森林选择的特征基因包含更多分类信息,分类准确率更高。结合该方法自身具有的分类方面的诸多优势,随机森林可以作为一种可靠的基因表达谱数据分析手段被广泛使用。  相似文献   

7.
胃癌survivin基因mRNA和蛋白的表达与临床病理关系   总被引:6,自引:0,他引:6  
目的 研究胃癌组织中survivin基因的mRNA和蛋白表达情况及其与临床病理参数的关系。方法 应用原位杂交方法和免疫组化SP法检测 5 4例胃癌 ,34例胃良性病变 ,2 0例胃正常组织标本中survivin基因mRNA及蛋白的表达。并对其与临床病理因素和二者的关系进行分析。结果 Survivin蛋白表达阳性率在胃癌、良性疾病组织和正常组织分别为 87 0 % (47/ 5 4 )、 2 3 5 % (8/ 34)和 15 % (3/ 2 0 ) ;SurvivinmRNA阳性率为 79 6 % (43/ 5 4 )、 2 3 5 % (8/ 34)和2 0 % (4/ 2 0 )。胃癌组远大于正常胃组织和良性疾病组 ,而正常胃组织与胃良性疾病组之间差异无显著性。SurvivinmRNA与蛋白在胃癌中的表达呈正相关 (rs=0 6 79,P <0 0 5 )。且其阳性率高低与性别、年龄、组织学类型无关 ;而与淋巴结转移、TNM分期、组织学分级有关。结论 SurvivinmRNA与蛋白在胃癌中表达较高 ,且与淋巴结转移、TNM分期、组织学分级有关 ,它可作为评估胃癌生物学行为和判断预后的生物学指标。  相似文献   

8.
龚辉成  周毅波  焦粤龙  于锋 《生物磁学》2009,(14):2702-2704,2684
目的:建立具有组织特异性的鼻咽癌基因表达谱,筛选鼻咽癌中信号转导相关基因。方法:采用深圳微芯公司基于玻片的包含8046个人类基因的基因芯片,检测7例鼻咽癌组织及1例鼻咽炎组织,初步获得鼻咽癌异常表达基因;结合GO分类从异常表达的基因中筛选信号转导相关基因,以Biocarta信号通路数据库查询筛选基因相关转导信号通路信息。结果:在鼻咽癌组织独得1241个异常用表达基因,其中高表达基因871个,低表达基因343个。发现28个差异表达基因与细胞的信号转导相关,其中表达上调的21个,表达下调的7个。结论:成功建立了具有组织特异性的鼻咽癌基因表达谱,初步获得了鼻咽癌信号转导相关基因。  相似文献   

9.
目的:通过对癌症基因表达数据的分析,预测多形性胶质母细胞瘤的驱动基因集。方法:基于主成分分析方法和神经网络,提出一种用于预测多形性胶质母细胞瘤驱动基因的系统生物学模型。首先对实验样本的原始表达谱数据进行预清洗,过滤掉无信息或表达不符合实验要求的表达数据,并对肿瘤表达谱数据进行标准化处理;然后对基因进行划分,相似突变率的基因将被划分到同一块中;最后通过学习神经网络,构建癌症相关基因的调控网络,得出驱动基因的预测集。结果:本研究应用上述模型,对多形性胶质母细胞瘤(glioblastoma multiforme,GBM)驱动基因进行预测。已发表的大量实验结果表明,我们预测出的大部分驱动基因在GBM中起重要作用。结论:我们提出一种对GBM表达谱数据分析的新方法,能够高精度地预测出该疾病的驱动基因,该模型同样能够较好地用于分析其它疾病的表达谱数据。  相似文献   

10.
癌症基因表达谱挖掘中的特征基因选择算法GA/WV   总被引:1,自引:0,他引:1  
鉴定癌症表达谱的特征基因集合可以促进癌症类型分类的研究,这也可能使病人获得更好的临床诊断?虽然一些方法在基因表达谱分析上取得了成功,但是用基因表达谱数据进行癌症分类研究依然是一个巨大的挑战,其主要原因在于缺少通用而可靠的基因重要性评估方法。GA/WV是一种新的用复杂的生物表达数据评估基因分类重要性的方法,通过联合遗传算法(GA)和加权投票分类算法(WV)得到的特征基因集合不但适用于WV分类器,也适用于其它分类器?将GA/WV方法用癌症基因表达谱数据集的验证,结果表明本方法是一种成功可靠的特征基因选择方法。  相似文献   

11.
Most of the conventional feature selection algorithms have a drawback whereby a weakly ranked gene that could perform well in terms of classification accuracy with an appropriate subset of genes will be left out of the selection. Considering this shortcoming, we propose a feature selection algorithm in gene expression data analysis of sample classifications. The proposed algorithm first divides genes into subsets, the sizes of which are relatively small (roughly of size h), then selects informative smaller subsets of genes (of size r < h) from a subset and merges the chosen genes with another gene subset (of size r) to update the gene subset. We repeat this process until all subsets are merged into one informative subset. We illustrate the effectiveness of the proposed algorithm by analyzing three distinct gene expression data sets. Our method shows promising classification accuracy for all the test data sets. We also show the relevance of the selected genes in terms of their biological functions.  相似文献   

12.
Pathway‐based feature selection algorithms, which utilize biological information contained in pathways to guide which features/genes should be selected, have evolved quickly and become widespread in the field of bioinformatics. Based on how the pathway information is incorporated, we classify pathway‐based feature selection algorithms into three major categories—penalty, stepwise forward, and weighting. Compared to the first two categories, the weighting methods have been underutilized even though they are usually the simplest ones. In this article, we constructed three different genes’ connectivity information‐based weights for each gene and then conducted feature selection upon the resulting weighted gene expression profiles. Using both simulations and a real‐world application, we have demonstrated that when the data‐driven connectivity information constructed from the data of specific disease under study is considered, the resulting weighted gene expression profiles slightly outperform the original expression profiles. In summary, a big challenge faced by the weighting method is how to estimate pathway knowledge‐based weights more accurately and precisely. Only until the issue is conquered successfully will wide utilization of the weighting methods be impossible.  相似文献   

13.
Feature selection from DNA microarray data is a major challenge due to high dimensionality in expression data. The number of samples in the microarray data set is much smaller compared to the number of genes. Hence the data is improper to be used as the training set of a classifier. Therefore it is important to select features prior to training the classifier. It should be noted that only a small subset of genes from the data set exhibits a strong correlation with the class. This is because finding the relevant genes from the data set is often non-trivial. Thus there is a need to develop robust yet reliable methods for gene finding in expression data. We describe the use of several hybrid feature selection approaches for gene finding in expression data. These approaches include filtering (filter out the best genes from the data set) and wrapper (best subset of genes from the data set) phases. The methods use information gain (IG) and Pearson Product Moment Correlation (PPMC) as the filtering parameters and biogeography based optimization (BBO) as the wrapper approach. K nearest neighbour algorithm (KNN) and back propagation neural network are used for evaluating the fitness of gene subsets during feature selection. Our analysis shows that an impressive performance is provided by the IG-BBO-KNN combination in different data sets with high accuracy (>90%) and low error rate.  相似文献   

14.
15.
This paper introduces a novel generic approach for classification problems with the objective of achieving maximum classification accuracy with minimum number of features selected. The method is illustrated with several case studies of gene expression data. Our approach integrates filter and wrapper gene selection methods with an added objective of selecting a small set of non-redundant genes that are most relevant for classification with the provision of bins for genes to be swapped in the search for their biological relevance. It is capable of selecting relatively few marker genes while giving comparable or better leave-one-out cross-validation accuracy when compared with gene ranking selection approaches. Additionally, gene profiles can be extracted from the evolving connectionist system, which provides a set of rules that can be further developed into expert systems. The approach uses an integration of Pearson correlation coefficient and signal-to-noise ratio methods with an adaptive evolving classifier applied through the leave-one-out method for validation. Datasets of gene expression from four case studies are used to illustrate the method. The results show the proposed approach leads to an improved feature selection process in terms of reducing the number of variables required and an increased in classification accuracy.  相似文献   

16.
Minimum redundancy feature selection from microarray gene expression data   总被引:7,自引:0,他引:7  
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.  相似文献   

17.
Microarrays have been useful in understanding various biological processes by allowing the simultaneous study of the expression of thousands of genes. However, the analysis of microarray data is a challenging task. One of the key problems in microarray analysis is the classification of unknown expression profiles. Specifically, the often large number of non-informative genes on the microarray adversely affects the performance and efficiency of classification algorithms. Furthermore, the skewed ratio of sample to variable poses a risk of overfitting. Thus, in this context, feature selection methods become crucial to select relevant genes and, hence, improve classification accuracy. In this study, we investigated feature selection methods based on gene expression profiles and protein interactions. We found that in our setup, the addition of protein interaction information did not contribute to any significant improvement of the classification results. Furthermore, we developed a novel feature selection method that relies exclusively on observed gene expression changes in microarray experiments, which we call “relative Signal-to-Noise ratio” (rSNR). More precisely, the rSNR ranks genes based on their specificity to an experimental condition, by comparing intrinsic variation, i.e. variation in gene expression within an experimental condition, with extrinsic variation, i.e. variation in gene expression across experimental conditions. Genes with low variation within an experimental condition of interest and high variation across experimental conditions are ranked higher, and help in improving classification accuracy. We compared different feature selection methods on two time-series microarray datasets and one static microarray dataset. We found that the rSNR performed generally better than the other methods.  相似文献   

18.
《Genomics》2020,112(1):114-126
Gene expression data are expected to make a great contribution in the producing of efficient cancer diagnosis and prognosis. Gene expression data are coded by large measured genes, and only of a few number of them carry precious information for different classes of samples. Recently, several researchers proposed gene selection methods based on metaheuristic algorithms for analysing and interpreting gene expression data. However, due to large number of selected genes with limited number of patient's samples and complex interaction between genes, many gene selection methods experienced challenges in order to approach the most relevant and reliable genes. Hence, in this paper, a hybrid filter/wrapper, called rMRMR-MBA is proposed for gene selection problem. In this method, robust Minimum Redundancy Maximum Relevancy (rMRMR) as filter to select the most promising genes and an modified bat algorithm (MBA) as search engine in wrapper approach is proposed to identify a small set of informative genes. The performance of the proposed method has been evaluated using ten gene expression datasets. For performance evaluation, MBA is evaluated by studying the convergence behaviour of MBA with and without TRIZ optimisation operators. For comparative evaluation, the results of the proposed rMRMR-MBA were compared against ten state-of-arts methods using the same datasets. The comparative study demonstrates that the proposed method produced better results in terms of classification accuracy and number of selected genes in two out of ten datasets and competitive results on the remaining datasets. In a nutshell, the proposed method is able to produce very promising results with high classification accuracy which can be considered a promising contribution for gene selection domain.  相似文献   

19.
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.  相似文献   

20.
Li L  Jiang W  Li X  Moser KL  Guo Z  Du L  Wang Q  Topol EJ  Wang Q  Rao S 《Genomics》2005,85(1):16-23
Development of a robust and efficient approach for extracting useful information from microarray data continues to be a significant and challenging task. Microarray data are characterized by a high dimension, high signal-to-noise ratio, and high correlations between genes, but with a relatively small sample size. Current methods for dimensional reduction can further be improved for the scenario of the presence of a single (or a few) high influential gene(s) in which its effect in the feature subset would prohibit inclusion of other important genes. We have formalized a robust gene selection approach based on a hybrid between genetic algorithm and support vector machine. The major goal of this hybridization was to exploit fully their respective merits (e.g., robustness to the size of solution space and capability of handling a very large dimension of feature genes) for identification of key feature genes (or molecular signatures) for a complex biological phenotype. We have applied the approach to the microarray data of diffuse large B cell lymphoma to demonstrate its behaviors and properties for mining the high-dimension data of genome-wide gene expression profiles. The resulting classifier(s) (the optimal gene subset(s)) has achieved the highest accuracy (99%) for prediction of independent microarray samples in comparisons with marginal filters and a hybrid between genetic algorithm and K nearest neighbors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号