期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Finding edging genes from microarray data 总被引：1，自引：0，他引：1

An J Chen YP 《Journal of biotechnology》2008,135(3):233-240

MOTIVATION: A set of genes and their gene expression levels are used to classify disease and normal tissues. Due to the massive number of genes in microarray, there are a large number of edges to divide different classes of genes in microarray space. The edging genes (EGs) can be co-regulated genes, they can also be on the same pathway or deregulated by the same non-coding genes, such as siRNA or miRNA. Every gene in EGs is vital for identifying a tissue's class. The changing in one EG's gene expression may cause a tissue alteration from normal to disease and vice versa. Finding EGs is of biological importance. In this work, we propose an algorithm to effectively find these EGs. RESULT: We tested our algorithm with five microarray datasets. The results are compared with the border-based algorithm which was used to find gene groups and subsequently divide different classes of tissues. Our algorithm finds a significantly larger amount of EGs than does the border-based algorithm. As our algorithm prunes irrelevant patterns at earlier stages, time and space complexities are much less prevalent than in the border-based algorithm. AVAILABILITY: The algorithm proposed is implemented in C++ on Linux platform. The EGs in five microarray datasets are calculated. The preprocessed datasets and the discovered EGs are available at http://www3.it.deakin.edu.au/~phoebe/microarray.html. 相似文献

2.

从microarray时序表达数据识别周期表达基因

周到何东周艳红《生物信息学》2008,6(2):68-70

根据周期表达基因的周期性和峰值特点,提出了一种将microarray时序表达数据划分为若干个基因表达周期,并对周期内的峰值特点进行评估以识别周期表达基因的方法,能有效减小microarray实验时的噪声干扰。选取了三组广泛使用的时序表达数据和一组可靠的周期表达基因集合对该方法的效果进行了测试,并与三种典型的周期表达基因识别方法的效果进行了比较。该方法能有效地从各种microarray时序表达数据中识别周期表达基因。相似文献

3.

Large-scale integration of cancer microarray data identifies a robust common cancer signature

Lei Xu Donald Geman Raimond L Winslow 《BMC bioinformatics》2007,8(1):275

Background

There is a continuing need to develop molecular diagnostic tools which complement histopathologic examination to increase the accuracy of cancer diagnosis. DNA microarrays provide a means for measuring gene expression signatures which can then be used as components of genomic-based diagnostic tests to determine the presence of cancer. 相似文献

4.

Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes

Hongying?Jiang Youping?Deng Huann-Sheng?Chen Lin?Tao Qiuying?Sha Jun?Chen Chung-Jui?Tsai Shuanglin?Zhang Email author 《BMC bioinformatics》2004,5(1):81

Background

Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip^®, employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively.

Results

In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16.

Conclusion

This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy.

相似文献

5.

拟南芥基因芯片数据中非生物胁迫相关基因的挖掘

束永俊李勇柏锡才华纪巍朱延明《生物信息学》2009,7(3):168-170,177

利用方差分析法从拟南芥芯片表达谱数据库挖掘非生物胁迫相关基因,并对这些基因进行GO注释分析,从而揭示非生物胁迫的生物学意义,发现非生物胁迫主要影响植物基因表达过程的转录调节和信号转导过程的磷酸化。同时对这些基因的上游启动子区域序列进行分析,挖掘非生物胁迫反应调控过程和适应过程的转录因子,发现植物非生物胁迫过程主要受bHLH—ZIP类和ZN—FINGER类C2H2型转录因子的调节。相似文献

6.

DNA microarray analysis reveals metastasis-associated genes in rat prostate cancer cell lines

Reyes I Tiwari R Geliebter J Reyes N 《Biomédica : revista del Instituto Nacional de Salud》2007,27(2):190-203

相似文献

7.

MGFM: a novel tool for detection of tissue and cell specific marker genes from microarray gene expression data

Khadija El Amrani Harald Stachelscheid Fritz Lekschas Andreas Kurtz Miguel A. Andrade-Navarro 《BMC genomics》2015,16(1)

相似文献

8.

A network flow approach to predict drug targets from microarray data,disease genes and interactome network - case study on prostate cancer

Shih-Heng Yeh Hsiang-Yuan Yeh Von-Wun Soo 《Journal of clinical bioinformatics》2012,2(1):1

Background

Systematic approach for drug discovery is an emerging discipline in systems biology research area. It aims at integrating interaction data and experimental data to elucidate diseases and also raises new issues in drug discovery for cancer treatment. However, drug target discovery is still at a trial-and-error experimental stage and it is a challenging task to develop a prediction model that can systematically detect possible drug targets to deal with complex diseases.

Methods

We integrate gene expression, disease genes and interaction networks to identify the effective drug targets which have a strong influence on disease genes using network flow approach. In the experiments, we adopt the microarray dataset containing 62 prostate cancer samples and 41 normal samples, 108 known prostate cancer genes and 322 approved drug targets treated in human extracted from DrugBank database to be candidate proteins as our test data. Using our method, we prioritize the candidate proteins and validate them to the known prostate cancer drug targets.

Results

We successfully identify potential drug targets which are strongly related to the well known drugs for prostate cancer treatment and also discover more potential drug targets which raise the attention to biologists at present. We denote that it is hard to discover drug targets based only on differential expression changes due to the fact that those genes used to be drug targets may not always have significant expression changes. Comparing to previous methods that depend on the network topology attributes, they turn out that the genes having potential as drug targets are weakly correlated to critical points in a network. In comparison with previous methods, our results have highest mean average precision and also rank the position of the truly drug targets higher. It thereby verifies the effectiveness of our method.

Conclusions

Our method does not know the real ideal routes in the disease network but it tries to find the feasible flow to give a strong influence to the disease genes through possible paths. We successfully formulate the identification of drug target prediction as a maximum flow problem on biological networks and discover potential drug targets in an accurate manner.

相似文献

9.

Exon level integration of proteomics and microarray data

Danny A Bitton Michał J Okoniewski Yvonne Connolly Crispin J Miller 《BMC bioinformatics》2008,9(1):118

Background

Previous studies comparing quantitative proteomics and microarray data have generally found poor correspondence between the two. We hypothesised that this might in part be because the different assays were targeting different parts of the expressed genome and might therefore be subjected to confounding effects from processes such as alternative splicing. 相似文献

10.

Zipf's law in importance of genes for cancer classification using microarray data

Li W Yang Y 《Journal of theoretical biology》2002,219(4):539-551

Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood. 相似文献

11.

Robust imputation method for missing values in microarray data

Yoon D Lee EK Park T 《BMC bioinformatics》2007,8(Z2):S6

相似文献

12.

Linking the genes: inferring quantitative gene networks from microarray data

de la Fuente A Brazhnik P Mendes P 《Trends in genetics : TIG》2002,18(8):395-398

Modern microarray technology is capable of providing data about the expression of thousands of genes, and even of whole genomes. An important question is how this technology can be used most effectively to unravel the workings of cellular machinery. Here, we propose a method to infer genetic networks on the basis of data from appropriately designed microarray experiments. In addition to identifying the genes that affect a specific other gene directly, this method also estimates the strength of such effects. We will discuss both the experimental setup and the theoretical background. 相似文献

13.

Fuzzy J-Means and VNS methods for clustering genes from microarray data 总被引：4，自引：0，他引：4

Belacel N Cuperlović-Culf M Laflamme M Ouellette R 《Bioinformatics (Oxford, England)》2004,20(11):1690-1701

MOTIVATION: In the interpretation of gene expression data from a group of microarray experiments that include samples from either different patients or conditions, special consideration must be given to the pleiotropic and epistatic roles of genes, as observed in the variation of gene coexpression patterns. Crisp clustering methods assign each gene to one cluster, thereby omitting information about the multiple roles of genes. RESULTS: Here, we present the application of a local search heuristic, Fuzzy J-Means, embedded into the variable neighborhood search metaheuristic for the clustering of microarray gene expression data. We show that for all the datasets studied this algorithm outperforms the standard Fuzzy C-Means heuristic. Different methods for the utilization of cluster membership information in determining gene coregulation are presented. The clustering and data analyses were performed on simulated datasets as well as experimental cDNA microarray data for breast cancer and human blood from the Stanford Microarray Database. AVAILABILITY: The source code of the clustering software (C programming language) is freely available from Nabil.Belacel@nrc-cnrc.gc.ca 相似文献

14.

Rapid divergence in expression between duplicate genes inferred from microarray data 总被引：15，自引：0，他引：15

Gu Z Nicolae D Lu HH Li WH 《Trends in genetics : TIG》2002,18(12):94-613

For more than 30 years, expression divergence has been considered as a major reason for retaining duplicated genes in a genome, but how often and how fast duplicate genes diverge in expression has not been studied at the genomic level. Using yeast microarray data, we show that expression divergence between duplicate genes is significantly correlated with their synonymous divergence (K_S) and also with their nonsynonymous divergence (K_A) if K_A ≤ 0.3. Thus, expression divergence increases with evolutionary time, and K_A is initially coupled with expression divergence. More interestingly, a large proportion of duplicate genes have diverged quickly in expression and the vast majority of gene pairs eventually become divergent in expression. Indeed, more than 40% of gene pairs show expression divergence even when K_S is ≤ 0.10, and this proportion becomes >80% for K_S > 1.5. Only a small fraction of ancient gene pairs do not show expression divergence. 相似文献

15.

CIT: identification of differentially expressed clusters of genes from microarray data 总被引：3，自引：0，他引：3

Rhodes DR Miller JC Haab BB Furge KA 《Bioinformatics (Oxford, England)》2002,18(1):205-206

Cluster Identification Tool (CIT) is a microarray analysis program that identifies differentially expressed genes. Following division of experimental samples based on a parameter of interest, CIT uses a statistical discrimination metric and permutation analysis to identify clusters of genes or individual genes that best differentiate between the experimental groups. CIT integrates with the freely available CLUSTER and TREEVIEW programs to form a more complete microarray analysis package. 相似文献

16.

Density based pruning for identification of differentially expressed genes from microarray data

Hu J Xu J 《BMC genomics》2010,11(Z2):S3

Motivation

Identification of differentially expressed genes from microarray datasets is one of the most important analyses for microarray data mining. Popular algorithms such as statistical t-test rank genes based on a single statistics. The false positive rate of these methods can be improved by considering other features of differentially expressed genes.

Results

We proposed a pattern recognition strategy for identifying differentially expressed genes. Genes are mapped to a two dimension feature space composed of average difference of gene expression and average expression levels. A density based pruning algorithm (DB Pruning) is developed to screen out potential differentially expressed genes usually located in the sparse boundary region. Biases of popular algorithms for identifying differentially expressed genes are visually characterized. Experiments on 17 datasets from Gene Omnibus Database (GEO) with experimentally verified differentially expressed genes showed that DB pruning can significantly improve the prediction accuracy of popular identification algorithms such as t-test, rank product, and fold change.

Conclusions

Density based pruning of non-differentially expressed genes is an effective method for enhancing statistical testing based algorithms for identifying differentially expressed genes. It improves t-test, rank product, and fold change by 11% to 50% in the numbers of identified true differentially expressed genes. The source code of DB pruning is freely available on our website http://mleg.cse.sc.edu/degprune

相似文献

17.

Robust feature selection for microarray data based on multicriterion fusion 总被引：1，自引：0，他引：1

Yang F Mao KZ 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2011,8(4):1080-1092

Feature selection often aims to select a compact feature subset to build a pattern classifier with reduced complexity, so as to achieve improved classification performance. From the perspective of pattern analysis, producing stable or robust solution is also a desired property of a feature selection algorithm. However, the issue of robustness is often overlooked in feature selection. In this study, we analyze the robustness issue existing in feature selection for high-dimensional and small-sized gene-expression data, and propose to improve robustness of feature selection algorithm by using multiple feature selection evaluation criteria. Based on this idea, a multicriterion fusion-based recursive feature elimination (MCF-RFE) algorithm is developed with the goal of improving both classification performance and stability of feature selection results. Experimental studies on five gene-expression data sets show that the MCF-RFE algorithm outperforms the commonly used benchmark feature selection algorithm SVM-RFE. 相似文献

18.

HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data 总被引：4，自引：0，他引：4

Wang Y Makedon FS Ford JC Pearlman J 《Bioinformatics (Oxford, England)》2005,21(8):1530-1537

MOTIVATION: Recent studies have shown that microarray gene expression data are useful for phenotype classification of many diseases. A major problem in this classification is that the number of features (genes) greatly exceeds the number of instances (tissue samples). It has been shown that selecting a small set of informative genes can lead to improved classification accuracy. Many approaches have been proposed for this gene selection problem. Most of the previous gene ranking methods typically select 50-200 top-ranked genes and these genes are often highly correlated. Our goal is to select a small set of non-redundant marker genes that are most relevant for the classification task. RESULTS: To achieve this goal, we developed a novel hybrid approach that combines gene ranking and clustering analysis. In this approach, we first applied feature filtering algorithms to select a set of top-ranked genes, and then applied hierarchical clustering on these genes to generate a dendrogram. Finally, the dendrogram was analyzed by a sweep-line algorithm and marker genes are selected by collapsing dense clusters. Empirical study using three public datasets shows that our approach is capable of selecting relatively few marker genes while offering the same or better leave-one-out cross-validation accuracy compared with approaches that use top-ranked genes directly for classification. AVAILABILITY: The HykGene software is freely available at http://www.cs.dartmouth.edu/~wyh/software.htm CONTACT: wyh@cs.dartmouth.edu SUPPLEMENTARY INFORMATION: Supplementary material is available from http://www.cs.dartmouth.edu/~wyh/hykgene/supplement/index.htm. 相似文献

19.

Robust classification modeling on microarray data using misclassification penalized posterior

Soukup M Cho H Lee JK 《Bioinformatics (Oxford, England)》2005,21(Z1):i423-i430

相似文献

20.

Testing for differentially expressed genes with microarray data

下载免费PDF全文

Tsai CA Chen YJ Chen JJ 《Nucleic acids research》2003,31(9):e52

This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests. 相似文献