共查询到20条相似文献,搜索用时 11 毫秒
1.
Genesis: cluster analysis of microarray data 总被引:26,自引:0,他引:26
2.
MOTIVATION: A variance stabilizing transformation for microarray data was recently introduced independently by several research groups. This transformation has sometimes been called the generalized logarithm or glog transformation. In this paper, we derive several alternative approximate variance stabilizing transformations that may be easier to use in some applications. RESULTS: We demonstrate that the started-log and the log-linear-hybrid transformation families can produce approximate variance stabilizing transformations for microarray data that are nearly as good as the generalized logarithm (glog) transformation. These transformations may be more convenient in some applications. 相似文献
3.
MOTIVATION: Standard statistical techniques often assume that data are normally distributed, with constant variance not depending on the mean of the data. Data that violate these assumptions can often be brought in line with the assumptions by application of a transformation. Gene-expression microarray data have a complicated error structure, with a variance that changes with the mean in a non-linear fashion. Log transformations, which are often applied to microarray data, can inflate the variance of observations near background. RESULTS: We introduce a transformation that stabilizes the variance of microarray data across the full range of expression. Simulation studies also suggest that this transformation approximately symmetrizes microarray data. 相似文献
4.
A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data 总被引:1,自引:0,他引:1
Teschendorff AE Wang Y Barbosa-Morais NL Brenton JD Caldas C 《Bioinformatics (Oxford, England)》2005,21(13):3025-3033
MOTIVATION: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. RESULTS: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure. 相似文献
5.
Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org. 相似文献
6.
7.
Hongying?Jiang Youping?Deng Huann-Sheng?Chen Lin?Tao Qiuying?Sha Jun?Chen Chung-Jui?Tsai Shuanglin?Zhang
Background
Due to the high cost and low reproducibility of many microarray experiments, it is not surprising to find a limited number of patient samples in each study, and very few common identified marker genes among different studies involving patients with the same disease. Therefore, it is of great interest and challenge to merge data sets from multiple studies to increase the sample size, which may in turn increase the power of statistical inferences. In this study, we combined two lung cancer studies using micorarray GeneChip®, employed two gene shaving methods and a two-step survival test to identify genes with expression patterns that can distinguish diseased from normal samples, and to indicate patient survival, respectively.Results
In addition to common data transformation and normalization procedures, we applied a distribution transformation method to integrate the two data sets. Gene shaving (GS) methods based on Random Forests (RF) and Fisher's Linear Discrimination (FLD) were then applied separately to the joint data set for cancer gene selection. The two methods discovered 13 and 10 marker genes (5 in common), respectively, with expression patterns differentiating diseased from normal samples. Among these marker genes, 8 and 7 were found to be cancer-related in other published reports. Furthermore, based on these marker genes, the classifiers we built from one data set predicted the other data set with more than 98% accuracy. Using the univariate Cox proportional hazard regression model, the expression patterns of 36 genes were found to be significantly correlated with patient survival (p < 0.05). Twenty-six of these 36 genes were reported as survival-related genes from the literature, including 7 known tumor-suppressor genes and 9 oncogenes. Additional principal component regression analysis further reduced the gene list from 36 to 16.Conclusion
This study provided a valuable method of integrating microarray data sets with different origins, and new methods of selecting a minimum number of marker genes to aid in cancer diagnosis. After careful data integration, the classification method developed from one data set can be applied to the other with high prediction accuracy.8.
Inoue M Nishimura S Hori G Nakahara H Saito M Yoshihara Y Amari S 《Journal of bioinformatics and computational biology》2004,2(4):669-679
A gene-expression microarray datum is modeled as an exponential expression signal (log-normal distribution) and additive noise. Variance-stabilizing transformation based on this model is useful for improving the uniformity of variance, which is often assumed for conventional statistical analysis methods. However, the existing method of estimating transformation parameters may not be perfect because of poor management of outliers. By employing an information normalization technique, we have developed an improved parameter estimation method, which enables statistically more straightforward outlier exclusion and works well even in the case of small sample size. Validation of this method with experimental data has suggested that it is superior to the conventional method. 相似文献
9.
The qualitative dimension of gene expression data and its heterogeneous nature in cancerous specimens can be accounted for by phylogenetic modeling that incorporates the directionality of altered gene expressions, complex patterns of expressions among a group of specimens, and data-based rather than specimen-based gene linkage. Our phylogenetic modeling approach is a double algorithmic technique that includes polarity assessment that brings out the qualitative value of the data, followed by maximum parsimony analysis that is most suitable for the data heterogeneity of cancer gene expression. We demonstrate that polarity assessment of expression values into derived and ancestral states, via outgroup comparison, reduces experimental noise; reveals dichotomously expressed asynchronous genes; and allows data pooling as well as comparability of intra- and interplatforms. Parsimony phylogenetic analysis of the polarized values produces a multidimensional classification of specimens into clades that reveal shared derived gene expressions (the synapomorphies); provides better assessment of ontogenic pathways and phyletic relatedness of specimens; efficiently utilizes dichotomously expressed genes; produces highly predictive class recognition; illustrates gene linkage and multiple developmental pathways; provides higher concordance between gene lists; and projects the direction of change among specimens. Further implication of this phylogenetic approach is that it may transform microarray into diagnostic, prognostic, and predictive tool. 相似文献
10.
The microarray technique has become a standard means in simultaneously examining expression of all genes measured in different circumstances. As microarray data are typically characterized by high dimensional features with a small number of samples, feature selection needs to be incorporated to identify a subset of genes that are meaningful for biological interpretation and accountable for the sample variation. In this article, we present a simple, yet effective feature selection framework suitable for two-dimensional microarray data. Our correlation-based, nonparametric approach allows compact representation of class-specific properties with a small number of genes. We evaluated our method using publicly available experimental data and obtained favorable results. 相似文献
11.
Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology 总被引:2,自引:0,他引:2
An XML-based Java application is described that provides a function-oriented overview of the results of cluster analysis of gene-expression microarray data based on Gene Ontology terms and associations. The application generates one HTML page with listings of the frequencies of explicit and implicit Gene Ontology annotations for each cluster, and separate, linked pages with listings of explicit annotations for each gene in a cluster. 相似文献
12.
Supervised cluster analysis for microarray data based on multivariate Gaussian mixture 总被引:7,自引:0,他引:7
MOTIVATION: Grouping genes having similar expression patterns is called gene clustering, which has been proved to be a useful tool for extracting underlying biological information of gene expression data. Many clustering procedures have shown success in microarray gene clustering; most of them belong to the family of heuristic clustering algorithms. Model-based algorithms are alternative clustering algorithms, which are based on the assumption that the whole set of microarray data is a finite mixture of a certain type of distributions with different parameters. Application of the model-based algorithms to unsupervised clustering has been reported. Here, for the first time, we demonstrated the use of the model-based algorithm in supervised clustering of microarray data. RESULTS: We applied the proposed methods to real gene expression data and simulated data. We showed that the supervised model-based algorithm is superior over the unsupervised method and the support vector machines (SVM) method. AVAILABILITY: The program written in the SAS language implementing methods I-III in this report is available upon request. The software of SVMs is available in the website http://svm.sdsc.edu/cgi-bin/nph-SVMsubmit.cgi 相似文献
13.
Background
Hepatocellular carcinoma (HCC) is a leading cause of death worldwide. Frequent cytogenetic abnormalities that occur in HCC suggest that tumor-modifying genes (oncogenes or tumor suppressors) may be driving selection for amplification or deletion of these particular genetic regions. In many cases, however, the gene(s) that drive the selection are unknown. Although techniques such as comparative genomic hybridization (CGH) have traditionally been used to identify cytogenetic aberrations, it might also be possible to identify them indirectly from gene-expression studies. A technique we have called comparative genomic microarray analysis (CGMA) predicts regions of cytogenetic change by searching for regional gene-expression biases. CGMA was applied to HCC gene-expression profiles to identify regions of frequent cytogenetic change and to identify genes whose expression is misregulated within these regions. 相似文献14.
Mario Huerta Juan Cedano Dario Peña Antonio Rodriguez Enrique Querol 《BMC bioinformatics》2009,10(1):138-8
Background
Microarray technology is so expensive and powerful that it is essential to extract maximum value from microarray data, specially from large-sample-series microarrays. Our web tools attempt to respond to these researchers' needs by facilitating the possibility to test and formulate from a hypothesis to entire models under a holistic point of view. 相似文献15.
16.
Culhane AC Perrière G Considine EC Cotter TG Higgins DG 《Bioinformatics (Oxford, England)》2002,18(12):1600-1608
MOTIVATION: Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS: We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data. 相似文献
17.
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems.
Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents
a program for clustering microarray data, which is based on the so-called path-distance. The algorithm gives in each step
a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene
or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition
of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. 相似文献
18.
Microarray technology is increasingly being applied in biological and medical research to address a wide range of problems. Cluster analysis has proven to be a very useful tool for investigating the structure of microarray data. This paper presents a program for clustering microarray data, which is based on the so call path-distance. The algorithm gives in each step a partition in two clusters and no prior assumptions on the structure of clusters are required. It assigns each object (gene or sample) to only one cluster and gives the global optimum for the function that quantifies the adequacy of a given partition of the sample into k clusters. The program was tested on experimental data sets, showing the robustness of the algorithm. 相似文献
19.
Computational analysis of microarray data 总被引:1,自引:0,他引:1
Quackenbush J 《Nature reviews. Genetics》2001,2(6):418-427
Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis. 相似文献
20.
Robust cluster analysis of microarray gene expression data with the number of clusters determined biologically 总被引:6,自引:0,他引:6
Bickel DR 《Bioinformatics (Oxford, England)》2003,19(7):818-824
MOTIVATION: The success of each method of cluster analysis depends on how well its underlying model describes the patterns of expression. Outlier-resistant and distribution-insensitive clustering of genes are robust against violations of model assumptions. RESULTS: A measure of dissimilarity that combines advantages of the Euclidean distance and the correlation coefficient is introduced. The measure can be made robust using a rank order correlation coefficient. A robust graphical method of summarizing the results of cluster analysis and a biological method of determining the number of clusters are also presented. These methods are applied to a public data set, showing that rank-based methods perform better than log-based methods. AVAILABILITY: Software is available from http://www.davidbickel.com. 相似文献