首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Various statistical and machine learning methods have been successfully applied to the classification of DNA microarray data. Simple instance-based classifiers such as nearest neighbor (NN) approaches perform remarkably well in comparison to more complex models, and are currently experiencing a renaissance in the analysis of data sets from biology and biotechnology. While binary classification of microarray data has been extensively investigated, studies involving multiclass data are rare. The question remains open whether there exists a significant difference in performance between NN approaches and more complex multiclass methods. Comparative studies in this field commonly assess different models based on their classification accuracy only; however, this approach lacks the rigor needed to draw reliable conclusions and is inadequate for testing the null hypothesis of equal performance. Comparing novel classification models to existing approaches requires focusing on the significance of differences in performance.  相似文献   

2.
Normalization of expression levels applied to microarray data can help in reducing measurement error. Different methods, including cyclic loess, quantile normalization and median or mean normalization, have been utilized to normalize microarray data. Although there is considerable literature regarding normalization techniques for mRNA microarray data, there are no publications comparing normalization techniques for microRNA (miRNA) microarray data, which are subject to similar sources of measurement error. In this paper, we compare the performance of cyclic loess, quantile normalization, median normalization and no normalization for a single-color microRNA microarray dataset. We show that the quantile normalization method works best in reducing differences in miRNA expression values for replicate tissue samples. By showing that the total mean squared error are lowest across almost all 36 investigated tissue samples, we are assured that the bias correction provided by quantile normalization is not outweighed by additional error variance that can arise from a more complex normalization method. Furthermore, we show that quantile normalization does not achieve these results by compression of scale.  相似文献   

3.
Regression approaches for microarray data analysis.   总被引:6,自引:0,他引:6  
A variety of new procedures have been devised to handle the two-sample comparison (e.g., tumor versus normal tissue) of gene expression values as measured with microarrays. Such new methods are required in part because of some defining characteristics of microarray-based studies: (i) the very large number of genes contributing expression measures which far exceeds the number of samples (observations) available and (ii) the fact that by virtue of pathway/network relationships, the gene expression measures tend to be highly correlated. These concerns are exacerbated in the regression setting, where the objective is to relate gene expression, simultaneously for multiple genes, to some external outcome or phenotype. Correspondingly, several methods have been recently proposed for addressing these issues. We briefly critique some of these methods prior to a detailed evaluation of gene harvesting. This reveals that gene harvesting, without additional constraints, can yield artifactual solutions. Results obtained employing such constraints motivate the use of regularized regression procedures such as the lasso, least angle regression, and support vector machines. Model selection and solution multiplicity issues are also discussed. The methods are evaluated using a microarray-based study of cardiomyopathy in transgenic mice.  相似文献   

4.
Multivariate exploratory tools for microarray data analysis   总被引:2,自引:0,他引:2  
The ultimate success of microarray technology in basic and applied biological sciences depends critically on the development of statistical methods for gene expression data analysis. The most widely used tests for differential expression of genes are essentially univariate. Such tests disregard the multidimensional structure of microarray data. Multivariate methods are needed to utilize the information hidden in gene interactions and hence to provide more powerful and biologically meaningful methods for finding subsets of differentially expressed genes. The objective of this paper is to develop methods of multidimensional search for biologically significant genes, considering expression signals as mutually dependent random variables. To attain these ends, we consider the utility of a pertinent distance between random vectors and its empirical counterpart constructed from gene expression data. The distance furnishes exploratory procedures aimed at finding a target subset of differentially expressed genes. To determine the size of the target subset, we resort to successive elimination of smaller subsets resulting from each step of a random search algorithm based on maximization of the proposed distance. Different stopping rules associated with this procedure are evaluated. The usefulness of the proposed approach is illustrated with an application to the analysis of two sets of gene expression data.  相似文献   

5.
Using ANOVA to analyze microarray data   总被引:6,自引:0,他引:6  
Churchill GA 《BioTechniques》2004,37(2):173-5, 177
ANOVA provides a general approach to the analysis of single and multiple factor experiments on both one- and two-color microarray platforms. Mixed model ANOVA is important because in many microarray experiments there are multiple sources of variation that must be taken into consideration when constructing tests for differential expression of a gene. The genome is large, and the signals of expression change can be small, so we must rely on rigorous statistical methods to distinguish signal from noise. We apply statistical tests to ensure that we are not just making up stories based on seeing patterns where there may be none.  相似文献   

6.
Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.  相似文献   

7.
MOTIVATION: Most supervised classification methods are limited by the requirement for more cases than variables. In microarray data the number of variables (genes) far exceeds the number of cases (arrays), and thus filtering and pre-selection of genes is required. We describe the application of Between Group Analysis (BGA) to the analysis of microarray data. A feature of BGA is that it can be used when the number of variables (genes) exceeds the number of cases (arrays). BGA is based on carrying out an ordination of groups of samples, using a standard method such as Correspondence Analysis (COA), rather than an ordination of the individual microarray samples. As such, it can be viewed as a method of carrying out COA with grouped data. RESULTS: We illustrate the power of the method using two cancer data sets. In both cases, we can quickly and accurately classify test samples from any number of specified a priori groups and identify the genes which characterize these groups. We obtained very high rates of correct classification, as determined by jack-knife or validation experiments with training and test sets. The results are comparable to those from other methods in terms of accuracy but the power and flexibility of BGA make it an especially attractive method for the analysis of microarray cancer data.  相似文献   

8.
Mayday is a workbench for visualization, analysis and storage of microarray data. It features a graphical user interface and supports the development and integration of existing and new analysis methods. Besides the infrastructural core functionality, Mayday offers a variety of plug-ins, such as various interactive viewers, a connection to the R statistical environment, a connection to SQL-based databases and different data mining methods, including WEKA-library based methods for classification and various clustering methods. In addition, so-called meta information objects are provided for annotation of the microarray data allowing integration of data from different sources, which is a feature that, for instance, is employed in the enhanced heatmap visualization. Supplementary information: The software and more detailed information including screenshots and a user guide as well as test data can be found on the Mayday home page http://www.zbit.uni-tuebingen.de/pas/mayday. The core is published under the GPL (GNU Public License) and the associated plug-ins under the LGPL (Lesser GNU Public License).  相似文献   

9.
Computational analysis of microarray data   总被引:1,自引:0,他引:1  
Microarray experiments are providing unprecedented quantities of genome-wide data on gene-expression patterns. Although this technique has been enthusiastically developed and applied in many biological contexts, the management and analysis of the millions of data points that result from these experiments has received less attention. Sophisticated computational tools are available, but the methods that are used to analyse the data can have a profound influence on the interpretation of the results. A basic understanding of these computational tools is therefore required for optimal experimental design and meaningful data analysis.  相似文献   

10.
11.
Journal of Mathematical Biology - The 3D microarrays, generally known as gene-sample-time microarrays, couple the information on different time points collected by 2D microarrays that measure gene...  相似文献   

12.
Microarray data contains a large number of genes (usually more than 1000) and a relatively small number of samples (usually fewer than 100). This presents problems to discriminant analysis of microarray data. One way to alleviate the problem is to reduce dimensionality of data by selecting important genes to the discriminant problem. Gene selection can be cast as a feature selection problem in the context of pattern classification. Feature selection approaches are broadly grouped into filter methods and wrapper methods. The wrapper method outperforms the filter method but at the cost of more intensive computation. In the present study, we proposed a wrapper-like gene selection algorithm based on the Regularization Network. Compared with classical wrapper method, the computational costs in our gene selection algorithm is significantly reduced, because the evaluation criterion we proposed does not demand repeated training in the leave-one-out procedure.  相似文献   

13.
MGraph: graphical models for microarray data analysis   总被引:2,自引:0,他引:2  
  相似文献   

14.

Background  

Genome-wide expression signatures are emerging as potential marker for overall survival and disease recurrence risk as evidenced by recent commercialization of gene expression based biomarkers in breast cancer. Similar predictions have recently been carried out using genome-wide copy number alterations and microRNAs. Existing software packages for microarray data analysis provide functions to define expression-based survival gene signatures. However, there is no software that can perform survival analysis using SNP array data or draw survival curves interactively for expression-based sample clusters.  相似文献   

15.
16.
This paper will give a complete methodological approach to the processing of oligonucleotide microarray data from postmortem tissue, particularly brain matter. Attention will be drawn to each of the important stages in the process; specifically the quality control, gene expression value calculation, multiple hypothesis testing and correlation analyses. We shall initially discuss the theoretical foundations of each individual method and subsequently apply the ensemble to a sample data set to illustrate and visualise important points.  相似文献   

17.
We consider identifying differentially expressing genes between two patient groups using microarray experiment. We propose a sample size calculation method for a specified number of true rejections while controlling the false discovery rate at a desired level. Input parameters for the sample size calculation include the allocation proportion in each group, the number of genes in each array, the number of differentially expressing genes and the effect sizes among the differentially expressing genes. We have a closed-form sample size formula if the projected effect sizes are equal among differentially expressing genes. Otherwise, our method requires a numerical method to solve an equation. Simulation studies are conducted to show that the calculated sample sizes are accurate in practical settings. The proposed method is demonstrated with a real study.  相似文献   

18.
Data analysis and management represent a major challenge for gene expression studies using microarrays. Here, we compare different methods of analysis and demonstrate the utility of a personal microarray database. Gene expression during HIV infection of cell lines was studied using Affymetrix U-133 A and B chips. The data were analyzed using Affymetrix Microarray Suite and Data Mining Tool, Silicon Genetics GeneSpring, and dChip from Harvard School of Public Health. A small-scale database was established with FileMaker Pro Developer to manage and analyze the data. There was great variability among the programs in the lists of significantly changed genes constructed from the same data. Similarly choices of different parameters for normalization, comparison, and standardization greatly affected the outcome. As many probe sets on the U133 chip target the same Unigene clusters, the Unigene information can be used as an internal control to confirm and interpret the probe set results. Algorithms used for the determination of changes in gene expression require further refinement and standardization. The use of a personal database powered with Unigene information can enhance the analysis of gene expression data.  相似文献   

19.

Background  

DNA microarrays, which determine the expression levels of tens of thousands of genes from a sample, are an important research tool. However, the volume of data they produce can be an obstacle to interpretation of the results. Clustering the genes on the basis of similarity of their expression profiles can simplify the data, and potentially provides an important source of biological inference, but these methods have not been tested systematically on datasets from complex human tissues. In this paper, four clustering methods, CRC, k-means, ISA and memISA, are used upon three brain expression datasets. The results are compared on speed, gene coverage and GO enrichment. The effects of combining the clusters produced by each method are also assessed.  相似文献   

20.
Microarray technology is associated with many sources of experimentaluncertainty. In this review we discuss a number of approachesfor dealing with this uncertainty in the processing of datafrom microarray experiments. We focus here on the analysis ofhigh-density oligonucleotide arrays, such as the popular AffymetrixGeneChip® array, which contain multiple probes for eachtarget. This set of probes can be used to determine an estimatefor the target concentration and can also be used to determinethe experimental uncertainty associated with this measurement.This measurement uncertainty can then be propagated throughthe downstream analysis using probabilistic methods. We giveexamples showing how these credibility intervals can be usedto help identify differential expression, to combine informationfrom replicated experiments and to improve the performance ofprincipal component analysis.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号