共查询到20条相似文献,搜索用时 0 毫秒
1.
Identifying differentially expressed genes from microarray experiments via statistic synthesis 总被引:1,自引:0,他引:1
MOTIVATION: A common objective of microarray experiments is the detection of differential gene expression between samples obtained under different conditions. The task of identifying differentially expressed genes consists of two aspects: ranking and selection. Numerous statistics have been proposed to rank genes in order of evidence for differential expression. However, no one statistic is universally optimal and there is seldom any basis or guidance that can direct toward a particular statistic of choice. RESULTS: Our new approach, which addresses both ranking and selection of differentially expressed genes, integrates differing statistics via a distance synthesis scheme. Using a set of (Affymetrix) spike-in datasets, in which differentially expressed genes are known, we demonstrate that our method compares favorably with the best individual statistics, while achieving robustness properties lacked by the individual statistics. We further evaluate performance on one other microarray study. 相似文献
2.
3.
Zhang C Lu X Zhang X 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2006,3(3):312-320
Many methods for classification and gene selection with microarray data have been developed. These methods usually give a ranking of genes. Evaluating the statistical significance of the gene ranking is important for understanding the results and for further biological investigations, but this question has not been well addressed for machine learning methods in existing works. Here, we address this problem by formulating it in the framework of hypothesis testing and propose a solution based on resampling. The proposed r-test methods convert gene ranking results into position p-values to evaluate the significance of genes. The methods are tested on three real microarray data sets and three simulation data sets with support vector machines as the method of classification and gene selection. The obtained position p-values help to determine the number of genes to be selected and enable scientists to analyze selection results by sophisticated multivariate methods under the same statistical inference paradigm as for simple hypothesis testing methods. 相似文献
4.
MOTIVATION: An important application of microarray experiments is to identify differentially expressed genes. Because microarray data are often not distributed according to a normal distribution nonparametric methods were suggested for their statistical analysis. Here, the Baumgartner-Weiss-Schindler test, a novel and powerful test based on ranks, is investigated and compared with the parametric t-test as well as with two other nonparametric tests (Wilcoxon rank sum test, Fisher-Pitman permutation test) recently recommended for the analysis of gene expression data. RESULTS: Simulation studies show that an exact permutation test based on the Baumgartner-Weiss-Schindler statistic B is preferable to the other three tests. It is less conservative than the Wilcoxon test and more powerful, in particular in case of asymmetric or heavily tailed distributions. When the underlying distribution is symmetric the differences in power between the tests are relatively small. Thus, the Baumgartner-Weiss-Schindler is recommended for the usual situation that the underlying distribution is a priori unknown. AVAILABILITY: SAS code available on request from the authors. 相似文献
5.
Statistical methods for ranking differentially expressed genes 总被引:2,自引:1,他引:2
Broberg P 《Genome biology》2003,4(6):R41
In the analysis of microarray data the identification of differential expression is paramount. Here I outline a method for finding an optimal test statistic with which to rank genes with respect to differential expression. Tests of the method show that it allows generation of top gene lists that give few false positives and few false negatives. Estimation of the false-negative as well as the false-positive rate lies at the heart of the method. 相似文献
6.
Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic. 相似文献
7.
Empirical evaluation of data transformations and ranking statistics for microarray analysis 总被引:4,自引:1,他引:4
Qin LX Kerr KF;Contributing Members of the Toxicogenomics Research Consortium 《Nucleic acids research》2004,32(18):5471-5479
There are many options in handling microarray data that can affect study conclusions, sometimes drastically. Working with a two-color platform, this study uses ten spike-in microarray experiments to evaluate the relative effectiveness of some of these options for the experimental goal of detecting differential expression. We consider two data transformations, background subtraction and intensity normalization, as well as six different statistics for detecting differentially expressed genes. Findings support the use of an intensity-based normalization procedure and also indicate that local background subtraction can be detrimental for effectively detecting differential expression. We also verify that robust statistics outperform t-statistics in identifying differentially expressed genes when there are few replicates. Finally, we find that choice of image analysis software can also substantially influence experimental conclusions. 相似文献
8.
9.
PURPOSE OF REVIEW: To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. RECENT FINDINGS: The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. SUMMARY: There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure. 相似文献
10.
11.
12.
Sample size determination for case-control studies of chronic disease are often based on the simple 2 X 2 tabular cross-classification of exposure and disease, thereby ignoring stratification which may be considered in the analysis. One consequence of this approach is that the sample size may be inadequate to attain a specified power and size when performing a statistical analysis on J 2 X 2 tables using Cochran's (1954, Biometrics 10, 417-451) statistic or the Mantel-Haenszel (1959, Journal of the National Cancer Institute 22, 719-748) statistic. A sample size formula is derived from Cochran's statistic and it is compared with the corresponding one derived when the data are treated as unstratified, and also with two other formulas proposed for stratified data analysis. The formula developed yields values slightly higher than one recently proposed by Mu?oz and Rosner (1984, Biometrics 40, 995-1004), which assumes that both margins of each 2 X 2 table are fixed, while the present study considers only the case-control margin to be fixed. 相似文献
13.
14.
Mugdha Gadgil 《BMC bioinformatics》2008,9(1):380
Background
DNA microarrays are used to investigate differences in gene expression between two or more classes of samples. Most currently used approaches compare mean expression levels between classes and are not geared to find genes whose expression is significantly different in only a subset of samples in a class. However, biological variability can lead to situations where key genes are differentially expressed in only a subset of samples. To facilitate the identification of such genes, a new method is reported. 相似文献15.
Background
Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) as a mechanism underlying tumorigenesis. Using microarrays and other technologies, tumor CNA are detected by comparing tumor sample CN to normal reference sample CN. While advances in microarray technology have improved detection of copy number alterations, the increase in the number of measured signals, noise from array probes, variations in signal-to-noise ratio across batches and disparity across laboratories leads to significant limitations for the accurate identification of CNA regions when comparing tumor and normal samples.Methods
To address these limitations, we designed a novel "Virtual Normal" algorithm (VN), which allowed for construction of an unbiased reference signal directly from test samples within an experiment using any publicly available normal reference set as a baseline thus eliminating the need for an in-lab normal reference set.Results
The algorithm was tested using an optimal, paired tumor/normal data set as well as previously uncharacterized pediatric malignant gliomas for which a normal reference set was not available. Using Affymetrix 250K Sty microarrays, we demonstrated improved signal-to-noise ratio and detected significant copy number alterations using the VN algorithm that were validated by independent PCR analysis of the target CNA regions.Conclusions
We developed and validated an algorithm to provide a virtual normal reference signal directly from tumor samples and minimize noise in the derivation of the raw CN signal. The algorithm reduces the variability of assays performed across different reagent and array batches, methods of sample preservation, multiple personnel, and among different laboratories. This approach may be valuable when matched normal samples are unavailable or the paired normal specimens have been subjected to variations in methods of preservation. 相似文献16.
Background
In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data. 相似文献17.
Background
We describe the E-RFE method for gene ranking, which is useful for the identification of markers in the predictive classification of array data. The method supports a practical modeling scheme designed to avoid the construction of classification rules based on the selection of too small gene subsets (an effect known as the selection bias, in which the estimated predictive errors are too optimistic due to testing on samples already considered in the feature selection process).Results
With E-RFE, we speed up the recursive feature elimination (RFE) with SVM classifiers by eliminating chunks of uninteresting genes using an entropy measure of the SVM weights distribution. An optimal subset of genes is selected according to a two-strata model evaluation procedure: modeling is replicated by an external stratified-partition resampling scheme, and, within each run, an internal K-fold cross-validation is used for E-RFE ranking. Also, the optimal number of genes can be estimated according to the saturation of Zipf's law profiles.Conclusions
Without a decrease of classification accuracy, E-RFE allows a speed-up factor of 100 with respect to standard RFE, while improving on alternative parametric RFE reduction strategies. Thus, a process for gene selection and error estimation is made practical, ensuring control of the selection bias, and providing additional diagnostic indicators of gene importance.18.
MOTIVATION: Microarray technology emerges as a powerful tool in life science. One major application of microarray technology is to identify differentially expressed genes under various conditions. Currently, the statistical methods to analyze microarray data are generally unsatisfactory, mainly due to the lack of understanding of the distribution and error structure of microarray data. RESULTS: We develop a generalized likelihood ratio (GLR) test based on the two-component model proposed by Rocke and Durbin to identify differentially expressed genes from microarray data. Simulation studies show that the GLR test is more powerful than commonly used methods, like the fold-change method and the two-sample t-test. When applied to microarray data, the GLR test identifies more differentially expressed genes than the t-test, has a lower false discovery rate and shows more consistency over independently repeated experiments. AVAILABILITY: The approach is implemented in software called GLR, which is freely available for downloading at http://www.cc.utah.edu/~jw27c60 相似文献
19.
Improving identification of differentially expressed genes in microarray studies using information from public databases
下载免费PDF全文

We demonstrate that the process of identifying differentially expressed genes in microarray studies with small sample sizes can be substantially improved by extracting information from a large number of datasets accumulated in public databases. The improvement comes from more reliable estimates of gene-specific variances based on other datasets. For a two-group comparison with two arrays in each group, for example, the result of our method was comparable to that of a t-test analysis with five samples in each group or to that of a regularized t-test analysis with three samples in each group. Our results are further improved by weighting the results of our approach with the regularized t-test results in a hybrid method. 相似文献
20.
McNemar's (1947, Psychometrika 12, 153-157) test of marginal homogeneity is generalized to a two-sample situation where the hypothesis of interest is that the marginal changes in each of two independently sampled tables are equal. This situation is especially applicable to two cohorts (a control and an intervention cohort), each measured at baseline and after the intervention on a binary outcome variable. Some assumptions often realistic in this situation simplify the calculation of sample size. The calculation of sample size in a study designed to increase utilization of breast cancer screening is demonstrated. 相似文献