首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
MOTIVATION: We present statistical methods for determining the number of per gene replicate spots required in microarray experiments. The purpose of these methods is to obtain an estimate of the sampling variability present in microarray data, and to determine the number of replicate spots required to achieve a high probability of detecting a significant fold change in gene expression, while maintaining a low error rate. Our approach is based on data from control microarrays, and involves the use of standard statistical estimation techniques. RESULTS: After analyzing two experimental data sets containing control array data, we were able to determine the statistical power available for the detection of significant differential expression given differing levels of replication. The inclusion of replicate spots on microarrays not only allows more accurate estimation of the variability present in an experiment, but more importantly increases the probability of detecting genes undergoing significant fold changes in expression, while substantially decreasing the probability of observing fold changes due to chance rather than true differential expression.  相似文献   

2.
Little consideration has been given to the effect of different segmentation methods on the variability of data derived from microarray images. Previous work has suggested that the significant source of variability from microarray image analysis is from estimation of local background. In this study, we used Analysis of Variance (ANOVA) models to investigate the effect of methods of segmentation on the precision of measurements obtained from replicate microarray experiments. We used four different methods of spot segmentation (adaptive, fixed circle, histogram and GenePix) to analyse a total number of 156 172 spots from 12 microarray experiments. Using a two-way ANOVA model and the coefficient of repeatability, we show that the method of segmentation significantly affects the precision of the microarray data. The histogram method gave the lowest variability across replicate spots compared to other methods, and had the lowest pixel-to-pixel variability within spots. This effect on precision was independent of background subtraction. We show that these findings have direct, practical implications as the variability in precision between the four methods resulted in different numbers of genes being identified as differentially expressed. Segmentation method is an important source of variability in microarray data that directly affects precision and the identification of differentially expressed genes.  相似文献   

3.
MOTIVATION: Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. The usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This results in the loss of valuable information about genewise variability. RESULTS: A method is proposed for extracting more information from within-array replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the between-replicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spiked-in RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spike-in experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. AVAILABILITY: The methodology is implemented in the limma software package for R, available from the CRAN repository http://www.r-project.org  相似文献   

4.
5.
We present Bayesian hierarchical models for the analysis of Affymetrix GeneChip data. The approach we take differs from other available approaches in two fundamental aspects. Firstly, we aim to integrate all processing steps of the raw data in a common statistically coherent framework, allowing all components and thus associated errors to be considered simultaneously. Secondly, inference is based on the full posterior distribution of gene expression indices and derived quantities, such as fold changes or ranks, rather than on single point estimates. Measures of uncertainty on these quantities are thus available. The models presented represent the first building block for integrated Bayesian Analysis of Affymetrix GeneChip data: the models take into account additive as well as multiplicative error, gene expression levels are estimated using perfect match and a fraction of mismatch probes and are modeled on the log scale. Background correction is incorporated by modeling true signal and cross-hybridization explicitly, and a need for further normalization is considerably reduced by allowing for array-specific distributions of nonspecific hybridization. When replicate arrays are available for a condition, posterior distributions of condition-specific gene expression indices are estimated directly, by a simultaneous consideration of replicate probe sets, avoiding averaging over estimates obtained from individual replicate arrays. The performance of the Bayesian model is compared to that of standard available point estimate methods on subsets of the well known GeneLogic and Affymetrix spike-in data. The Bayesian model is found to perform well and the integrated procedure presented appears to hold considerable promise for further development.  相似文献   

6.
7.
8.
MOTIVATION: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) how to identify meaningful gene signatures (ranked gene lists) exhibiting the differences between classes and sample subsets. Solutions to both questions have immediate biological and biomedical applications. To achieve optimal classification performance, a suitable combination of classifier and gene selection method needs to be specifically selected for a given dataset. The selected gene signatures can be unstable and the resulting classification accuracy unreliable, particularly when considering different subsets of samples. Both unstable gene signatures and overestimated classification accuracy can impair biological conclusions. METHODS: We address these two issues by repeatedly evaluating the classification performance of all models, i.e. pairwise combinations of various gene selection and classification methods, for random subsets of arrays (sampling). A model score is used to select the most appropriate model for the given dataset. Consensus gene signatures are constructed by extracting those genes frequently selected over many samplings. Sampling additionally permits measurement of the stability of the classification performance for each model, which serves as a measure of model reliability. RESULTS: We analyzed a large gene expression dataset with 78 measurements of four different cartilage sample classes. Classifiers trained on subsets of measurements frequently produce models with highly variable performance. Our approach provides reliable classification performance estimates via sampling. In addition to reliable classification performance, we determined stable consensus signatures (i.e. gene lists) for sample classes. Manual literature screening showed that these genes are highly relevant to our gene expression experiment with osteoarthritic cartilage. We compared our approach to others based on a publicly available dataset on breast cancer. AVAILABILITY: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt  相似文献   

9.
应用生物信息学方法筛选并分析三阴性乳腺癌(triple-negative breast cancer,TNBC)相关miRNA及其靶基因,为TNBC的研究提供潜在的分子靶点。采用GEO2R分析TNBC相关miRNA芯片数据集,筛选差异表达倍数最大的5个上调和5个下调miRNA。miRWalk、TargetScan和miRDB预测靶基因并进行Veen分析取交集。利用DAVID对靶基因进行GO富集分析和KEGG通路分析。利用STRING数据库构建蛋白互作网络,并结合Cytoscape构建miRNA-靶基因调控网络,从而筛选出关键的miRNA及其关键靶基因。利用GEPIA2数据库对靶基因进行生存分析。GEO2R筛选出486个差异miRNA,上调和下调的miRNA分别有298个和188个。对差异倍数最大的5个上调和5个下调miRNA的靶基因进行富集分析显示,靶基因主要参与ErbB信号通路、癌症中转录调控紊乱和cGMP-PKG信号通路等。miRNA-靶基因调控网络显示,表达上调的关键miRNA为miR-611,其关键靶基因为CDC27、UBE2D2、UBR1、SPSB1、HERC2RLIM;表达下调的关键miRNA为miR-1205,其关键靶基因为WSB1、FBXL8、UBE2W、PTPN11、ARF6、DNAJC6COPS2。生存分析表明,UBR1P=0.007 2)和PTPN11P=0.029)表达上调可显著降低TNBC患者的整体生存率。经筛选获得的关键miRNA及其关键靶基因可作为潜在分子标记物用于TNBC的早期诊断、治疗靶点选择和预后判断,并为后续的研究提供参考依据。  相似文献   

10.
11.
Microarrays and high-throughput sequencing methods can be used to measure the expression of thousands of genes in a biological sample in a few days, whereas PCR-based methods can be used to measure the expression of a few genes in thousands of samples in about the same amount of time. These methods become more costly as the number of biological samples increases or as the number of genes of interest increases, respectively, and these factors constrain experimental design. To address these issues, we introduced ‘vertical arrays’ in which RNA from each biological sample is converted into multiple, overlapping cDNA subsets and spotted on glass slides. These vertical arrays can be queried with single gene probes to assess the expression behavior in thousands of biological samples in a single hybridization reaction. The spotted subsets are less complex than the original RNA from which they derive, which improves signal-to-noise ratios. Here, we demonstrate the quantitative capabilities of vertical arrays, including the sensitivity and accuracy of the method and the number of subsets needed to achieve this accuracy for most expressed genes.  相似文献   

12.
13.
S Wang  X Li  J Fang 《BMC bioinformatics》2012,13(1):178-26
ABSTRACT: BACKGROUND: Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. RESULTS: This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. CONCLUSIONS: It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network.  相似文献   

14.
High throughput technologies, such as gene expression arrays and protein mass spectrometry, allow one to simultaneously evaluate thousands of potential biomarkers that could distinguish different tissue types. Of particular interest here is distinguishing between cancerous and normal organ tissues. We consider statistical methods to rank genes (or proteins) in regards to differential expression between tissues. Various statistical measures are considered, and we argue that two measures related to the Receiver Operating Characteristic Curve are particularly suitable for this purpose. We also propose that sampling variability in the gene rankings be quantified, and suggest using the "selection probability function," the probability distribution of rankings for each gene. This is estimated via the bootstrap. A real dataset, derived from gene expression arrays of 23 normal and 30 ovarian cancer tissues, is analyzed. Simulation studies are also used to assess the relative performance of different statistical gene ranking measures and our quantification of sampling variability. Our approach leads naturally to a procedure for sample-size calculations, appropriate for exploratory studies that seek to identify differentially expressed genes.  相似文献   

15.
We investigated a possible molecular pathogenesis involving retinal ganglion cell apoptosis following transient high intraocular pressure. Changes in the gene expression profiles of the retina were detected via gene chip methodology. Twelve New Zealand white rabbits were randomly assigned to control and 3-min negative pressure suction groups. The control group was treated only with a laser, and the experimental group was also treated with suction for 3 min, using a negative pressure generator. Total RNA was then extracted from the retinal tissue at different recovery stages to analyze gene expression profiles using the Agilent rabbit one-way gene chip. The groups were then compared. Immediately after negative pressure suction induction, 704 genes were differentially expressed. Among these, 485 genes were upregulated, and 219 were downregulated. Expression of the genes encoding CRYAA, CRYAB, and TLR3 genes, which are involved in apoptosis, was elevated. The KRT18 gene, which is involved in apoptosis, had reduced expression. Seven days after negative pressure suction, 482 genes were differentially expressed. Among these, 178 genes were upregulated, and 304 were downregulated. Expression of the genes encoding CRYAB, IL1-BETA and IL1R1, which are involved in apoptosis, was upregulated. Ten days after negative pressure suction, 402 genes were differentially expressed. Of these, 213 genes were upregulated, and 189 were downregulated. Apoptosis genes CRYAB, CRYBA3, CRYBB2, IL1- BETA, and IL1R1 showed higher expression levels. We concluded that negative pressure suction for long periods of time (for example, 3 min) results in changes in gene expression. Genes with higher fold changes help protect retinal ganglion cells from apoptosis. We suggest that promoting the expression of these genes should be considered as a new means for treating ischemic-hypoxic retinopathy.  相似文献   

16.
Identifying differentially expressed genes in cDNA microarray experiments.   总被引:1,自引:0,他引:1  
A major goal of microarray experiments is to determine which genes are differentially expressed between samples. Differential expression has been assessed by taking ratios of expression levels of different samples at a spot on the array and flagging spots (genes) where the magnitude of the fold difference exceeds some threshold. More recent work has attempted to incorporate the fact that the variability of these ratios is not constant. Most methods are variants of Student's t-test. These variants standardize the ratios by dividing by an estimate of the standard deviation of that ratio; spots with large standardized values are flagged. Estimating these standard deviations requires replication of the measurements, either within a slide or between slides, or the use of a model describing what the standard deviation should be. Starting from considerations of the kinetics driving microarray hybridization, we derive models for the intensity of a replicated spot, when replication is performed within and between arrays. Replication within slides leads to a beta-binomial model, and replication between slides leads to a gamma-Poisson model. These models predict how the variance of a log ratio changes with the total intensity of the signal at the spot, independent of the identity of the gene. Ratios for genes with a small amount of total signal are highly variable, whereas ratios for genes with a large amount of total signal are fairly stable. Log ratios are scaled by the standard deviations given by these functions, giving model-based versions of Studentization. An example is given.  相似文献   

17.
A critical step for DNA array analysis is data filtration, which can reduce thousands of detected signals to limited sets of genes. Commonly accepted rules for such filtration are still absent. We present a rational approach, based on thresholding of intensities with cutoff levels that are estimated by receiver operating characteristic (ROC) analysis. The technique compares test results with known distributions of positive and negative signals. We apply the method to Atlas cDNA arrays, GeneFilters, and Affymetrix GeneChip. ROC analysis demonstrates similarities in the distribution of false and true positive data for these different systems. We illustrate the estimation of an optimal cutoff level for intensity-based filtration, providing the highest ratio of true to false signals. For GeneChip arrays, we derived filtration thresholds consistent with the reported data based on replicate hybridizations. Intensity-based filtration optimized with ROC combined with other types of filtration (for example, based on significances of differences and/or ratios), should improve DNA array analysis. ROC methodology is also demonstrated for comparison of the performance of different types of arrays, imagers, and analysis software.  相似文献   

18.
Li C  Hung Wong W 《Genome biology》2001,2(8):research0032.1-research003211

Background

A model-based analysis of oligonucleotide expression arrays we developed previously uses a probe-sensitivity index to capture the response characteristic of a specific probe pair and calculates model-based expression indexes (MBEI). MBEI has standard error attached to it as a measure of accuracy. Here we investigate the stability of the probe-sensitivity index across different tissue types, the reproducibility of results in replicate experiments, and the use of MBEI in perfect match (PM)-only arrays.

Results

Probe-sensitivity indexes are stable across tissue types. The target gene's presence in many arrays of an array set allows the probe-sensitivity index to be estimated accurately. We extended the model to obtain expression values for PM-only arrays, and found that the 20-probe PM-only model is comparable to the 10-probe PM/MM difference model, in terms of the expression correlations with the original 20-probe PM/MM difference model. MBEI method is able to extend the reliable detection limit of expression to a lower mRNA concentration. The standard errors of MBEI can be used to construct confidence intervals of fold changes, and the lower confidence bound of fold change is a better ranking statistic for filtering genes. We can assign reliability indexes for genes in a specific cluster of interest in hierarchical clustering by resampling clustering trees. A software dChip implementing many of these analysis methods is made available.

Conclusions

The model-based approach reduces the variability of low expression estimates, and provides a natural method of calculating expression values for PM-only arrays. The standard errors attached to expression values can be used to assess the reliability of downstream analysis.  相似文献   

19.
This study aims to determine the effects of SCNT on cardiac development of SCNT pigs through proteomic methods. Heart proteins from three adult SCNTs and two normal reproductive Bama miniature pigs were extracted, separated, and identified via comparative proteomic methods, including two-dimensional gel electrophoresis, mass spectrometry, and Western blot. Eleven differentially expressed spots were identified as differentially expressed proteins, of which five spots were upregulated proteins such as cardiac myosin heavy chain, cathepsin D, and heat shock protein beta-1 (HSP27). By contrast, six spots were downregulated proteins such as alpha skeletal muscle and actin. The results also demonstrated that nuclear transfer might result in abnormal expression of some important proteins in hearts from SCNT pigs, and affect the cardiac development in SCNT pigs' survival.  相似文献   

20.
Developing effective methods for analyzing array-CGH data to detect chromosomal aberrations is very important for the diagnosis of pathogenesis of cancer and other diseases. Current analysis methods, being largely based on smoothing and/or segmentation, are not quite capable of detecting both the aberration regions and the boundary break points very accurately. Furthermore, when evaluating the accuracy of an algorithm for analyzing array-CGH data, it is commonly assumed that noise in the data follows normal distribution. A fundamental question is whether noise in array-CGH is indeed Gaussian, and if not, can one exploit the characteristics of noise to develop novel analysis methods that are capable of detecting accurately the aberration regions as well as the boundary break points simultaneously? By analyzing bacterial artificial chromosomes (BACs) arrays with an average 1 mb resolution, 19 k oligo arrays with the average probe spacing <100 kb and 385 k oligo arrays with the average probe spacing of about 6 kb, we show that when there are aberrations, noise in all three types of arrays is highly non-Gaussian and possesses long-range spatial correlations, and that such noise leads to worse performance of existing methods for detecting aberrations in array-CGH than the Gaussian noise case. We further develop a novel method, which has optimally exploited the character of the noise, and is capable of identifying both aberration regions as well as the boundary break points very accurately. Finally, we propose a new concept, posteriori signal-to-noise ratio (p-SNR), to assign certain confidence level to an aberration region and boundaries detected.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号