首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Cancer derived microarray data sets are routinely produced by various platforms that are either commercially available or manufactured by academic groups. The fundamental difference in their probe selection strategies holds the promise that identical observations produced by more than one platform prove to be more robust when validated by biology. However, cross-platform comparison requires matching corresponding probe sets. We are introducing here sequence-based matching of probes instead of gene identifier-based matching. We analyzed breast cancer cell line derived RNA aliquots using Agilent cDNA and Affymetrix oligonucleotide microarray platforms to assess the advantage of this method. We show, that at different levels of the analysis, including gene expression ratios and difference calls, cross-platform consistency is significantly improved by sequence- based matching. We also present evidence that sequence-based probe matching produces more consistent results when comparing similar biological data sets obtained by different microarray platforms. This strategy allowed a more efficient transfer of classification of breast cancer samples between data sets produced by cDNA microarray and Affymetrix gene-chip platforms.  相似文献   

2.
There have been several reports about the potential for predicting prognosis of neuroblastoma patients using microarray gene expression profiling of the tumors. However these studies have revealed an apparent diversity in the identity of the genes in their predictive signatures. To test the contribution of the platform to this discrepancy we applied the z-scoring method to minimize the impact of platform and combine gene expression profiles of neuroblastoma (NB) tumors from two different platforms, cDNA and Affymetrix. A total of 12442 genes were common to both cDNA and Affymetrix arrays in our data set. Two-way ANOVA analysis was applied to the combined data set for assessing the relative effect of prognosis and platform on gene expression. We found that 26.6% (3307) of the genes had significant impact on survival. There was no significant impact of microarray platform on expression after application of z-scoring standardization procedure. Artificial neural network (ANN) analysis of the combined data set in a leave-one-out prediction strategy correctly predicted the outcome for 90% of the samples. Hierarchical clustering analysis using the top-ranked 160 genes showed the great separation of two clusters, and the majority of matched samples from the different platforms were clustered next to each other. The ANN classifier trained with our combined cross-platform data for these 160 genes could predict the prognosis of 102 independent test samples with 71% accuracy. Furthermore it correctly predicted the outcome for 85/102 (83%) NB patients through the leave-one-out cross-validation approach. Our study showed that gene expression studies performed in different platforms could be integrated for prognosis analysis after removing variation resulting from different platforms.  相似文献   

3.
Wang B  Howel P  Bruheim S  Ju J  Owen LB  Fodstad O  Xi Y 《PloS one》2011,6(2):e17167

Background

A number of gene-profiling methodologies have been applied to microRNA research. The diversity of the platforms and analytical methods makes the comparison and integration of cross-platform microRNA profiling data challenging. In this study, we systematically analyze three representative microRNA profiling platforms: Locked Nucleic Acid (LNA) microarray, beads array, and TaqMan quantitative real-time PCR Low Density Array (TLDA).

Methodology/Principal Findings

The microRNA profiles of 40 human osteosarcoma xenograft samples were generated by LNA array, beads array, and TLDA. Results show that each of the three platforms perform similarly regarding intra-platform reproducibility or reproducibility of data within one platform while LNA array and TLDA had the best inter-platform reproducibility or reproducibility of data across platforms. The endogenous controls/probes contained in each platform have been observed for their stability under different treatments/environments; those included in TLDA have the best performance with minimal coefficients of variation. Importantly, we identify that the proper selection of normalization methods is critical for improving the inter-platform reproducibility, which is evidenced by the application of two non-linear normalization methods (loess and quantile) that substantially elevated the sensitivity and specificity of the statistical data assessment.

Conclusions

Each platform is relatively stable in terms of its own microRNA profiling intra-reproducibility; however, the inter-platform reproducibility among different platforms is low. More microRNA specific normalization methods are in demand for cross-platform microRNA microarray data integration and comparison, which will improve the reproducibility and consistency between platforms.  相似文献   

4.
To validate and extend the findings of the MicroArray Quality Control (MAQC) project, a biologically relevant toxicogenomics data set was generated using 36 RNA samples from rats treated with three chemicals (aristolochic acid, riddelliine and comfrey) and each sample was hybridized to four microarray platforms. The MAQC project assessed concordance in intersite and cross-platform comparisons and the impact of gene selection methods on the reproducibility of profiling data in terms of differentially expressed genes using distinct reference RNA samples. The real-world toxicogenomic data set reported here showed high concordance in intersite and cross-platform comparisons. Further, gene lists generated by fold-change ranking were more reproducible than those obtained by t-test P value or Significance Analysis of Microarrays. Finally, gene lists generated by fold-change ranking with a nonstringent P-value cutoff showed increased consistency in Gene Ontology terms and pathways, and hence the biological impact of chemical exposure could be reliably deduced from all platforms analyzed.  相似文献   

5.

Background

The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630–631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676–5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.

Results

We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.

Conclusion

Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
  相似文献   

6.
7.
8.
Shao L  Fan X  Cheng N  Wu L  Xiong H  Fang H  Ding D  Shi L  Cheng Y  Tong W 《PloS one》2012,7(1):e29534
The era of personalized medicine for cancer therapeutics has taken an important step forward in making accurate prognoses for individual patients with the adoption of high-throughput microarray technology. However, microarray technology in cancer diagnosis or prognosis has been primarily used for the statistical evaluation of patient populations, and thus excludes inter-individual variability and patient-specific predictions. Here we propose a metric called clinical confidence that serves as a measure of prognostic reliability to facilitate the shift from population-wide to personalized cancer prognosis using microarray-based predictive models. The performance of sample-based models predicted with different clinical confidences was evaluated and compared systematically using three large clinical datasets studying the following cancers: breast cancer, multiple myeloma, and neuroblastoma. Survival curves for patients, with different confidences, were also delineated. The results show that the clinical confidence metric separates patients with different prediction accuracies and survival times. Samples with high clinical confidence were likely to have accurate prognoses from predictive models. Moreover, patients with high clinical confidence would be expected to live for a notably longer or shorter time if their prognosis was good or grim based on the models, respectively. We conclude that clinical confidence could serve as a beneficial metric for personalized cancer prognosis prediction utilizing microarrays. Ascribing a confidence level to prognosis with the clinical confidence metric provides the clinician an objective, personalized basis for decisions, such as choosing the severity of the treatment.  相似文献   

9.
The successful application of genomic selection (GS) approaches is dependent on genetic makers derived from high-throughput and low-cost genotyping methods. Recent GS studies in trees have predominantly relied on SNP arrays as the source of genotyping, though this technology has a high entry cost. The recent development of alternative genotyping platforms, tailored to specific species and with low entry cost, has become possible due to advances in next-generation sequencing and genome complexity reduction methods such as sequence capture. However, the performance of these new platforms in GS models has not yet been evaluated, or compared to models developed from SNP arrays. Here, we evaluate the impact of these genotyping technologies on the development of GS prediction models for a Eucalyptus breeding population composed of 739 trees phenotyped for 13 wood quality and growth traits. Genotyping data obtained with both methods were compared for linkage disequilibrium, minor allele frequency, and missing data. Phenotypic prediction methods RR-BLUP and BayesB were employed, while predictive ability using cross validation was used to evaluate the performance of GS models derived from the different genotyping platforms. Differences in linkage disequilibrium patterns, minor allele frequency, missing data, and marker distribution were detected between sequence capture and SNP arrays. However, RR-BLUP and BayesB GS models resulted in similar predictive abilities. These results demonstrate that both genotyping methods are equivalent for genomic prediction of the traits evaluated. Sequence capture offers an alternative for species where SNP arrays are not available, or for when the initial development cost is too high.  相似文献   

10.

Background  

Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal) samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a) all genes on the microarray platform and b) a list of known disease-related genes (a priori selection). We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms.  相似文献   

11.
Tsai YS  Aguan K  Pal NR  Chung IF 《PloS one》2011,6(9):e24259
Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases.  相似文献   

12.
13.

Background

Gene expression microarray has been the primary biomarker platform ubiquitously applied in biomedical research, resulting in enormous data, predictive models, and biomarkers accrued. Recently, RNA-seq has looked likely to replace microarrays, but there will be a period where both technologies co-exist. This raises two important questions: Can microarray-based models and biomarkers be directly applied to RNA-seq data? Can future RNA-seq-based predictive models and biomarkers be applied to microarray data to leverage past investment?

Results

We systematically evaluated the transferability of predictive models and signature genes between microarray and RNA-seq using two large clinical data sets. The complexity of cross-platform sequence correspondence was considered in the analysis and examined using three human and two rat data sets, and three levels of mapping complexity were revealed. Three algorithms representing different modeling complexity were applied to the three levels of mappings for each of the eight binary endpoints and Cox regression was used to model survival times with expression data. In total, 240,096 predictive models were examined.

Conclusions

Signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development, and microarray-based models can accurately predict RNA-seq-profiled samples; while RNA-seq-based models are less accurate in predicting microarray-profiled samples and are affected both by the choice of modeling algorithm and the gene mapping complexity. The results suggest continued usefulness of legacy microarray data and established microarray biomarkers and predictive models in the forthcoming RNA-seq era.

Electronic supplementary material

The online version of this article (doi:10.1186/s13059-014-0523-y) contains supplementary material, which is available to authorized users.  相似文献   

14.
Li W  Wang R  Yan Z  Bai L  Sun Z 《PloS one》2012,7(3):e33653
A considerable portion of patients with colorectal cancer have a high risk of disease recurrence after surgery. These patients can be identified by analyzing the expression profiles of signature genes in tumors. But there is no consensus on which genes should be used and the performance of specific set of signature genes varies greatly with different datasets, impeding their implementation in the routine clinical application. Instead of using individual genes, here we identified functional multi-gene modules with significant expression changes between recurrent and recurrence-free tumors, used them as the signatures for predicting colorectal cancer recurrence in multiple datasets that were collected independently and profiled on different microarray platforms. The multi-gene modules we identified have a significant enrichment of known genes and biological processes relevant to cancer development, including genes from the chemokine pathway. Most strikingly, they recruited a significant enrichment of somatic mutations found in colorectal cancer. These results confirmed the functional relevance of these modules for colorectal cancer development. Further, these functional modules from different datasets overlapped significantly. Finally, we demonstrated that, leveraging above information of these modules, our module based classifier avoided arbitrary fitting the classifier function and screening the signatures using the training data, and achieved more consistency in prognosis prediction across three independent datasets, which holds even using very small training sets of tumors.  相似文献   

15.

Background  

Despite the widespread use of microarrays, much ambiguity regarding data analysis, interpretation and correlation of the different technologies exists. There is a considerable amount of interest in correlating results obtained between different microarray platforms. To date, only a few cross-platform evaluations have been published and unfortunately, no guidelines have been established on the best methods of making such correlations. To address this issue we conducted a thorough evaluation of two commercial microarray platforms to determine an appropriate methodology for making cross-platform correlations.  相似文献   

16.
The growing body of DNA microarray data has the potential to advance our understanding of the molecular basis of disease. However annotating microarray datasets with clinically useful information is not always possible, as this often requires access to detailed patient records. In this study we introduce GLAD, a new Semi-Supervised Learning (SSL) method for combining independent annotated datasets and unannotated datasets with the aim of identifying more robust sample classifiers. In our method, independent models are developed using subsets of genes for the annotated and unannotated datasets. These models are evaluated according to a scoring function that incorporates terms for classification accuracy on annotated data, and relative cluster separation in unannotated data. Improved models are iteratively generated using a genetic algorithm feature selection technique. Our results show that the addition of unannotated data into training, significantly improves classifier robustness.  相似文献   

17.
MOTIVATION: DNA microarray data analysis has been used previously to identify marker genes which discriminate cancer from normal samples. However, due to the limited sample size of each study, there are few common markers among different studies of the same cancer. With the rapid accumulation of microarray data, it is of great interest to integrate inter-study microarray data to increase sample size, which could lead to the discovery of more reliable markers. RESULTS: We present a novel, simple method of integrating different microarray datasets to identify marker genes and apply the method to prostate cancer datasets. In this study, by applying a new statistical method, referred to as the top-scoring pair (TSP) classifier, we have identified a pair of robust marker genes (HPN and STAT6) by integrating microarray datasets from three different prostate cancer studies. Cross-platform validation shows that the TSP classifier built from the marker gene pair, which simply compares relative expression values, achieves high accuracy, sensitivity and specificity on independent datasets generated using various array platforms. Our findings suggest a new model for the discovery of marker genes from accumulated microarray data and demonstrate how the great wealth of microarray data can be exploited to increase the power of statistical analysis. CONTACT: leixu@jhu.edu.  相似文献   

18.
The utility of previously generated microarray data is severely limited owing to small study size, leading to under-powered analysis, and failure of replication. Multiplicity of platforms and various sources of systematic noise limit the ability to compile existing data from similar studies. We present a model for transformation of data across different generations of Affymetrix arrays, developed using previously published datasets describing technical replicates performed with two generations of arrays. The transformation is based upon a probe set-specific regression model, generated from replicate measurements across platforms, performed using correlation coefficients. The model, when applied to the expression intensities of 5069 shared, sequence-matched probe sets in three different generations of Affymetrix Human oligonucleotide arrays, showed significant improvement in inter generation correlations between sample-wide means and individual probe set pairs. The approach was further validated by an observed reduction in Euclidean distance between signal intensities across generations for the predicted values. Finally, application of the model to independent, but related datasets resulted in improved clustering of samples based upon their biological, as opposed to technical, attributes. Our results suggest that this transformation method is a valuable tool for integrating microarray datasets from different generations of arrays.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号