首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
ABSTRACT: BACKGROUND: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. RESULTS: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. CONCLUSION: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.  相似文献   

3.
A critical challenge in prostate cancer (PCa) clinical management is posed by the inadequacy of currently used biomarkers for disease screening, diagnosis, prognosis and treatment. In recent years, microRNAs (miRNAs) have emerged as promising alternate biomarkers for prostate cancer diagnosis and prognosis. However, the development of miRNAs as effective biomarkers for prostate cancer heavily relies on their accurate detection in clinical tissues. miRNA analyses in prostate cancer clinical specimens is often challenging owing to tumor heterogeneity, sampling errors, stromal contamination etc. The goal of this article is to describe a simplified workflow for miRNA analyses in archived FFPE or fresh frozen prostate cancer clinical specimens using a combination of quantitative real-time PCR (RT-PCR) and in situ hybridization (ISH). Within this workflow, we optimize the existing methodologies for miRNA extraction from FFPE and frozen prostate tissues and expression analyses by Taqman-probe based miRNA RT-PCR. In addition, we describe an optimized method for ISH analyses formiRNA detection in prostate tissues using locked nucleic acid (LNA)- based probes. Our optimized miRNA ISH protocol can be applied to prostate cancer tissue slides or prostate cancer tissue microarrays (TMA).  相似文献   

4.
5.
Clinico-pathological factors fail to consistently predict the outcome after pancreatic resection for pancreatic ductal adenocarcinoma (PDAC). PDACs show a high level of inter- and intra- tumor genetic heterogeneity. A molecular classification should help sort patients into less heterogeneous and more appropriate groups regarding the metastatic risk and the therapeutic response, with the consequences of better predicting evolution and better orienting the treatment. PDAC can be classified based on mutational subtypes and 18gene alterations. Whole-genome sequencing identified mutational signatures, mutational burden and hyper-mutated tumors with specific DNA repair defects. Their overlap/similarities allow the definition of molecular subtypes. DNA and RNA classifications can be used in prognosis assessment. They are useful in therapeutic choice for they allow the design of approaches that can predict the respective drug sensitivity of each molecular subtype. This review provides a comprehensive analysis of available molecular classifications in PDAC and how this can help guide clinical decisions.  相似文献   

6.
7.
In the decade since their invention, spotted microarrays have been undergoing technical advances that have increased the utility, scope and precision of their ability to measure gene expression. At the same time, more researchers are taking advantage of the fundamentally quantitative nature of these tools with refined experimental designs and sophisticated statistical analyses. These new approaches utilise the power of microarrays to estimate differences in gene expression levels, rather than just categorising genes as up- or down-regulated, and allow the comparison of expression data across multiple samples. In this review, some of the technical aspects of spotted microarrays that can affect statistical inference are highlighted, and a discussion is provided of how several methods for estimating gene expression level across multiple samples deal with these challenges. The focus is on a Bayesian analysis method, BAGEL, which is easy to implement and produces easily interpreted results.  相似文献   

8.
Hong H  Tong W  Perkins R  Fang H  Xie Q  Shi L 《DNA and cell biology》2004,23(10):685-694
The wealth of knowledge imbedded in gene expression data from DNA microarrays portends rapid advances in both research and clinic. Turning the prodigious and noisy data into knowledge is a challenge to the field of bioinformatics, and development of classifiers using supervised learning techniques is the primary methodological approach for clinical application using gene expression data. In this paper, we present a novel classification method, multiclass Decision Forest (DF), that is the direct extension of the two-class DF previously developed in our lab. Central to DF is the synergistic combining of multiple heterogenic but comparable decision trees to reach a more accurate and robust classification model. The computationally inexpensive multiclass DF algorithm integrates gene selection and model development, and thus eliminates the bias of gene preselection in crossvalidation. Importantly, the method provides several statistical means for assessment of prediction accuracy, prediction confidence, and diagnostic capability. We demonstrate the method by application to gene expression data for 83 small round blue-cell tumors (SRBCTs) samples belonging to one of four different classes. Based on 500 runs of 10-fold crossvalidation, tumor prediction accuracy was approximately 97%, sensitivity was approximately 95%, diagnostic sensitivity was approximately 91%, and diagnostic accuracy was approximately 99.5%. Among 25 genes selected to distinguish tumor class, 12 have functional information in the literature implicating their involvement in cancer. The four types of SRBCTs samples are also distinguishable in a clustering analysis based on the expression profiles of these 25 genes. The results demonstrated that the multiclass DF is an effective classification method for analysis of gene expression data for the purpose of molecular diagnostics.  相似文献   

9.
Retinitis Pigmentosa (RP) is a heterogeneous group of inherited retinal dystrophies characterised ultimately by the loss of photoreceptor cells. RP is the leading cause of visual loss in individuals younger than 60 years, with a prevalence of about 1 in 4000. The molecular genetic diagnosis of autosomal recessive RP (arRP) is challenging due to the large genetic and clinical heterogeneity. Traditional methods for sequencing arRP genes are often laborious and not easily available and a screening technique that enables the rapid detection of the genetic cause would be very helpful in the clinical practice. The goal of this study was to develop and apply microarray-based resequencing technology capable of detecting both known and novel mutations on a single high-throughput platform. Hence, the coding regions and exon/intron boundaries of 16 arRP genes were resequenced using microarrays in 102 Spanish patients with clinical diagnosis of arRP. All the detected variations were confirmed by direct sequencing and potential pathogenicity was assessed by functional predictions and frequency in controls. For validation purposes 4 positive controls for variants consisting of previously identified changes were hybridized on the array. As a result of the screening, we detected 44 variants, of which 15 are very likely pathogenic detected in 14 arRP families (14%). Finally, the design of this array can easily be transformed in an equivalent diagnostic system based on targeted enrichment followed by next generation sequencing.  相似文献   

10.
Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.  相似文献   

11.
The widespread use of DNA microarrays has led to the discovery of many genes whose expression profile may have significant clinical relevance. The translation of this data to the bedside requires that gene expression be validated as protein expression, and that annotated clinical samples be available for correlative and quantitative studies to assess clinical context and usefulness of putative biomarkers. We review two microarray platforms developed to facilitate the clinical validation of candidate biomarkers: tissue microarrays and reverse-phase protein microarrays. Tissue microarrays are arrays of core biopsies obtained from paraffin-embedded tissues, which can be assayed for histologically-specific protein expression by immunohistochemistry. Reverse-phase protein microarrays consist of arrays of cell lysates or, more recently, plasma or serum samples, which can be assayed for protein quantity and for the presence of post-translational modifications such as phosphorylation. Although these platforms are limited by the availability of validated antibodies, both enable the preservation of precious clinical samples as well as experimental standardization in a high-throughput manner proper to microarray technologies. While tissue microarrays are rapidly becoming a mainstay of translational research, reverse-phase protein microarrays require further technical refinements and validation prior to their widespread adoption by research laboratories.  相似文献   

12.
13.
Combining results from gene microarrays, clinical chemistry, and quantitative tissue histomorphology in an integrated bioinformatics setting enables prioritization of gene families as well as individual genes in a type II diabetes animal study. This new methodology takes advantage of a time-controlled mouse study as the animals progress from a normal phenotype to that of type II diabetes. Profiles from different levels of the biological hierarchy of unpooled entities provide an encompassing, system-wide view of biological changes. Here, phenotypic changes on the tissue-structural and physiological level are used as statistical covariants to enrich the gene expression analysis, suggesting correlative processes between gene expression and phenotype unlocked by multi-sample comparisons. We apply correlative and gene set enrichment procedures and compare the results to differential analysis to identify molecular markers. Evaluation based on ontological classifications proves changes in prioritization of disease-related genes that would have been overlooked by conventional gene expression analyses strategies.  相似文献   

14.
15.
MOTIVATION: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. RESULTS: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.  相似文献   

16.
Multiple myeloma (MM) is a cancer of antibody-making plasma cells. It frequently harbors alterations in DNA and chromosome copy numbers, and can be divided into two major subtypes, hyperdiploid (HMM) and non-hyperdiploid multiple myeloma (NHMM). The two subtypes have different survival prognosis, possibly due to different but converging paths to oncogenesis. Existing methods for identifying the two subtypes are fluorescence in situ hybridization (FISH) and copy number microarrays, with increased cost and sample requirements. We hypothesize that chromosome alterations have their imprint in gene expression through dosage effect. Using five MM expression datasets that have HMM status measured by FISH and copy number microarrays, we have developed and validated a K-nearest-neighbor method to classify MM into HMM and NHMM based on gene expression profiles. Classification accuracy for test datasets ranges from 0.83 to 0.88. This classification will enable researchers to study differences and commonalities of the two MM subtypes in disease biology and prognosis using expression datasets without need for additional subtype measurements. Our study also supports the advantages of using cancer specific characteristics in feature design and pooling multiple rounds of classification results to improve accuracy. We provide R source code and processed datasets at www.ChengLiLab.org/software.  相似文献   

17.
Tissue microarrays have become an essential tool in translational pathology. They are used to confirm results from other experimental platforms, such as expression microarrays, as well as a primary tool to explore the expression profile of proteins by immunohistochemical analysis. Tissue microarrays are routinely used molecular epidemiology, drug development and determining the diagnostic, prognostic and predictive value of new biomarkers. By applying traditional protein based assays, as well as novel assays to the platform, tissue microarrays have gained a new utility as a proteomic tool for both basic science as well as clinical investigation. This article will explore the new approaches that are being applied to tissue microarrays to, characterize the human proteome, and new technologies that allow tissue microarrays to function as a protein array. The U.S. Government's right to retain a non-exclusive, royalty-free license in and to any copyright is acknowledged  相似文献   

18.
MOTIVATION: Gene expression data have become an instrumental resource in describing the molecular state associated with various cellular phenotypes and responses to environmental perturbations. The utility of expression profiling has been demonstrated in partitioning clinical states, predicting the class of unknown samples and in assigning putative functional roles to previously uncharacterized genes based on profile similarity. However, gene expression profiling has had only limited success in identifying therapeutic targets. This is partly due to the fact that current methods based on fold-change focus only on single genes in isolation, and thus cannot convey causal information. In this paper, we present a technique for analysis of expression data in a graph-theoretic framework that relies on associations between genes. We describe the global organization of these networks and biological correlates of their structure. We go on to present a novel technique for the molecular characterization of disparate cellular states that adds a new dimension to the fold-based methods and conclude with an example application to a human medulloblastoma dataset. RESULTS: We have shown that expression networks generated from large model-organism expression datasets are scale-free and that the average clustering coefficient of these networks is several orders of magnitude higher than would be expected for similarly sized scale-free networks, suggesting an inherent hierarchical modularity similar to that previously identified in other biological networks. Furthermore, we have shown that these properties are robust with respect to the parameters of network construction. We have demonstrated an enrichment of genes having lethal knockout phenotypes in the high-degree (i.e. hub) nodes in networks generated from aggregate condition datasets; using process-focused Saccharomyces cerivisiae datasets we have demonstrated additional high-degree enrichments of condition-specific genes encoding proteins known to be involved in or important for the processes interrogated by the microarrays. These results demonstrate the utility of network analysis applied to expression data in identifying genes that are regulated in a state-specific manner. We concluded by showing that a sample application to a human clinical dataset prominently identified a known therapeutic target. AVAILABILITY: Software implementing the methods for network generation presented in this paper is available for academic use by request from the authors in the form of compiled linux binary executables.  相似文献   

19.
Both microRNA (miRNA) and mRNA expression profiles are important methods for cancer type classification. A comparative study of their classification performance will be helpful in choosing the means of classification. Here we evaluated the classification performance of miRNA and mRNA profiles using a new data mining approach based on a novel SVM (Support Vector Machines) based recursive fea- ture elimination (nRFE) algorithm. Computational experiments showed that information encoded in miRNAs is not sufficient to classify cancers; gut-derived samples cluster more accurately when using mRNA expression profiles compared with using miRNA profiles; and poorly differentiated tumors (PDT) could be classified by mRNA expression profiles at the accuracy of 100% versus 93.8% when using miRNA profiles. Furthermore, we showed that mRNA expression profiles have higher capacity in normal tissue classifications than miRNA. We concluded that classification performance using mRNA profiles is superior to that of miRNA profiles in multiple-class cancer classifications.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号