首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: The diverse microarray datasets that have become available over the past several years represent a rich opportunity and challenge for biological data mining. Many supervised and unsupervised methods have been developed for the analysis of individual microarray datasets. However, integrated analysis of multiple datasets can provide a broader insight into genetic regulation of specific biological pathways under a variety of conditions. RESULTS: To aid in the analysis of such large compendia of microarray experiments, we present Microarray Experiment Functional Integration Technology (MEFIT), a scalable Bayesian framework for predicting functional relationships from integrated microarray datasets. Furthermore, MEFIT predicts these functional relationships within the context of specific biological processes. All results are provided in the context of one or more specific biological functions, which can be provided by a biologist or drawn automatically from catalogs such as the Gene Ontology (GO). Using MEFIT, we integrated 40 Saccharomyces cerevisiae microarray datasets spanning 712 unique conditions. In tests based on 110 biological functions drawn from the GO biological process ontology, MEFIT provided a 5% or greater performance increase for 54 functions, with a 5% or more decrease in performance in only two functions.  相似文献   

2.
3.

Background  

The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework.  相似文献   

4.
5.
6.
Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.  相似文献   

7.
MADGene is a software environment comprising a web-based database and a java application. This platform aims at unifying gene identifiers (ids) and performing gene set analysis. MADGene allows the user to perform inter-conversion of clone and gene ids over a large range of nomenclatures relative to 17 species. We propose a set of 23 functions to facilitate the analysis of gene sets and we give two microarray applications to show how MADGene can be used to conduct meta-analyses. AVAILABILITY: The MADGene resources are freely available online from http://www.madtools.org, a website dedicated to the analysis and annotation of DNA microarray data.  相似文献   

8.
9.

Background

The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, 630–631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676–5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.

Results

We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.

Conclusion

Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
  相似文献   

10.
MOTIVATION: Human clinical projects typically require a priori statistical power analyses. Towards this end, we sought to build a flexible and interactive power analysis tool for microarray studies integrated into our public domain HCE 3.5 software package. We then sought to determine if probe set algorithms or organism type strongly influenced power analysis results. RESULTS: The HCE 3.5 power analysis tool was designed to import any pre-existing Affymetrix microarray project, and interactively test the effects of user-defined definitions of alpha (significance), beta (1-power), sample size and effect size. The tool generates a filter for all probe sets or more focused ontology-based subsets, with or without noise filters that can be used to limit analyses of a future project to appropriately powered probe sets. We studied projects from three organisms (Arabidopsis, rat, human), and three probe set algorithms (MAS5.0, RMA, dChip PM/MM). We found large differences in power results based on probe set algorithm selection and noise filters. RMA provided high sensitivity for low numbers of arrays, but this came at a cost of high false positive results (24% false positive in the human project studied). Our data suggest that a priori power calculations are important for both experimental design in hypothesis testing and hypothesis generation, as well as for the selection of optimized data analysis parameters. AVAILABILITY: The Hierarchical Clustering Explorer 3.5 with the interactive power analysis functions is available at www.cs.umd.edu/hcil/hce or www.cnmcresearch.org/bioinformatics. CONTACT: jseo@cnmcresearch.org  相似文献   

11.

Background

Gene expression analysis has been intensively researched for more than a decade. Recently, there has been elevated interest in the integration of microarray data analysis with other types of biological knowledge in a holistic analytical approach. We propose a methodology that can be facilitated for pathway based microarray data analysis, based on the observation that a substantial proportion of genes present in biochemical pathway databases are members of a number of distinct pathways. Our methodology aims towards establishing the state of individual pathways, by identifying those truly affected by the experimental conditions based on the behaviour of such genes. For that purpose it considers all the pathways in which a gene participates and the general census of gene expression per pathway.

Results

We utilise hill climbing, simulated annealing and a genetic algorithm to analyse the consistency of the produced results, through the application of fuzzy adjusted rand indexes and hamming distance. All algorithms produce highly consistent genes to pathways allocations, revealing the contribution of genes to pathway functionality, in agreement with current pathway state visualisation techniques, with the simulated annealing search proving slightly superior in terms of efficiency.

Conclusions

We show that the expression values of genes, which are members of a number of biochemical pathways or modules, are the net effect of the contribution of each gene to these biochemical processes. We show that by manipulating the pathway and module contribution of such genes to follow underlying trends we can interpret microarray results centred on the behaviour of these genes.  相似文献   

12.

Background

Public data integration may help overcome challenges in clinical implementation of microarray profiles. We integrated several ovarian cancer datasets to identify a reproducible predictor of survival.

Methodology/Principal Findings

Four microarray datasets from different institutions comprising 265 advanced stage tumors were uniformly reprocessed into a single training dataset, also adjusting for inter-laboratory variation (“batch-effect”). Supervised principal component survival analysis was employed to identify prognostic models. Models were independently validated in a 61-patient cohort using a custom array genechip and a publicly available 229-array dataset. Molecular correspondence of high- and low-risk outcome groups between training and validation datasets was demonstrated using Subclass Mapping. Previously established molecular phenotypes in the 2nd validation set were correlated with high and low-risk outcome groups. Functional representational and pathway analysis was used to explore gene networks associated with high and low risk phenotypes. A 19-gene model showed optimal performance in the training set (median OS 31 and 78 months, p<0.01), 1st validation set (median OS 32 months versus not-yet-reached, p = 0.026) and 2nd validation set (median OS 43 versus 61 months, p = 0.013) maintaining independent prognostic power in multivariate analysis. There was strong molecular correspondence of the respective high- and low-risk tumors between training and 1st validation set. Low and high-risk tumors were enriched for favorable and unfavorable molecular subtypes and pathways, previously defined in the public 2nd validation set.

Conclusions/Significance

Integration of previously generated cancer microarray datasets may lead to robust and widely applicable survival predictors. These predictors are not simply a compilation of prognostic genes but appear to track true molecular phenotypes of good- and poor-outcome.  相似文献   

13.
Quantitative information about the nucleic acids hybridization reaction on microarrays is fundamental to designing optimized assays for molecular diagnostics. This study presents the kinetic, equilibrium, and thermodynamic analyses of DNA hybridization in a microarray system designed for fast molecular testing of pathogenic bacteria. Our microarray setup uses a porous, nylon membrane for probe immobilization and flowthrough incubation. The Langmuir model was used to determine the reaction rate constants of hybridization with antisense targets specific to Staphylococcus epidermidis and Staphylococcus aureus strains. The kinetic analysis revealed a sequence-dependent reaction rate, with association rate constants on the order of 105 M−1 s−1 and dissociation rate constants of 10−4 s−1. We found that by increasing the probe surface density from 1011 to 1012 molecules/cm2, the hybridization rate and efficiency are suppressed while the melting temperature of the DNA duplex increases. The maximum fraction of hybridized capture probes at equilibrium did not exceed 50% for hybridization with antisense sequences and was below 6% for hybridization with long targets obtained from PCR. The van’t Hoff analysis of the temperature denaturation data showed that the DNA hybridization in our porous, flowthrough microarray is thermodynamically less favorable than the hybridization of the same sequences in solution.  相似文献   

14.
15.
16.
17.
In the past 15 years, the quantitative trait locus (QTL) mapping approach has been applied to crosses between different inbred mouse strains to identify genetic loci associated with plasma HDL cholesterol levels. Although successful, a disadvantage of this method is low mapping resolution, as often several hundred candidate genes fall within the confidence interval for each locus. Methods have been developed to narrow these loci by combining the data from the different crosses, but they rely on the accurate mapping of the QTL and the treatment of the data in a consistent manner. We collected 23 raw datasets used for the mapping of previously published HDL QTL and reanalyzed the data from each cross using a consistent method and the latest mouse genetic map. By utilizing this approach, we identified novel QTL and QTL that were mapped to the wrong part of chromosomes. Our new HDL QTL map allows for reliable combining of QTL data and candidate gene analysis, which we demonstrate by identifying Grin3a and Etv6, as candidate genes for QTL on chromosomes 4 and 6, respectively. In addition, we were able to narrow a QTL on Chr 19 to five candidates.  相似文献   

18.
Data analysis and management represent a major challenge for gene expression studies using microarrays. Here, we compare different methods of analysis and demonstrate the utility of a personal microarray database. Gene expression during HIV infection of cell lines was studied using Affymetrix U-133 A and B chips. The data were analyzed using Affymetrix Microarray Suite and Data Mining Tool, Silicon Genetics GeneSpring, and dChip from Harvard School of Public Health. A small-scale database was established with FileMaker Pro Developer to manage and analyze the data. There was great variability among the programs in the lists of significantly changed genes constructed from the same data. Similarly choices of different parameters for normalization, comparison, and standardization greatly affected the outcome. As many probe sets on the U133 chip target the same Unigene clusters, the Unigene information can be used as an internal control to confirm and interpret the probe set results. Algorithms used for the determination of changes in gene expression require further refinement and standardization. The use of a personal database powered with Unigene information can enhance the analysis of gene expression data.  相似文献   

19.
20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号