首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 827 毫秒
1.
The low reproducibility of differential expression of individual genes in microarray experiments has led to the suggestion that experiments be analyzed in terms of gene characteristics, such as GO categories or pathways, in order to enhance the robustness of the results. An implicit assumption of this approach is that the different experiments in effect randomly sample the genes participating in an active process. We argue that by the same rationale it is possible to perform this higher-level analysis on the aggregation of genes that are differentially-expressed in different expression-based studies, even if the experiments used different platforms. The aggregation increases the reliability of the results, it has the potential for uncovering signals that are liable to escape detection in the individual experiments, and it enables a more thorough mining of the ever more plentiful microarray data. We present here a proof-of-concept study of these ideas, using ten studies describing the changes in expression profiles of human host genes in response to infection by Retroviridae or Herpesviridae viral families. We supply a tool (accessible at www.cs.bgu.ac.il/~waytogo) which enables the user to learn about genes and processes of interest in this study.  相似文献   

2.
Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving signi?cant genes and pathways. In the ?rst step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next,10 well-known imputation methods were applied to the complete datasets. The signi?cance analysis of microarrays(SAM) method was applied to detect the signi?cant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving signi?cant genes. To determine the impact of different imputation methods on the identi?cation of important genes, the chi-squared test was used to compare the proportions of overlaps between signi?cant genes detected from original data and those detected from the imputed datasets. Additionally, the signi?cant genes are tested for their enrichment in important pathways, using the Consensus Path DB. Our results showed that almost all the signi?cant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no signi?cant difference in the performance of various imputationmethods tested. The source code and selected datasets are available on http://pro?les.bs.ipm.ir/softwares/imputation_methods/.  相似文献   

3.
MOTIVATION: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. RESULTS: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation)and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too. AVAILABILITY: A Fortran program blue called EMMIX-WIRE (EM-based MIXture analysis WIth Random Effects) is available on request from the corresponding author.  相似文献   

4.
“MiRNA‐218 regulates osteoclast differentiation and inflammation response in periodontitis rats through MMP9”, Cell. Microbiol. 2019;21:e12979, by Jie Guo, Xuemin Zeng, Jie Miao, Chunpeng Liu, Fulan Wei, Dongxu Liu, Zhong Zheng, Kang Ting, Chunling Wang, and Yi Liu. The Editors of Cellular Microbiology and the publisher John Wiley & Sons agree to publish an Expression of Concern regarding the above article, published online in Cellular Microbiology on November 16, 2018, in Wiley Online Library ( https://onlinelibrary.wiley.com/doi/full/10.1111/cmi.12979 ). In September 2019, the journal was contacted regarding concerns about the data presented in Figures 6 and 7 because of high level of similarities in the graphs presented in these figures. The different bars in the graphs show identical height. The standard deviation bars are also of identical length. Although one graph expresses the number of TRAP‐positive cells (Figure 6b) and the other graphs express the relative mRNA expression of different osteoclast‐related genes (Figure 6c‐g), all graphs are identical. The bars in the graphs in Figure 7 that represent 5 different osteoclast genes show the same height. Figures 6 and 7 show identical mRNA expression for a series of different genes: V‐ATPase, NFATc1, CTSK, DC‐STAMP and TRAP. In December 2019, the journal requested the authors to provide the raw data of the experiments presented in the article and for explanations of the similarities. The authors responded that the similarities were due to unintentional errors and provided Excel spread sheets containing processed data in March 2020. The data provided in the Excel sheets that were sent by the authors were analyzed and it was concluded that the calculations as shown in the Excel sheets are correct. However, the concerns raised regarding similarities in the heights of bars representing different parameters and narrow range of standard deviations presented in Figures 6–7 remained. The authors disagree with the concerns raised. In addition, the editors were concerned by manipulations of western blot images to represent single bands instead of doublets for COL1 in Figures 5 and 8 and for MMP9 in Figures 3A and C, 4C, 5A and D, and 8A. The first issue concerning COL1 bands has been addressed and corrected during the peer‐review process. The latter has been clarified after publication and following a request from the editors for the raw data of all figures in the article. In the published article, western blots of MMP‐9 in Figures 3A and C, 4C, 5A and D, and 8A show active‐MMP‐9 only and do not include pro‐MMP‐9 bands that were present in the original western blot experiments. The authors explained that on the original blots that were provided during the peer‐review process, MMP‐9 show doublets that represent pro‐MMP‐9 and active‐MMP‐9. As no significant difference was found for pro‐MMP‐9, the authors only presented single bands for active‐MMP‐9 in the publication version. The authors’ institution, Shandong University, did not respond to a request from the Publisher and the Editor‐in‐Chief to investigate whether the data arose from the originally reported experiments, are unmodified, and are suitable for publication. As a result, the journal is issuing this expression of concern to readers.  相似文献   

5.
MOTIVATION: Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. In time-course experiments in which gene expression is monitored over time, we are interested in testing gene expression profiles for different experimental groups. However, no sophisticated analytic methods have yet been proposed to handle time-course experiment data. RESULTS: We propose a statistical test procedure based on the ANOVA model to identify genes that have different gene expression profiles among experimental groups in time-course experiments. Especially, we propose a permutation test which does not require the normality assumption. For this test, we use residuals from the ANOVA model only with time-effects. Using this test, we detect genes that have different gene expression profiles among experimental groups. The proposed model is illustrated using cDNA microarrays of 3840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells.  相似文献   

6.
7.
Developmental mutants with defects in fruiting body formation are excellent resources for the identification of genetic components that control cellular differentiation processes in filamentous fungi. The mutant pro4 of the ascomycete Sordaria macrospora is characterized by a developmental arrest during the sexual life cycle. This mutant generates only pre-fruiting bodies (protoperithecia), and is unable to form ascospores. Besides being sterile, pro4 is auxotrophic for leucine. Ascospore analysis revealed that the two phenotypes are genetically linked. After isolation of the wild-type leu1 gene from S. macrospora, complementation experiments demonstrated that the gene was able to restore both prototrophy and fertility in pro4. To investigate the control of leu1 expression, other genes involved in leucine biosynthesis specifically and in the general control of amino acid biosynthesis (“cross-pathway control”) have been analysed using Northern hybridization and quantitative RT-PCR. These analyses demonstrated that genes of leucine biosynthesis are transcribed at higher levels under conditions of amino acid starvation. In addition, the expression data for the cpc1 and cpc2 genes indicate that cross-pathway control is superimposed on leucine-specific regulation of fruiting body development in the leu1 mutant. This was further substantiated by growth experiments in which the wild-type strain was found to show a sterile phenotype when grown on a medium containing the amino acid analogue 5-methyl-tryptophan. Taken together, these data show that pro4 represents a novel mutant type in S. macrospora, in which amino acid starvation acts as a signal that interrupts the development of the fruiting body. Electronic Supplementary Material Supplementary material is available for this article at http://dx.doi.org/10.1007/s00438-005-0021-8  相似文献   

8.
MOTIVATION: When analyzing expression experiments, researchers are often interested in identifying the set of biological processes that are up- or down-regulated under the experimental condition studied. Current approaches, including clustering expression profiles and averaging the expression profiles of genes known to participate in specific processes, fail to provide an accurate estimate of the activity levels of many biological processes. RESULTS: We introduce a probabilistic continuous hidden process Model (CHPM) for time series expression data. CHPM can simultaneously determine the most probable assignment of genes to processes and the level of activation of these processes over time. To estimate model parameters, CHPM uses multiple time series datasets and incorporates prior biological knowledge. Applying CHPM to yeast expression data, we show that our algorithm produces more accurate functional assignments for genes compared to other expression analysis methods. The inferred process activity levels can be used to study the relationships between biological processes. We also report new biological experiments confirming some of the process activity levels predicted by CHPM. AVAILABILITY: A Java implementation is available at http:\\www.cs.cmu.edu\~yanxins\chpm. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

9.
MOTIVATION: Multi-series time-course microarray experiments are useful approaches for exploring biological processes. In this type of experiments, the researcher is frequently interested in studying gene expression changes along time and in evaluating trend differences between the various experimental groups. The large amount of data, multiplicity of experimental conditions and the dynamic nature of the experiments poses great challenges to data analysis. RESULTS: In this work, we propose a statistical procedure to identify genes that show different gene expression profiles across analytical groups in time-course experiments. The method is a two-regression step approach where the experimental groups are identified by dummy variables. The procedure first adjusts a global regression model with all the defined variables to identify differentially expressed genes, and in second a variable selection strategy is applied to study differences between groups and to find statistically significant different profiles. The methodology is illustrated on both a real and a simulated microarray dataset.  相似文献   

10.
MOTIVATION: Characterizing the dynamic regulation of gene expression by time course experiments is becoming more and more important. A common problem is to identify differentially expressed genes between the treatment and control time course. It is often difficult to compare expression patterns of a gene between two time courses for the following reasons: (1) the number of sampling time points may be different or hard to be aligned between the treatment and the control time courses; (2) estimation of the function that describes the expression of a gene in a time course is difficult and error-prone due to the limited number of time points. We propose a novel method to identify the differentially expressed genes between two time courses, which avoids direct comparison of gene expression patterns between the two time courses. RESULTS: Instead of attempting to 'align' and compare the two time courses directly, we first convert the treatment and control time courses into neighborhood systems that reflect the underlying relationships between genes. We then identify the differentially expressed genes by comparing the two gene relationship networks. To verify our method, we apply it to two treatment-control time course datasets. The results are consistent with the previous results and also give some new biologically meaningful findings. AVAILABILITY: The algorithm in this paper is coded in C++ and is available from http://leili-lab.cmb.usc.edu/yeastaging/projects/MARD/  相似文献   

11.
Time course microarray experiments designed to characterize the dynamic regulation of gene expression in biological systems are becoming increasingly important. One critical issue that arises when examining time course microarray data is the identification of genes that show different temporal expression patterns among biological conditions. Here we propose a Bayesian hierarchical model to incorporate important experimental factors and to account for correlated gene expression measurements over time and over different genes. A new gene selection algorithm is also presented with the model to simultaneously identify genes that show changes in expression among biological conditions, in response to time and other experimental factors of interest. The algorithm performs well in terms of the false positive and false negative rates in simulation studies. The methodology is applied to a mouse model time course experiment to correlate temporal changes in azoxymethane-induced gene expression profiles with colorectal cancer susceptibility.  相似文献   

12.
The moderately halophilic, chloride-dependent bacterium Halobacillus halophilus switches its osmolyte strategy with the salinity in its environment by the production of different compatible solutes. Ectoine is produced predominantly at very high salinities, along with proline. Interestingly, ectoine production is growth phase dependent which led to a more than 1000-fold change in the ectoine : proline ratio from 0.04 in exponential to 27.4 in late stationary phase cultures. The genes encoding the ectoine biosynthesis pathway were identified on the chromosome in the order ectABC . They form an operon that is expressed in a salinity-dependent manner with low-level expression below 1.5 M NaCl but 10-fold and 23-fold increased expression at 2.5 and 3.0 M NaCl respectively. The temporal expression of genes involved in osmoresponse is different with gdh / gln and pro genes being first, followed by ect genes. Chloride had no effect on expression of ect genes, but stimulated cellular EctC synthesis as well as ectoine production. These data demonstrate, for the first time, a growth-phase dependent switch in osmolyte strategy in a moderate halophile and, additionally, represent another piece of the chloride regulon of H. halophilus .  相似文献   

13.
The ultimate goal of functional genomics is to define the function of all the genes in the genome of an organism. A large body of information of the biological roles of genes has been accumulated and aggregated in the past decades of research, both from traditional experiments detailing the role of individual genes and proteins, and from newer experimental strategies that aim to characterize gene function on a genomic scale.It is clear that the goal of functional genomics can only be achieved by integrating information and data sources from the variety of these different experiments. Integration of different data is thus an important challenge for bioinformatics.The integration of different data sources often helps to uncover non-obvious relationships between genes, but there are also two further benefits. First, it is likely that whenever information from multiple independent sources agrees, it should be more valid and reliable. Secondly, by looking at the union of multiple sources, one can cover larger parts of the genome. This is obvious for integrating results from multiple single gene or protein experiments, but also necessary for many of the results from genome-wide experiments since they are often confined to certain (although sizable) subsets of the genome.In this paper, we explore an example of such a data integration procedure. We focus on the prediction of membership in protein complexes for individual genes. For this, we recruit six different data sources that include expression profiles, interaction data, essentiality and localization information. Each of these data sources individually contains some weakly predictive information with respect to protein complexes, but we show how this prediction can be improved by combining all of them. Supplementary information is available at http://bioinfo.mbb.yale.edu/integrate/interactions/.Abbreviations: TP: true possitive; TN: true negative; FP: false positive; FN: false negative; Y2H: yeast two-hybrid.  相似文献   

14.
15.
MOTIVATION: There is a very large and growing level of effort toward improving the platforms, experiment designs, and data analysis methods for microarray expression profiling. Along with a growing richness in the approaches there is a growing confusion among most scientists as to how to make objective comparisons and choices between them for different applications. There is a need for a standard framework for the microarray community to compare and improve analytical and statistical methods. RESULTS: We report on a microarray data set comprising 204 in-situ synthesized oligonucleotide arrays, each hybridized with two-color cDNA samples derived from 20 different human tissues and cell lines. Design of the approximately 24 000 60mer oligonucleotides that report approximately 2500 known genes on the arrays, and design of the hybridization experiments, were carried out in a way that supports the performance assessment of alternative data processing approaches and of alternative experiment and array designs. We also propose standard figures of merit for success in detecting individual differential expression changes or expression levels, and for detecting similarities and differences in expression patterns across genes and experiments. We expect this data set and the proposed figures of merit will provide a standard framework for much of the microarray community to compare and improve many analytical and statistical methods relevant to microarray data analysis, including image processing, normalization, error modeling, combining of multiple reporters per gene, use of replicate experiments, and sample referencing schemes in measurements based on expression change. AVAILABILITY/SUPPLEMENTARY INFORMATION: Expression data and supplementary information are available at http://www.rii.com/publications/2003/HE_SDS.htm  相似文献   

16.
MOTIVATION: Time-course microarray experiments are designed to study biological processes in a temporal fashion. Longitudinal gene expression data arise when biological samples taken from the same subject at different time points are used to measure the gene expression levels. It has been observed that the gene expression patterns of samples of a given tumor measured at different time points are likely to be much more similar to each other than are the expression patterns of tumor samples of the same type taken from different subjects. In statistics, this phenomenon is called the within-subject correlation of repeated measurements on the same subject, and the resulting data are called longitudinal data. It is well known in other applications that valid statistical analyses have to appropriately take account of the possible within-subject correlation in longitudinal data. RESULTS: We apply estimating equation techniques to construct a robust statistic, which is a variant of the robust Wald statistic and accounts for the potential within-subject correlation of longitudinal gene expression data, to detect genes with temporal changes in expression. We associate significance levels to the proposed statistic by either incorporating the idea of the significance analysis of microarrays method or using the mixture model method to identify significant genes. The utility of the statistic is demonstrated by applying it to an important study of osteoblast lineage-specific differentiation. Using simulated data, we also show pitfalls in drawing statistical inference when the within-subject correlation in longitudinal gene expression data is ignored.  相似文献   

17.
Although microarray data have been successfully used for gene clustering and classification, the use of time series microarray data for constructing gene regulatory networks remains a particularly difficult task. The challenge lies in reliably inferring regulatory relationships from datasets that normally possess a large number of genes and a limited number of time points. In addition to the numerical challenge, the enormous complexity and dynamic properties of gene expression regulation also impede the progress of inferring gene regulatory relationships. Based on the accepted model of the relationship between regulator and target genes, we developed a new approach for inferring gene regulatory relationships by combining target-target pattern recognition and examination of regulator-specific binding sites in the promoter regions of putative target genes. Pattern recognition was accomplished in two steps: A first algorithm was used to search for the genes that share expression profile similarities with known target genes (KTGs) of each investigated regulator. The selected genes were further filtered by examining for the presence of regulator-specific binding sites in their promoter regions. As we implemented our approach to 18 yeast regulator genes and their known target genes, we discovered 267 new regulatory relationships, among which 15% are rediscovered, experimentally validated ones. Of the discovered target genes, 36.1% have the same or similar functions to a KTG of the regulator. An even larger number of inferred genes fall in the biological context and regulatory scope of their regulators. Since the regulatory relationships are inferred from pattern recognition between target-target genes, the method we present is especially suitable for inferring gene regulatory relationships in which there is a time delay between the expression of regulating and target genes.  相似文献   

18.
MOTIVATION: The technology of hybridization to DNA arrays is used to obtain the expression levels of many different genes simultaneously. It enables searching for genes that are expressed specifically under certain conditions. However, the technology produces large amounts of data demanding computational methods for their analysis. It is necessary to find ways to compare data from different experiments and to consider the quality and reproducibility of the data. RESULTS: Data analyzed in this paper have been generated by hybridization of radioactively labeled targets to DNA arrays spotted on nylon membranes. We introduce methods to compare the intensity values of several hybridization experiments. This is essential to find differentially expressed genes or to do pattern analysis. We also discuss possibilities for quality control of the acquired data. AVAILABILITY: http://www.dkfz.de/tbi CONTACT: M.Vingron@dkfz-heidelberg.de  相似文献   

19.
Mining gene expression databases for association rules   总被引:16,自引:0,他引:16  
  相似文献   

20.
New platforms allow quantification of gene expression from large, replicated experiments but current sampling protocols for plant tissue using immediate flash freezing in liquid nitrogen are a barrier to these high-throughput studies. In this study, we compared four sampling methods for RNA extraction for gene expression analysis: (1) the standard sampling method of flash freezing whole leaves in liquid nitrogen immediately upon removal from the plant; (2) incubation of excised leaf disks for 2 min at field temperature followed by flash freezing; (3) incubation of excised leaf disks for 1 h on ice followed by flash freezing; and (4) incubation of excised leaf disks for 1 h at field temperature followed by flash freezing. Gene expression analysis was done for 23 genes using nCounter, and normalization of the data was done using the geometric mean of five housekeeping genes. Quality of RNA was highest for protocol A and lowest for protocol D. Despite some differences in RNA quality, gene expression was not significantly different among protocols A, B, and C for any of the 23 genes. Expression of some genes was significantly different between protocol D and the other protocols. This study demonstrates that when sampling leaf disks for gene expression analysis, the time between tissue removal from the plant and flash freezing in liquid nitrogen can be extended. This increase in time allowable during sampling provides greater flexibility in sampling large replicated field experiments for statistical analysis of gene expression data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号