首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
MOTIVATION: Cellular processes cause changes over time. Observing and measuring those changes over time allows insights into the how and why of regulation. The experimental platform for doing the appropriate large-scale experiments to obtain time-courses of expression levels is provided by microarray technology. However, the proper way of analyzing the resulting time course data is still very much an issue under investigation. The inherent time dependencies in the data suggest that clustering techniques which reflect those dependencies yield improved performance. RESULTS: We propose to use Hidden Markov Models (HMMs) to account for the horizontal dependencies along the time axis in time course data and to cope with the prevalent errors and missing values. The HMMs are used within a model-based clustering framework. We are given a number of clusters, each represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior. Then, our method finds in an iterative procedure cluster models and an assignment of data points to these models that maximizes the joint likelihood of clustering and models. Partially supervised learning--adding groups of labeled data to the initial collection of clusters--is supported. A graphical user interface allows querying an expression profile dataset for time course similar to a prototype graphically defined as a sequence of levels and durations. We also propose a heuristic approach to automate determination of the number of clusters. We evaluate the method on published yeast cell cycle and fibroblasts serum response datasets, and compare them, with favorable results, to the autoregressive curves method.  相似文献   

3.
Using ANOVA to analyze microarray data   总被引:6,自引:0,他引:6  
Churchill GA 《BioTechniques》2004,37(2):173-5, 177
ANOVA provides a general approach to the analysis of single and multiple factor experiments on both one- and two-color microarray platforms. Mixed model ANOVA is important because in many microarray experiments there are multiple sources of variation that must be taken into consideration when constructing tests for differential expression of a gene. The genome is large, and the signals of expression change can be small, so we must rely on rigorous statistical methods to distinguish signal from noise. We apply statistical tests to ensure that we are not just making up stories based on seeing patterns where there may be none.  相似文献   

4.
苏文 《生态学报》2019,39(13):5005-5013
基于CNKI数据库,采用文献计量和知识图谱的方法,通过对应用生态系统观测研究网络长期定位观测数据的文献进行分析,探讨长期观测数据的应用领域、具体用途、用户特点及不同生态站数据的应用状况与研究主题,以期为提高生态系统观测研究网络长期观测数据的共享服务能力、充分发挥长期观测数据的价值提供参考。分析结果表明:生态系统观测研究网络长期观测数据受到越来越多学者的关注,其应用学科领域以林业、农业基础科学为主,同时不断扩展到其他学科中,呈多元化态势;数据主要在生态系统服务研究、模型模拟、人工林研究、水污染研究、生物多样性研究、小麦玉米研究、土壤水分研究等方面发挥作用;数据的主要用户群体为高等院校和科研院所,不同机构应用长期观测数据开展的研究各有侧重;各生态站的长期观测数据能够为揭示其所代表生态区和生态系统类型的生态系统结构与功能、能量流动与养分循环的变化规律,分析主要生态环境问题的现状、动态变化及驱动机制等方面提供重要支撑。最后,对生态系统观测研究网络长期观测数据应用的相关方面提出几点建议:(1)健全数据引用机制,制定相应的科学数据引用和著录标准;(2)发挥生态网络长期观测数据优势,开展专题数据产品的生产,充分开发生态网络长期观测数据的潜在价值;(3)加大和稳定生态站的经费投入,提高生态站的观测能力和水平,同时还要完善、优化生态站布局。  相似文献   

5.
6.
MOTIVATION: There is currently much interest in reverse-engineering regulatory relationships between genes from microarray expression data. We propose a new algorithmic method for inferring such interactions between genes using data from gene knockout experiments. The algorithm we use is the Sparse Bayesian regression algorithm of Tipping and Faul. This method is highly suited to this problem as it does not require the data to be discretized, overcomes the need for an explicit topology search and, most importantly, requires no heuristic thresholding of the discovered connections. RESULTS: Using simulated expression data, we are able to show that this algorithm outperforms a recently published correlation-based approach. Crucially, it does this without the need to set any ad hoc threshold on possible connections.  相似文献   

7.
8.
The complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha has made clear the entire gene organization of the chloroplast genome. Quite a few genes encoding components of photosynthesis and protein synthesis machinery have been identified by comparative computer analysis. Other genes involved in photosynthesis, respiratory electron transport, and membrane-associated transport in chloroplasts were predicted by the amino acid sequence homology and secondary structure of gene products. Thirty-three open reading frames in the liverwort chloroplast genome remain unidentified. However, most of these open reading frames are also conserved in the chloroplast genomes of two species, a liverwort, Marchantia polymorpha, and tobacco, Nicotiana tabacum, indicating their active functions in chloroplasts.Abbreviations bp base pair - kDa kilodalton - IR inverted repeat - ORF open reading frame - DALA -aminolevulinate  相似文献   

9.
Habitat suitability index (HSI) models rarely characterize the uncertainty associated with their estimates of habitat quality despite the fact that uncertainty can have important management implications. The purpose of this paper was to explore the use of Bayesian belief networks (BBNs) for representing and propagating 3 types of uncertainty in HSI models—uncertainty in the suitability index relationships, the parameters of the HSI equation, and measurement of habitat variables (i.e., model inputs). I constructed a BBN–HSI model, based on an existing HSI model, using Netica™ software. I parameterized the BBN's conditional probability tables via Monte Carlo methods, and developed a discretization scheme that met specifications for numerical error. I applied the model to both real and dummy sites in order to demonstrate the utility of the BBN–HSI model for 1) determining whether sites with different habitat types had statistically significant differences in HSI, and 2) making decisions based on rules that reflect different attitudes toward risk—maximum expected value, maximin, and maximax. I also examined effects of uncertainty in the habitat variables on the model's output. Some sites with different habitat types had different values for E[HSI], the expected value of HSI, but habitat suitability was not significantly different based on the overlap of 90% confidence intervals for E[HSI]. The different decision rules resulted in different rankings of sites, and hence, different decisions based on risk. As measurement uncertainty in habitat variables increased, sites with significantly different (α = 0.1) E[HSI] became statistically more similar. Incorporating uncertainty in HSI models enables explicit consideration of risk and more robust habitat management decisions. © 2012 The Wildlife Society.  相似文献   

10.
The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.  相似文献   

11.
12.
Gene overlap occurs when two or more genes are encoded by the same nucleotides. This phenomenon is found in all taxonomic domains, but is particularly common in viruses, where it may increase the information content of compact genomes or influence the creation of new genes. Here we report a global comparative study of overlapping open reading frames (OvRFs) of 12,609 virus reference genomes in the NCBI database. We retrieved metadata associated with all annotated open reading frames (ORFs) in each genome record to calculate the number, length, and frameshift of OvRFs. Our results show that while the number of OvRFs increases with genome length, they tend to be shorter in longer genomes. The majority of overlaps involve +2 frameshifts, predominantly found in dsDNA viruses. Antisense overlaps in which one of the ORFs was encoded in the same frame on the opposite strand (−0) tend to be longer. Next, we develop a new graph-based representation of the distribution of overlaps among the ORFs of genomes in a given virus family. In the absence of an unambiguous partition of ORFs by homology at this taxonomic level, we used an alignment-free k-mer based approach to cluster protein coding sequences by similarity. We connect these clusters with two types of directed edges to indicate (1) that constituent ORFs are adjacent in one or more genomes, and (2) that these ORFs overlap. These adjacency graphs not only provide a natural visualization scheme, but also a novel statistical framework for analyzing the effects of gene- and genome-level attributes on the frequencies of overlaps.  相似文献   

13.
A Bayesian network classification methodology for gene expression data.   总被引:5,自引:0,他引:5  
We present new techniques for the application of a Bayesian network learning framework to the problem of classifying gene expression data. The focus on classification permits us to develop techniques that address in several ways the complexities of learning Bayesian nets. Our classification model reduces the Bayesian network learning problem to the problem of learning multiple subnetworks, each consisting of a class label node and its set of parent genes. We argue that this classification model is more appropriate for the gene expression domain than are other structurally similar Bayesian network classification models, such as Naive Bayes and Tree Augmented Naive Bayes (TAN), because our model is consistent with prior domain experience suggesting that a relatively small number of genes, taken in different combinations, is required to predict most clinical classes of interest. Within this framework, we consider two different approaches to identifying parent sets which are supported by the gene expression observations and any other currently available evidence. One approach employs a simple greedy algorithm to search the universe of all genes; the second approach develops and applies a gene selection algorithm whose results are incorporated as a prior to enable an exhaustive search for parent sets over a restricted universe of genes. Two other significant contributions are the construction of classifiers from multiple, competing Bayesian network hypotheses and algorithmic methods for normalizing and binning gene expression data in the absence of prior expert knowledge. Our classifiers are developed under a cross validation regimen and then validated on corresponding out-of-sample test sets. The classifiers attain a classification rate in excess of 90% on out-of-sample test sets for two publicly available datasets. We present an extensive compilation of results reported in the literature for other classification methods run against these same two datasets. Our results are comparable to, or better than, any we have found reported for these two sets, when a train-test protocol as stringent as ours is followed.  相似文献   

14.
Commonly accepted intensity-dependent normalization in spotted microarray studies takes account of measurement errors in the differential expression ratio but ignores measurement errors in the total intensity, although the definitions imply the same measurement error components are involved in both statistics. Furthermore, identification of differentially expressed genes is usually considered separately following normalization, which is statistically problematic. By incorporating the measurement errors in both total intensities and differential expression ratios, we propose a measurement-error model for intensity-dependent normalization and identification of differentially expressed genes. This model is also flexible enough to incorporate intra-array and inter-array effects. A Bayesian framework is proposed for the analysis of the proposed measurement-error model to avoid the potential risk of using the common two-step procedure. We also propose a Bayesian identification of differentially expressed genes to control the false discovery rate instead of the ad hoc thresholding of the posterior odds ratio. The simulation study and an application to real microarray data demonstrate promising results.  相似文献   

15.
Dynamic Bayesian networks (DBNs) are considered as a promising model for inferring gene networks from time series microarray data. DBNs have overtaken Bayesian networks (BNs) as DBNs can construct cyclic regulations using time delay information. In this paper, a general framework for DBN modelling is outlined. Both discrete and continuous DBN models are constructed systematically and criteria for learning network structures are introduced from a Bayesian statistical viewpoint. This paper reviews the applications of DBNs over the past years. Real data applications for Saccharomyces cerevisiae time series gene expression data are also shown.  相似文献   

16.

Background  

A central goal of Systems Biology is to model and analyze biological signaling pathways that interact with one another to form complex networks. Here we introduce Qualitative networks, an extension of Boolean networks. With this framework, we use formal verification methods to check whether a model is consistent with the laboratory experimental observations on which it is based. If the model does not conform to the data, we suggest a revised model and the new hypotheses are tested in-silico.  相似文献   

17.
18.
19.
MOTIVATION: Network inference algorithms are powerful computational tools for identifying putative causal interactions among variables from observational data. Bayesian network inference algorithms hold particular promise in that they can capture linear, non-linear, combinatorial, stochastic and other types of relationships among variables across multiple levels of biological organization. However, challenges remain when applying these algorithms to limited quantities of experimental data collected from biological systems. Here, we use a simulation approach to make advances in our dynamic Bayesian network (DBN) inference algorithm, especially in the context of limited quantities of biological data. RESULTS: We test a range of scoring metrics and search heuristics to find an effective algorithm configuration for evaluating our methodological advances. We also identify sampling intervals and levels of data discretization that allow the best recovery of the simulated networks. We develop a novel influence score for DBNs that attempts to estimate both the sign (activation or repression) and relative magnitude of interactions among variables. When faced with limited quantities of observational data, combining our influence score with moderate data interpolation reduces a significant portion of false positive interactions in the recovered networks. Together, our advances allow DBN inference algorithms to be more effective in recovering biological networks from experimentally collected data. AVAILABILITY: Source code and simulated data are available upon request. SUPPLEMENTARY INFORMATION: http://www.jarvislab.net/Bioinformatics/BNAdvances/  相似文献   

20.
MicroRNAs (miRNAs) regulate a large proportion of mammalian genes by hybridizing to targeted messenger RNAs (mRNAs) and down-regulating their translation into protein. Although much work has been done in the genome-wide computational prediction of miRNA genes and their target mRNAs, an open question is how to efficiently obtain functional miRNA targets from a large number of candidate miRNA targets predicted by existing computational algorithms. In this paper, we propose a novel Bayesian model and learning algorithm, GenMiR++ (Generative model for miRNA regulation), that accounts for patterns of gene expression using miRNA expression data and a set of candidate miRNA targets. A set of high-confidence functional miRNA targets are then obtained from the data using a Bayesian learning algorithm. Our model scores 467 high-confidence miRNA targets out of 1,770 targets obtained from TargetScanS in mouse at a false detection rate of 2.5%: several confirmed miRNA targets appear in our high-confidence set, such as the interactions between miR-92 and the signal transduction gene MAP2K4, as well as the relationship between miR-16 and BCL2, an anti-apoptotic gene which has been implicated in chronic lymphocytic leukemia. We present results on the robustness of our model showing that our learning algorithm is not sensitive to various perturbations of the data. Our high-confidence targets represent a significant increase in the number of miRNA targets and represent a starting point for a global understanding of gene regulation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号