首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Clustering of microarray gene expression data is performed routinely, for genes as well as for samples. Clustering of genes can exhibit functional relationships between genes; clustering of samples on the other hand is important for finding e.g. disease subtypes, relevant patient groups for stratification or related treatments. Usually this is done by first filtering the genes for high-variance under the assumption that they carry most of the information needed for separating different sample groups. If this assumption is violated, important groupings in the data might be lost. Furthermore, classical clustering methods do not facilitate the biological interpretation of the results. Therefore, we propose to methodologically integrate the clustering algorithm with prior biological information. This is different from other approaches as knowledge about classes of genes can be directly used to ease the interpretation of the results and possibly boost clustering performance. Our approach computes dendrograms that resemble decision trees with gene classes used to split the data at each node which can help to find biologically meaningful differences between the sample groups. We have tested the proposed method both on simulated and real data and conclude its usefulness as a complementary method, especially when assumptions of few differentially expressed genes along with an informative mapping of genes to different classes are met.  相似文献   

2.
Nagy A  Wu J  Berland KM 《Biophysical journal》2005,89(3):2077-2090
Fluorescence fluctuation spectroscopy has become an important measurement tool for investigating molecular dynamics, molecular interactions, and chemical kinetics in biological systems. Although the basic theory of fluctuation spectroscopy is well established, it is not widely recognized that saturation of the fluorescence excitation can dramatically alter the size and profile of the fluorescence observation volume from which fluorescence fluctuations are measured, even at relatively modest excitation levels. A precise model for these changes is needed for accurate analysis and interpretation of fluctuation spectroscopy data. We here introduce a combined analytical and computational approach to characterize the observation volume under saturating conditions and demonstrate how the variation in the volume is important in two-photon fluorescence correlation spectroscopy. We introduce a simple approach for analysis of fluorescence correlation spectroscopy data that can fully account for the effects of saturation, and demonstrate its success for characterizing the observed changes in both the amplitude and relaxation timescale of measured correlation curves. We also discuss how a quantitative model for the observed phenomena may be of broader importance in fluorescence fluctuation spectroscopy.  相似文献   

3.
Variation,selection and evolution of function-valued traits   总被引:9,自引:0,他引:9  
We describe an emerging framework for understanding variation, selection and evolution of phenotypic traits that are mathematical functions. We use one specific empirical example – thermal performance curves (TPCs) for growth rates of caterpillars – to demonstrate how models for function-valued traits are natural extensions of more familiar, multivariate models for correlated, quantitative traits. We emphasize three main points. First, because function-valued traits are continuous functions, there are important constraints on their patterns of variation that are not captured by multivariate models. Phenotypic and genetic variation in function-valued traits can be quantified in terms of variance-covariance functions and their associated eigenfunctions: we illustrate how these are estimated as well as their biological interpretations for TPCs. Second, selection on a function-valued trait is itself a function, defined in terms of selection gradient functions. For TPCs, the selection gradient describes how the relationship between an organism's performance and its fitness varies as a function of its temperature. We show how the form of the selection gradient function for TPCs relates to the frequency distribution of environmental states (caterpillar temperatures) during selection. Third, we can predict evolutionary responses of function-valued traits in terms of the genetic variance-covariance and the selection gradient functions. We illustrate how non-linear evolutionary responses of TPCs may occur even when the mean phenotype and the selection gradient are themselves linear functions of temperature. Finally, we discuss some of the methodological and empirical challenges for future studies of the evolution of function-valued traits.  相似文献   

4.
Inference of protein functions is one of the most important aims of modern biology. To fully exploit the large volumes of genomic data typically produced in modern-day genomic experiments, automated computational methods for protein function prediction are urgently needed. Established methods use sequence or structure similarity to infer functions but those types of data do not suffice to determine the biological context in which proteins act. Current high-throughput biological experiments produce large amounts of data on the interactions between proteins. Such data can be used to infer interaction networks and to predict the biological process that the protein is involved in. Here, we develop a probabilistic approach for protein function prediction using network data, such as protein-protein interaction measurements. We take a Bayesian approach to an existing Markov Random Field method by performing simultaneous estimation of the model parameters and prediction of protein functions. We use an adaptive Markov Chain Monte Carlo algorithm that leads to more accurate parameter estimates and consequently to improved prediction performance compared to the standard Markov Random Fields method. We tested our method using a high quality S.cereviciae validation network with 1622 proteins against 90 Gene Ontology terms of different levels of abstraction. Compared to three other protein function prediction methods, our approach shows very good prediction performance. Our method can be directly applied to protein-protein interaction or coexpression networks, but also can be extended to use multiple data sources. We apply our method to physical protein interaction data from S. cerevisiae and provide novel predictions, using 340 Gene Ontology terms, for 1170 unannotated proteins and we evaluate the predictions using the available literature.  相似文献   

5.
Gillis J  Pavlidis P 《PloS one》2011,6(2):e17258
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.  相似文献   

6.
7.
Structural models of biological macromolecules can be tested by comparing calculated and experimental solution scattering curves. We have developed an approach for computing scattering shape functions at medium resolution from models proposed on the basis of other techniques such as electron microscopy. We present the results obtained with the 50S ribosomal subunit from Escherichia coli; two models are considered, one proposed by Lake (1976), the other one by Tischendorf et al. (1975). Although the two models are similar in many respects, their scattering shape functions are significantly different. The comparison with the experimental scattering curve allows us to check the scale of the models and, after scaling, to quantitate the agreement between the observed and the calculated curves. Finally, it can provide a starting point for the structural interpretation of the X-ray data.  相似文献   

8.
It has been increasingly recognized that incorporating prior knowledge into cluster analysis can result in more reliable and meaningful clusters. In contrast to the standard modelbased clustering with a global mixture model, which does not use any prior information, a stratified mixture model was recently proposed to incorporate gene functions or biological pathways as priors in model-based clustering of gene expression profiles: various gene functional groups form the strata in a stratified mixture model. Albeit useful, the stratified method may be less efficient than the global analysis if the strata are non-informative to clustering. We propose a weighted method that aims to strike a balance between a stratified analysis and a global analysis: it weights between the clustering results of the stratified analysis and that of the global analysis; the weight is determined by data. More generally, the weighted method can take advantage of the hierarchical structure of most existing gene functional annotation systems, such as MIPS and Gene Ontology (GO), and facilitate choosing appropriate gene functional groups as priors. We use simulated data and real data to demonstrate the feasibility and advantages of the proposed method.  相似文献   

9.
DNA methylation is an important epigenetic modification involved in many biological processes and diseases. Recent developments in whole genome bisulfite sequencing (WGBS) technology have enabled genome-wide measurements of DNA methylation at single base pair resolution. Many experiments have been conducted to compare DNA methylation profiles under different biological contexts, with the goal of identifying differentially methylated regions (DMRs). Due to the high cost of WGBS experiments, many studies are still conducted without biological replicates. Methods and tools available for analyzing such data are very limited.We develop a statistical method, DSS-single, for detecting DMRs from WGBS data without replicates. We characterize the count data using a rigorous model that accounts for the spatial correlation of methylation levels, sequence depth and biological variation. We demonstrate that using information from neighboring CG sites, biological variation can be estimated accurately even without replicates. DMR detection is then carried out via a Wald test procedure. Simulations demonstrate that DSS-single has greater sensitivity and accuracy than existing methods, and an analysis of H1 versus IMR90 cell lines suggests that it also yields the most biologically meaningful results. DSS-single is implemented in the Bioconductor package DSS.  相似文献   

10.
Thermal performance curves have provided a common framework to study the impact of temperature in biological systems. However, few generalities have emerged to date. Here, we combine an experimental approach with theoretical analyses to demonstrate that performance curves are expected to vary predictably with the levels of biological organization. We measured rates of enzymatic reactions, organismal performance and population viability in Drosophila acclimated to different thermal conditions and show that performance curves become narrower with thermal optima shifting towards lower temperatures at higher levels or organization. We then explain these results on theoretical grounds, showing that this pattern reflects the cumulative impact of asymmetric thermal effects that piles up with complexity. These results and the proposed framework are important to understand how organisms, populations and ecological communities might respond to changing thermal conditions.  相似文献   

11.
We present MultiGO, a web-enabled tool for the identification of biologically relevant gene sets from hierarchically clustered gene expression trees (http://ekhidna.biocenter.helsinki.fi/poxo/multigo). High-throughput gene expression measuring techniques, such as microarrays, are nowadays often used to monitor the expression of thousands of genes. Since these experiments can produce overwhelming amounts of data, computational methods that assist the data analysis and interpretation are essential. MultiGO is a tool that automatically extracts the biological information for multiple clusters and determines their biological relevance, and hence facilitates the interpretation of the data. Since the entire expression tree is analysed, MultiGO is guaranteed to report all clusters that share a common enriched biological function, as defined by Gene Ontology annotations. The tool also identifies a plausible cluster set, which represents the key biological functions affected by the experiment. The performance is demonstrated by analysing drought-, cold- and abscisic acid-related expression data sets from Arabidopsis thaliana. The analysis not only identified known biological functions, but also brought into focus the less established connections to defense-related gene clusters. Thus, in comparison to analyses of manually selected gene lists, the systematic analysis of every cluster can reveal unexpected biological phenomena and produce much more comprehensive biological insights to the experiment of interest.  相似文献   

12.
MOTIVATION: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. RESULTS: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters. AVAILABILITY: Please contact the first author.  相似文献   

13.
14.
MOTIVATION: The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. This list is only the starting point for a meaningful biological interpretation. Modern methods identify relevant biological processes or functions from gene expression data by scoring the statistical significance of predefined functional gene groups, e.g. based on Gene Ontology (GO). We develop methods that increase the explanatory power of this approach by integrating knowledge about relationships between the GO terms into the calculation of the statistical significance. RESULTS: We present two novel algorithms that improve GO group scoring using the underlying GO graph topology. The algorithms are evaluated on real and simulated gene expression data. We show that both methods eliminate local dependencies between GO terms and point to relevant areas in the GO graph that remain undetected with state-of-the-art algorithms for scoring functional terms. A simulation study demonstrates that the new methods exhibit a higher level of detecting relevant biological terms than competing methods.  相似文献   

15.
16.
The root of a phylogenetic tree is fundamental to its biological interpretation, but standard substitution models do not provide any information on its position. Here, we describe two recently developed models that relax the usual assumptions of stationarity and reversibility, thereby facilitating root inference without the need for an outgroup. We compare the performance of these models on a classic test case for phylogenetic methods, before considering two highly topical questions in evolutionary biology: the deep structure of the tree of life and the root of the archaeal radiation. We show that all three alignments contain meaningful rooting information that can be harnessed by these new models, thus complementing and extending previous work based on outgroup rooting. In particular, our analyses exclude the root of the tree of life from the eukaryotes or Archaea, placing it on the bacterial stem or within the Bacteria. They also exclude the root of the archaeal radiation from several major clades, consistent with analyses using other rooting methods. Overall, our results demonstrate the utility of non-reversible and non-stationary models for rooting phylogenetic trees, and identify areas where further progress can be made.  相似文献   

17.
Abstract.   Objectives : A class of sigmoid functions designated generalized von Bertalanffy, Gompertzian and generalized Logistic has been used to fit tumour growth data. Various models have been proposed to explain the biological significance and foundations of these functions. However, no model has been found to fully explain all three or the relationships between them. Materials and Methods : We propose a simple cancer cell population dynamics model that provides a biological interpretation for these sigmoids' ability to represent tumour growth. Results and Conclusions : We show that the three sigmoids can be derived from the model and are in fact a single solution subject to the continuous variation of parameters describing the decay of the proliferation fraction and/or cell quiescence. We use the model to generate proliferation fraction profiles for each sigmoid and comment on the significance of the differences relative to cell cycle-specific and non-cell cycle-specific therapies.  相似文献   

18.
The essential enzymatic cofactor NAD+ can be synthesized in many eukaryotes, including Saccharomyces cerevisiae and mammals, using tryptophan as a starting material. Metabolites along the pathway or on branches have important biological functions. For example, kynurenic acid can act as an NMDA antagonist, thereby functioning as a neuroprotectant in a wide range of pathological states. N-Formyl kynurenine formamidase (FKF) catalyzes the second step of the NAD+ biosynthetic pathway by hydrolyzing N-formyl kynurenine to produce kynurenine and formate. The S. cerevisiae FKF had been reported to be a pyridoxal phosphate-dependent enzyme encoded by BNA3. We used combined crystallographic, bioinformatic and biochemical methods to demonstrate that Bna3p is not an FKF but rather is most likely the yeast kynurenine aminotransferase, which converts kynurenine to kynurenic acid. Additionally, we identify YDR428C, a yeast ORF coding for an alpha/beta hydrolase with no previously assigned function, as the FKF. We predicted its function based on our interpretation of prior structural genomics results and on its sequence homology to known FKFs. Biochemical, bioinformatics, genetic and in vivo metabolite data derived from LC-MS demonstrate that YDR428C, which we have designated BNA7, is the yeast FKF.  相似文献   

19.
The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号