首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Clustering time-course gene expression data (gene trajectories) is an important step towards solving the complex problem of gene regulatory network modeling and discovery as it significantly reduces the dimensionality of the gene space required for analysis. Traditional clustering methods that perform hill-climbing from randomly initialized cluster centers are prone to produce inconsistent and sub-optimal cluster solutions over different runs. This paper introduces a novel method that hybridizes genetic algorithm (GA) and expectation maximization algorithms (EM) for clustering gene trajectories with the mixtures of multiple linear regression models (MLRs), with the objective of improving the global optimality and consistency of the clustering performance. The proposed method is applied to cluster the human fibroblasts and the yeast time-course gene expression data based on their trajectory similarities. It outperforms the standard EM method significantly in terms of both clustering accuracy and consistency. The biological implications of the improved clustering performance are demonstrated.  相似文献   

2.
MOTIVATION: Cellular processes cause changes over time. Observing and measuring those changes over time allows insights into the how and why of regulation. The experimental platform for doing the appropriate large-scale experiments to obtain time-courses of expression levels is provided by microarray technology. However, the proper way of analyzing the resulting time course data is still very much an issue under investigation. The inherent time dependencies in the data suggest that clustering techniques which reflect those dependencies yield improved performance. RESULTS: We propose to use Hidden Markov Models (HMMs) to account for the horizontal dependencies along the time axis in time course data and to cope with the prevalent errors and missing values. The HMMs are used within a model-based clustering framework. We are given a number of clusters, each represented by one Hidden Markov Model from a finite collection encompassing typical qualitative behavior. Then, our method finds in an iterative procedure cluster models and an assignment of data points to these models that maximizes the joint likelihood of clustering and models. Partially supervised learning--adding groups of labeled data to the initial collection of clusters--is supported. A graphical user interface allows querying an expression profile dataset for time course similar to a prototype graphically defined as a sequence of levels and durations. We also propose a heuristic approach to automate determination of the number of clusters. We evaluate the method on published yeast cell cycle and fibroblasts serum response datasets, and compare them, with favorable results, to the autoregressive curves method.  相似文献   

3.
All organisms have evolved to cope with changes in environmental conditions, ensuring the optimal combination of proliferation and survival. In yeast, exposure to a mild stress leads to an increased tolerance for other stresses. This suggests that yeast uses information from the environment to prepare for future threats. We used the yeast knockout collection to systematically investigate the genes and functions involved in severe stress survival and in the acquisition of stress (cross-) tolerance. Besides genes and functions relevant for survival of heat, acid, and oxidative stress, we found an inverse correlation between mutant growth rate and stress survival. Using chemostat cultures, we confirmed that growth rate governs stress tolerance, with higher growth efficiency at low growth rates liberating the energy for these investments. Cellular functions required for stress tolerance acquisition, independent of the reduction in growth rate, were involved in vesicular transport, the Rpd3 histone deacetylase complex, and the mitotic cell cycle. Stress resistance and acquired stress tolerance in Saccharomyces cerevisiae are governed by a combination of stress-specific and general processes. The reduction of growth rate, irrespective of the cause of this reduction, leads to redistribution of resources toward stress tolerance functions, thus preparing the cells for impending change.  相似文献   

4.
MOTIVATION: This paper introduces the application of a novel clustering method to microarray expression data. Its first stage involves compression of dimensions that can be achieved by applying SVD to the gene-sample matrix in microarray problems. Thus the data (samples or genes) can be represented by vectors in a truncated space of low dimensionality, 4 and 5 in the examples studied here. We find it preferable to project all vectors onto the unit sphere before applying a clustering algorithm. The clustering algorithm used here is the quantum clustering method that has one free scale parameter. Although the method is not hierarchical, it can be modified to allow hierarchy in terms of this scale parameter. RESULTS: We apply our method to three data sets. The results are very promising. On cancer cell data we obtain a dendrogram that reflects correct groupings of cells. In an AML/ALL data set we obtain very good clustering of samples into four classes of the data. Finally, in clustering of genes in yeast cell cycle data we obtain four groups in a problem that is estimated to contain five families. AVAILABILITY: Software is available as Matlab programs at http://neuron.tau.ac.il/~horn/QC.htm.  相似文献   

5.
6.
Wang D  Harper JF  Gribskov M 《Plant physiology》2003,132(4):2152-2165
The genome of the budding yeast (Saccharomyces cerevisiae) provides an important paradigm for transgenomic comparisons with other eukaryotic species. Here, we report a systematic comparison of the protein kinases of yeast (119 kinases) and a reference plant Arabidopsis (1,019 kinases). Using a whole-protein-based, hierarchical clustering approach, the complete set of protein kinases from both species were clustered. We validated our clustering by three observations: (a) clustering pattern of functional orthologs proven in genetic complementation experiments, (b) consistency with reported classifications of yeast kinases, and (c) consistency with the biochemical properties of those Arabidopsis kinases already experimentally characterized. The clustering pattern identified no overlap between yeast kinases and the receptor-like kinases (RLKs) of Arabidopsis. Ten more kinase families were found to be specific for one of the two species. Among them, the calcium-dependent protein kinase and phosphoenolpyruvate carboxylase kinase families are specific for plants, whereas the Ca(2+)/calmodulin-dependent protein kinase and provirus insertion in mouse-like kinase families were found only in yeast and animals. Three yeast kinase families, nitrogen permease reactivator/halotolerance-5), polyamine transport kinase, and negative regulator of sexual conjugation and meiosis, are absent in both plants and animals. The majority of yeast kinase families (21 of 26) display Arabidopsis counterparts, and all are mapped into Arabidopsis families of intracellular kinases that are not related to RLKs. Representatives from 11 of the common families (54 kinases from Arabidopsis and 17 from yeast) share an extremely high degree of similarity (blast E value < 10(-80)), suggesting the likelihood of orthologous functions. Selective expansion of yeast kinase families was observed in Arabidopsis. This is most evident for yeast genes CBK1, HRR25, and SNF1 and the kinase family S6K. Reduction of kinase families was also observed, as in the case of the NEK-like family. The distinguishing features between the two sets of kinases are the selective expansion of yeast families and the generation of a limited number of new kinase families for new functionality in Arabidopsis, most notably, the Arabidopsis RLKs that constitute important components of plant intercellular communication apparatus.  相似文献   

7.
MOTIVATION: In haploinsufficiency profiling data, pleiotropic genes are often misclassified by clustering algorithms that impose the constraint that a gene or experiment belong to only one cluster. We have developed a general probabilistic model that clusters genes and experiments without requiring that a given gene or drug only appear in one cluster. The model also incorporates the functional annotation of known genes to guide the clustering procedure. RESULTS: We applied our model to the clustering of 79 chemogenomic experiments in yeast. Known pleiotropic genes PDR5 and MAL11 are more accurately represented by the model than by a clustering procedure that requires genes to belong to a single cluster. Drugs such as miconazole and fenpropimorph that have different targets but similar off-target genes are clustered more accurately by the model-based framework. We show that this model is useful for summarizing the relationship among treatments and genes affected by those treatments in a compendium of microarray profiles. AVAILABILITY: Supplementary information and computer code at http://genomics.lbl.gov/llda.  相似文献   

8.
High-throughput genomic measurements, interpreted as cooccurring data samples from multiple sources, open up a fresh problem for machine learning: What is in common in the different data sets, that is, what kind of statistical dependencies are there between the paired samples from the different sets? We introduce a clustering algorithm for exploring the dependencies. Samples within each data set are grouped such that the dependencies between groups of different sets capture as much of pairwise dependencies between the samples as possible. We formalize this problem in a novel probabilistic way, as optimization of a Bayes factor. The method is applied to reveal commonalities and exceptions in gene expression between organisms and to suggest regulatory interactions in the form of dependencies between gene expression profiles and regulator binding patterns.  相似文献   

9.
Feiglin A  Moult J  Lee B  Ofran Y  Unger R 《PloS one》2012,7(6):e39662
The yeast protein-protein interaction network has been shown to have distinct topological features such as a scale free degree distribution and a high level of clustering. Here we analyze an additional feature which is called Neighbor Overlap. This feature reflects the number of shared neighbors between a pair of proteins. We show that Neighbor Overlap is enriched in the yeast protein-protein interaction network compared with control networks carefully designed to match the characteristics of the yeast network in terms of degree distribution and clustering coefficient. Our analysis also reveals that pairs of proteins with high Neighbor Overlap have higher sequence similarity, more similar GO annotations and stronger genetic interactions than pairs with low ones. Finally, we demonstrate that pairs of proteins with redundant functions tend to have high Neighbor Overlap. We suggest that a combination of three mechanisms is the basis for this feature: The abundance of protein complexes, selection for backup of function, and the need to allow functional variation.  相似文献   

10.
Summary Multivariate analysis of plant community data has three goals: summarization of redundancy, identification of outliers, and elueidation of relationships. The first two are handled conveniently by initial fast clustering, and the third by subsequent ordination and hierarchical clustering, and perhaps table arrangement.Initial clustering algorithms should achieve withincluster homogeneity and require minimal computer resources. However, algorithmic uniqueness and a hierarchy are not needed. Computing time should be proportional to the amount of data, with no higher dependencies on the number of samples. A method is presented here meeting these requirements, called composite clustering and implemented in a FORTRAN program called COMPCLUS. The computer time required for COMPCLUS clustering is on the order of the time required merely to read the data, regardless of the number of samples.Several large field data sets were analyzed effectively by using COMPCLUS to reduce redundancy and identify outliers, and then ordinating the resulting composite clusters by detrended correspondence analysis (DECORANA). Various clusterings of the same data set can be compared using a percent mutual matches (PMM) index, and a matrix of such values can be ordinated for simultaneous comparison of a number of clusterings.This paper benefited at many points from discussions with Mark O. Hill and Robert H. Whittaker. Mark Hill suggested condensed data storage. This work was done under a National Science Foundation grant to Robert Whittaker. I also appreciate technical assistance from Timothy F. Mason and Steven B. Singer.  相似文献   

11.
The use of mutual information as a similarity measure in agglomerative hierarchical clustering (AHC) raises an important issue: some correction needs to be applied for the dimensionality of variables. In this work, we formulate the decision of merging dependent multivariate normal variables in an AHC procedure as a Bayesian model comparison. We found that the Bayesian formulation naturally shrinks the empirical covariance matrix towards a matrix set a priori (e.g., the identity), provides an automated stopping rule, and corrects for dimensionality using a term that scales up the measure as a function of the dimensionality of the variables. Also, the resulting log Bayes factor is asymptotically proportional to the plug-in estimate of mutual information, with an additive correction for dimensionality in agreement with the Bayesian information criterion. We investigated the behavior of these Bayesian alternatives (in exact and asymptotic forms) to mutual information on simulated and real data. An encouraging result was first derived on simulations: the hierarchical clustering based on the log Bayes factor outperformed off-the-shelf clustering techniques as well as raw and normalized mutual information in terms of classification accuracy. On a toy example, we found that the Bayesian approaches led to results that were similar to those of mutual information clustering techniques, with the advantage of an automated thresholding. On real functional magnetic resonance imaging (fMRI) datasets measuring brain activity, it identified clusters consistent with the established outcome of standard procedures. On this application, normalized mutual information had a highly atypical behavior, in the sense that it systematically favored very large clusters. These initial experiments suggest that the proposed Bayesian alternatives to mutual information are a useful new tool for hierarchical clustering.  相似文献   

12.
Biological processes exhibit different behavior depending on the influent loads, temperature, microorganism activity, and so on. It has been shown that a combination of several models can provide a suitable approach to model such processes. In the present study, we developed a multiple statistical model approach for the monitoring of biological batch processes. The proposed method consists of four main components: (1) multiway principal component analysis (MPCA) to reduce the dimensionality of data and to remove collinearity; (2) multiple models with a posterior probability for modeling different operating regions; (3) local batch monitoring by the T(2)- and Q-statistics of the specific local model; and (4) a new discrimination measure (DM) to identify when the system has shifted to a new operating condition. Under this approach, local monitoring by multiple models divides the entire historical data set into separate regions, which are then modeled separately. Then, these local regions can be supervised separately, leading to more effective batch monitoring. The proposed method is applied to a pilot-scale 80-L sequencing batch reactor (SBR) for biological wastewater treatment. This SBR is characterized by nonstationary, batchwise, and multiple operation modes. The results obtained for the pilot-scale SBR indicate that the proposed method has the ability to model multiple operating conditions, to identify various operating regions, and also to determine whether the biosystem has shifted to a new operating condition. Our findings show that the local monitoring approach can give more reliable and higher resolution monitoring results than the global model.  相似文献   

13.
Our cognition relies on the ability of the brain to segment hierarchically structured events on multiple scales. Recent evidence suggests that the brain performs this event segmentation based on the structure of state-transition graphs behind sequential experiences. However, the underlying circuit mechanisms are poorly understood. In this paper we propose an extended attractor network model for graph-based hierarchical computation which we call the Laplacian associative memory. This model generates multiscale representations for communities (clusters) of associative links between memory items, and the scale is regulated by the heterogenous modulation of inhibitory circuits. We analytically and numerically show that these representations correspond to graph Laplacian eigenvectors, a popular method for graph segmentation and dimensionality reduction. Finally, we demonstrate that our model exhibits chunked sequential activity patterns resembling hippocampal theta sequences. Our model connects graph theory and attractor dynamics to provide a biologically plausible mechanism for abstraction in the brain.  相似文献   

14.
During industrial production process using yeast, cells are exposed to the stress due to the accumulation of ethanol, which affects the cell growth activity and productivity of target products, thus, the ethanol stress-tolerant yeast strains are highly desired. To identify the target gene(s) for constructing ethanol stress tolerant yeast strains, we obtained the gene expression profiles of two strains of Saccharomyces cerevisiae, namely, a laboratory strain and a strain used for brewing Japanese rice wine (sake), in the presence of 5% (v/v) ethanol, using DNA microarray. For the selection of target genes for breeding ethanol stress tolerant strains, clustering of DNA microarray data was performed. For further selection, the ethanol sensitivity of the knockout mutants in each of which the gene selected by DNA microarray analysis is deleted, was also investigated. The integration of the DNA microarray data and the ethanol sensitivity data of knockout strains suggests that the enhancement of expression of genes related to tryptophan biosynthesis might confer the ethanol stress tolerance to yeast cells. Indeed, the strains overexpressing tryptophan biosynthesis genes showed a stress tolerance to 5% ethanol. Moreover, the addition of tryptophan to the culture medium and overexpression of tryptophan permease gene conferred ethanol stress tolerance to yeast cells. These results indicate that overexpression of the genes for trypophan biosynthesis increases the ethanol stress tolerance. Tryptophan supplementation to culture and overexpression of the tryptophan permease gene are also effective for the increase in ethanol stress tolerance. Our methodology for the selection of target genes for constructing ethanol stress tolerant strains, based on the data of DNA microarray analysis and phenotypes of knockout mutants, was validated.  相似文献   

15.
Environmental stress (nutritive, chemical, electromagnetic and thermal) has been shown to disrupt central nervous system (CNS) development in every model system studied to date. However, empirical linkages between stress, specific targets in the brain, and consequences for behavior have rarely been established. The present study experimentally demonstrates one such linkage by examining the effects of ecologically-relevant thermal stress on development of the Drosophila melanogaster mushroom body (MB), a conserved sensory integration and associative center in the insect brain. We show that a daily hyperthermic episode throughout larval and pupal development (1) severely disrupts MB anatomy by reducing intrinsic Kenyon cell (KC) neuron numbers but has little effect on other brain structures or general anatomy, and (2) greatly impairs associative odor learning in adults, despite having little effect on memory or sensory acuity. Hence, heat stress of ecologically relevant duration and intensity can impair brain development and learning potential.  相似文献   

16.
Signal transduction networks are crucial for inter- and intra-cellular signaling. Signals are often transmitted via covalent modification of protein structure, with phosphorylation/dephosphorylation as the primary example. In this paper, we apply a recently described method of computational algebra to the modeling of signaling networks, based on time-course protein modification data. Computational algebraic techniques are employed to construct next-state functions. A Monte Carlo method is used to approximate the Deegan-Packel Index of Power corresponding to the respective variables. The Deegan-Packel Index of Power is used to conjecture dependencies in the cellular signaling networks. We apply this method to two examples of protein modification time-course data available in the literature. These experiments identified protein carbonylation upon exposure of cells to sub-lethal concentrations of copper. We demonstrate that this method can identify protein dependencies that might correspond to regulatory mechanisms to shut down glycolysis in a reverse, step-wise fashion in response to copper-induced oxidative stress in yeast. These examples show that the computational algebra approach can identify dependencies that may outline signaling networks involved in the response of glycolytic enzymes to the oxidative stress caused by copper.  相似文献   

17.
针对发酵过程非线性和时变特点,提出了一种具有实时性的动态MPCA方法,采用多模型非线性结构代替传统MPCA单模型线性化结构,克服了后者不能处理非线性过程和实时性的问题,并避免了MPCA在线应用时预报未来测量值带来的误差,提高了发酵过程性能监测和故障诊断的准确性。对头孢菌素C发酵过程的拟在线仿真研究,验证了基于动态MPCA的统计过程监测的有效性。  相似文献   

18.
MZ Ding  X Wang  W Liu  JS Cheng  Y Yang  YJ Yuan 《PloS one》2012,7(8):e43474
The tolerant mechanism of yeast to the combination of three inhibitors (furfural, phenol and acetic acid) was investigated using 2-DE combined with MALDI-TOF/TOF-MS. The stress response and detoxification related proteins (e.g., Ahp1p, Hsp26p) were expressed higher in the tolerant yeast than in the parental yeast. The expressions of most nitrogen metabolism related proteins (e.g. Gdh1p, Met1p) were higher in the parental yeast, indicating that the tolerant yeast decreases its nitrogen metabolism rate to reserve energy, and possesses high resistance to the stress of combined inhibitors. Furthermore, upon exposure to the inhibitors, the proteins related to protein folding, degradation and translation (e.g., Ssc1p, Ubp14p, Efb1p) were all significantly affected, and the oxidative stress related proteins (e.g., Ahp1p, Grx1p) were increased. Knockdown of genes related to the oxidative stress and unfolded protein response (Grx1, Gre2, Asc1) significantly decreased the tolerance of yeast to inhibitors, which further suggested that yeast responded to the inhibitors mainly by inducing unfolded protein response. This study reveals that increasing the detoxification and tolerating oxidative stress, and/or decreasing the nitrogen metabolism would be promising strategies in developing more tolerant strains to the multiple inhibitors in lignocellulose hydrolysates.  相似文献   

19.
Summary This study examined the water relations and growth responses of Uniola paniculata (sea oats) to (1) three watering regimes and (2) four controlled water-table depths. Uniola paniculata is frequently the dominant foredune grass along much of the southeastern Atlantic and Gulf coasts of the United States, but its distribution is limited in Louisiana. Throughout most of its range, U. paniculata tends to dominate and be well adapted to the most exposed areas of the dune where soil moisture is low. Dune elevations in Louisiana, however, rarely exceed 2 m, and as a result the depth to the water table is generally shallow. We hypothesized that if U. paniculata grows very near the water-table, as it may in Louisiana, it will display signs of water-logging stress. This study demonstrated that excessive soil moisture resulting from inundation or shallow water-table depth has a greater negative effect on plant growth than do low soil moisture conditions. Uniola paniculata's initial response to either drought or inundation was a reduction of leaf (stomatal) conductance and a concomitant decrease in leaf elongation. However, plants could recover from drought-induced leaf xylem pressures of less than-3.3 MPa, but prolonged inundation killed the plants. Waterlogging stress (manifested in significantly reduced leaf stomatal conductances and reduced biomass production) was observed in plants grown at 0.3 m above the water table. This stress was relieved, however, at an elevation of 0.9 m above the water table. As the elevation was increased from 0.9 to 2.7 m, there were no signs of drought stress nor a stimulation in growth due to lower soil moisture. We concluded that although U. paniculata's moisture-conserving traits adapt it well to the dune environment, this species can grow very well at an elevation of only 0.9 m above the water table. Field measurements of water-table depth in three Louisiana populations averaged about 1.3 m. Therefore, the observed limited distribution of U. paniculata along the Louisiana coast apparently cannot be explained by water-logging stress induced by the low dune elevations and the corresponding shallow water-table depth.  相似文献   

20.
The metabolic responses of parental and inhibitors-tolerant yeasts in presence of the combination of three inhibitors (furfural, phenol and acetic acid) during ethanol fermentation were investigated by comparative metabolic profiling. Samples of parental and tolerant yeasts with/without three inhibitors in fermentation medium represented significantly different metabolic states. Further investigation on the specific responses of two strains revealed that the levels of most amino acids, inositol, and phenethylamine were dramatically increased in presence of inhibitors in parental yeast, while they kept relatively stable in tolerant yeast. It suggested that the protein degradation was increased and oxygen stress was induced by combined inhibitors in parental yeast. In addition, carbon metabolism (glycolysis and TCA) and pyrimidine ribonucleotides pathway (uracil and cytosine) were reduced in both strains in presence of combined inhibitors, which was considered as the general stress response. Higher levels of pyridimines in tolerant yeast suggested that they were responsible for counteracting the stress of combined inhibitors. These findings provided new insights into underlying mechanisms of yeast in resistance to the synergistic effects of inhibitors in lignocellulose hydrolysates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号