首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Time series gene expression data analysis is used widely to study the dynamics of various cell processes. Most of the time series data available today consist of few time points only, thus making the application of standard clustering techniques difficult.  相似文献   

2.
Nowadays, remote sensing technologies produce huge amounts of satellite images that can be helpful to monitor geographical areas over time. A satellite image time series (SITS) usually contains spatio-temporal phenomena that are complex and difficult to understand. Conceiving new data mining tools for SITS analysis is challenging since we need to simultaneously manage the spatial and the temporal dimensions at the same time. In this work, we propose a new clustering framework specifically designed for SITS data. Our method firstly detects spatio-temporal entities, then it characterizes their evolutions by mean of a graph-based representation, and finally it produces clusters of spatio-temporal entities sharing similar temporal behaviors. Unlike previous approaches, which mainly work at pixel-level, our framework exploits a purely object-based representation to perform the clustering task. Object-based analysis involves a segmentation step where segments (objects) are extracted from an image and constitute the element of analysis. We experimentally validate our method on two real world SITS datasets by comparing it with standard techniques employed in remote sensing analysis. We also use a qualitative analysis to highlight the interpretability of the results obtained.  相似文献   

3.
In this paper we test a method to estimate the tree and grass vegetation cover over Australia from satellite-derived normalized difference vegetation index (NDVI) time series (monthly 1981–91, ≈5 km pixels) observations. The evergreen cover is assumed to track along the base of the NDVI time series, which is assumed to be equivalent to the woody vegetation cover. The base of the NDVI time series is estimated using modifications to a classical econometric model (i.e. time series is the sum of trend, seasonal and random components). Estimates of the average evergreen component during 1982–85 and 1986–89 were generally consistent with known vegetation distributions. Changes in evergreen cover were largely restricted to the south-west and south-east of Australia. Those changes were largely the result of differences in rainfall between the two periods. The proposed method for estimating woody vegetation cover is found to be generally robust. However, there are some regions where the grass (or pasture) is mostly evergreen. Some possible refinements are proposed to handle such cases.  相似文献   

4.
5.
6.

Background

Co-evolution is the process in which two (or more) sets of orthologs exhibit a similar or correlative pattern of evolution. Co-evolution is a powerful way to learn about the functional interdependencies between sets of genes and cellular functions and to predict physical interactions. More generally, it can be used for answering fundamental questions about the evolution of biological systems. Orthologs that exhibit a strong signal of co-evolution in a certain part of the evolutionary tree may show a mild signal of co-evolution in other branches of the tree. The major reasons for this phenomenon are noise in the biological input, genes that gain or lose functions, and the fact that some measures of co-evolution relate to rare events such as positive selection. Previous publications in the field dealt with the problem of finding sets of genes that co-evolved along an entire underlying phylogenetic tree, without considering the fact that often co-evolution is local.

Results

In this work, we describe a new set of biological problems that are related to finding patterns of local co-evolution. We discuss their computational complexity and design algorithms for solving them. These algorithms outperform other bi-clustering methods as they are designed specifically for solving the set of problems mentioned above. We use our approach to trace the co-evolution of fungal, eukaryotic, and mammalian genes at high resolution across the different parts of the corresponding phylogenetic trees. Specifically, we discover regions in the fungi tree that are enriched with positive evolution. We show that metabolic genes exhibit a remarkable level of co-evolution and different patterns of co-evolution in various biological datasets. In addition, we find that protein complexes that are related to gene expression exhibit non-homogenous levels of co-evolution across different parts of the fungi evolutionary line. In the case of mammalian evolution, signaling pathways that are related to neurotransmission exhibit a relatively higher level of co-evolution along the primate subtree.

Conclusions

We show that finding local patterns of co-evolution is a computationally challenging task and we offer novel algorithms that allow us to solve this problem, thus opening a new approach for analyzing the evolution of biological systems.  相似文献   

7.
MOTIVATION: Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mine protein-protein interactions from biomedical texts. RESULTS: We present a novel and robust approach for extracting protein-protein interactions from literature. Our method uses a dynamic programming algorithm to compute distinguishing patterns by aligning relevant sentences and key verbs that describe protein interactions. A matching algorithm is designed to extract the interactions between proteins. Equipped only with a dictionary of protein names, our system achieves a recall rate of 80.0% and precision rate of 80.5%. AVAILABILITY: The program is available on request from the authors.  相似文献   

8.
Discovering statistically significant biclusters in gene expression data   总被引:1,自引:0,他引:1  
In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant biclusters. We tested our method on a collection of yeast expression profiles and on a human cancer dataset. Cross validation results show high specificity in assigning function to genes based on their biclusters, and we are able to annotate in this way 196 uncharacterized yeast genes. We also demonstrate how the biclusters lead to detecting new concrete biological associations. In cancer data we are able to detect and relate finer tissue types than was previously possible. We also show that the method outperforms the biclustering algorithm of Cheng and Church (2000).  相似文献   

9.
As custodians of deep time, palaeontologists have an obligation to seek the causes and consequences of long‐term evolutionary trajectories and the processes of ecosystem assembly and collapse. Building explicit process models on the relevant scales can be fraught with difficulties, and causal inference is typically limited to patterns of association. In this review, we discuss some of the ways in which causal connections can be extracted from palaeontological time series and provide an overview of three recently developed analytical frameworks that have been applied to palaeontological questions, namely linear stochastic differential equations, convergent cross mapping and transfer entropy. We outline how these methods differ conceptually, and in practice, and point to available software and worked examples. We end by discussing why a paradigm of dynamical causality is needed to decipher the messages encrypted in palaeontological patterns.  相似文献   

10.
The Poincaré plot is a popular two-dimensional, time series analysis tool because of its intuitive display of dynamic system behavior. Poincaré plots have been used to visualize heart rate and respiratory pattern variabilities. However, conventional quantitative analysis relies primarily on statistical measurements of the cumulative distribution of points, making it difficult to interpret irregular or complex plots. Moreover, the plots are constructed to reflect highly correlated regions of the time series, reducing the amount of nonlinear information that is presented and thereby hiding potentially relevant features. We propose temporal Poincaré variability (TPV), a novel analysis methodology that uses standard techniques to quantify the temporal distribution of points and to detect nonlinear sources responsible for physiological variability. In addition, the analysis is applied across multiple time delays, yielding a richer insight into system dynamics than the traditional circle return plot. The method is applied to data sets of R-R intervals and to synthetic point process data extracted from the Lorenz time series. The results demonstrate that TPV complements the traditional analysis and can be applied more generally, including Poincaré plots with multiple clusters, and more consistently than the conventional measures and can address questions regarding potential structure underlying the variability of a data set.  相似文献   

11.
SarkOne is a genus-specific satellite-DNA family, isolated from the genomes of the species of the genus Sarcocapnos. This satellite DNA is composed of repeats with a consensus length of 855 bp and a mean G+C content of 52.5%. We have sequenced a total of 189 SarkOne monomeric repeats belonging to a total of seven species of the genus Sarcocapnos. The comparative analysis of these sequences both at the intraspecific and the interspecific levels have revealed divergence patterns between species are proportional to between-species divergence according to the phylogeny of the genus. Our study demonstrates that the molecular drive leading to the concerted-evolution pattern of this satellite DNA is a time-dependent process by which new mutations are spreading through genomes and populations at a gradual pace. However, time is a limiting factor in the observation of concerted evolution in some pairwise comparisons. Thus, pairwise comparisons of species sharing a recent common ancestor did not reveal nucleotide sites in transitional stages higher than stage III according to the Strachan's model. By contrast, there was a gradation in the percentage of upper transition stages (IV, V, VI) the more phylogenetically distant the species were. In addition, closely related species shared a high number of polymorphic sites, but these types of sites were not common when comparing more distant species. All these data are discussed in the light of current life-cycle models of satellite-DNA evolution.  相似文献   

12.
13.
14.
15.

Background  

The ability to monitor the change in expression patterns over time, and to observe the emergence of coherent temporal responses using gene expression time series, obtained from microarray experiments, is critical to advance our understanding of complex biological processes. In this context, biclustering algorithms have been recognized as an important tool for the discovery of local expression patterns, which are crucial to unravel potential regulatory mechanisms. Although most formulations of the biclustering problem are NP-hard, when working with time series expression data the interesting biclusters can be restricted to those with contiguous columns. This restriction leads to a tractable problem and enables the design of efficient biclustering algorithms able to identify all maximal contiguous column coherent biclusters.  相似文献   

16.
17.
We describe a method based on time series analysis that divided the rabies enzootic area of southern Ontario into 13 regions using data collected at the township level, the smallest available geographical unit for Ontario (Canada). The intent was to discover ecogeographic patterns if such existed. For the period 1957-89, the quarterly time series of fox rabies cases for each of the 423 townships in the study area was correlated with the time series of its adjacent neighbors. Townships were then linked to adjacent townships provided the pair-wise correlations had significant correlation coefficients. This procedure produced 13 clusters that remained stable when additional lead/lag relationships between townships were examined. Furthermore, those clusters, which we then termed "rabies units," had different behaviors in terms of species distribution, persistence, and periodicity. Time series in adjacent units were not synchronous. We discuss how our findings influenced the rabies control program in Ontario, how they relate to recent findings about the distribution of fox rabies virus subtypes, and how they lend support for the role of metapopulation structulre in persistence of disease.  相似文献   

18.

Background  

In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution of PPI networks and identification of conserved subnetworks across different species, discovery of modules of interaction, use of PPI networks for functional annotation of uncharacterized proteins, and improvement of the accuracy of currently available networks.  相似文献   

19.
20.
The Partial Directed Coherence (PDC) and its generalized formulation (gPDC) are popular tools for investigating, in the frequency domain, the concept of Granger causality among multivariate (MV) time series. PDC and gPDC are formalized in terms of the coefficients of an MV autoregressive (MVAR) model which describes only the lagged effects among the time series and forsakes instantaneous effects. However, instantaneous effects are known to affect linear parametric modeling, and are likely to occur in experimental time series. In this study, we investigate the impact on the assessment of frequency domain causality of excluding instantaneous effects from the model underlying PDC evaluation. Moreover, we propose the utilization of an extended MVAR model including both instantaneous and lagged effects. This model is used to assess PDC either in accordance with the definition of Granger causality when considering only lagged effects (iPDC), or with an extended form of causality, when we consider both instantaneous and lagged effects (ePDC). The approach is first evaluated on three theoretical examples of MVAR processes, which show that the presence of instantaneous correlations may produce misleading profiles of PDC and gPDC, while ePDC and iPDC derived from the extended model provide here a correct interpretation of extended and lagged causality. It is then applied to representative examples of cardiorespiratory and EEG MV time series. They suggest that ePDC and iPDC are better interpretable than PDC and gPDC in terms of the known cardiovascular and neural physiologies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号