首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To facilitate analysis and understanding of biological systems, large-scale data are often integrated into models using a variety of mathematical and computational approaches. Such models describe the dynamics of the biological system and can be used to study the changes in the state of the system over time. For many model classes, such as discrete or continuous dynamical systems, there exist appropriate frameworks and tools for analyzing system dynamics. However, the heterogeneous information that encodes and bridges molecular and cellular dynamics, inherent to fine-grained molecular simulation models, presents significant challenges to the study of system dynamics. In this paper, we present an algorithmic information theory based approach for the analysis and interpretation of the dynamics of such executable models of biological systems. We apply a normalized compression distance (NCD) analysis to the state representations of a model that simulates the immune decision making and immune cell behavior. We show that this analysis successfully captures the essential information in the dynamics of the system, which results from a variety of events including proliferation, differentiation, or perturbations such as gene knock-outs. We demonstrate that this approach can be used for the analysis of executable models, regardless of the modeling framework, and for making experimentally quantifiable predictions.  相似文献   

2.
Comparative analyses of cellular interaction networks enable understanding of the cell's modular organization through identification of functional modules and complexes. These techniques often rely on topological features such as connectedness and density, based on the premise that functionally related proteins are likely to interact densely and that these interactions follow similar evolutionary trajectories. Significant recent work has focused on efficient algorithms for identification of such functional modules and their conservation. In spite of algorithmic advances, development of a comprehensive infrastructure for interaction databases is in relative infancy compared to corresponding sequence analysis tools. One critical, and as yet unresolved aspect of this infrastructure is a measure of the statistical significance of a match, or a dense subcomponent. In the absence of analytical measures, conventional methods rely on computationally expensive simulations based on ad-hoc models for quantifying significance. In this paper, we present techniques for analytically quantifying statistical significance of dense components in reference model graphs. We consider two reference models--a G(n, p) model in which each pair of nodes in a graph has an identical likelihood, p, of sharing an edge, and a two-level G(n, p) model, which accounts for high-degree hub nodes generally observed in interaction networks. Experiments performed on a rich collection of protein interaction (PPI) networks show that the proposed model provides a reliable means of evaluating statistical significance of dense patterns in these networks. We also adapt existing state-of-the-art network clustering algorithms by using our statistical significance measure as an optimization criterion. Comparison of the resulting module identification algorithm, SIDES, with existing methods shows that SIDES outperforms existing algorithms in terms of sensitivity and specificity of identified clusters with respect to available GO annotations.  相似文献   

3.
Finding statistically significant communities in networks   总被引:1,自引:0,他引:1  
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks.  相似文献   

4.
Bayesian statistical methods based on simulation techniques have recently been shown to provide powerful tools for the analysis of genetic population structure. We have previously developed a Markov chain Monte Carlo (MCMC) algorithm for characterizing genetically divergent groups based on molecular markers and geographical sampling design of the dataset. However, for large-scale datasets such algorithms may get stuck to local maxima in the parameter space. Therefore, we have modified our earlier algorithm to support multiple parallel MCMC chains, with enhanced features that enable considerably faster and more reliable estimation compared to the earlier version of the algorithm. We consider also a hierarchical tree representation, from which a Bayesian model-averaged structure estimate can be extracted. The algorithm is implemented in a computer program that features a user-friendly interface and built-in graphics. The enhanced features are illustrated by analyses of simulated data and an extensive human molecular dataset. AVAILABILITY: Freely available at http://www.rni.helsinki.fi/~jic/bapspage.html.  相似文献   

5.
MOTIVATION: Metabolic flux analysis of biochemical reaction networks using isotope tracers requires software tools that can analyze the dynamics of isotopic isomer (isotopomer) accumulation in metabolites and reveal the underlying kinetic mechanisms of metabolism regulation. Since existing tools are restricted by the isotopic steady state and remain disconnected from the underlying kinetic mechanisms, we have recently developed a novel approach for the analysis of tracer-based metabolomic data that meets these requirements. The present contribution describes the last step of this development: implementation of (i) the algorithms for the determination of the kinetic parameters and respective metabolic fluxes consistent with the experimental data and (ii) statistical analysis of both fluxes and parameters, thereby lending it a practical application. RESULTS: The C++ applications package for dynamic isotopomer distribution data analysis was supplemented by (i) five distinct methods for resolving a large system of differential equations; (ii) the 'simulated annealing' algorithm adopted to estimate the set of parameters and metabolic fluxes, which corresponds to the global minimum of the difference between the computed and measured isotopomer distributions; and (iii) the algorithms for statistical analysis of the estimated parameters and fluxes, which use the covariance matrix evaluation, as well as Monte Carlo simulations. An example of using this tool for the analysis of (13)C distribution in the metabolites of glucose degradation pathways has demonstrated the evaluation of optimal set of parameters and fluxes consistent with the experimental pattern, their range and statistical significance, and also the advantages of using dynamic rather than the usual steady-state method of analysis. AVAILABILITY: Software is available free from http://www.bq.ub.es/bioqint/selivanov.htm  相似文献   

6.
7.
We present a new method for inferring hidden Markov models from noisy time sequences without the necessity of assuming a model architecture, thus allowing for the detection of degenerate states. This is based on the statistical prediction techniques developed by Crutchfield et al. and generates so called causal state models, equivalent in structure to hidden Markov models. The new method is applicable to any continuous data which clusters around discrete values and exhibits multiple transitions between these values such as tethered particle motion data or Fluorescence Resonance Energy Transfer (FRET) spectra. The algorithms developed have been shown to perform well on simulated data, demonstrating the ability to recover the model used to generate the data under high noise, sparse data conditions and the ability to infer the existence of degenerate states. They have also been applied to new experimental FRET data of Holliday Junction dynamics, extracting the expected two state model and providing values for the transition rates in good agreement with previous results and with results obtained using existing maximum likelihood based methods. The method differs markedly from previous Markov-model reconstructions in being able to uncover truly hidden states.  相似文献   

8.
Despite the importance of polyploidy and the increasing availability of new genomic data, there remain important gaps in our knowledge of polyploid population genetics. These gaps arise from the complex nature of polyploid data (e.g. multiple alleles and loci, mixed inheritance patterns, association between ploidy and mating system variation). Furthermore, many of the standard tools for population genetics that have been developed for diploids are often not feasible for polyploids. This review aims to provide an overview of the state‐of‐the‐art in polyploid population genetics and to identify the main areas where further development of molecular techniques and statistical theory is required. We review commonly used molecular tools (amplified fragment length polymorphism, microsatellites, Sanger sequencing, next‐generation sequencing and derived technologies) and their challenges associated with their use in polyploid populations: that is, allele dosage determination, null alleles, difficulty of distinguishing orthologues from paralogues and copy number variation. In addition, we review the approaches that have been used for population genetic analysis in polyploids and their specific problems. These problems are in most cases directly associated with dosage uncertainty and the problem of inferring allele frequencies and assumptions regarding inheritance. This leads us to conclude that for advancing the field of polyploid population genetics, most priority should be given to development of new molecular approaches that allow efficient dosage determination, and to further development of analytical approaches to circumvent dosage uncertainty and to accommodate ‘flexible’ modes of inheritance. In addition, there is a need for more simulation‐based studies that test what kinds of biases could result from both existing and novel approaches.  相似文献   

9.
Over the past decade, technological advances in experimental and animal tracking techniques have motivated a renewed theoretical interest in animal collective motion and, in particular, locust swarming. This review offers a comprehensive biological background followed by comparative analysis of recent models of locust collective motion, in particular locust marching, their settings, and underlying assumptions. We describe a wide range of recent modeling and simulation approaches, from discrete agent-based models of self-propelled particles to continuous models of integro-differential equations, aimed at describing and analyzing the fascinating phenomenon of locust collective motion. These modeling efforts have a dual role: The first views locusts as a quintessential example of animal collective motion. As such, they aim at abstraction and coarse-graining, often utilizing the tools of statistical physics. The second, which originates from a more biological perspective, views locust swarming as a scientific problem of its own exceptional merit. The main goal should, thus, be the analysis and prediction of natural swarm dynamics. We discuss the properties of swarm dynamics using the tools of statistical physics, as well as the implications for laboratory experiments and natural swarms. Finally, we stress the importance of a combined-interdisciplinary, biological-theoretical effort in successfully confronting the challenges that locusts pose at both the theoretical and practical levels.  相似文献   

10.

Background

The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome [1] would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.

Results

To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.

Conclusion

We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.  相似文献   

11.
Developments in the design of small peptides that mimic proteins in complexity, recent advances in nanosecond time-resolved spectroscopy methods to study peptides and the development of modern, highly parallel simulation algorithms have come together to give us a detailed picture of peptide folding dynamics. Two newly implemented simulation techniques, parallel replica dynamics and replica exchange molecular dynamics, can now describe directly from simulations the kinetics and thermodynamics of peptide formation, respectively. Given these developments, the simulation community now has the tools to verify and validate simulation protocols and models (forcefields).  相似文献   

12.
This article describes the application of a change-point algorithm to the analysis of stochastic signals in biological systems whose underlying state dynamics consist of transitions between discrete states. Applications of this analysis include molecular-motor stepping, fluorophore bleaching, electrophysiology, particle and cell tracking, detection of copy number variation by sequencing, tethered-particle motion, etc. We present a unified approach to the analysis of processes whose noise can be modeled by Gaussian, Wiener, or Ornstein-Uhlenbeck processes. To fit the model, we exploit explicit, closed-form algebraic expressions for maximum-likelihood estimators of model parameters and estimated information loss of the generalized noise model, which can be computed extremely efficiently. We implement change-point detection using the frequentist information criterion (which, to our knowledge, is a new information criterion). The frequentist information criterion specifies a single, information-based statistical test that is free from ad hoc parameters and requires no prior probability distribution. We demonstrate this information-based approach in the analysis of simulated and experimental tethered-particle-motion data.  相似文献   

13.
Understanding the basis for intracellular motion is critical as the field moves toward a deeper understanding of the relation between Brownian forces, molecular crowding, and anisotropic (or isotropic) energetic forcing. Effective forces and other parameters used to summarize molecular motion change over time in live cells due to latent state changes, e.g., changes induced by dynamic micro-environments, photobleaching, and other heterogeneity inherent in biological processes. This study discusses limitations in currently popular analysis methods (e.g., mean square displacement-based analyses) and how new techniques can be used to systematically analyze Single Particle Tracking (SPT) data experiencing abrupt state changes in time or space. The approach is to track GFP tagged chromatids in metaphase in live yeast cells and quantitatively probe the effective forces resulting from dynamic interactions that reflect the sum of a number of physical phenomena. State changes can be induced by various sources including: microtubule dynamics exerting force through the centromere, thermal polymer fluctuations, and DNA-based molecular machines including polymerases and protein exchange complexes such as chaperones and chromatin remodeling complexes. Simulations aiming to show the relevance of the approach to more general SPT data analyses are also studied. Refined force estimates are obtained by adopting and modifying a nonparametric Bayesian modeling technique, the Hierarchical Dirichlet Process Switching Linear Dynamical System (HDP-SLDS), for SPT applications. The HDP-SLDS method shows promise in systematically identifying dynamical regime changes induced by unobserved state changes when the number of underlying states is unknown in advance (a common problem in SPT applications). We expand on the relevance of the HDP-SLDS approach, review the relevant background of Hierarchical Dirichlet Processes, show how to map discrete time HDP-SLDS models to classic SPT models, and discuss limitations of the approach. In addition, we demonstrate new computational techniques for tuning hyperparameters and for checking the statistical consistency of model assumptions directly against individual experimental trajectories; the techniques circumvent the need for “ground-truth” and/or subjective information.  相似文献   

14.
15.
Understanding how plant community dynamics are impacted by altered disturbance regimes is a pressing challenge for restoration ecology. Most assessments of community dynamics involve computationally intensive statistical techniques, while management often defers to derived, qualitative “state‐and‐transition” models. Here, we demonstrate an intermediate approach to track and predict community resilience, diversifying the tools available to assess ecosystem change. First, we develop indices of sagebrush‐steppe community structure in permanent monitoring plots based on plant functional types and our conceptual understanding of the ecosystem. The indices define a bivariate space within which the trajectories of permanent monitoring plots can be tracked. Second, we quantify two metrics of community resilience: resistance (overall change during the time period) and stability (average amount of movement per monitoring period). Plots dominated by obligate seeder shrubs displayed low resilience relative to those dominated by grasses and forbs or resprouting shrubs. Resilience was strongly related to initial plant functional type composition and elevation. Our results suggest restoration objectives should consider how plant traits control ecosystem responses to disturbance. We suggest that the approach developed here can help assess longer‐term resilience, evaluate restoration success, and identify communities at risk of state transitions.  相似文献   

16.
Colocalization of differently labeled biomolecules is a valuable tool in fluorescence microscopy and can provide information on biomolecular interactions. With the advent of super-resolution microscopy, colocalization analysis is getting closer to molecular resolution, bridging the gap to other technologies such as fluorescence resonance energy transfer. Among these novel microscopic techniques, single-molecule localization-based super-resolution methods offer the advantage of providing single-molecule coordinates that, rather than intensity information, can be used for colocalization analysis. This requires adapting the existing mathematical algorithms for localization microscopy data. Here, we introduce an algorithm for coordinate-based colocalization analysis which is suited for single-molecule super-resolution data. In addition, we present an experimental configuration for simultaneous dual-color imaging together with a robust approach to correct for optical aberrations with an accuracy of a few nanometers. We demonstrate the potential of our approach for cellular structures and for two proteins binding actin filaments.  相似文献   

17.
Intron-containing and intronless genes have different biological properties and statistical characteristics. Here we propose a new computational method to distinguish between intron-containing and intronless gene sequences. Seven feature parameters , , , , , , and based on detrended fluctuation analysis (DFA) are fully used, and thus we can compute a 7-dimensional feature vector for any given gene sequence to be discriminated. Furthermore, support vector machine (SVM) classifier with Gaussian radial basis kernel function is performed on this feature space to classify the genes into intron-containing and intronless. We investigate the performance of the proposed method in comparison with other state-of-the-art algorithms on biological datasets. The experimental results show that our new method significantly improves the accuracy over those existing techniques.  相似文献   

18.
19.
There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their 'crossover potential' with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse.  相似文献   

20.
Sequence elements, at all levels-DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on -measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs-two half sites with a flexible length gap in between-and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号