首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 875 毫秒
1.
The most commonly used models for analysing local dependencies in DNA sequences are (high-order) Markov chains. Incorporating knowledge relative to the possible grouping of the nucleotides enables to define dedicated sub-classes of Markov chains. The problem of formulating lumpability hypotheses for a Markov chain is therefore addressed. In the classical approach to lumpability, this problem can be formulated as the determination of an appropriate state space (smaller than the original state space) such that the lumped chain defined on this state space retains the Markov property. We propose a different perspective on lumpability where the state space is fixed and the partitioning of this state space is represented by a one-to-many probabilistic function within a two-level stochastic process. Three nested classes of lumped processes can be defined in this way as sub-classes of first-order Markov chains. These lumped processes enable parsimonious reparameterizations of Markov chains that help to reveal relevant partitions of the state space. Characterizations of the lumped processes on the original transition probability matrix are derived. Different model selection methods relying either on hypothesis testing or on penalized log-likelihood criteria are presented as well as extensions to lumped processes constructed from high-order Markov chains. The relevance of the proposed approach to lumpability is illustrated by the analysis of DNA sequences. In particular, the use of lumped processes enables to highlight differences between intronic sequences and gene untranslated region sequences.  相似文献   

2.
Baigent S 《Bio Systems》2003,68(2-3):213-222
We study the steady state and dynamical properties of a pair of cells coupled by a voltage-dependent gap junction. The cells have linear membrane properties, and the gap junction is modelled using a simple Markov chain with a voltage-dependent transition matrix. We first show that the voltage-independent case is globally convergent using energy dissipation as a Lyapunov function for the cells, and standard results on the convergence of homogeneous Markov chains for the junction. For the voltage-dependent case, we use the difference in cell and gap junction time scales to reduce the coupled equations for cells and the gap junction to a single equation for the gap junction, but with a transition matrix that depends upon the current gap junction state. We identify cooperativity as key property behind the global convergence of Markov chains and investigate convergence of the voltage-dependent system by establishing some conditions under which cooperativity is preserved.  相似文献   

3.
Coalescent process with fluctuating population size and its effective size   总被引:3,自引:0,他引:3  
We consider a Wright-Fisher model whose population size is a finite Markov chain. We introduce a sequence of two-dimensional discrete time Markov chains whose components describe the coalescent process and the fluctuation of population size. For the limiting process of the sequence of Markov chains, the relationship of the expectation of coalescence time to the harmonic and the arithmetic means of population sizes is shown, and the Laplace transform of the distribution of coalescence time is calculated. We define the coalescence effective population size (cEPS) by the expectation of coalescence time. We show that cEPS is strictly larger (resp. smaller) than the harmonic (resp. arithmetic) mean. As the population size fluctuates more quickly (resp. slowly), cEPS is closer to the harmonic (resp. arithmetic) mean. For the case of a two-valued Markov chain, we show the explicit expression of cEPS and its dependency on the sample size.  相似文献   

4.
The coalescent with recombination process has initially been formulated backwards in time, but simulation algorithms and inference procedures often apply along sequences. Therefore it is of major interest to approximate the coalescent with recombination process by a Markov chain along sequences. We consider the finite loci case and two or more sequences. We formulate a natural Markovian approximation for the tree building process along the sequences, and derive simple and analytically tractable formulae for the distribution of the tree at the next locus conditioned on the tree at the present locus. We compare our Markov approximation to other sequential Markov chains and discuss various applications.  相似文献   

5.
We find the general form of the proper Gaussian Markov chains of second order and give an example for them. Comparing with the Gaussian Markov processes they can be used as an improved growth model.  相似文献   

6.
Biogeography is primarily concerned with the spatial distribution of biodiversity, including performing scenarios in a changing environment. The efforts deployed to develop species distribution models have resulted in predictive tools, but have mostly remained correlative and have largely ignored biotic interactions. Here we build upon the theory of island biogeography as a first approximation to the assembly dynamics of local communities embedded within a metacommunity context. We include all types of interactions and introduce environmental constraints on colonization and extinction dynamics. We develop a probabilistic framework based on Markov chains and derive probabilities for the realization of species assemblages, rather than single species occurrences. We consider the expected distribution of species richness under different types of ecological interactions. We also illustrate the potential of our framework by studying the interplay between different ecological requirements, interactions and the distribution of biodiversity along an environmental gradient. Our framework supports the idea that the future research in biogeography requires a coherent integration of several ecological concepts into a single theory in order to perform conceptual and methodological innovations, such as the switch from single‐species distribution to community distribution.  相似文献   

7.
Aggregated Markov processes related by similarity transformation are equivalent in that they cannot be distinguished by steady-state experiments. We derive an explicit formula for the set of all detailed-balance preserving similarity transformations between such continuous time Markov chains with N states. The matrices that define the allowed similarity transformations are found to be a simple non-linear function applied to almost any element of the special orthogonal group in N dimensions. Since a model is identifiable only if there is no similarity transformations to an equivalent model, we expect this result to prove useful in the theory of identification of aggregated Markov chains, an enterprise of growing importance as more and more single molecules yield to observation.  相似文献   

8.
Predicting connectivity, or how landscapes alter movement, is essential for understanding the scope for species persistence with environmental change. Although it is well known that movement is risky, connectivity modelling often conflates behavioural responses to the matrix through which animals disperse with mortality risk. We derive new connectivity models using random walk theory, based on the concept of spatial absorbing Markov chains. These models decompose the role of matrix on movement behaviour and mortality risk, can incorporate species distribution to predict the amount of flow, and provide both short‐ and long‐term analytical solutions for multiple connectivity metrics. We validate the framework using data on movement of an insect herbivore in 15 experimental landscapes. Our results demonstrate that disentangling the roles of movement behaviour and mortality risk is fundamental to accurately interpreting landscape connectivity, and that spatial absorbing Markov chains provide a generalisable and powerful framework with which to do so.  相似文献   

9.
10.
Probabilistic Boolean networks (PBNs) have recently been introduced as a promising class of models of genetic regulatory networks. The dynamic behaviour of PBNs can be analysed in the context of Markov chains. A key goal is the determination of the steady-state (long-run) behaviour of a PBN by analysing the corresponding Markov chain. This allows one to compute the long-term influence of a gene on another gene or determine the long-term joint probabilistic behaviour of a few selected genes. Because matrix-based methods quickly become prohibitive for large sizes of networks, we propose the use of Monte Carlo methods. However, the rate of convergence to the stationary distribution becomes a central issue. We discuss several approaches for determining the number of iterations necessary to achieve convergence of the Markov chain corresponding to a PBN. Using a recently introduced method based on the theory of two-state Markov chains, we illustrate the approach on a sub-network designed from human glioma gene expression data and determine the joint steadystate probabilities for several groups of genes.  相似文献   

11.
12.
We present the software library marathon, which is designed to support the analysis of sampling algorithms that are based on the Markov-Chain Monte Carlo principle. The main application of this library is the computation of properties of so-called state graphs, which represent the structure of Markov chains. We demonstrate applications and the usefulness of marathon by investigating the quality of several bounding methods on four well-known Markov chains for sampling perfect matchings and bipartite graphs. In a set of experiments, we compute the total mixing time and several of its bounds for a large number of input instances. We find that the upper bound gained by the famous canonical path method is often several magnitudes larger than the total mixing time and deteriorates with growing input size. In contrast, the spectral bound is found to be a precise approximation of the total mixing time.  相似文献   

13.
We present two algorithms to perform computations over Markov chains. The first one determines whether the sequence of powers of the transition matrix of a Markov chain converges or not to a limit matrix. If it does converge, the second algorithm enables us to estimate this limit. The combination of these algorithms allows the computation of a limit using DNA computing. In this sense, we have encoded the states and the transition probabilities using strands of DNA for generating paths of the Markov chain.  相似文献   

14.
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.  相似文献   

15.
An exact expression for the variance of random frequency thata given word has in text generated by a Markov chain is presented.The result is applied to periodic Markov chains, which describethe protein-coding DNA sequences better than simple Markov chains.A new solution to the problem of word overlap is proposed. Itwas found that the expected frequency and overlapping propertiesdetermine most of the variance. The expectation and varianceof counts for triplets are compared with experimental countsin Escherichia coli coding sequences.  相似文献   

16.
We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump (RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.  相似文献   

17.
In this paper we review some recent results on the evolution of probability measures under cellular automata acting on a fullshift. In particular we discuss the crucial role of the attractiveness of maximal measures. We enlarge the context of the results of a previous study of topological Markov chains that are Abelian groups; the shift map is an automorphism of this group. This is carried out by studying the dynamics of Markov measures by a particular additive cellular automata. Many of these topics were within the focus of Francisco Varela's mathematical interests.  相似文献   

18.

Background

Alignment-free sequence comparison using counts of word patterns (grams, k-tuples) has become an active research topic due to the large amount of sequence data from the new sequencing technologies. Genome sequences are frequently modelled by Markov chains and the likelihood ratio test or the corresponding approximate χ 2-statistic has been suggested to compare two sequences. However, it is not known how to best choose the word length k in such studies.

Results

We develop an optimal strategy to choose k by maximizing the statistical power of detecting differences between two sequences. Let the orders of the Markov chains for the two sequences be r 1 and r 2, respectively. We show through both simulations and theoretical studies that the optimal k= max(r 1,r 2)+1 for both long sequences and next generation sequencing (NGS) read data. The orders of the Markov chains may be unknown and several methods have been developed to estimate the orders of Markov chains based on both long sequences and NGS reads. We study the power loss of the statistics when the estimated orders are used. It is shown that the power loss is minimal for some of the estimators of the orders of Markov chains.

Conclusion

Our studies provide guidelines on choosing the optimal word length for the comparison of Markov sequences.
  相似文献   

19.
In this article, we introduce the drifting Markov models (DMMs) which are inhomogeneous Markov models designed for modeling the heterogeneities of sequences (in our case DNA or protein sequences) in a more flexible way than homogeneous Markov chains or even hidden Markov models (HMMs). We focus here on the polynomial drift: the transition matrix varies in a polynomial way. To show the reliability of our models on DNA, we exhibit high similarities between the probability distributions of nucleotides obtained by our models and the frequencies of these nucleotides computed by using a sliding window. In a further step, these DMMs can be used as the states of an HMM: on each of its segments, the observed process can be modeled by a drifting Markov model. Search of rare words in DNA sequences remains possible with DMMs and according to the fits provided, DMMs turn out to be a powerful tool for this purpose. The software is available on request from the author. It will soon be integrated on seq++ library (http://stat.genopole.cnrs.fr/seqpp/).  相似文献   

20.
Statistical analysis of nucleotide sequences.   总被引:5,自引:4,他引:1       下载免费PDF全文
In order to scan nucleic acid databases for potentially relevant but as yet unknown signals, we have developed an improved statistical model for pattern analysis of nucleic acid sequences by modifying previous methods based on Markov chains. We demonstrate the importance of selecting the appropriate parameters in order for the method to function at all. The model allows the simultaneous analysis of several short sequences with unequal base frequencies and Markov order k not equal to 0 as is usually the case in databases. As a test of these modifications, we show that in E. coli sequences there is a bias against palindromic hexamers which correspond to known restriction enzyme recognition sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号