首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Several Markov chain models (up to fourth order) have been fitted to the sequences of the seven DNAs presented in Fuchs et al. (1980). Two methods for determining the order of Markov chain are applied to the data. The two methods lead to different conclusions and we dicuss these discrepancies. When the distribution of the nucleotides in a DNA sequence is investigated, it is suggested that the study on the order of the Markov model should be supplemented with additional analysis.  相似文献   

2.
Liu L  Ho YK  Yau S 《DNA and cell biology》2007,26(7):477-483
The inhomogeneous Markov chain model is used to discriminate acceptor and donor sites in genomic DNA sequences. It outperforms statistical methods such as homogeneous Markov chain model, higher order Markov chain and interpolated Markov chain models, and machine-learning methods such as k-nearest neighbor and support vector machine as well. Besides its high accuracy, another advantage of inhomogeneous Markov chain model is its simplicity in computation. In the three states system (acceptor, donor, and neither), the inhomogeneous Markov chain model is combined with a three-layer feed forward neural network. Using this combined system 3175 primate splice-junction gene sequences have been tested, with a prediction accuracy of greater than 98%.  相似文献   

3.
Several statistical methods were tested for accuracy in predicting observed frequencies of di- through hexanucleotides in 74,444 bp of E. coli DNA. A Markov chain was most accurate overall, whereas other methods, including a random model based on mononucleotide frequencies, were very inaccurate. When ranked highest to lowest abundance, the observed frequencies of oligonucleotides up to six bases in length in E. coli DNA were highly asymmetric. All ordered abundance plots had a wide linear range containing the majority of the oligomers which deviated sharply at the high and low ends of the curves. In general, values predicted by a Markov chain closely followed the overall shape of the ordered abundance curves. A simple equation was derived by which the frequency of any nucleotide longer than four bases in the E. coli genome (or any genome) can be relatively accurately estimated from the nested set of component tri- and tetranucleotides by serial application of a 3rd order Markov chain. The equation yielded a mean ratio of 1.03 +/- 0.94 for the observed-to-expected frequencies of the 4,096 hexanucleotides. Hence, the method is a relatively accurate but not perfect predictor of the length in nucleotides between hexanucleotide sites. Higher accuracy can be achieved using a 4th order Markov chain and larger data sets. The high asymmetry in oligonucleotide abundance means that in the E. coli genome of 4.2 X 10(6) bp many relatively short sequences of 7-9 bp are very rare or absent.  相似文献   

4.
One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google''s PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.  相似文献   

5.
Deformable template models and Markov chain Monte Carlo methods are used for analysing a space-time process of intracoronary ultrasound images in order to detect the artery contour and various other characteristics as a function of time.  相似文献   

6.
Maximum likelihood and Bayesian approaches are presented for analyzing hierarchical statistical models of natural selection operating on DNA polymorphism within a panmictic population. For analyzing Bayesian models, we present Markov chain Monte-Carlo (MCMC) methods for sampling from the joint posterior distribution of parameters. For frequentist analysis, an Expectation-Maximization (EM) algorithm is presented for finding the maximum likelihood estimate of the genome wide mean and variance in selection intensity among classes of mutations. The framework presented here provides an ideal setting for modeling mutations dispersed through the genome and, in particular, for the analysis of how natural selection operates on different classes of single nucleotide polymorphisms (SNPs).  相似文献   

7.
Markov chain Monte Carlo (MCMC) methods have been proposed to overcome computational problems in linkage and segregation analyses. This approach involves sampling genotypes at the marker and trait loci. Among MCMC methods, scalar-Gibbs is the easiest to implement, and it is used in genetics. However, the Markov chain that corresponds to scalar-Gibbs may not be irreducible when the marker locus has more than two alleles, and even when the chain is irreducible, mixing has been observed to be slow. Joint sampling of genotypes has been proposed as a strategy to overcome these problems. An algorithm that combines the Elston-Stewart algorithm and iterative peeling (ESIP sampler) to sample genotypes jointly from the entire pedigree is used in this study. Here, it is shown that the ESIP sampler yields an irreducible Markov chain, regardless of the number of alleles at a locus. Further, results obtained by ESIP sampler are compared with other methods in the literature. Of the methods that are guaranteed to be irreducible, ESIP was the most efficient.  相似文献   

8.
A Bayesian approach to DNA sequence segmentation   总被引:3,自引:0,他引:3  
Boys RJ  Henderson DA 《Biometrics》2004,60(3):573-581
Many deoxyribonucleic acid (DNA) sequences display compositional heterogeneity in the form of segments of similar structure. This article describes a Bayesian method that identifies such segments by using a Markov chain governed by a hidden Markov model. Markov chain Monte Carlo (MCMC) techniques are employed to compute all posterior quantities of interest and, in particular, allow inferences to be made regarding the number of segment types and the order of Markov dependence in the DNA sequence. The method is applied to the segmentation of the bacteriophage lambda genome, a common benchmark sequence used for the comparison of statistical segmentation algorithms.  相似文献   

9.
Markov chain Monte Carlo (MCMC) methods have been widely used to overcome computational problems in linkage and segregation analyses. Many variants of this approach exist and are practiced; among the most popular is the Gibbs sampler. The Gibbs sampler is simple to implement but has (in its simplest form) mixing and reducibility problems; furthermore in order to initiate a Gibbs sampling chain we need a starting genotypic or allelic configuration which is consistent with the marker data in the pedigree and which has suitable weight in the joint distribution. We outline a procedure for finding such a configuration in pedigrees which have too many loci to allow for exact peeling. We also explain how this technique could be used to implement a blocking Gibbs sampler.  相似文献   

10.
韩乐 《生物信息学》2004,2(2):27-28
修正非齐次模型是在齐次模型和非齐次模型基础上提出的适用于蛋白质编码区的马尔可夫模型。此模型可以用来分析生物物种进化和基因突变,模型中的马尔可夫度与序列进化水平相关联,转移矩阵与基因突变相关联。本文通过比较7类不同物种-1度马尔可夫链的含量,验证了生物物种进化反映在密码子使用上的特征;通过密码子位点间转移矩阵的计算,分析了基因突变在密码子不同位点上发生的可能性。  相似文献   

11.
土地利用/景观生态学研究中的马尔可夫链统计性质分析   总被引:9,自引:0,他引:9  
马尔可夫链在土地利用和景观生态学研究中得到了广泛应用,而应用中通常假设土地利用变化为满足马尔可夫性的一阶时齐马尔可夫链,对马尔可夫链的统计性质是否成立却很少进行检验.本文以北京市土地利用变化监测数据为算例,提出了马尔可夫链统计性质的皮尔逊χ2 拟合优度检验方法.检验结果表明,土地利用研究中通常假设的时齐性和马尔可夫性(一阶性)在统计学上并不成立,即北京土地利用演变过程为非时齐的高阶马尔可夫链.相对于马尔可夫统计性质的似然比检验中转移概率大于零的要求,皮尔逊χ2检验对转移概率的要求相对宽松,允许转移概率为零,所以应用的范围较似然比检验更为广泛.  相似文献   

12.
Kozumi H 《Biometrics》2000,56(4):1002-1006
This paper considers the discrete survival data from a Bayesian point of view. A sequence of the baseline hazard functions, which plays an important role in the discrete hazard function, is modeled with a hidden Markov chain. It is explained how the resultant model is implemented via Markov chain Monte Carlo methods. The model is illustrated by an application of real data.  相似文献   

13.
Hal Caswell 《Oikos》2009,118(12):1763-1782
Demography is the study of the population consequences of the fates of individuals. Individuals are differentiated on the basis of age or, in general, life cycle stages. The movement of an individual through its life cycle is a random process, and although the eventual destination (death) is certain, the pathways taken to that destination are stochastic and will differ even between identical individuals; this is individual stochasticity. A stage‐classified demographic model contains implicit age‐specific information, which can be analyzed using Markov chain methods. The living stages in the life cycles are transient states in an absorbing Markov chain; death is an absorbing state. This paper presents Markov chain methods for computing the mean and variance of the lifetime number of visits to any transient state, the mean and variance of longevity, the net reproductive rate R0, and the cohort generation time. It presents the matrix calculus methods needed to calculate the sensitivity and elasticity of all these indices to any life history parameters. These sensitivities have many uses, including calculation of selection gradients. It is shown that the use of R0 as a measure of fitness or an invasion exponent gives erroneous results except when R0=λ=1. The Markov chain approach is then generalized to variable environments (deterministic environmental sequences, periodic environments, iid random environments, Markovian environments). Variable environments are analyzed using the vec‐permutation method to create a model that classifies individuals jointly by the stage and environmental condition. Throughout, examples are presented using the North Atlantic right whale (Eubaleana glacialis) and an endangered prairie plant (Lomatium bradshawii) in a stochastic fire environment.  相似文献   

14.
Probabilistic Boolean networks (PBNs) have recently been introduced as a promising class of models of genetic regulatory networks. The dynamic behaviour of PBNs can be analysed in the context of Markov chains. A key goal is the determination of the steady-state (long-run) behaviour of a PBN by analysing the corresponding Markov chain. This allows one to compute the long-term influence of a gene on another gene or determine the long-term joint probabilistic behaviour of a few selected genes. Because matrix-based methods quickly become prohibitive for large sizes of networks, we propose the use of Monte Carlo methods. However, the rate of convergence to the stationary distribution becomes a central issue. We discuss several approaches for determining the number of iterations necessary to achieve convergence of the Markov chain corresponding to a PBN. Using a recently introduced method based on the theory of two-state Markov chains, we illustrate the approach on a sub-network designed from human glioma gene expression data and determine the joint steadystate probabilities for several groups of genes.  相似文献   

15.
In order to search for probable conformations of the peptide, the amino acid side chain, and the carbohydrate linkage in glycoproteins, conformational energy surfaces of glycopeptide model compounds were studied by Monte Carlo methods using the Metropolis algorithm. The potential energies were composed of empirical energy functions which include nonbonded interactions, electrostatics, hydrogen bonding, and torsional energies specified by parameters which have been used for peptides. Calculations were performed on 1-N-acetyl-2-acetamido-beta-D-glucopyranosyl amine and the glycosylated dipeptide N-acetyl-delta-N-(2-acetamido-beta-D-glucopyranosyl)-L-asparaginyl-N'-methyl amide as models for N-glycosylated peptides and on methyl-2-acetamido-alpha-D-galactopyranoside as well as the glycosylated dipeptides N-acetyl-gamma-O-(2-acetamido-alpha-D-galactopyranosyl)-L-threonyl-N'-methyl amide and its seryl analog as models for O-glycosylated glycoproteins. The probable conformations of these compounds were analyzed by single-angle probability tables and by two-dimensional conformation density maps projected from the Markov chains which contained up to six independently varied conformational dihedral angles. The presence of high barriers to rotation required the use of search strategies which resulted in a rather low acceptance rate for new conformations in the Metropolis algorithm in order to avoid trapping of the Markov chain in local energy minima. This problem contributed to the failure of these calculations to attain complete convergence to the thermodynamic limit for the glycosylated dipeptide models in which six dihedral angles were independently varied. Analysis of the results shows that the conformational space available to the highly crowded axial glycosides of the alpha-O-GalNAc type is much more restricted than that for the N-asparaginyl glycopeptides. The most probable conformation for the O-glycosylated peptides is is a beta-turn while N-glycosylated peptides may be either in a beta-turn or an extended conformation.  相似文献   

16.
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.  相似文献   

17.
Boolean networks are a simple but efficient model for describing gene regulatory systems. A number of algorithms have been proposed to infer Boolean networks. However, these methods do not take full consideration of the effects of noise and model uncertainty. In this paper, we propose a full Bayesian approach to infer Boolean genetic networks. Markov chain Monte Carlo algorithms are used to obtain the posterior samples of both the network structure and the related parameters. In addition to regular link addition and removal moves, which can guarantee the irreducibility of the Markov chain for traversing the whole network space, carefully constructed mixture proposals are used to improve the Markov chain Monte Carlo convergence. Both simulations and a real application on cell-cycle data show that our method is more powerful than existing methods for the inference of both the topology and logic relations of the Boolean network from observed data.  相似文献   

18.
An improved Markov chain model has been developed for forecasting of sugarcane yields in which growth indices of biometrical characters based on data from two stages simultaneously have been utilised. Comparisons were also made with the models in use viz. the regression model and the first order Markov chain model.  相似文献   

19.
Statistical methods have been developed for finding local patterns, also called motifs, in multiple protein sequences. The aligned segments may imply functional or structural core regions. However, the existing methods often have difficulties in aligning multiple proteins when sequence residue identities are low (e.g., less than 25%). In this article, we develop a Bayesian model and Markov chain Monte Carlo (MCMC) methods for identifying subtle motifs in protein sequences. Specifically, a motif is defined not only in terms of specific sites characterized by amino acid frequency vectors, but also as a combination of secondary characteristics such as hydrophobicity, polarity, etc. Markov chain Monte Carlo methods are proposed to search for a motif pattern with high posterior probability under the new model. A special MCMC algorithm is developed, involving transitions between state spaces of different dimensions. The proposed methods were supported by a simulated study. It was then tested by two real datasets, including a group of helix-turn-helix proteins, and one set from the CATH Protein Structure Classification Database. Statistical comparisons showed that the new approach worked better than a typical Gibbs sampling approach which is based only on an amino acid model.  相似文献   

20.
MOTIVATION: Bayesian analysis is one of the most popular methods in phylogenetic inference. The most commonly used methods fix a single multiple alignment and consider only substitutions as phylogenetically informative mutations, though alignments and phylogenies should be inferred jointly as insertions and deletions also carry informative signals. Methods addressing these issues have been developed only recently and there has not been so far a user-friendly program with a graphical interface that implements these methods. RESULTS: We have developed an extendable software package in the Java programming language that samples from the joint posterior distribution of phylogenies, alignments and evolutionary parameters by applying the Markov chain Monte Carlo method. The package also offers tools for efficient on-the-fly summarization of the results. It has a graphical interface to configure, start and supervise the analysis, to track the status of the Markov chain and to save the results. The background model for insertions and deletions can be combined with any substitution model. It is easy to add new substitution models to the software package as plugins. The samples from the Markov chain can be summarized in several ways, and new postprocessing plugins may also be installed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号