首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to cross a street without being run over, we need to be able to extract very fast hidden causes of dynamically changing multi-modal sensory stimuli, and to predict their future evolution. We show here that a generic cortical microcircuit motif, pyramidal cells with lateral excitation and inhibition, provides the basis for this difficult but all-important information processing capability. This capability emerges in the presence of noise automatically through effects of STDP on connections between pyramidal cells in Winner-Take-All circuits with lateral excitation. In fact, one can show that these motifs endow cortical microcircuits with functional properties of a hidden Markov model, a generic model for solving such tasks through probabilistic inference. Whereas in engineering applications this model is adapted to specific tasks through offline learning, we show here that a major portion of the functionality of hidden Markov models arises already from online applications of STDP, without any supervision or rewards. We demonstrate the emergent computing capabilities of the model through several computer simulations. The full power of hidden Markov model learning can be attained through reward-gated STDP. This is due to the fact that these mechanisms enable a rejection sampling approximation to theoretically optimal learning. We investigate the possible performance gain that can be achieved with this more accurate learning method for an artificial grammar task.  相似文献   

2.
The classical multiple testing model remains an important practical area of statistics with new approaches still being developed. In this paper we develop a new multiple testing procedure inspired by a method sometimes used in a problem with a different focus. Namely, the inference after model selection problem. We note that solutions to that problem are often accomplished by making use of a penalized likelihood function. A classic example is the Bayesian information criterion (BIC) method. In this paper we construct a generalized BIC method and evaluate its properties as a multiple testing procedure. The procedure is applicable to a wide variety of statistical models including regression, contrasts, treatment versus control, change point, and others. Numerical work indicates that, in particular, for sparse models the new generalized BIC would be preferred over existing multiple testing procedures.  相似文献   

3.
Currently available methods for model selection used in phylogenetic analysis are based on an initial fixed-tree topology. Once a model is picked based on this topology, a rigorous search of the tree space is run under that model to find the maximum-likelihood estimate of the tree (topology and branch lengths) and the maximum-likelihood estimates of the model parameters. In this paper, we propose two extensions to the decision-theoretic (DT) approach that relax the fixed-topology restriction. We also relax the fixed-topology restriction for the Bayesian information criterion (BIC) and the Akaike information criterion (AIC) methods. We compare the performance of the different methods (the relaxed, restricted, and the likelihood-ratio test [LRT]) using simulated data. This comparison is done by evaluating the relative complexity of the models resulting from each method and by comparing the performance of the chosen models in estimating the true tree. We also compare the methods relative to one another by measuring the closeness of the estimated trees corresponding to the different chosen models under these methods. We show that varying the topology does not have a major impact on model choice. We also show that the outcome of the two proposed extensions is identical and is comparable to that of the BIC, Extended-BIC, and DT. Hence, using the simpler methods in choosing a model for analyzing the data is more computationally feasible, with results comparable to the more computationally intensive methods. Another outcome of this study is that earlier conclusions about the DT approach are reinforced. That is, LRT, Extended-AIC, and AIC result in more complicated models that do not contribute to the performance of the phylogenetic inference, yet cause a significant increase in the time required for data analysis.  相似文献   

4.
基于混合效应模型的人工红松节子属性   总被引:1,自引:0,他引:1  
基于黑龙江省孟家岗林场60株人工红松1534个节子数据,利用SAS软件中的NLMIXED和GLIMMIX模块构建人工红松节子属性因子(基径、健全节长度、死亡年龄、角度)的混合效应预测模型.采用赤池信息准则(AIC)、贝叶斯信息准则(BIC)、对数似然值(-2LL)和似然比检验(LRT)评价指标对所构建模型的精度进行比较.结果表明:考虑树木效应的混合模型模拟精度均高于传统回归模型.含有b_1、b_2随机参数组合的节子基径模型是最优混合效应模型;含有b_1、b_3随机参数组合的节子健全节长度模型是最优混合效应模型;含有节子基径随机参数的广义线性混合模型为节子死亡年龄的最优模型;含有截距、节子基径、健全节长度3种随机效应参数组合的广义线性混合模型为节子角度的最优模型.混合效应模型比传统回归模型更能有效地描述节子属性.红松是东北主要的用材树种,利用节子属性预测模型结合合理的整枝方案可以提高木材质量.  相似文献   

5.
A finite-context (Markov) model of order k yields the probability distribution of the next symbol in a sequence of symbols, given the recent past up to depth k. Markov modeling has long been applied to DNA sequences, for example to find gene-coding regions. With the first studies came the discovery that DNA sequences are non-stationary: distinct regions require distinct model orders. Since then, Markov and hidden Markov models have been extensively used to describe the gene structure of prokaryotes and eukaryotes. However, to our knowledge, a comprehensive study about the potential of Markov models to describe complete genomes is still lacking. We address this gap in this paper. Our approach relies on (i) multiple competing Markov models of different orders (ii) careful programming techniques that allow orders as large as sixteen (iii) adequate inverted repeat handling (iv) probability estimates suited to the wide range of context depths used. To measure how well a model fits the data at a particular position in the sequence we use the negative logarithm of the probability estimate at that position. The measure yields information profiles of the sequence, which are of independent interest. The average over the entire sequence, which amounts to the average number of bits per base needed to describe the sequence, is used as a global performance measure. Our main conclusion is that, from the probabilistic or information theoretic point of view and according to this performance measure, multiple competing Markov models explain entire genomes almost as well or even better than state-of-the-art DNA compression methods, such as XM, which rely on very different statistical models. This is surprising, because Markov models are local (short-range), contrasting with the statistical models underlying other methods, where the extensive data repetitions in DNA sequences is explored, and therefore have a non-local character.  相似文献   

6.
Discrete Markovian models can be used to characterize patterns in sequences of values and have many applications in biological sequence analysis, including gene prediction, CpG island detection, alignment, and protein profiling. We present ToPS, a computational framework that can be used to implement different applications in bioinformatics analysis by combining eight kinds of models: (i) independent and identically distributed process; (ii) variable-length Markov chain; (iii) inhomogeneous Markov chain; (iv) hidden Markov model; (v) profile hidden Markov model; (vi) pair hidden Markov model; (vii) generalized hidden Markov model; and (viii) similarity based sequence weighting. The framework includes functionality for training, simulation and decoding of the models. Additionally, it provides two methods to help parameter setting: Akaike and Bayesian information criteria (AIC and BIC). The models can be used stand-alone, combined in Bayesian classifiers, or included in more complex, multi-model, probabilistic architectures using GHMMs. In particular the framework provides a novel, flexible, implementation of decoding in GHMMs that detects when the architecture can be traversed efficiently.
This is a PLOS Computational Biology Software Article.
  相似文献   

7.
In order to have confidence in model-based phylogenetic analysis, the model of nucleotide substitution adopted must be selected in a statistically rigorous manner. Several model-selection methods are applicable to maximum likelihood (ML) analysis, including the hierarchical likelihood-ratio test (hLRT), Akaike information criterion (AIC), Bayesian information criterion (BIC), and decision theory (DT), but their performance relative to empirical data has not been investigated thoroughly. In this study, we use 250 phylogenetic data sets obtained from TreeBASE to examine the effects that choice in model selection has on ML estimation of phylogeny, with an emphasis on optimal topology, bootstrap support, and hypothesis testing. We show that the use of different methods leads to the selection of two or more models for approximately 80% of the data sets and that the AIC typically selects more complex models than alternative approaches. Although ML estimation with different best-fit models results in incongruent tree topologies approximately 50% of the time, these differences are primarily attributable to alternative resolutions of poorly supported nodes. Furthermore, topologies and bootstrap values estimated with ML using alternative statistically supported models are more similar to each other than to topologies and bootstrap values estimated with ML under the Kimura two-parameter (K2P) model or maximum parsimony (MP). In addition, Swofford-Olsen-Waddell-Hillis (SOWH) tests indicate that ML trees estimated with alternative best-fit models are usually not significantly different from each other when evaluated with the same model. However, ML trees estimated with statistically supported models are often significantly suboptimal to ML trees made with the K2P model when both are evaluated with K2P, indicating that not all models perform in an equivalent manner. Nevertheless, the use of alternative statistically supported models generally does not affect tests of monophyletic relationships under either the Shimodaira-Hasegawa (S-H) or SOWH methods. Our results suggest that although choice in model selection has a strong impact on optimal tree topology, it rarely affects evolutionary inferences drawn from the data because differences are mainly confined to poorly supported nodes. Moreover, since ML with alternative best-fit models tends to produce more similar estimates of phylogeny than ML under the K2P model or MP, the use of any statistically based model-selection method is vastly preferable to forgoing the model-selection process altogether.  相似文献   

8.
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments.  相似文献   

9.
Liu L  Ho YK  Yau S 《DNA and cell biology》2007,26(7):477-483
The inhomogeneous Markov chain model is used to discriminate acceptor and donor sites in genomic DNA sequences. It outperforms statistical methods such as homogeneous Markov chain model, higher order Markov chain and interpolated Markov chain models, and machine-learning methods such as k-nearest neighbor and support vector machine as well. Besides its high accuracy, another advantage of inhomogeneous Markov chain model is its simplicity in computation. In the three states system (acceptor, donor, and neither), the inhomogeneous Markov chain model is combined with a three-layer feed forward neural network. Using this combined system 3175 primate splice-junction gene sequences have been tested, with a prediction accuracy of greater than 98%.  相似文献   

10.
Models of sequence evolution play an important role in molecular evolutionary studies. The use of inappropriate models of evolution may bias the results of the analysis and lead to erroneous conclusions. Several procedures for selecting the best-fit model of evolution for the data at hand have been proposed, like the likelihood ratio test (LRT) and the Akaike (AIC) and Bayesian (BIC) information criteria. The relative performance of these model-selecting algorithms has not yet been studied under a range of different model trees. In this study, the influence of branch length variation upon model selection is characterized. This is done by simulating sequence alignments under a known model of nucleotide substitution, and recording how often this true model is recovered by different model-fitting strategies. Results of this study agree with previous simulations and suggest that model selection is reasonably accurate. However, different model selection methods showed distinct levels of accuracy. Some LRT approaches showed better performance than the AIC or BIC information criteria. Within the LRTs, model selection is affected by the complexity of the initial model selected for the comparisons, and only slightly by the order in which different parameters are added to the model. A specific hierarchy of LRTs, which starts from a simple model of evolution, performed overall better than other possible LRT hierarchies, or than the AIC or BIC. Received: 2 October 2000 / Accepted: 4 January 2001  相似文献   

11.
A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.  相似文献   

12.
Most existing statistical methods for mapping quantitative trait loci (QTL) are not suitable for analyzing survival traits with a skewed distribution and censoring mechanism. As a result, researchers incorporate parametric and semi-parametric models of survival analysis into the framework of the interval mapping for QTL controlling survival traits. In survival analysis, accelerated failure time (AFT) model is considered as a de facto standard and fundamental model for data analysis. Based on AFT model, we propose a parametric approach for mapping survival traits using the EM algorithm to obtain the maximum likelihood estimates of the parameters. Also, with Bayesian information criterion (BIC) as a model selection criterion, an optimal mapping model is constructed by choosing specific error distributions with maximum likelihood and parsimonious parameters. Two real datasets were analyzed by our proposed method for illustration. The results show that among the five commonly used survival distributions, Weibull distribution is the optimal survival function for mapping of heading time in rice, while Log-logistic distribution is the optimal one for hyperoxic acute lung injury.  相似文献   

13.
基于黑龙江省孟家岗林场、东京城林业局和林口林业局不同林分条件下103株人工红松解析木的2977个圆盘数据,结合林业研究中常见的Kozak(1988)、Muhairwe(1999)、Lee(2003)、Kozak(2004)可变指数削度方程,构建带皮直径、心材直径、边材宽度、树皮厚度的削度方程,并对比选出最优基础模型;采用SAS软件PROC MODEL模块中似乎不相关回归(SUR),建立带皮直径、心材直径、边材宽度和树皮厚度削度方程的可加性模型系统,同时将区域作为哑变量引入模型中,通过调整确定系数(Radj2)、均方根误差(RMSE)、赤池信息准则(AIC)、贝叶斯信息准则(BIC)等模型评价指标,对模型进行综合评价。结果表明: 带皮直径、心材直径、边材宽度和树皮厚度最优基础模型均为Kozak(2004);可加性模型系统在满足各分量与总量可加性的基础上,也得到较好的预测效果,预估精度均达到98%以上,引入哑变量后,可加性模型系统预测能力均有不同程度的提升,尤其心材直径和边材宽度预测能力提升更显著;不同区域带皮直径和树皮厚度削度差异较小,而心材直径、边材宽度的削度存在明显差异。本研究构建的包含哑变量可加性模型系统,不但模型预测精度较高,还满足带皮直径、心材直径、边材宽度和树皮厚度之间的可加性逻辑,为人工红松心边材及树皮材积的准确估测提供了基础。  相似文献   

14.
One of the most frequently used models for understanding human navigation on the Web is the Markov chain model, where Web pages are represented as states and hyperlinks as probabilities of navigating from one page to another. Predominantly, human navigation on the Web has been thought to satisfy the memoryless Markov property stating that the next page a user visits only depends on her current page and not on previously visited ones. This idea has found its way in numerous applications such as Google''s PageRank algorithm and others. Recently, new studies suggested that human navigation may better be modeled using higher order Markov chain models, i.e., the next page depends on a longer history of past clicks. Yet, this finding is preliminary and does not account for the higher complexity of higher order Markov chain models which is why the memoryless model is still widely used. In this work we thoroughly present a diverse array of advanced inference methods for determining the appropriate Markov chain order. We highlight strengths and weaknesses of each method and apply them for investigating memory and structure of human navigation on the Web. Our experiments reveal that the complexity of higher order models grows faster than their utility, and thus we confirm that the memoryless model represents a quite practical model for human navigation on a page level. However, when we expand our analysis to a topical level, where we abstract away from specific page transitions to transitions between topics, we find that the memoryless assumption is violated and specific regularities can be observed. We report results from experiments with two types of navigational datasets (goal-oriented vs. free form) and observe interesting structural differences that make a strong argument for more contextual studies of human navigation in future work.  相似文献   

15.
Genetic growth curves of boars in a test station were predicted on daily weight records collected by automated weighing scales. The data contained 121 865 observations from 1477 Norwegian Landrace boars and 108 589 observations from 1300 Norwegian Duroc boars. Random regression models using Legendre polynomials up to second order for weight at different ages were compared for best predicting ability and Bayesian information criterion (BIC) for both breeds. The model with second-order polynomials had best predictive ability and BIC. The heritability for weight, based on this model, was found to vary along the growth trajectory between 0.32-0.35 for Duroc and 0.17-0.25 for Landrace. By varying test length possibility to use shorter test time and pre-selection was tested. Test length was varied and compared with average termination at 100 kg, termination of the test at 90 kg gives, e.g. 2% reduction in accuracy of estimated breeding values (EBV) for both breeds and termination at 80 kg gives 5% reduction in accuracy of EBVs for Landrace and 3% for Duroc. A shorter test period can decrease test costs per boar, but also gives possibilities to increase selection intensity as there will be room for testing more boars.  相似文献   

16.
Bogdan M  Ghosh JK  Doerge RW 《Genetics》2004,167(2):989-999
The problem of locating multiple interacting quantitative trait loci (QTL) can be addressed as a multiple regression problem, with marker genotypes being the regressor variables. An important and difficult part in fitting such a regression model is the estimation of the QTL number and respective interactions. Among the many model selection criteria that can be used to estimate the number of regressor variables, none are used to estimate the number of interactions. Our simulations demonstrate that epistatic terms appearing in a model without the related main effects cause the standard model selection criteria to have a strong tendency to overestimate the number of interactions, and so the QTL number. With this as our motivation we investigate the behavior of the Schwarz Bayesian information criterion (BIC) by explaining the phenomenon of the overestimation and proposing a novel modification of BIC that allows the detection of main effects and pairwise interactions in a backcross population. Results of an extensive simulation study demonstrate that our modified version of BIC performs very well in practice. Our methodology can be extended to general populations and higher-order interactions.  相似文献   

17.
For time series of count data, correlated measurements, clustering as well as excessive zeros occur simultaneously in biomedical applications. Ignoring such effects might contribute to misleading treatment outcomes. A generalized mixture Poisson geometric process (GMPGP) model and a zero‐altered mixture Poisson geometric process (ZMPGP) model are developed from the geometric process model, which was originally developed for modelling positive continuous data and was extended to handle count data. These models are motivated by evaluating the trend development of new tumour counts for bladder cancer patients as well as by identifying useful covariates which affect the count level. The models are implemented using Bayesian method with Markov chain Monte Carlo (MCMC) algorithms and are assessed using deviance information criterion (DIC).  相似文献   

18.
Wang X 《Genomics》2012,99(2):90-95
Two-gene classifiers have attracted a broad interest for their simplicity and practicality. Most existing two-gene classification algorithms were involved in exhaustive search that led to their low time-efficiencies. In this study, we proposed two new two-gene classification algorithms which used simple univariate gene selection strategy and constructed simple classification rules based on optimal cut-points for two genes selected. We detected the optimal cut-point with the information entropy principle. We applied the two-gene classification models to eleven cancer gene expression datasets and compared their classification performance to that of some established two-gene classification models like the top-scoring pairs model and the greedy pairs model, as well as standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. These comparisons indicated that the performance of our two-gene classifiers was comparable to or better than that of compared models.  相似文献   

19.
An important task in the application of Markov models to the analysis of ion channel data is the determination of the correct gating scheme of the ion channel under investigation. Some prior knowledge from other experiments can reduce significantly the number of possible models. If these models are standard statistical procedures nested like likelihood ratio testing, provide reliable selection methods. In the case of non-nested models, information criteria like AIC, BIC, etc., are used. However, it is not known if any of these criteria provide a reliable selection method and which is the best one in the context of ion channel gating. We provide an alternative approach to model selection in the case of non-nested models with an equal number of open and closed states. The models to choose from are embedded in a properly defined general model. Therefore, we circumvent the problems of model selection in the non-nested case and can apply model selection procedures for nested models.  相似文献   

20.
It is known that the Kimura 3ST model of sequence evolution on phylogenetic trees can be extended quite naturally to arbitrary split systems. However, this extension relies heavily on mathematical peculiarities of the associated Hadamard transformation, and providing an analogous augmentation of the general Markov model has thus far been elusive. In this paper, we rectify this shortcoming by showing how to extend the general Markov model on trees to include incompatible edges; and even further to more general network models. This is achieved by exploring the algebra of the generators of the continuous-time Markov chain together with the “splitting” operator that generates the branching process on phylogenetic trees. For simplicity, we proceed by discussing the two state case and then show that our results are easily extended to more states with little complication. Intriguingly, upon restriction of the two state general Markov model to the parameter space of the binary symmetric model, our extension is indistinguishable from the Hadamard approach only on trees; as soon as any incompatible splits are introduced the two approaches give rise to differing probability distributions with disparate structure. Through exploration of a simple example, we give an argument that our extension to more general networks has desirable properties that the previous approaches do not share. In particular, our construction allows for convergent evolution of previously divergent lineages; a property that is of significant interest for biological applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号