首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Shannon entropy H and related measures are increasingly used in molecular ecology and population genetics because (1) unlike measures based on heterozygosity or allele number, these measures weigh alleles in proportion to their population fraction, thus capturing a previously-ignored aspect of allele frequency distributions that may be important in many applications; (2) these measures connect directly to the rich predictive mathematics of information theory; (3) Shannon entropy is completely additive and has an explicitly hierarchical nature; and (4) Shannon entropy-based differentiation measures obey strong monotonicity properties that heterozygosity-based measures lack. We derive simple new expressions for the expected values of the Shannon entropy of the equilibrium allele distribution at a neutral locus in a single isolated population under two models of mutation: the infinite allele model and the stepwise mutation model. Surprisingly, this complex stochastic system for each model has an entropy expressable as a simple combination of well-known mathematical functions. Moreover, entropy- and heterozygosity-based measures for each model are linked by simple relationships that are shown by simulations to be approximately valid even far from equilibrium. We also identify a bridge between the two models of mutation. We apply our approach to subdivided populations which follow the finite island model, obtaining the Shannon entropy of the equilibrium allele distributions of the subpopulations and of the total population. We also derive the expected mutual information and normalized mutual information (“Shannon differentiation”) between subpopulations at equilibrium, and identify the model parameters that determine them. We apply our measures to data from the common starling (Sturnus vulgaris) in Australia. Our measures provide a test for neutrality that is robust to violations of equilibrium assumptions, as verified on real world data from starlings.  相似文献   

2.
Commonly observed patterns typically follow a few distinct families of probability distributions. Over one hundred years ago, Karl Pearson provided a systematic derivation and classification of the common continuous distributions. His approach was phenomenological: a differential equation that generated common distributions without any underlying conceptual basis for why common distributions have particular forms and what explains the familial relations. Pearson's system and its descendants remain the most popular systematic classification of probability distributions. Here, we unify the disparate forms of common distributions into a single system based on two meaningful and justifiable propositions. First, distributions follow maximum entropy subject to constraints, where maximum entropy is equivalent to minimum information. Second, different problems associate magnitude to information in different ways, an association we describe in terms of the relation between information invariance and measurement scale. Our framework relates the different continuous probability distributions through the variations in measurement scale that change each family of maximum entropy distributions into a distinct family. From our framework, future work in biology can consider the genesis of common patterns in a new and more general way. Particular biological processes set the relation between the information in observations and magnitude, the basis for information invariance, symmetry and measurement scale. The measurement scale, in turn, determines the most likely probability distributions and observed patterns associated with particular processes. This view presents a fundamentally derived alternative to the largely unproductive debates about neutrality in ecology and evolution.  相似文献   

3.
最大信息熵原理与群体遗传平衡   总被引:29,自引:0,他引:29  
建立了用最大信息熵原理推导群体遗传平衡定律的统一数学模型,并给出了模型的统一解,此解正是Hardy-Weinberg定律所给出的平衡群体的基因型频率,说明当群体信息熵达到最大时,群体基因型频率不再变化,即达到“平衡”。这证明了最大熵分布就是Hardy-Weinberg平衡分布。Hardy-Weinberg平衡定律与最大信息熵原理的内在一致性说明,杂交和随机交配是一个不可逆过程,使群体基因型信息熵增大,无序性增,是选择和近亲交配使群体的信息熵降低,有序性增加,育种过程实际就是调节群体信息熵的过程。过程信息熵的含义是表示一个概率分布的不确定性,最大熵原理意味着在一定的约束条件,选择具有最大不确定性的分布,从而其分布是最为随机的。最大熵原理在信息,工程,天文,地理,图像处理,模式识别等自然科学和社会科学领域都有广泛的成功应用,本文从群体遗传学角度证明了这一原理具有普遍适用性。熵是描述系统状态的函数,而最大熵原理则表明了系统发展变化的趋势,系统的最终状态必然是熵增加至最大值的状态,对于任何系统都是如此。因此,群体遗传系统的平衡定律可以统一用最大熵原理进行判定和描述;任意群体的基因型信息熵在随机交配世代传递时有不断增加的趋势;在一定约束条件下基因型信息熵达到最大值时,就称之为达到遗传平衡。本文将信息论原理应用于群体遗传学研究,揭示了基因信息熵的生物学意义,并表明可以用信息学和控制论的原理和方法来研究群体遗传学问题。  相似文献   

4.
A theory of thermal fluctuations in DNA miniplasmids.   总被引:2,自引:0,他引:2       下载免费PDF全文
I Tobias 《Biophysical journal》1998,74(5):2545-2553
A recent analysis of the normal modes of vibration of a ring formed by bringing together and sealing, with or without the addition of twist, the ends of rods that are straight when stress free is taken as the basis for a theory of the statistical thermodynamics of a canonical ensemble of DNA minicircles with specified linking number difference deltaLk and number N of base pairs. It is assumed that N corresponds to a circumference in the range of one or two persistence lengths. For such an ensemble, the theory yields an expression for the average writhe (Wr), which can be employed to calculate the free energy, entropy, and enthalpy of supercoiling, deltaGsc, deltaSsc, and deltaHsc. The results obtained for the dependence of deltaGsc on deltaLk and N are in accord with experimental observations of equilibrium distributions of topoisomers of plasmids with N approximately 200 bp.  相似文献   

5.
Techniques have recently become available to label protein subunits with fluorescent probes at predetermined orientation relative to the protein coordinates. The known local orientation enables quantitative interpretation of fluorescence polarization experiments in terms of orientation and motions of the protein within a larger macromolecular assembly. Combining data obtained from probes placed at several distinct orientations relative to the protein structure reveals functionally relevant information about the axial and azimuthal orientation of the labeled protein segment relative to its surroundings. Here we present an analytical method to determine the protein orientational distribution from such data. The method produces the broadest distribution compatible with the data by maximizing its informational entropy. The key advantages of this approach are that no a priori assumptions are required about the shape of the distribution and that a unique, exact fit to the data is obtained. The relative orientations of the probes used for the experiments have great influence on information content of the maximum entropy distribution. Therefore, the choice of probe orientations is crucial. In particular, the probes must access independent aspects of the protein orientation, and two-fold rotational symmetries must be avoided. For a set of probes, a "figure of merit" is proposed, based on the independence among the probe orientations. With simulated fluorescence polarization data, we tested the capacity of maximum entropy analysis to recover specific protein orientational distributions and found that it is capable of recovering orientational distributions with one and two peaks. The similarity between the maximum entropy distribution and the test distribution improves gradually as the number of independent probe orientations increases. As a practical example, ME distributions were determined with experimental data from muscle fibers labeled with bifunctional rhodamine at known orientations with respect to the myosin regulatory light chain (RLC). These distributions show a complex relationship between the axial orientation of the RLC relative to the fiber axis and the azimuthal orientation of the RLC about its own axis. Maximum entropy analysis reveals limitations in available experimental data and supports the design of further probe angles to resolve details of the orientational distribution.  相似文献   

6.
A statistical explanation of MaxEnt for ecologists   总被引:9,自引:0,他引:9  
MaxEnt is a program for modelling species distributions from presence‐only species records. This paper is written for ecologists and describes the MaxEnt model from a statistical perspective, making explicit links between the structure of the model, decisions required in producing a modelled distribution, and knowledge about the species and the data that might affect those decisions. To begin we discuss the characteristics of presence‐only data, highlighting implications for modelling distributions. We particularly focus on the problems of sample bias and lack of information on species prevalence. The keystone of the paper is a new statistical explanation of MaxEnt which shows that the model minimizes the relative entropy between two probability densities (one estimated from the presence data and one, from the landscape) defined in covariate space. For many users, this viewpoint is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts. We then step through a detailed explanation of MaxEnt describing key components (e.g. covariates and features, and definition of the landscape extent), the mechanics of model fitting (e.g. feature selection, constraints and regularization) and outputs. Using case studies for a Banksia species native to south‐west Australia and a riverine fish, we fit models and interpret them, exploring why certain choices affect the result and what this means. The fish example illustrates use of the model with vector data for linear river segments rather than raster (gridded) data. Appropriate treatments for survey bias, unprojected data, locally restricted species, and predicting to environments outside the range of the training data are demonstrated, and new capabilities discussed. Online appendices include additional details of the model and the mathematical links between previous explanations and this one, example code and data, and further information on the case studies.  相似文献   

7.
关于最大信息熵原理与群体遗传平衡一致性的探讨   总被引:16,自引:1,他引:15  
张宏礼  张鸿雁 《遗传》2006,28(3):324-328
汪小龙等建立了用最大信息熵原理推导一个基因座上群体遗传平衡的统一数学模型,并给出了模型的最大值解,此解正是Hardy-Weinberg平衡定律所给出的基因型频率。这说明当群体基因型信息熵最大时,群体基因型频率不再变化,达到平衡状态,从而证明了最大信息熵原理与Hardy-Weinberg平衡定律具有一致性,同时指出这一结论可以推广至有迁移、突变、选择、遗传漂变、近亲交配的群体以及多个基因座情形。概括地说就是:最大信息熵原理与群体遗传平衡具有一致性。但是,他们仅仅证明了最大信息熵原理与一个基因座上Hardy-Weinberg平衡定律具有一致性,本文在这个范围内将其推广至多个基因座,且每一个基因座均为复等位基因情形。至于最大信息熵原理是否与其它的群体遗传平衡具有一致性,他们的结论仅仅是猜想,并未严格推导。事实上,要想将这种一致性推广到迁移、突变、随机漂变和近亲交配等群体,则不见得正确。   相似文献   

8.
根据DNA序列的‘终止密码子’及其‘逆补终止密码子’的分布情况,给出一种新的DNA序列向量的构建方法,运用Shannon熵相关理论,对Jensen-Shannon离散量和KL离散量进行了修正和比较。试验表明,该方法在预测DNA序列的基因编码与非编码区边界的效率上是86%显著高于Bernaola等人提出的70%。  相似文献   

9.
Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.  相似文献   

10.
基因调控网络模型为深入理解生命本质提供了一个新的研究框架和平台。作为基因调控网络模型的其中一种,互信息关联网络模型使用熵和互信息描述基因和基因之间的关联。本文描述了用互信息度量基因表达相似性的方法,提出基于Bootstrap的互信息估计算法,并对产生的偏离现象提出了改进策略。实验结果表明,改进的互信息估计方法可以有效提高基因表达相似性估计的精确度。  相似文献   

11.
Naganathan AN  Doshi U  Fung A  Sadqi M  Muñoz V 《Biochemistry》2006,45(28):8466-8475
For many decades, protein folding experimentalists have worked with no information about the time scales of relevant protein folding motions and without methods for estimating the height of folding barriers. Protein folding experiments have been interpreted using chemical models in which the folding process is characterized as a series of equilibria between two or more distinct states that interconvert with activated kinetics. Accordingly, the information to be extracted from experiments was circumscribed to apparent equilibrium constants and relative folding rates. Recent developments are changing this situation dramatically. The combination of fast-folding experiments with the development of analytical methods more closely connected to physical theory reveals that folding barriers in native conditions range from minimally high (approximately 14RT for the very slow folder AcP) to nonexistent. While slow-folding (i.e., > or = 1 ms) single-domain proteins are expected to fold in a two-state fashion, microsecond-folding proteins should exhibit complex behavior arising from crossing marginal or negligible folding barriers. This realization opens a realm of exciting opportunities for experimentalists. The free energy surface of a protein with a marginal (or no) barrier can be mapped using equilibrium experiments, which could resolve energetic factors from structural factors in folding. Kinetic experiments on these proteins provide the unique opportunity to measure folding dynamics directly. Furthermore, the complex distributions of time-dependent folding behaviors expected for these proteins might be accessible to single-molecule measurements. Here, we discuss some of these recent developments in protein folding, emphasizing aspects that can serve as a guide for experimentalists interested in exploiting this new avenue of research.  相似文献   

12.
Rhodes CJ  Demetrius L 《PloS one》2010,5(9):e12951

Background

Standard epidemiological theory claims that in structured populations competition between multiple pathogen strains is a deterministic process which is mediated by the basic reproduction number () of the individual strains. A new theory based on analysis, simulation and empirical study challenges this predictor of success.

Principal Findings

We show that the quantity is a valid predictor in structured populations only when size is infinite. In this article we show that when population size is finite the dynamics of infection by multi-strain pathogens is a stochastic process whose outcome can be predicted by evolutionary entropy, S, an information theoretic measure which describes the uncertainty in the infectious age of an infected parent of a randomly chosen new infective. Evolutionary entropy characterises the demographic stability or robustness of the population of infectives. This statistical parameter determines the duration of infection and thus provides a quantitative index of the pathogenicity of a strain. Standard epidemiological theory based on as a measure of selective advantage is the limit as the population size tends to infinity of the entropic selection theory. The standard model is an approximation to the entropic selection theory whose validity increases with population size.

Conclusion

An epidemiological analysis based on entropy is shown to explain empirical observations regarding the emergence of less pathogenic strains of human influenza during the antigenic drift phase. Furthermore, we exploit the entropy perspective to discuss certain epidemiological patterns of the current H1N1 swine ''flu outbreak.  相似文献   

13.
Information theory allows us to investigate information processing in neural systems in terms of information transfer, storage and modification. Especially the measure of information transfer, transfer entropy, has seen a dramatic surge of interest in neuroscience. Estimating transfer entropy from two processes requires the observation of multiple realizations of these processes to estimate associated probability density functions. To obtain these necessary observations, available estimators typically assume stationarity of processes to allow pooling of observations over time. This assumption however, is a major obstacle to the application of these estimators in neuroscience as observed processes are often non-stationary. As a solution, Gomez-Herrero and colleagues theoretically showed that the stationarity assumption may be avoided by estimating transfer entropy from an ensemble of realizations. Such an ensemble of realizations is often readily available in neuroscience experiments in the form of experimental trials. Thus, in this work we combine the ensemble method with a recently proposed transfer entropy estimator to make transfer entropy estimation applicable to non-stationary time series. We present an efficient implementation of the approach that is suitable for the increased computational demand of the ensemble method''s practical application. In particular, we use a massively parallel implementation for a graphics processing unit to handle the computationally most heavy aspects of the ensemble method for transfer entropy estimation. We test the performance and robustness of our implementation on data from numerical simulations of stochastic processes. We also demonstrate the applicability of the ensemble method to magnetoencephalographic data. While we mainly evaluate the proposed method for neuroscience data, we expect it to be applicable in a variety of fields that are concerned with the analysis of information transfer in complex biological, social, and artificial systems.  相似文献   

14.
Weinberger ED 《Bio Systems》2002,66(3):105-119
'Standard' information theory says nothing about the semantic content of information. Nevertheless, applications such as evolutionary theory demand consideration of precisely this aspect of information, a need that has motivated a largely unsuccessful search for a suitable measure of an 'amount of meaning'. This paper represents an attempt to move beyond this impasse, based on the observation that the meaning of a message can only be understood relative to its receiver. Positing that the semantic value of information is its usefulness in making an informed decision, we define pragmatic information as the information gain in the probability distributions of the receiver's actions, both before and after receipt of a message in some pre-defined ensemble. We then prove rigorously that our definition is the only one that satisfies obvious desiderata, such as the additivity of information from logically independent messages. This definition, when applied to the information 'learned' by the time evolution of a process, defies the intuitions of the few previous researchers thinking along these lines by being monotonic in the uncertainty that remains after receipt of the message, but non-monotonic in the Shannon entropy of the input ensemble. It also follows that the pragmatic information of the genetic 'messages' in an evolving population is a global Lyapunov function for Eigen's quasi-species model of biological evolution. A concluding section argues that a theory such as ours must explicitly acknowledge purposeful action, or 'agency', in such diverse fields as evolutionary theory and finance.  相似文献   

15.
Episodes of population growth and decline leave characteristic signatures in the distribution of nucleotide (or restriction) site differences between pairs of individuals. These signatures appear in histograms showing the relative frequencies of pairs of individuals who differ by i sites, where i = 0, 1, .... In this distribution an episode of growth generates a wave that travels to the right, traversing 1 unit of the horizontal axis in each 1/2u generations, where u is the mutation rate. The smaller the initial population, the steeper will be the leading face of the wave. The larger the increase in population size, the smaller will be the distribution's vertical intercept. The implications of continued exponential growth are indistinguishable from those of a sudden burst of population growth Bottlenecks in population size also generate waves similar to those produced by a sudden expansion, but with elevated uppertail probabilities. Reductions in population size initially generate L-shaped distributions with high probability of identity, but these converge rapidly to a new equilibrium. In equilibrium populations the theoretical curves are free of waves. However, computer simulations of such populations generate empirical distributions with many peaks and little resemblance to the theory. On the other hand, agreement is better in the transient (nonequilibrium) case, where simulated empirical distributions typically exhibit waves very similar to those predicted by theory. Thus, waves in empirical distributions may be rich in information about the history of population dynamics.  相似文献   

16.
In contracting muscle, individual myosin molecules function as part of a large ensemble, hydrolyzing ATP to power the relative sliding of actin filaments. The technological advances that have enabled direct observation and manipulation of single molecules, including recent experiments that have explored myosin's force-dependent properties, provide detailed insight into the kinetics of myosin's mechanochemical interaction with actin. However, it has been difficult to reconcile these single-molecule observations with the behavior of myosin in an ensemble. Here, using a combination of simulations and theory, we show that the kinetic mechanism derived from single-molecule experiments describes ensemble behavior; but the connection between single molecule and ensemble is complex. In particular, even in the absence of external force, internal forces generated between myosin molecules in a large ensemble accelerate ADP release and increase how far actin moves during a single myosin attachment. These myosin-induced changes in strong binding lifetime and attachment distance cause measurable properties, such as actin speed in the motility assay, to vary depending on the number of myosin molecules interacting with an actin filament. This ensemble-size effect challenges the simple detachment limited model of motility, because even when motility speed is limited by ADP release, increasing attachment rate can increase motility speed.  相似文献   

17.
非平衡群体基因变异测量的Shannon信息量方法   总被引:14,自引:2,他引:14  
在Shannon信息量的基础上,对非平衡群体建立了群体基因型相对信息量S′(G),纯合体相对信息量S′J(G)、杂合体相对信息量S′H(G)的概念,并赋予它们以遗传学意义,与基因一致度J和基因多样度D进行了理论比较,结果表明,二者在数量规律上有很好的一致性,但又是相对独立的指标体系,且各相对信息量还有新的内涵。S′(G)既能表征基因变异,又能反映基因型水平上的遗传变异,S′J(G)主要反映纯合体的遗传变异,S′H(G)主要反映杂合体的遗传变异,各相对信息量既可反映群体的遗传变异程度,又能比较不同位点间的遗传变异程度。  相似文献   

18.
Quantifying distributed information processing is crucial to understanding collective motion in animal groups. Recent studies have begun to apply rigorous methods based on information theory to quantify such distributed computation. Following this perspective, we use transfer entropy to quantify dynamic information flows locally in space and time across a school of fish during directional changes around a circular tank, i.e., U-turns. This analysis reveals peaks in information flows during collective U-turns and identifies two different flows: an informative flow (positive transfer entropy) from fish that have already turned to fish that are turning, and a misinformative flow (negative transfer entropy) from fish that have not turned yet to fish that are turning. We also reveal that the information flows are related to relative position and alignment between fish and identify spatial patterns of information and misinformation cascades. This study offers several methodological contributions and we expect further application of these methodologies to reveal intricacies of self-organisation in other animal groups and active matter in general.  相似文献   

19.
20.
A methodology using biosensor technology for combined kinetic and thermodynamic analysis of biomolecular interactions is described. Rate and affinity constants are determined with BIAcore. Thermodynamics parameters, changes in free energy, enthalpy and entropy, are evaluated from equilibrium data and by using rate constants and transition state theory. The methodology using van't Hoff theory gives complementary information to microcalorimetry, since only the direct binding is measured with BIAcore whereas microcalorimetry measures all components, including e.g. hydration effects. Furthermore, BIAcore gives possibilities to gain new information by thermodynamic analysis of the rate constants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号