首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Pareto distribution, whereby at large enough x the probability density ρ(x) ~ x ? α (α ≥ 2), is quite important in both basic and practical aspects. The main point is its essential difference from the normal (Gaussian) distribution; namely, the probability of large deviations in this case proves to be much higher. Universal applicability of the normal distribution law remains a common belief despite the lack of objective proof in many applied areas. Here we consider how a Pareto distribution arises in a dynamic system exposed in a noise field, and discuss simplest unidimensional models where the system response in a broad range of the variable can be accurately enough approximated with such a distribution.  相似文献   

2.
Based on a large repertoire of chromosomal rearrangement operations, the genomic distance d between two genomes with chi(r) and chi(b) linear chromosomes, respectively, both containing the same (or orthologous) n genes or markers, is d=n + max(chi(r),chi(b))-c, where c is the number of cycles in the breakpoint graph of the two genomes. In this paper, we study the exact probability distribution of c. We derive the expectation and variance, and show that, in the limit, the expectation of d is n - (2chirchib)/(2chir+2chib(-1)) - 1/2ln (n + max(chir,chib))/(chir+chib).  相似文献   

3.

Background  

The size distribution of gene families in a broad range of genomes is well approximated by a generalized Pareto function. Evolution of ensembles of gene families can be described with Birth, Death, and Innovation Models (BDIMs). Analysis of the properties of different versions of BDIMs has the potential of revealing important features of genome evolution.  相似文献   

4.
In the framework of reaction-diffusion theory we deal with the problem of pattern regulation in morphogenesis. A generic model is proposed where the kinetic terms follow constraints imposed by scale invariance considerations. These constraints allow a class of kinetic schemes to be formulated so that, starting with an initially homogeneous morphogen distribution in the field, a stable gradient is established of the form: S(chi,L) = Lpf(chi/L). Here L is the length of the morphogenetic field, chi is the position variable and f(chi/L) is some monotonic function of the relative distance. With this distribution a scale invariant gradient can be constructed which leads to pattern regulation. A linear stability analysis of the model permits the definition of the parameter values enabling the system to abandon the homogeneous state spontaneously. Simulations of the evolution of the system towards its final stable state result in approximate pattern invariance for different field lengths. The accuracy of this invariance is in agreement with some recent quantitative experimental findings in both developing and regenerating systems.  相似文献   

5.
We present a Bayesian statistical analysis of the conformations of side chains in proteins from the Protein Data Bank. This is an extension of the backbone-dependent rotamer library, and includes rotamer populations and average chi angles for a full range of phi, psi values. The Bayesian analysis used here provides a rigorous statistical method for taking account of varying amounts of data. Bayesian statistics requires the assumption of a prior distribution for parameters over their range of possible values. This prior distribution can be derived from previous data or from pooling some of the present data. The prior distribution is combined with the data to form the posterior distribution, which is a compromise between the prior distribution and the data. For the chi 2, chi 3, and chi 4 rotamer prior distributions, we assume that the probability of each rotamer type is dependent only on the previous chi rotamer in the chain. For the backbone-dependence of the chi 1 rotamers, we derive prior distributions from the product of the phi-dependent and psi-dependent probabilities. Molecular mechanics calculations with the CHARMM22 potential show a strong similarity with the experimental distributions, indicating that proteins attain their lowest energy rotamers with respect to local backbone-side-chain interactions. The new library is suitable for use in homology modeling, protein folding simulations, and the refinement of X-ray and NMR structures.  相似文献   

6.
Employing extensive co-conversion data for selected and unselected sites of known molecular location in the rosy locus of Drosophila melanogaster, we determine the parameters of meiotic gene conversion tract length distribution. The tract length distribution for gene conversion events can be approximated by the equation P(L >/= n) = (n) where P is the probability that tract length (L) is greater than or equal to a specified number of nucleotides (n). From the co-conversion data, a maximum likelihood estimate with standard error for is 0.99717 +/- 0.00026, corresponding to a mean conversion tract length of 352 base pairs. (Thus, gene conversion tract lengths are sufficiently small to allow for extensive shuffling of DNA sequence polymorphisms within a gene.) For selected site conversions there is a bias towards recovery of longer tracts. The distribution of conversion tract lengths associated with selected sites can be approximated by the equation P(L >/= n| selected = (n)(1 - n + n/), where P is now the probability that a selected site tract length (L) is greater than or equal to a specified number of nucleotides (n). For the optimal value of determined from the co-conversion analysis, the mean conversion tract length for selected sites is 706 base pairs. We discuss, in the light of this and other studies, the relationship between meiotic gene conversion and P element excision induced gap repair and determine that they are distinct processes defined by different parameters and, possibly, mechanisms.  相似文献   

7.
Abstract The germination responses of a nondormant fraction of a seed population of Taraxacum officinale Weber at constant temperatures in the range 7–34°C were analysed through a time-course study. Maximal percentage germination (approximately 90%) was attained at temperatures 10–18°C, where simple linear relationships were observed between the temperature and the germination rates, i.e. the reciprocals of the time taken to germinate by subpopulations with 20–80% germination. There was a variation in the required ‘thermal times’ (θ) which characterized the linear relationships, the distribution of which could be approximated for the seed population by the following distribution function: where m is the median of the distribution, and A is a shape parameter characterizing the pattern of the distribution. Final percentage germination decreased with increasing temperature from 20 to 32°C, where the final percentage germination vs. temperature plotted on a normal probability scale yielded a straight line, indicating the normality of the distribution of the upper limit temperature in the seed population. The estimated mean and standard deviation were 27.25 ± 3.75°C. The rate of germination for the subpopulation with 20–80% germination also decreased with increases in the temperature from 22 to 30°C. If the relationships between the temperature within this range and the rate for the subpopulations with 20–80% germination were approximated by the regression lines, the negative ‘thermal time’ characterizing the yielded linear relationship would have a distribution which could be approximated by the same function with the required thermal time for the relationship of suboptimal range. The parameters m and A for the negative ‘thermal time’ were determined to be 2870 K h and 1.7 × 10-10 K-3 h-3.  相似文献   

8.
9.
A relatively simple method is proposed for the estimation of parameters of stage-structured populations from sample data for situation where (a) unit time survival rates may vary with time, and (b) the distribution of entry times to stage 1 is too complicated to be fitted with a simple parametric model such as a normal or gamma distribution. The key aspects of this model are that the entry time distribution is approximated by an exponential function withp parameters, the unit time survival rates in stages are approximated by anr parameter exponential polynomial in the stage number, and the durations of stages are assumed to be the same for all individuals. The new method is applied to four Zooplankton data sets, with parametric bootstrapping used to assess the bias and variation in estimates. It is concluded that good estimates of demographic parameters from stagefrequency data from natural populations will usually only be possible if extra information such as the durations of stages is known.  相似文献   

10.
Consider a haploid population and, within its genome, a gene whose presence is vital for the survival of any individual. Each copy of this gene is subject to mutations which destroy its function. Suppose one member of the population somehow acquires a duplicate copy of the gene, where the duplicate is fully linked to the original gene’s locus. Preservation is said to occur if eventually the entire population consists of individuals descended from this one which initially carried the duplicate. The system is modelled by a finite state-space Markov process which in turn is approximated by a diffusion process, whence an explicit expression for the probability of preservation is derived. The event of preservation can be compared to the fixation of a selectively neutral gene variant initially present in a single individual, the probability of which is the reciprocal of the population size. For very weak mutation, this and the probability of preservation are equal, while as mutation becomes stronger, the preservation probability tends to double this reciprocal. This is in excellent agreement with simulation studies.  相似文献   

11.
The limiting conditional probability distribution (LCD) has been much studied in the field of mathematical biology, particularly in the context of epidemiology and the persistence of epidemics. However, it has not yet been applied to the immune system. One of the characteristic features of the T cell repertoire is its diversity. This diversity declines in old age, whence the concepts of extinction and persistence are also relevant to the immune system. In this paper we model T cell repertoire maintenance by means of a continuous-time birth and death process on the positive integers, where the origin is an absorbing state. We show that eventual extinction is guaranteed. The late-time behaviour of the process before extinction takes place is modelled by the LCD, which we prove always exists for the process studied here. In most cases, analytic expressions for the LCD cannot be computed but the probability distribution may be approximated by means of the stationary probability distributions of two related processes. We show how these approximations are related to the LCD of the original process and use them to study the LCD in two special cases. We also make use of the large N expansion to derive a further approximation to the LCD. The accuracy of the various approximations is then analysed.  相似文献   

12.
A Markov process with absorbing boundaries may be made recurrent by returning the process to the interior whenever a boundary is reached. The age of such a process may be defined as the length of time since the last return event. Examples drawn from two-allele genetic models are discussed, in which reversibility of the return process means that the age of an allele, whose present frequency in the population is known, has the same probability distribution as its future extinction time. Some discrete models are not reversible, yet if approximated by diffusion processes, the (approximate) age distribution is the same as the future extinction time distribution. Various results in the literature are unified by this viewpoint.  相似文献   

13.
It is well known that the asymptotic null distribution of the homogeneity lod score (LOD) does not depend on the genetic model specified in the analysis. When appropriately rescaled, the LOD is asymptotically distributed as 0.5 chi(2)(0) + 0.5 chi(2)(1), regardless of the assumed trait model. However, because locus heterogeneity is a common phenomenon, the heterogeneity lod score (HLOD), rather than the LOD itself, is often used in gene mapping studies. We show here that, in contrast with the LOD, the asymptotic null distribution of the HLOD does depend upon the genetic model assumed in the analysis. In affected sib pair (ASP) data, this distribution can be worked out explicitly as (0.5 - c)chi(2)(0) + 0.5chi(2)(1) + cchi(2)(2), where c depends on the assumed trait model. E.g., for a simple dominant model (HLOD/D), c is a function of the disease allele frequency p: for p = 0.01, c = 0.0006; while for p = 0.1, c = 0.059. For a simple recessive model (HLOD/R), c = 0.098 independently of p. This latter (recessive) distribution turns out to be the same as the asymptotic distribution of the MLS statistic under the possible triangle constraint, which is asymptotically equivalent to the HLOD/R. The null distribution of the HLOD/D is close to that of the LOD, because the weight c on the chi(2)(2) component is small. These results mean that the cutoff value for a test of size alpha will tend to be smaller for the HLOD/D than the HLOD/R. For example, the alpha = 0.0001 cutoff (on the lod scale) for the HLOD/D with p = 0.05 is 3.01, while for the LOD it is 3.00, and for the HLOD/R it is 3.27. For general pedigrees, explicit analytical expression of the null HLOD distribution does not appear possible, but it will still depend on the assumed genetic model.  相似文献   

14.
A specimen of intestinal glycoprotein isolated from the pig and two samples of dextran, all of which are polydisperse (that is, the preparations may be regarded as consisting of a continuous distribution of molecular weights), have been examined in the ultracentrifuge under meniscus-depletion conditions at equilibrium. They are compared with each other and with a glycoprotein from Cysticercus tenuicollis cyst fluid which is almost monodisperse. The quantity c(-(1/3)) (c=concentration) is plotted against xi (the reduced radius); this plot is linear when the molecular-weight distribution approximates to the ;most probable', i.e. when M(n):M(w):M(z): M((z+1))....... is as 1:2:3:4: etc. The use of this plot, and related procedures, to evaluate qualitatively and semi-quantitatively molecular-weight distribution functions where they can be realistically approximated to Schulz distributions is discussed. The theoretical basis is given in an Appendix.  相似文献   

15.
Prediction of multilocus identity-by-descent   总被引:2,自引:1,他引:1  
Hill WG  Hernández-Sánchez J 《Genetics》2007,176(4):2307-2315
Previous studies have enabled exact prediction of probabilities of identity-by-descent (IBD) in random-mating populations for a few loci (up to four or so), with extension to more using approximate regression methods. Here we present a precise predictor of multiple-locus IBD using simple formulas based on exact results for two loci. In particular, the probability of non-IBD X(ABC) at each of ordered loci A, B, and C can be well approximated by X(ABC) = X(AB)X(BC)/X(B) and generalizes to X(123...k) = X(12)X(23...)X(k)(-1,k)/X(k-2), where X is the probability of non-IBD at each locus. Predictions from this chain rule are very precise with population bottlenecks and migration, but are rather poorer in the presence of mutation. From these coefficients, the probabilities of multilocus IBD and non-IBD can also be computed for genomic regions as functions of population size, time, and map distances. An approximate but simple recurrence formula is also developed, which generally is less accurate than the chain rule but is more robust with mutation. Used together with the chain rule it leads to explicit equations for non-IBD in a region. The results can be applied to detection of quantitative trait loci (QTL) by computing the probability of IBD at candidate loci in terms of identity-by-state at neighboring markers.  相似文献   

16.
The decision whether a measured distribution complies with an equidistribution is a central element of many biostatistical methods. High throughput differential expression measurements, for instance, necessitate to judge possible over-representation of genes. The reliability of this judgement, however, is strongly affected when rarely expressed genes are pooled. We propose a method that can be applied to frequency ranked distributions and that yields a simple but efficient criterion to assess the hypothesis of equiprobable expression levels. By applying our technique to surrogate data we exemplify how the decision criterion can differentiate between a true equidistribution and a triangular distribution. The distinction succeeds even for small sample sizes where standard tests of significance (e.g. chi(2)) fail. Our method will have a major impact on several problems of computational biology where rare events baffle a reliable assessment of frequency distributions. The program package is available upon request from the authors.  相似文献   

17.
18.
The standard marriage model is evaluated with respect to its applicability in Bangladesh, so that reliable and consistent estimates of mean marriage age for females in Bangladesh can be made. The standard marriage model proposes that a person enters the marriage market and waits until marriage occurs. The distribution of age at entry into the marriage market is generally normal. The delays until marriage occurs are modelled as negative exponential distributions. In a population where marriage is universal, the standard schedule of 1st marriage frequencies developed by Coale and McNeil is a close approximation to the convolution of a normal curve and several exponential distributions G(x), the cumulative probability of marriage at age x. Since the standard distribution of age at 1st marriage is closely approximated by the convolution of a normal curve and several negative exponential distributions, the age at entry to the marriage market for females, and whether this is normally distributed, should be examined. 1 cross-sectional study in Bangladesh concludes that onset of menarche determines entry into the marriage market. The proportion of ever married females by single year of age which is available from cross sectional demographic surveys can be fitted to the Coale-McNeil model. Marriages in the rural areas of Bangladesh seem to follow the pattern of entering the marriage market at puberty, then waiting until actual marriage takes place. This model of entries and delays can also be fitted to cross-sectional data from rural Bangladesh. The use of the Coale-McNeil marriage model in rural Bangladesh is appropriate for estimating the mean age of marriage.  相似文献   

19.
The goal of protein engineering and design is to identify sequences that adopt three-dimensional structures of desired function. Often, this is treated as a single-objective optimization problem, identifying the sequence–structure solution with the lowest computed free energy of folding. However, many design problems are multi-state, multi-specificity, or otherwise require concurrent optimization of multiple objectives. There may be tradeoffs among objectives, where improving one feature requires compromising another. The challenge lies in determining solutions that are part of the Pareto optimal set—designs where no further improvement can be achieved in any of the objectives without degrading one of the others. Pareto optimality problems are found in all areas of study, from economics to engineering to biology, and computational methods have been developed specifically to identify the Pareto frontier. We review progress in multi-objective protein design, the development of Pareto optimization methods, and present a specific case study using multi-objective optimization methods to model the tradeoff between three parameters, stability, specificity, and complexity, of a set of interacting synthetic collagen peptides.  相似文献   

20.
We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号