首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is well known that the structure is currently available only for a small fraction of known protein sequences. It is urgent to discover the important features of known protein sequences based on present protein structures. Here, we report a study on the size distribution of protein families within different types of folds. The fold of a protein means the global arrangement of its main secondary structures, both in terms of their relative orientations and their topological connections, which specify a certain biochemical and biophysical aspect. We first search protein families in the structural database SCOP against the sequence-based database Pfam, and acquire a pool of corresponding Pfam families whose structures can be deemed as known. This pool of Pfam families is called the sample space for short. Then the size distributions of protein families involving the sample space, the Pfam database and the SCOP database are obtained. The results indicate that the size distributions of protein families under different kinds of folds abide by similar power-law. Specially, the largest families scatter evenly in different kinds of folds. This may help better understand the relationship of protein sequence, structure and function. We also show that the total of proteins with known structures can be considered a random sample from the whole space of protein sequences, which is an essential but unsettled assumption for related predictions, such as, estimating the number of protein folds in nature. Finally we conclude that about 2957 folds are needed to cover the total Pfam families by a simple method.  相似文献   

2.
The asymptotic final size distribution of a multitype Reed-Frost process, a chain-binomial model for the spread of infection in a finite, closed multitype population, is derived in the case of reducible contact pattern between types. The results are obtained using techniques developed for the irreducible case.  相似文献   

3.
4.
We introduce and analyse a simple probabilistic model of genome evolution. It is based on three fundamental evolutionary events: gene loss, duplication and accumulated change. This is motivated by previous works which consisted in fitting the available genomic data into, what is called paralog distributions. This formalism is described by a system of infinite number of linear equations. We show that this system generates a semigroup of linear operators on the space l 1. We prove that size distribution of paralogous gene families in a genome converges to the equilibrium as time goes to infinity. Moreover we show that when probabilities of gene removal and duplication are close to each other, then the resulting distribution is close to logarithmic distribution. Some empirical results for yeast genomes are presented.  相似文献   

5.
6.
Early in development, one X‐chromosome in each cell of the female embryo is inactivated. Knowing the number of certain human tissue cells at the time of X‐inactivation can improve our understanding of certain diseases such as cancer or genetic disorders as well as cellular development. However, the moment of X‐inactivation in humans is difficult to observe directly. In this study, we developed a mathematical model using branching processes and asymptotic normal approximation that will more accurately determine a relationship between the number of cells at X‐inactivation with the proportion of one allele found in normal heterozygous adult females. We then conducted computer simulations to show the adequacy of this model. Finally, this model was used to more accurately estimate the number of hemopoietic stem cells at X‐inactivation using a real life data set.  相似文献   

7.
新型降水分布数学模型研究及其应用   总被引:3,自引:0,他引:3  
在分布式水文模型中,单元栅格内的降水输入是准确模拟各种水文过程的关键因素,寻求产生分布式降水数据的方法是水文模型研究的热点之一.在对国内外降水模型分析基础上,认为流域面上实际降水分布是天气系统降水与下垫面地形影响共同作用的结果,如果不受地形影响,天气系统降水的降水量等值线在平面上的分布近似为一组同心椭圆.根据这一原理,建立了一种能够模拟天气系统降水分布,并利用牛顿插值法对模拟结果进行地形影响修正的新型降水分布数学模型,提出了对降水中心位置及其中心降水量的模型模拟.利用黄土高原西川河流域实测资料对模型进行了检验,结果表明,该模型具有较高精度.由于模型概念简单明晰,且能指明降水中心位置及其中心降水量,因此在流域暴雨分析和洪水预报中具有一定价值.  相似文献   

8.
A branching process method is employed to study the survival probability of a slightly advantageous mutant gene with a general distribution of progeny size in a large population. A counter-example to a classic proposition is given. A somewhat weaker result is proved.Supported in part by NIH Grant 5R01 GM10452-18  相似文献   

9.
10.

Background and Aims

The distribution of photosynthetic enzymes, or nitrogen, through the canopy affects canopy photosynthesis, as well as plant quality and nitrogen demand. Most canopy photosynthesis models assume an exponential distribution of nitrogen, or protein, through the canopy, although this is rarely consistent with experimental observation. Previous optimization schemes to derive the nitrogen distribution through the canopy generally focus on the distribution of a fixed amount of total nitrogen, which fails to account for the variation in both the actual quantity of nitrogen in response to environmental conditions and the interaction of photosynthesis and respiration at similar levels of complexity.

Model

A model of canopy photosynthesis is presented for C3 and C4 canopies that considers a balanced approach between photosynthesis and respiration as well as plant carbon partitioning. Protein distribution is related to irradiance in the canopy by a flexible equation for which the exponential distribution is a special case. The model is designed to be simple to parameterize for crop, pasture and ecosystem studies. The amount and distribution of protein that maximizes canopy net photosynthesis is calculated.

Key Results

The optimum protein distribution is not exponential, but is quite linear near the top of the canopy, which is consistent with experimental observations. The overall concentration within the canopy is dependent on environmental conditions, including the distribution of direct and diffuse components of irradiance.

Conclusions

The widely used exponential distribution of nitrogen or protein through the canopy is generally inappropriate. The model derives the optimum distribution with characteristics that are consistent with observation, so overcoming limitations of using the exponential distribution. Although canopies may not always operate at an optimum, optimization analysis provides valuable insight into plant acclimation to environmental conditions. Protein distribution has implications for the prediction of carbon assimilation, plant quality and nitrogen demand.  相似文献   

11.
12.
Liu X  Fan K  Wang W 《Proteins》2004,54(3):491-499
Currently, of the 10(6) known protein sequences, only about 10(4) structures have been solved. Based on homologies and similarities, proteins are grouped into different families in which each has a structural prototype, namely, the fold, and some share the same folds. However, the total number of folds and families, and furthermore, the distribution of folds over families in nature, are still an enigma. Here, we report a study on the distribution of folds over families and the total number of folds in nature, using a maximum probability principle and the moment method of estimation. A quadratic relation between the numbers of families and folds is found for the number of families in an interval from 6000 to 30,000. For example, about 2700 folds for 23,100 families are obtained, among them about 33 superfolds, including more than 100 families each, and the largest superfold comprises about 800 families. Our results suggest that although the majority of folds have only a single family per fold, a considerably larger number of folds include many more families each than in the database, and the distribution of folds over families in nature differs markedly from the sampled distribution. The long tail of fold distribution is first estimated in this article. The results fit the data for different versions of the structural classification of proteins (SCOP) excellently, and the goodness-of-fit tests strongly support the results. In addition, the method of directly "enlarging" the sample to the population may be useful in inferring distributions of species in different fields.  相似文献   

13.
Abstract.— The genealogies of samples of orthologous regions from multiple species can be classified by their shapes. Using a neutral coalescent model of two species, I give exact probabilities of each of four possible genealogical shapes: reciprocal monophyly, two types of paraphyly, and polyphyly. After the divergence that forms two species, each of which has population size N , polyphyly is the most likely genealogical shape for the lineages of the two species. At ∼ 1.300 N generations after divergence, paraphyly becomes most likely, and reciprocal monophyly becomes most likely at ∼1.665 N generations. For a given species, the time at which 99% of its loci acquire monophyletic genealogies is ∼5.298 N generations, assuming all loci in its sister species are monophyletic. The probability that all lineages of two species are reciprocally monophyletic given that a sample from the two species has a reciprocally monophyletic genealogy increases rapidly with sample size, as does the probability that the most recent common ancestor (MRCA) for a sample is also the MRCA for all lineages from the two species. The results have potential applications for the testing of evolutionary hypotheses.  相似文献   

14.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

15.
The accelerated failure time model is presented as an alternative to the proportional hazard model in the analysis of survival data. We investigate the effect of covariates omission in the case of applying a Weibull accelerated failure time model. In an uncensored setting, the asymptotic bias of the treatment effect is theoretically zero when important covariates are omitted; however, the asymptotic variance estimator of the treatment effect could be biased and then the size of the Wald test for the treatment effect is likely to exceed the nominal level. In some cases, the test size could be more than twice the nominal level. In a simulation study, in both censored and uncensored settings, Type I error for the test of the treatment effect was likely inflated when the prognostic covariates are omitted. This work remarks the careless use of the accelerated failure time model. We recommend the use of the robust sandwich variance estimator in order to avoid the inflation of the Type I error in the accelerated failure time model, although the robust variance is not commonly used in the survival data analyses.  相似文献   

16.
1. The occurrence of unresolved complexes of cryptic species may hinder the identification of the main ecological drivers of biodiversity when different cryptic taxa have different ecological requirements. 2. We assessed factors influencing the occurrence of Synchaeta species (monogonont rotifers) in 17 waterbodies of the Trentino‐South Tyrol region in the Eastern Alps. To do so, we compared the results of using unresolved complexes of cryptic species, as is common practice in limnological studies based on morphological taxonomy, and having resolved cryptic complexes, made possible by DNA taxonomy. 3. To identify cryptic species, we used the generalised mixed Yule coalescent (GMYC) model. We investigated the relationship between the environment and the occurrence of Synchaeta spp. by multivariate ordination using two definitions of the units of diversity, namely (i) unresolved species complexes (morphospecies) and (ii) putative cryptic species (GMYC entities). Our expectation was that resolving complexes of cryptic species could provide more information than using morphospecies. 4. As expected, DNA taxonomy provided greater taxonomic resolution than morphological taxonomy. Further, environmental‐based multivariate ordination on cryptic species explained a significantly higher proportion of variance than that based on morphospecies. Occurrence of GMYC entities was related to total phosphorus (TP), whereas no relationship could be found between morphospecies and the environment. Moreover, different cryptic species within the same morphospecies showed different, and even opposite, preferences for TP. In addition, the wide geographical distribution of haplotypes and cryptic species indicated the absence of barriers to dispersal in Synchaeta.  相似文献   

17.
18.
Probabilistic models of the cell cycle maintain that cell generation time is a random variable given by some distribution function, and that the probability of cell division per unit time is a function only of cell age (and not, for instance, of cell size). Given the probability density, f(t), for time spent in the random compartment of the cell cycle, we derive a recursion relation for n(x), the probability density for cell size at birth in a sample of cells in generation n. For the case of exponential growth of cells, the recursion relation has no steady-state solution. For the case of linear cell growth, we show that there exists a unique, globally asymptotically stable, steady-state birth size distribution, *(x). For the special case of the transition probability model, we display *(x) explicitly.This work was supported by the National Science Foundation under grants MCS8301104 (to J.J.T.) and MCS8300559 (to K.B.H.), and by the National Institutes of Health under grant GM27629 (to J.J.T.).  相似文献   

19.
In this study the effect of the propagation coefficient on the molar distribution function in a modified shell model for micellar systems was examined. The sharpness of the micelle size distribution boundary was found to depend less on the degree of polymerization, n, than on the propagation coefficient, P. Although Kegeles (J. Phys. Chem. 83 (1979) 1728) has reported a marked sharpening of the distribution boundary when P = 2.0. we found the boundary to be fairly broad at this point. However, as values of the propagation coefficient were increased from 3 to 10, the micelle distribution boundary became increasingly sharp. The possibility of such a change in the reaction boundary arising from a structural transition, accompanied by a change in the rate of dissociation of monomer from the shell, is also discussed.  相似文献   

20.
A note on estimation for gamma and stable processes   总被引:1,自引:0,他引:1  
BASAWA  I. V.; BROCKWELL  P. J. 《Biometrika》1980,67(1):234-236
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号