首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set.  相似文献   

A versatile method for simultaneous analysis of families of curves   总被引:4,自引:0,他引:4  
We have developed a versatile new approach to the simultaneous analysis of families of curves, which combines the simplicity of empirical methods with several of the advantages of mathematical modeling, including objective comparison of curves and statistical hypothesis testing. The method uses weighted smoothing cubic splines; the degree of smoothing is adjusted automatically to satisfy constraints on curve chape (monotonicity, number of inflection points). By simultaneous analysis of a family of curves, one can extract the shape common to all the curves. Up to four linear scaling parameters are used to match the shape to each curve, and to provide optimal superimposition of the several curves. By applying constraints to these scaling factors, one can test a variety of hypotheses concerning comparisons of curves (e.g., identity, parallelism, or similarity of shape of two or more curves), and thus evaluate the effects of experimental manipulation. By optimal pooling of data one can avoid the need for arbitrary selection of a typical experiment, and can detect subtle but reproducible effects that might otherwise be overlooked. This approach can facilitate the development of an appropriate model. The method has been implemented in a Turbo-Pascal program for IBM-PC compatible microcomputers, and in FORTRAN-77 for the DEC-10 mainframe, and has been utilized successfully in a wide variety of applications.  相似文献   

Static allometries, the scaling relationship between body and trait size, describe the shape of animals in a population or species, and are generated in response to variation in genetic or environmental regulators of size. In principle, allometries may vary with the different size regulators that generate them, which can be problematic since allometric differences are also used to infer patterns of selection on morphology. We test this hypothesis by examining the patterns of scaling in Drosophila melanogaster subjected to variation in three environmental regulators of size: nutrition, temperature and rearing density. Our data indicate that different environmental regulators of size do indeed generate different patterns of scaling. Consequently, flies that are ostensibly the same size may have very different body proportions. These data indicate that trait size is not simply a read-out of body size, but that different environmental factors may regulate body and trait size, and the relationship between the two, through different developmental mechanisms. It may therefore be difficult to infer selective pressures that shape scaling relationships in a wild population without first elucidating the environmental and genetic factors that generate size variation among members of the population.  相似文献   

Abstract. Dominance/diversity curves, displaying the relative abundances of the species within a community, have often been constructed from field data. Several ecological and statistical models of dominance/diversity have been proposed, to explain the curves. Yet, rarely have curves of different models been fitted to field data. In this paper the appropriate parameters and methods of curve fitting for plant communities are described for the General Lognormal, Canonical Lognormal, Geometric, Broken Stick, Zipf and Zipf-Mandelbrot models. Distinction is made between fixed and optimised parameters, to clarify para-meterisation of the models. It is concluded that all should be fitted by minimising the deviance in a ranked-abundance plot. Statistical tests of goodness of fit are discussed. It is concluded that consistency of fit between replicate quadrats of a community provide the best test. Curves of all the models discussed are fitted to data from a species-rich Spanish hay meadow, and to data from a New Zealand intertidal algal community. The Spanish meadow data are best fitted by General Lognormal. The New Zealand algal data are best fitted by Geometric or General Lognormal. Goodness of fit for a sample is usually relatively good or poor for all models, since much of the deviance comes from steps in the curve which none of the models can fit closely.  相似文献   

Complications inherent in scaling the basal rate of metabolism in mammals   总被引:19,自引:0,他引:19  
The scaling of the basal rate of metabolism in mammals is reexamined. Both the power and level of the scaling function are sensitive to various factors that interact with body mass and rate of metabolism, including the precision of temperature regulation, food habits, and activity level. This sensitivity implies that the rate of metabolism is a highly plastic character in the course of evolution. Consequently, the singular effect of mass on the rate of metabolism is most effectively analyzed in ecologically and physiologically uniform sets of species, rather than in taxonomically defined groups, which often are ecologically and physiologically diverse. Otherwise, all fitted curves for mammals integrate a variety of competing factors, thereby reflecting the species used and denying unique analytic significance to the power in scaling relations. Kleiber's eutherian curve may represent a relatively uniform set of data because all the species included were domesticated and because selection for high rates of production (and high rates of metabolism) occurred in the process of domestication. In the analysis of scaling relationships, the standard error of estimate (Sy.x) is a more valuable measure of the residual variation than is (1.0-r2) because r2 is a non-linear measure of the conformation of data to the relation and because Sy.x, unlike r2, is independent of the units used in the scaling relationship. At present the best estimate indicates that total rate of metabolism scales proportionally to approximately m0.60 at small masses (less than 300 g), as long as small species do not enter torpor, and scales proportionally to approximately m0.75 at large masses (greater than or equal to 300 g). Physiological properties other than metabolism are potentially sensitive to secondary factors, so their scaling functions also would be most clearly defined for physiologically uniform groups of species. This view suggests that insight into the significance of scaling relations can be obtained by examining the residual variation around a scaling function as well as by examining conformation to the function.  相似文献   

In livestock, many studies have reported the results of imputation to 50k single nucleotide polymorphism (SNP) genotypes for animals that are genotyped with low-density SNP panels. The objective of this paper is to review different measures of correctness of imputation, and to evaluate their utility depending on the purpose of the imputed genotypes. Across studies, imputation accuracy, computed as the correlation between true and imputed genotypes, and imputation error rates, that counts the number of incorrectly imputed alleles, are commonly used measures of imputation correctness. Based on the nature of both measures and results reported in the literature, imputation accuracy appears to be a more useful measure of the correctness of imputation than imputation error rates, because imputation accuracy does not depend on minor allele frequency (MAF), whereas imputation error rate depends on MAF. Therefore imputation accuracy can be better compared across loci with different MAF. Imputation accuracy depends on the ability of identifying the correct haplotype of a SNP, but many other factors have been identified as well, including the number of genotyped immediate ancestors, the number of animals with genotypes at the high-density panel, the SNP density on the low- and high-density panel, the MAF of the imputed SNP and whether imputed SNP are located at the end of a chromosome or not. Some of these factors directly contribute to the linkage disequilibrium between imputed SNP and SNP on the low-density panel. When imputation accuracy is assessed as a predictor for the accuracy of subsequent genomic prediction, we recommend that: (1) individual-specific imputation accuracies should be used that are computed after centring and scaling both true and imputed genotypes; and (2) imputation of gene dosage is preferred over imputation of the most likely genotype, as this increases accuracy and reduces bias of the imputed genotypes and the subsequent genomic predictions.  相似文献   

A growing number of oxygen equilibrium curves for hemoglobin (Hb) mutants, post-translational modifications, or the binding of potent new effectors of Hb cannot be fitted adequately with the two-state model. Examples are curves showing double maxima in the derivative of the Hill plot, or slopes of less than unity. We present such examples of modified hemoglobins and strong effectors in this study and calculate at which substate level the two-state model differs from the data. Analysis of hemoglobin oxygen equilibrium curves is reconsidered using the two-state model extended to allow variation of the individual substate probabilities. In this way the effect on the equilibrium due to perturbations in energy of each substate can be studied as a diagnostic tool.  相似文献   

The effects of productivity on the parameters of the species–area curve were investigated in this paper using two data sets on terrestrial plant communities: (1) one including 48 plots in 12 experimental sites on ploughed, formerly cultivated fields in the Siena region, Italy, and (2) one including 40 plots in hay meadows in the Bremen region, Germany. In both regions, species presence of vascular plants was recorded in nested plots ranging in size from 0.004 to 256 m2 and 0.001 to 1000 m2, respectively. Productivity was estimated as dry standing biomass. In the Siena data set, species richness showed a humped‐back relation to biomass in the plot sizes up to 1 m2. For the larger plot sizes, no significant correlations were found. In the Bremen data set, positive relation between species number and biomass was observed at the smallest spatial scale (0.001 m2), whereas the relation disappeared or tended to be negative for the larger plot sizes. In general, the slopes z of the log species–log area curves (SAC) were negatively related to biomass in both data sets, while the intercept c increased with biomass in the Siena data set and was unrelated to biomass in the Bremen data set. The relationship between c and z was negative in the Siena data set and positive in the Bremen data set. The above results differed somewhat depending on which plot sizes were considered for the calculation of the SAC. Literature data confirmed that there are no clear patterns in the inter‐correlations between productivity, small scale and large scale species richness. Sites differing in productivity and in the slopes and intercepts of SAC may thus give rise to different species richness–productivity relationships. There can be one possible relation between species richness and biomass at one spatial scale (e.g. humped‐back) and another type of relation, even opposite, at another spatial scale. This suggests that the properties of species–area curves do not respond in a uniform way to the changes in productivity, but depend on the type of habitat or plant community and its particular properties. The parameter of the SAC can then hardly be used as scale‐independent parameter to investigate the effects of ecological factors, such as productivity, on species richness. The lack of clear patterns in the relations between small scale and large scale species richness implies that the predictions of the species‐pool hypothesis may fail when applied to plot sizes as dealt with in this study.  相似文献   

The dynamics of aging is often described by survival curves that show the proportion of individuals surviving to a given age. The shape of the survival curve reflects the dependence of mortality on age, and it varies greatly for different organisms. In a recently published paper, Stroustrup and coauthors ((2016) Nature, {vn530}, 103–107) showed that many factors affecting the lifespan of Caenorhabditis elegans do not change the shape of the survival curve, but only stretch or compress it in time. Apparently, this means that aging is a programmed process whose trajectory is difficult to change, although it is possible to speed it up or slow it down. More research is needed to clarify whether the “rule of temporal scaling” is applicable to other organisms. A good indicator of temporal scaling is the coefficient of lifespan variation: similar values of this coefficient for two samples indicate similar shape of the survival curves. Preliminary results of experiments on adaptation of Drosophila melanogaster to unfavorable food show that temporal scalability of survival curves is sometimes present in more complex organisms, although this is not a universal rule. Both evolutionary and environmental changes sometimes affect only the average lifespan without changing the coefficient of variation (in this case, temporal scaling is present), but often both parameters (i.e. both scale and shape of the survival curve) change simultaneously. In addition to the relative stability of the coefficient of variation, another possible argument in favor of genetic determination of the aging process is relatively low variability of the time of death, which is sometimes of the same order of magnitude as the variability of timing of other ontogenetic events, such as the onset of sexual maturation.  相似文献   

Enzymes often act on more than one substrate, and the question then arises as to whether this can be attributed to the existence of two different enzymes that have not been separated or, more interesting, to the presence of two different active sites in the same enzyme. The competition plot is a kinetic method that allows us to test with little experimentation whether the two reactions occur at the same site or at different sites. It consists of making mixtures of the two substrates and plotting the total rate against a parameter p that defines the concentrations of the two substrates in terms of reference concentrations chosen to give the same rates at p = 0 and p = 1, i.e., when only one of the substrates is present. With a slight modification of the equations it can also be applied to enzymes that deviate from Michaelis-Menten kinetics. If the two substrates react at the same site, the competition plot gives a horizontal straight line; i.e., the total rate is independent of p. In contrast, if the two reactions occur at two separate and independent sites a curve with a maximum is obtained; separate reactions with cross-inhibition generate curves with either maxima or minima according to whether the Michaelis constants of the two substrates are smaller or larger than their inhibition constants in the other reactions. Strategies to avoid ambiguous results and to improve the sensitivity of the plot are described. A practical example is given to facilitate the experimental protocol for this plot.  相似文献   

A number of details required for the classification of 3 : 3 double reciprocal plots are provided. It is shown that the ν(S) plot for a 3 : 3 function can have at most four inflexions and at most two inflexions adjacent to a turning point. Using this information, a classification of 3 : 3 ν(S) plots into ten main varieties with several subclasses is reported. The problem of defining the probability with which a given mechanism can give rise to specific curve shape features is considered. Applying this technique, the probability with which four simple enzyme mechanisms can give rise to 3 : 3 curve shapes is computed. It is shown that a 3 : 3 saturation function can have no turning points, at most two inflexions and at most one inflexion in double reciprocal space. The probability with which the available 3 : 3 shapes can arise is also computed. It is concluded that, with realistic values for rate constants, chemically reasonable enzyme mechanisms leading to rate equations of degree n : n can generate most of the kinetic profiles available to a rational function of degree n : n with positive coefficients. The probability of obtaining specific curve shapes is not so characteristic of the particular mechanism for 3:3 rate equations as it is for 2:2 rate equations. The probability of obtaining highly complex curves with several turning points or inflexions is rather lower for the enzyme mechanisms than with general 3 : 3 rational functions. There is a high probability that 3 : 3 mechanisms will generate kinetic curves that are geometrically similar to those possible for degree 2 : 2 but this is not so for binding isotherms. Hence differentiating 3 : 3 from 2 : 2 rate equations from experimental kinetic data is more likely to be successful by non-linear regression to the whole data set than by demonstrating a specific 3 : 3 feature. Binding curves, on the other hand, for three or more sites should give Scatchard plots with inflexions, features not possible with second degree equations which are conic sections in this space.  相似文献   

To understand the state and trends in biodiversity beyond the scope of monitoring programs, biodiversity indicators must be comparable across inventories. Species richness (SR) is one of the most widely used biodiversity indicators. However, as SR increases with the size of the area sampled, inventories using different plot sizes are hardly comparable. This study aims at producing a methodological framework that enables SR comparisons across plot‐based inventories with differing plot sizes. We used National Forest Inventory (NFI) data from Norway, Slovakia, Spain, and Switzerland to build sample‐based rarefaction curves by randomly incrementally aggregating plots, representing the relationship between SR and sampled area. As aggregated plots can be far apart and subject to different environmental conditions, we estimated the amount of environmental heterogeneity (EH) introduced in the aggregation process. By correcting for this EH, we produced adjusted rarefaction curves mimicking the sampling of environmentally homogeneous forest stands, thus reducing the effect of plot size and enabling reliable SR comparisons between inventories. Models were built using the Conway–Maxell–Poisson distribution to account for the underdispersed SR data. Our method successfully corrected for the EH introduced during the aggregation process in all countries, with better performances in Norway and Switzerland. We further found that SR comparisons across countries based on the country‐specific NFI plot sizes are misleading, and that our approach offers an opportunity to harmonize pan‐European SR monitoring. Our method provides reliable and comparable SR estimates for inventories that use different plot sizes. Our approach can be applied to any plot‐based inventory and count data other than SR, thus allowing a more comprehensive assessment of biodiversity across various scales and ecosystems.  相似文献   

Heterogeneous species abundance models are models in which the dynamics differ between species, described by variation among parameters defining the dynamics. Using a dynamic and heterogeneous species abundance model generating the lognormal species abundance distribution it is first shown that different degrees of heterogeneity may result in equivalent species abundance distributions. An alternative to Preston's canonical lognormal model is defined by assuming that reduction in resources, for example reduction in available area, increases the density regulation of each species. This leads to species-individual curves and species-area curves that are approximately linear in a double logarithmic plot. Preston's canonical parameter gamma varies little along these curves and takes values in the neighborhood of one. Quite remarkably, the curves, which define the sensitivity of the community to area reductions, are independent of the heterogeneity among species for this model. As a consequence, the curves can be estimated from a single sample from the community using the Poisson lognormal distribution. It is shown how to perform sensitivity analysis with respect to over-dispersion in sampling relative to the Poisson distribution as well as sampling intensity, that is, the fraction of the community sampled. The method is exemplified by analyzing three simulated data sets.  相似文献   

There is an increasing worldwide concern about the problem of dealing with the waste electrical and electronic equipment (WEEE), given the high volume of appliances that are disposed of every day. In this article, an environmental evaluation of WEEE is performed that combines life cycle assessment (LCA) methodology and multivariate statistical techniques. Because LCA handles a large number of data in its different phases, when one is trying to uncover the structure of large multidimensional data sets, multivariate statistical techniques can provide useful information. In particular, principal‐component analysis and multidimensional scaling are two important dimension‐reducing tools that have been shown to be of help in understanding this type of complex multivariate data set. In this article, we use a variable selection method that reduces the number of categories for which the environmental impacts have to be computed; this step is especially useful when the number of impact categories or the number of products or processes to benchmark increases. We provide a detailed illustration showing how we have used the proposed approach to analyze and interpret the environmental impacts of different domestic appliances.  相似文献   

There is considerable evidence for trial to trial variability of the event related potentials (ERPs) within a given subject's recording. This variability influences the outcome of usual procedures in ERP analysis. Better results may be obtained if the sources of variability are explicitly taken into account in an appropriate model. This paper considers a probabilistic model, the random shift and scaling (RSS) model, where the response is modified by a random time shift and a random scale factor. In addition to this, an additional random scale factor which affects both the response and the background noise is taken into account. This time shift and these scale factors are handled as nuisance parameters. Maximum likelihood and least squares estimators of these parameters and the waveform of response are derived for the RSS model. It is shown that the Woody estimate of the ERP reported in earlier work can be derived by restricting the assumptions for the RSS model. Test statistics for hypotheses on means are obtained for the RSS model and a new type of discriminant function. The usefulness of the method is illustrated by means of simulation studies. Receiver operating characteristic (ROC) curves are used to demonstrate that the new type of discriminant performs better than the usual Fisher's Linear Discriminant.  相似文献   

The degree to which plant communities are vulnerable to invasion by alien species has often been assessed using the relationship between native and alien plant species richness (NAR). Variation in the direction and strength of the NAR tends to be negative for small plot sizes and study extents, but positive for large plots and extents. This invasion paradox has been attributed to different processes driving species richness at different spatial scales. However, the focus on plot size has drawn attention away from other factors influencing the NAR, in part because the influence of other factors may be obscured by or interact with plot size. Here, we test whether variation in the NAR can be explained by covariates linked to community susceptibility to invasion and whether these interact with plot size using a quantitative meta‐analysis drawn from 87 field studies that examined 161 NARs. While plot size explained most variation, the NAR was less positive in grassland habitats and in the Australasian region. Other covariates did not show strong relationships with the NAR even after accounting for interactions with plot size. Instead, much of the unexplained variation is associated with article or author specific differences, suggesting the NAR depends strongly on how different authors choose their study system or study design.  相似文献   

Allometric scaling relationships of the form Y = aX b are widely utilized in many types of models and analyses of tree structure. They are often viewed as static relationships where both the scaling exponent (b) and the normalization constant (a) obtain empirical values that are fixed within a single set of data. Among different sets of data, their values can show environmental variability. However, there have been only few attempts to give a mechanistic interpretation for this variability. We used field data to demonstrate how the scaling relationships in trees can be modified by ecological interactions. Moreover, we show how such processes can be incorporated into the scaling models to improve the fit and the information content of the scaling equations. When fixed theoretical scaling exponents were used instead of empirical exponents and when the effect of competitive interactions between trees was described by separate submodels that predicted the value of the normalisation constant in the scaling equations, it was possible to obtain 4–10% improvement in the model fit of three different structural scaling relationships. Our results suggest that unexplained variation in the values of the scaling parameters can be substituted by an identified effect of ecological factors on the value of the normalisation constant. This agrees with recent theoretical suggestions stating that ecological factors can directly influence the value of normalisation constants.  相似文献   

The need to identify “toxicologically equivalent” doses across different species is a major issue in toxicology and risk assessment. In this article, we describe an approach for establishing default cross-species extrapolation factors used to scale oral doses across species for non-carcinogenic endpoints. This work represents part of an on-going effort to harmonize the way animal data are evaluated for carcinogenic and non-carcinogenic endpoints. In addition to considering default scaling factors, we also discuss how chemical-specific data (e.g., metabolic or mechanistic data) can be incorporated into the dose extrapolation process. After first examining the required properties of a default scaling methodology, we consider scaling approaches based on empirical relationships observed for particular classes of compounds and also more theoretical approaches based on general physiological principles (i.e, allometry). The available data suggest that the empirical and allometric approaches each provide support for the idea that toxicological risks are approximately equal when daily oral doses are proportional to body weight raised to the 3/4-power. We also discuss specific challenges for dose scaling related to different routes of exposure, acute versus chronic toxicity, and extrapolations related to particular life stages (e.g., childhood).  相似文献   

Microarrays measure values that are approximately proportional to the numbers of copies of different mRNA molecules in samples. Due to technical difficulties, the constant of proportionality between the measured intensities and the numbers of mRNA copies per cell is unknown and may vary for different arrays. Usually, the data are normalized (i.e., array-wise multiplied by appropriate factors) in order to compensate for this effect and to enable informative comparisons between different experiments. Centralization is a new two-step method for the computation of such normalization factors that is both biologically better motivated and more robust than standard approaches. First, for each pair of arrays the quotient of the constants of proportionality is estimated. Second, from the resulting matrix of pairwise quotients an optimally consistent scaling of the samples is computed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号