首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the problem of estimating the marginal mean of an incompletely observed variable and develop a multiple imputation approach. Using fully observed predictors, we first establish two working models: one predicts the missing outcome variable, and the other predicts the probability of missingness. The predictive scores from the two models are used to measure the similarity between the incomplete and observed cases. Based on the predictive scores, we construct a set of kernel weights for the observed cases, with higher weights indicating more similarity. Missing data are imputed by sampling from the observed cases with probability proportional to their kernel weights. The proposed approach can produce reasonable estimates for the marginal mean and has a double robustness property, provided that one of the two working models is correctly specified. It also shows some robustness against misspecification of both models. We demonstrate these patterns in a simulation study. In a real‐data example, we analyze the total helicopter response time from injury in the Arizona emergency medical service data.  相似文献   

2.
Spatial distributions of biological variables are often well-characterized with pairwise measures of spatial autocorrelation. In this article, the probability theory for products and covariances of join-count spatial autocorrelation measures are developed for spatial distributions of multiple nominal (e.g. species or genotypes) types. This more fully describes the joint distributions of pairwise measures in spatial distributions of multiple (i.e. more than two) types. An example is given on how the covariances can be used for finding standard errors of weighted averages of join-counts in spatial autocorrelation analysis of more than two types, as is typical for genetic data for multiallelic loci.  相似文献   

3.
BACKGROUND: Comparing distributions of data is an important goal in many applications. For example, determining whether two samples (e.g., a control and test sample) are statistically significantly different is useful to detect a response, or to provide feedback regarding instrument stability by detecting when collected data varies significantly over time. METHODS: We apply a variant of the chi-squared statistic to comparing univariate distributions. In this variant, a control distribution is divided such that an equal number of events fall into each of the divisions, or bins. This approach is thereby a mini-max algorithm, in that it minimizes the maximum expected variance for the control distribution. The control-derived bins are then applied to test sample distributions, and a normalized chi-squared value is computed. We term this algorithm Probability Binning. RESULTS: Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events derived from the same distribution. Based on this distribution, we derive a conversion of any given chi-squared value into a metric that is analogous to a t-score, i.e., it can be used to estimate the probability that a test distribution is different from a control distribution. We demonstrate that this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. Finally, we demonstrate the applicability of this metric to ranking immunophenotyping distributions to suggest that it indeed can be used to objectively determine the relative distance of distributions compared to a single control. CONCLUSION: Probability Binning, as shown here, provides a useful metric for determining the probability that two or more flow cytometric data distributions are different. This metric can also be used to rank distributions to identify which are most similar or dissimilar. In addition, the algorithm can be used to quantitate contamination of even highly-overlapping populations. Finally, as demonstrated in an accompanying paper, Probability Binning can be used to gate on events that represent significantly different subsets from a control sample. Published 2001 Wiley-Liss, Inc.  相似文献   

4.
The generation interval is the interval between the time when an individual is infected by an infector and the time when this infector was infected. Its distribution underpins estimates of the reproductive number and hence informs public health strategies. Empirical generation-interval distributions are often derived from contact-tracing data. But linking observed generation intervals to the underlying generation interval required for modelling purposes is surprisingly not straightforward, and misspecifications can lead to incorrect estimates of the reproductive number, with the potential to misguide interventions to stop or slow an epidemic. Here, we clarify the theoretical framework for three conceptually different generation-interval distributions: the ‘intrinsic’ one typically used in mathematical models and the ‘forward’ and ‘backward’ ones typically observed from contact-tracing data, looking, respectively, forward or backward in time. We explain how the relationship between these distributions changes as an epidemic progresses and discuss how empirical generation-interval data can be used to correctly inform mathematical models.  相似文献   

5.
Life cycle inventory data have multiple sources of uncertainty. These data uncertainties are often modeled using probability density functions, and in the ecoinvent database the lognormal distribution is used by default to model exchange uncertainty values. The aim of this article is to systematically measure the effect of this default distribution by changing from the lognormal to several other distribution functions and examining how this change affects the uncertainty of life cycle assessment results. Using the ecoinvent 2.2 inventory database, data uncertainty distributions are switched from the lognormal distribution to the normal, triangular, and gamma distributions. The effect of the distribution switching is assessed for both impact assessment results of individual products system, as well as comparisons between product systems. Impact assessment results are generated using 5,000 Monte Carlo iterations for each product system, using the Intergovernmental Panel on Climate Change (IPCC) 2001 (100‐year time frame) method. When comparing the lognormal distribution to the alternative default distributions, the difference in the resulting median and standard deviation values range from slight to significant, depending on the distributions used by default. However, the switch shows practically no effect on product system comparisons. Yet, impact assessment results are sensitive to how the data uncertainties are defined. In this article, we followed what we believe to be ecoinvent standard practice and preserved the “most representative” value. Practitioners should recognize that the most representative value can depart from the average of a probability distribution. Consistent default distribution choices are necessary when performing product system comparisons.  相似文献   

6.
Evaluating the relationship between a response variable and explanatory variables is important to establish better statistical models. Concordance probability is one measure of this relationship and is often used in biomedical research. Concordance probability can be seen as an extension of the area under the receiver operating characteristic curve. In this study, we propose estimators of concordance probability for time-to-event data subject to double censoring. A doubly censored time-to-event response is observed when either left or right censoring may occur. In the presence of double censoring, existing estimators of concordance probability lack desirable properties such as consistency and asymptotic normality. The proposed estimators consist of estimators of the left-censoring and the right-censoring distributions as a weight for each pair of cases, and reduce to the existing estimators in special cases. We show the statistical properties of the proposed estimators and evaluate their performance via numerical experiments.  相似文献   

7.
Basket trials simultaneously evaluate the effect of one or more drugs on a defined biomarker, genetic alteration, or molecular target in a variety of disease subtypes, often called strata. A conventional approach for analyzing such trials is an independent analysis of each of the strata. This analysis is inefficient as it lacks the power to detect the effect of drugs in each stratum. To address these issues, various designs for basket trials have been proposed, centering on designs using Bayesian hierarchical models. In this article, we propose a novel Bayesian basket trial design that incorporates predictive sample size determination, early termination for inefficacy and efficacy, and the borrowing of information across strata. The borrowing of information is based on the similarity between the posterior distributions of the response probability. In general, Bayesian hierarchical models have many distributional assumptions along with multiple parameters. By contrast, our method has prior distributions for response probability and two parameters for similarity of distributions. The proposed design is easier to implement and less computationally demanding than other Bayesian basket designs. Through a simulation with various scenarios, our proposed design is compared with other designs including one that does not borrow information and one that uses a Bayesian hierarchical model.  相似文献   

8.
北京地区植被景观中斑块大小的分布特征   总被引:8,自引:0,他引:8  
利用GIS软件ARC/INFO将北京地区的1:20万植被图数字化,并提取各斑块的面积信息。该图包含72个基本的斑块类型,它们又分属于森林、灌丛、草地、果园、农田和水体6大类型。这些大类又分别包含20、28、4、7、11和2个基本类型。选用斑块个数、总面积、平均斑块面积、标准差、变异系数、中值、最大斑块面积、最小斑块面积、极差和偏态系数等几个描述统计量,以及厂一分布、对数正态分布、Weibull分布、指数分布和正态分布等5个概率分布来刻画斑块大小的分布特征。结果表明:除具有很少斑块的少数基本类型以外,其它基本类型以及所有6个大类的斑块大小的分布都不是对称的,而是右偏的。因此,普通的正态分布不能对它们加以刻画;其他4种概率分布也只能刻画部分类型,并且服从对数正态分布的类型最多,服从负指数分布的斑块类型最少。  相似文献   

9.
Evolutionary algorithms are, fundamentally, stochastic search procedures. Each next population is a probabilistic function of the current population. Various controls are available to adjust the probability mass function that is used to sample the space of candidate solutions at each generation. For example, the step size of a single-parent variation operator can be adjusted with a corresponding effect on the probability of finding improved solutions and the expected improvement that will be obtained. Examining these statistics as a function of the step size leads to a 'fitness distribution', a function that trades off the expected improvement at each iteration for the probability of that improvement. This paper analyzes the effects of adjusting the step size of Gaussian and Cauchy mutations, as well as a mutation that is a convolution of these two distributions. The results indicate that fitness distributions can be effective in identifying suitable parameter settings for these operators. Some comments on the utility of extending this protocol toward the general diagnosis of evolutionary algorithms is also offered.  相似文献   

10.
We prove that the generalized Poisson distribution GP(theta, eta) (eta > or = 0) is a mixture of Poisson distributions; this is a new property for a distribution which is the topic of the book by Consul (1989). Because we find that the fits to count data of the generalized Poisson and negative binomial distributions are often similar, to understand their differences, we compare the probability mass functions and skewnesses of the generalized Poisson and negative binomial distributions with the first two moments fixed. They have slight differences in many situations, but their zero-inflated distributions, with masses at zero, means and variances fixed, can differ more. These probabilistic comparisons are helpful in selecting a better fitting distribution for modelling count data with long right tails. Through a real example of count data with large zero fraction, we illustrate how the generalized Poisson and negative binomial distributions as well as their zero-inflated distributions can be discriminated.  相似文献   

11.
Beisel CJ  Rokyta DR  Wichman HA  Joyce P 《Genetics》2007,176(4):2441-2449
In modeling evolutionary genetics, it is often assumed that mutational effects are assigned according to a continuous probability distribution, and multiple distributions have been used with varying degrees of justification. For mutations with beneficial effects, the distribution currently favored is the exponential distribution, in part because it can be justified in terms of extreme value theory, since beneficial mutations should have fitnesses in the extreme right tail of the fitness distribution. While the appeal to extreme value theory seems justified, the exponential distribution is but one of three possible limiting forms for tail distributions, with the other two loosely corresponding to distributions with right-truncated tails and those with heavy tails. We describe a likelihood-ratio framework for analyzing the fitness effects of beneficial mutations, focusing on testing the null hypothesis that the distribution is exponential. We also describe how to account for missing the smallest-effect mutations, which are often difficult to identify experimentally. This technique makes it possible to apply the test to gain-of-function mutations, where the ancestral genotype is unable to grow under the selective conditions. We also describe how to pool data across experiments, since we expect few possible beneficial mutations in any particular experiment.  相似文献   

12.
1. Intraspecific aggregation at a single spatial scale can promote the coexistence of competitors. This paper demonstrates how this same mechanism can be applied to the many systems that are patchy at two scales, with patches nested within 'superpatches'.
2. Data are presented from a field study showing that insects living in rotting fruits have aggregated distributions in the fruits under a single tree, and that the mean density and degree of aggregation varies significantly among trees. Observations in this system motivate the following models.
3. A model of competition has been developed between two species which explicitly represents spatial variation at two scales. By integrating the probability distributions for each scale, the marginal distributions of competitors over all patches can be found and used to calculate coexistence criteria. This model assumes global movement of the competitors.
4. Although spatial variation at a single scale may not be sufficient for coexistence, the total variation over all patches can allow coexistence. Variation in mean densities among superpatches and variation in the degree of aggregation among superpatches both promote coexistence, but act in different ways.
5. A second model of competition between two species is described which incorporates the effects of limited movement among superpatches. Limited movement among superpatches generally promotes coexistence, and also leads to correlations among aggregation and the mean densities of competitors.  相似文献   

13.
Words are built from smaller meaning bearing parts, called morphemes. As one word can contain multiple morphemes, one morpheme can be present in different words. The number of distinct words a morpheme can be found in is its family size. Here we used Birth-Death-Innovation Models (BDIMs) to analyze the distribution of morpheme family sizes in English and German vocabulary over the last 200 years. Rather than just fitting to a probability distribution, these mechanistic models allow for the direct interpretation of identified parameters. Despite the complexity of language change, we indeed found that a specific variant of this pure stochastic model, the second order linear balanced BDIM, significantly fitted the observed distributions. In this model, birth and death rates are increased for smaller morpheme families. This finding indicates an influence of morpheme family sizes on vocabulary changes. This could be an effect of word formation, perception or both. On a more general level, we give an example on how mechanistic models can enable the identification of statistical trends in language change usually hidden by cultural influences.  相似文献   

14.
Bioclimatic models are the primary tools for simulating the impact of climate change on species distributions. Part of the uncertainty in the output of these models results from uncertainty in projections of future climates. To account for this, studies often simulate species responses to climates predicted by more than one climate model and/or emission scenario. One area of uncertainty, however, has remained unexplored: internal climate model variability. By running a single climate model multiple times, but each time perturbing the initial state of the model slightly, different but equally valid realizations of climate will be produced. In this paper, we identify how ongoing improvements in climate models can be used to provide guidance for impacts studies. In doing so we provide the first assessment of the extent to which this internal climate model variability generates uncertainty in projections of future species distributions, compared with variability between climate models. We obtained data on 13 realizations from three climate models (three from CSIRO Mark2 v3.0, four from GISS AOM, and six from MIROC v3.2) for two time periods: current (1985–1995) and future (2025–2035). Initially, we compared the simulated values for each climate variable (P, Tmax, Tmin, and Tmean) for the current period to observed climate data. This showed that climates simulated by realizations from the same climate model were more similar to each other than to realizations from other models. However, when projected into the future, these realizations followed different trajectories and the values of climate variables differed considerably within and among climate models. These had pronounced effects on the projected distributions of nine Australian butterfly species when modelled using the BIOCLIM component of DIVA-GIS. Our results show that internal climate model variability can lead to substantial differences in the extent to which the future distributions of species are projected to change. These can be greater than differences resulting from between-climate model variability. Further, different conclusions regarding the vulnerability of species to climate change can be reached due to internal model variability. Clearly, several climate models, each represented by multiple realizations, are required if we are to adequately capture the range of uncertainty associated with projecting species distributions in the future.  相似文献   

15.
Strug LJ  Hodge SE 《Human heredity》2006,61(4):200-209
The 'multiple testing problem' currently bedevils the field of genetic epidemiology. Briefly stated, this problem arises with the performance of more than one statistical test and results in an increased probability of committing at least one Type I error. The accepted/conventional way of dealing with this problem is based on the classical Neyman-Pearson statistical paradigm and involves adjusting one's error probabilities. This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence. Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test. In a companion paper in this issue (Strug & Hodge I), we presented an alternative statistical paradigm, the 'evidential paradigm', to be used when planning and evaluating linkage studies. The evidential paradigm uses the lod score as the measure of evidence (as opposed to a p value), and provides new, alternatively defined error probabilities (alternative to Type I and Type II error rates). We showed how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence. In the current paper we apply the evidential paradigm to the multiple testing problem - specifically, multiple testing in the context of linkage analysis. We advocate using the lod score as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple testing scenarios. We distinguish two situations: performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses. For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show. For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) constitute appropriate adjustments for conducting multiple hypothesis tests on a data set.  相似文献   

16.
1.  Most species' surveys and biodiversity inventories are limited by time and money. Therefore, it would be extremely useful to develop predictive models of animal distributions based on habitat, and to use these models to estimate species' densities and range sizes in poorly sampled regions.
2.  In this study, two sets of data were collected. The first set consisted of over 2000 butterfly transect counts, which were used to determine the relative density of each species in 16 major habitat types in a 35-km2 area of fragmented landscape in north-west Wales. For the second set of data, the area was divided into 140 cells using a 500-m grid, and the extent of each habitat and the presence or absence of each butterfly and moth species was determined for each cell.
3.  Logistic regression was used to model the relationship between species' distribution and predicted density, based on habitat extent, in each grid square. The resultant models were used to predict butterfly distributions and occupancy at a range of spatial scales.
4.  Using a jack-knife procedure, our models successfully reclassified the presence or absence of species in a high percentage of grid squares (mean 83% agreement). There were highly significant relationships between the modelled probability of species occurring at regional and local scales and the number of grid squares occupied at those scales.
5.  We conclude that basic habitat data can be used to predict insect distributions and relative densities reasonably well within a fragmented landscape. It remains to be seen how accurate these predictions will be over a wider area.  相似文献   

17.
This study compares two approaches for constructing diatom-based indices for monitoring river eutrophication. The first approach is based on weighted averaging of species indicator values with the underlying assumption that species have symmetrical unimodal distributions along the nutrient gradient, and their distributions are sufficiently described by a single indicator value per species. The second approach uses multiple indicator values for individual taxa and is based on the possibility that species have complex asymmetrical response curves. Multiple indicator values represent relative probabilities that a species would be found within certain ranges of nutrient concentration. We used 155 benthic diatom samples collected from rivers in the Northern Piedmont ecoregion (Northeastern U.S.A.) to construct two datasets: one used for developing models and indices, and another for testing them. To characterize the shape of species response curves we analyzed changes in the relative abundance of 118 diatom taxa common in this dataset along the total phosphorus (TP) gradient by fitting parametric and non-parametric regression models. We found that only 34 diatoms had symmetrical unimodal response to TP. Among several indices that use a single indicator value for each species, the best was the weighted averaging partial least square (WA-PLS) inference model. The correlation coefficient between observed and inferred TP in the test dataset was 0.67. The best index that employed multiple indicator values for each species had approximately the same predictive power as the WA-PLS based index, but in addition, this index provided a sample-specific measure of uncertainty for the TP estimation.  相似文献   

18.
Random trees and random characters can be used in null models for testing phylogenetic hypothesis. We consider three interpretations of random trees: first, that trees are selected from the set of all possible trees with equal probability; second, that trees are formed by random speciation or coalescence (equivalent); and third, that trees are formed by a series of random partitions of the taxa. We consider two interpretations of random characters: first, that the number of taxa with each state is held constant, but the states are randomly reshuffled among the taxa; and second, that the probability each taxon is assigned a particular state is constant from one taxon to the next. Under null models representing various combinations of randomizations of trees and characters, exact recursion equations are given to calculate the probability distribution of the number of character state changes required by a phylogenetic tree. Possible applications of these probability distributions are discussed. They can be used, for example, to test for a panmictic population structure within a species or to test phylogenetic inertia in a character's evolution. Whether and how a null model incorporates tree randomness makes little difference to the probability distribution in many but not all circumstances. The null model's sense of character randomness appears more critical. The difficult issue of choosing a null model is discussed.  相似文献   

19.
The mechanisms by which excitatory and inhibitory input impulse sequences interact in changing the spike probability in neurons are examined in the two mathematical neuron models; one is a real-time neuron model which is close to physiological reality, and the other a stochastic automaton model for the temporal pattern discrimination proposed in the previous paper (Tsukada et al., 1976), which is developed in this paper as neuron models for interaction of excitatory and inhibitory input impulse sequences. The interval distributions of the output spike train from these models tend to be multimodal and are compared with those used for experimental data, reported by Bishop et al. (1964) for geniculate neuron activity and Poisson process deleting model analyzed by Ten Hoopen et al. (1966). Special attention, moreover, should be paid to how different forms of inhibitory input are transformed into the output interval distributions through these neuron models. These results exhibit a clear correlation between inhibitory input form and output interval distribution. More detailed information on this mechanism is obtained from the computations of recurrence-time under the stationary condition to go from active state to itself for the first time, each of which is influenced by the inhibitory input forms. In addition to these facts, some resultant characteristics on interval histogram and serial correlation are discussed in relation to physiological data from the literature.  相似文献   

20.
A rough guide to population change in exploited fish stocks   总被引:2,自引:0,他引:2  
R. Cook 《Ecology letters》2000,3(5):394-398
Interpreting how populations will change in response to exploitation is essential to the sound management of fish stocks. While deterministic models can be of use in evaluating sustainable fishing rates, the inherent variability of fish populations limits their value. In this paper a probabilistic approach is investigated which avoids having to make strong assumptions about the functional relationship between spawning stock size and the annual number of young fish (recruits) produced. Empirical probability distributions for recruits are derived, conditioned on stock size, and used to indicate likely stock changes under different fishing mortality rates. The method is applied to cod ( Gadus morhua ) in the North Sea to illustrate how population change can be inferred and used by fishery managers to choose fishing mortality rates which are likely to achieve sustainable exploitation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号