首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The power prior has been widely used to discount the amount of information borrowed from historical data in the design and analysis of clinical trials. It is realized by raising the likelihood function of the historical data to a power parameter δ [ 0 , 1 ] $\delta \in [0, 1]$ , which quantifies the heterogeneity between the historical and the new study. In a fully Bayesian approach, a natural extension is to assign a hyperprior to δ such that the posterior of δ can reflect the degree of similarity between the historical and current data. To comply with the likelihood principle, an extra normalizing factor needs to be calculated and such prior is known as the normalized power prior. However, the normalizing factor involves an integral of a prior multiplied by a fractional likelihood and needs to be computed repeatedly over different δ during the posterior sampling. This makes its use prohibitive in practice for most elaborate models. This work provides an efficient framework to implement the normalized power prior in clinical studies. It bypasses the aforementioned efforts by sampling from the power prior with δ = 0 $\delta = 0$ and δ = 1 $\delta = 1$ only. Such a posterior sampling procedure can facilitate the use of a random δ with adaptive borrowing capability in general models. The numerical efficiency of the proposed method is illustrated via extensive simulation studies, a toxicological study, and an oncology study.  相似文献   

2.
A recent method for estimating a lower bound of the population size in capture–recapture samples is studied. Specifically, some asymptotic properties, such as strong consistency and asymptotic normality, are provided. The introduced estimator is based on the empirical probability generating function (pgf) of the observed data, and it is consistent for count distributions having a log-convex pgf (-class). This is a large family that includes mixed and compound Poisson distributions, and their independent sums and finite mixtures as well. The finite-sample performance of the lower bound estimator is assessed via simulation showing a better behavior than some close competitors. Several examples of application are also analyzed and discussed.  相似文献   

3.
The decision curve plots the net benefit of a risk model for making decisions over a range of risk thresholds, corresponding to different ratios of misclassification costs. We discuss three methods to estimate the decision curve, together with corresponding methods of inference and methods to compare two risk models at a given risk threshold. One method uses risks (R) and a binary event indicator (Y) on the entire validation cohort. This method makes no assumptions on how well-calibrated the risk model is nor on the incidence of disease in the population and is comparatively robust to model miscalibration. If one assumes that the model is well-calibrated, one can compute a much more precise estimate of based on risks R alone. However, if the risk model is miscalibrated, serious bias can result. Case–control data can also be used to estimate if the incidence (or prevalence) of the event () is known. This strategy has comparable efficiency to using the full data, and its efficiency is only modestly less than that for the full data if the incidence is estimated from the mean of Y. We estimate variances using influence functions and propose a bootstrap procedure to obtain simultaneous confidence bands around the decision curve for a range of thresholds. The influence function approach to estimate variances can also be applied to cohorts derived from complex survey samples instead of simple random samples.  相似文献   

4.
Count phenotypes with excessive zeros are often observed in the biological world. Researchers have studied many statistical methods for mapping the quantitative trait loci (QTLs) of zero-inflated count phenotypes. However, most of the existing methods consist of finding the approximate positions of the QTLs on the chromosome by genome-wide scanning. Additionally, most of the existing methods use the EM algorithm for parameter estimation. In this paper, we propose a Bayesian interval mapping scheme of QTLs for zero-inflated count data. The method takes advantage of a zero-inflated generalized Poisson (ZIGP) regression model to study the influence of QTLs on the zero-inflated count phenotype. The MCMC algorithm is used to estimate the effects and position parameters of QTLs. We use the Haldane map function to realize the conversion between recombination rate and map distance. Monte Carlo simulations are conducted to test the applicability and advantage of the proposed method. The effects of QTLs on the formation of mouse cholesterol gallstones were demonstrated by analyzing an mouse data set.  相似文献   

5.
When the objective is to administer the best of two treatments to an individual, it is necessary to know his or her individual treatment effects (ITEs) and the correlation between the potential responses (PRs) and under treatments 1 and 0. Data that are generated in a parallel-group design RCT does not allow the ITE to be determined because only two samples from the marginal distributions of these PRs are observed and not the corresponding joint distribution. This is due to the “fundamental problem of causal inference.” Here, we present a counterfactual approach for estimating the joint distribution of two normally distributed responses to two treatments. This joint distribution of the PRs and can be estimated by assuming a bivariate normal distribution for the PRs and by using a normally distributed baseline biomarker functionally related to the sum . Such a functional relationship is plausible since a biomarker and the sum encode for the same information in an RCT, namely the variation between subjects. The estimation of the joint trivariate distribution is subjected to some constraints. These constraints can be framed in the context of linear regressions with regard to the proportions of variances in the responses explained and with regard to the residual variation. This presents new insights on the presence of treatment–biomarker interactions. We applied our approach to example data on exercise and heart rate and extended the approach to survival data.  相似文献   

6.
Regression modelling is a powerful statistical tool often used in biomedical and clinical research. It could be formulated as an inverse problem that measures the discrepancy between the target outcome and the data produced by representation of the modelled predictors. This approach could simultaneously perform variable selection and coefficient estimation. We focus particularly on a linear regression issue, , where is the parameter of interest and its components are the regression coefficients. The inverse problem finds an estimate for the parameter , which is mapped by the linear operator to the observed outcome data . This problem could be conveyed by finding a solution in the affine subspace . However, in the presence of collinearity, high-dimensional data and high conditioning number of the related covariance matrix, the solution may not be unique, so the introduction of prior information to reduce the subset and regularize the inverse problem is needed. Informed by Huber's robust statistics framework, we propose an optimal regularizer to the regression problem. We compare results of the proposed method and other penalized regression regularization methods: ridge, lasso, adaptive-lasso and elastic-net under different strong hypothesis such as high conditioning number of the covariance matrix and high error amplitude, on both simulated and real data from the South London Stroke Register. The proposed approach can be extended to mixed regression models. Our inverse problem framework coupled with robust statistics methodology offer new insights in statistical regression and learning. It could open a new research development for model fitting and learning.  相似文献   

7.
Count data sets are traditionally analyzed using the ordinary Poisson distribution. However, such a model has its applicability limited as it can be somewhat restrictive to handle specific data structures. In this case, it arises the need for obtaining alternative models that accommodate, for example, (a) zero‐modification (inflation or deflation at the frequency of zeros), (b) overdispersion, and (c) individual heterogeneity arising from clustering or repeated (correlated) measurements made on the same subject. Cases (a)–(b) and (b)–(c) are often treated together in the statistical literature with several practical applications, but models supporting all at once are less common. Hence, this paper's primary goal was to jointly address these issues by deriving a mixed‐effects regression model based on the hurdle version of the Poisson–Lindley distribution. In this framework, the zero‐modification is incorporated by assuming that a binary probability model determines which outcomes are zero‐valued, and a zero‐truncated process is responsible for generating positive observations. Approximate posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the Adaptive Metropolis algorithm. Intensive Monte Carlo simulation studies were performed to assess the empirical properties of the Bayesian estimators. The proposed model was considered for the analysis of a real data set, and its competitiveness regarding some well‐established mixed‐effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian ‐value and the randomized quantile residuals were considered for model diagnostics.  相似文献   

8.
The question of how individual patient data from cohort studies or historical clinical trials can be leveraged for designing more powerful, or smaller yet equally powerful, clinical trials becomes increasingly important in the era of digitalization. Today, the traditional statistical analyses approaches may seem questionable to practitioners in light of ubiquitous historical prognostic information. Several methodological developments aim at incorporating historical information in the design and analysis of future clinical trials, most importantly Bayesian information borrowing, propensity score methods, stratification, and covariate adjustment. Adjusting the analysis with respect to a prognostic score, which was obtained from some model applied to historical data, received renewed interest from a machine learning perspective, and we study the potential of this approach for randomized clinical trials. In an idealized situation of a normal outcome in a two-arm trial with 1:1 allocation, we derive a simple sample size reduction formula as a function of two criteria characterizing the prognostic score: (1) the coefficient of determination R2 on historical data and (2) the correlation ρ between the estimated and the true unknown prognostic scores. While maintaining the same power, the original total sample size n planned for the unadjusted analysis reduces to ( 1 R 2 ρ 2 ) × n $(1 - R^2 \rho ^2) \times n$ in an adjusted analysis. Robustness in less ideal situations was assessed empirically. We conclude that there is potential for substantially more powerful or smaller trials, but only when prognostic scores can be accurately estimated.  相似文献   

9.
In scientific research, many hypotheses relate to the comparison of two independent groups. Usually, it is of interest to use a design (i.e., the allocation of sample sizes m and n for fixed ) that maximizes the power of the applied statistical test. It is known that the two‐sample t‐tests for homogeneous and heterogeneous variances may lose substantial power when variances are unequal but equally large samples are used. We demonstrate that this is not the case for the nonparametric Wilcoxon–Mann–Whitney‐test, whose application in biometrical research fields is motivated by two examples from cancer research. We prove the optimality of the design in case of symmetric and identically shaped distributions using normal approximations and show that this design generally offers power only negligibly lower than the optimal design for a wide range of distributions.  相似文献   

10.

Aim

Theoretically, woody biomass turnover time ( τ ) quantified using outflux (i.e. tree mortality) predicts biomass dynamics better than using influx (i.e. productivity). This study aims at using forest inventory data to empirically test the outflux approach and generate a spatially explicit understanding of woody τ in mature forests. We further compared woody τ estimates with dynamic global vegetation models (DGVMs) and with a data assimilation product of C stocks and fluxes—CARDAMOM.

Location

Continents.

Time Period

Historic from 1951 to 2018.

Major Taxa Studied

Trees and forests.

Methods

We compared the approaches of using outflux versus influx for estimating woody τ and predicting biomass accumulation rates. We investigated abiotic and biotic drivers of spatial woody τ and generated a spatially explicit map of woody τ at a 0.25-degree resolution across continents using machine learning. We further examined whether six DGVMs and CARDAMOM generally captured the observational pattern of woody τ .

Results

Woody τ quantified by the outflux approach better (with R2 0.4–0.5) predicted the biomass accumulation rates than the influx approach (with R2 0.1–0.4) across continents. We found large spatial variations of woody τ for mature forests, with highest values in temperate forests (98.8 ± 2.6 y) followed by boreal forests (73.9 ± 3.6 y) and tropical forests. The map of woody τ extrapolated from plot data showed higher values in wetter eastern and pacific coast USA, Africa and eastern Amazon. Climate (temperature and aridity index) and vegetation structure (tree density and forest age) were the dominant drivers of woody τ across continents. The highest woody τ in temperate forests was not captured by either DGVMs or CARDAMOM.

Main Conclusions

Our study empirically demonstrated the preference of using outflux over influx to estimate woody τ for predicting biomass accumulation rates. The spatially explicit map of woody τ and the underlying drivers provide valuable information to improve the representation of forest demography and carbon turnover processes in DGVMs.  相似文献   

11.
We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability , both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence.  相似文献   

12.
K.O. Ekvall  M. Bottai 《Biometrics》2023,79(3):2286-2297
We propose a unified framework for likelihood-based regression modeling when the response variable has finite support. Our work is motivated by the fact that, in practice, observed data are discrete and bounded. The proposed methods assume a model which includes models previously considered for interval-censored variables with log-concave distributions as special cases. The resulting log-likelihood is concave, which we use to establish asymptotic normality of its maximizer as the number of observations n tends to infinity with the number of parameters d fixed, and rates of convergence of L1-regularized estimators when the true parameter vector is sparse and d and n both tend to infinity with log ( d ) / n 0 $\log (d) / n \rightarrow 0$ . We consider an inexact proximal Newton algorithm for computing estimates and give theoretical guarantees for its convergence. The range of possible applications is wide, including but not limited to survival analysis in discrete time, the modeling of outcomes on scored surveys and questionnaires, and, more generally, interval-censored regression. The applicability and usefulness of the proposed methods are illustrated in simulations and data examples.  相似文献   

13.
This paper is motivated by the GH‐2000 biomarker test, though the discussion is applicable to other diagnostic tests. The GH‐2000 biomarker test has been developed as a powerful technique to detect growth hormone misuse by athletes, based on the GH‐2000 score. Decision limits on the GH‐2000 score have been developed and incorporated into the guidelines of the World Anti‐Doping Agency (WADA). These decision limits are constructed, however, under the assumption that the GH‐2000 score follows a normal distribution. As it is difficult to affirm the normality of a distribution based on a finite sample, nonparametric decision limits, readily available in the statistical literature, are viable alternatives. In this paper, we compare the normal distribution–based and nonparametric decision limits. We show that the decision limit based on the normal distribution may deviate significantly from the nominal confidence level or nominal FPR when the distribution of the GH‐2000 score departs only slightly from the normal distribution. While a nonparametric decision limit does not assume any specific distribution of the GH‐2000 score and always guarantees the nominal confidence level and FPR, it requires a much larger sample size than the normal distribution–based decision limit. Due to the stringent FPR of the GH‐2000 biomarker test used by WADA, the sample sizes currently available are much too small, and it will take many years of testing to have the minimum sample size required, in order to use the nonparametric decision limits. Large sample theory about the normal distribution–based and nonparametric decision limits is also developed in this paper to help understanding their behaviours when the sample size is large.  相似文献   

14.
Climate change leads to increasing temperature and more extreme hot and drought events. Ecosystem capability to cope with climate warming depends on vegetation's adjusting pace with temperature change. How environmental stresses impair such a vegetation pace has not been carefully investigated. Here we show that dryness substantially dampens vegetation pace in warm regions to adjust the optimal temperature of gross primary production (GPP) ( T opt GPP ) in response to change in temperature over space and time. T opt GPP spatially converges to an increase of 1.01°C (95% CI: 0.97, 1.05) per 1°C increase in the yearly maximum temperature (Tmax) across humid or cold sites worldwide (37oS–79oN) but only 0.59°C (95% CI: 0.46, 0.74) per 1°C increase in Tmax across dry and warm sites. T opt GPP temporally changes by 0.81°C (95% CI: 0.75, 0.87) per 1°C interannual variation in Tmax at humid or cold sites and 0.42°C (95% CI: 0.17, 0.66) at dry and warm sites. Regardless of the water limitation, the maximum GPP (GPPmax) similarly increases by 0.23 g C m−2 day−1 per 1°C increase in T opt GPP in either humid or dry areas. Our results indicate that the future climate warming likely stimulates vegetation productivity more substantially in humid than water-limited regions.  相似文献   

15.
Optimal experimental designs are often formal and specific, and not intuitively plausible to practical experimenters. However, even in theory, there often are many different possible design points providing identical or nearly identical information compared to the design points of a strictly optimal design. In practical applications, this can be used to find designs that are a compromise between mathematical optimality and practical requirements, including preferences of experimenters. For this purpose, we propose a derivative-based two-dimensional graphical representation of the design space that, given any optimal design is already known, will show which areas of the design space are relevant for good designs and how these areas relate to each other. While existing equivalence theorems already allow such an illustration in regard to the relevance of design points only, our approach also shows whether different design points contribute the same kind of information, and thus allows tweaking of designs for practical applications, especially in regard to the splitting and combining of design points. We demonstrate the approach on a toxicological trial where a -optimal design for a dose–response experiment modeled by a four-parameter log-logistic function was requested. As these designs require a prior estimate of the relevant parameters, which is difficult to obtain in a practical situation, we also discuss an adaption of our representations to the criterion of Bayesian -optimality. While we focus on -optimality, the approach is in principle applicable to different optimality criteria as well. However, much of the computational and graphical simplicity will be lost.  相似文献   

16.

Aim

Understanding connections between environment and biodiversity is crucial for conservation, identifying causes of ecosystem stress, and predicting population responses to changing environments. Explaining biodiversity requires an understanding of how species richness and environment covary across scales. Here, we identify scales and locations at which biodiversity is generated and correlates with environment.

Location

Full latitudinal range per continent.

Time Period

Present day.

Major Taxa Studied

Terrestrial vertebrates: all mammals, carnivorans, bats, songbirds, hummingbirds, amphibians.

Methods

We describe the use of wavelet power spectra, cross-power and coherence for identifying scale-dependent trends across Earth's surface. Spectra reveal scale- and location-dependent coherence between species richness and topography (E), mean annual precipitation (Pn), temperature (Tm) and annual temperature range (ΔT).

Results

>97% of species richness of taxa studied is generated at large scales, that is, wavelengths 10 3 km, with 30%–69% generated at scales 10 4 km. At these scales, richness tends to be highly coherent and anti-correlated with E and ΔT, and positively correlated with Pn and Tm. Coherence between carnivoran richness and ΔT is low across scales, implying insensitivity to seasonal temperature variations. Conversely, amphibian richness is strongly anti-correlated with ΔT at large scales. At scales 10 3 km, examined taxa, except carnivorans, show highest richness within the tropics. Terrestrial plateaux exhibit high coherence between carnivorans and E at scales 10 3 km, consistent with contribution of large-scale tectonic processes to biodiversity. Results are similar across different continents and for global latitudinal averages. Spectral admittance permits derivation of rules-of-thumb relating long-wavelength environmental and species richness trends.

Main Conclusions

Sensitivities of mammal, bird and amphibian populations to environment are highly scale dependent. At large scales, carnivoran richness is largely independent of temperature and precipitation, whereas amphibian richness correlates strongly with precipitation and temperature, and anti-correlates with temperature range. These results pave the way for spectral-based calibration of models that predict biodiversity response to climate change scenarios.  相似文献   

17.
Linda M. Haines 《Biometrics》2020,76(2):540-548
Multinomial N-mixture models are commonly used to fit data from a removal sampling protocol. If the mixing distribution is negative binomial, the distribution of the counts does not appear to have been identified, and practitioners approximate the requisite likelihood by placing an upper bound on the embedded infinite sum. In this paper, the distribution which underpins the multinomial N-mixture model with a negative binomial mixing distribution is shown to belong to the broad class of multivariate negative binomial distributions. Specifically, the likelihood can be expressed in closed form as the product of conditional and marginal likelihoods and the information matrix shown to be block diagonal. As a consequence, the nature of the maximum likelihood estimates of the unknown parameters and their attendant standard errors can be examined and tests of the hypothesis of the Poisson against the negative binomial mixing distribution formulated. In addition, appropriate multinomial N-mixture models for data sets which include zero site totals can also be constructed. Two illustrative examples are provided.  相似文献   

18.
Genome-scale metabolic network model (GSMM) based on enzyme constraints greatly improves general metabolic models. The turnover number ( k cat ${k}_{\mathrm{cat}}$ ) of enzymes is used as a parameter to limit the reaction when extending GSMM. Therefore, turnover number plays a crucial role in the prediction accuracy of cell metabolism. In this work, we proposed an enzyme-constrained GSMM parameter optimization method. First, sensitivity analysis of the parameters was carried out to select the parameters with the greatest influence on predicting the specific growth rate. Then, differential evolution (DE) algorithm with adaptive mutation strategy was adopted to optimize the parameters. This algorithm can dynamically select five different mutation strategies. Finally, the specific growth rate prediction, flux variability, and phase plane of the optimized model were analyzed to further evaluate the model. The enzyme-constrained GSMM of Saccharomyces cerevisiae, ecYeast8.3.4, was optimized. Results of the sensitivity analysis showed that the optimization variables can be divided into three groups based on sensitivity: most sensitive (149 k cat ${k}_{\mathrm{cat}}$ c), highly sensitive (1759 k cat ${k}_{\mathrm{cat}}$ ), and nonsensitive (2502 k cat ${k}_{\mathrm{cat}}$ ) groups. Six optimization strategies were developed based on the results of the sensitivity analysis. The results showed that the DE with adaptive mutation strategy can indeed improve the model by optimizing highly sensitive parameters. Retaining all parameters and optimizing the highly sensitive parameters are the recommended optimization strategy.  相似文献   

19.
The existence of a large-biomass carbon (C) sink in Northern Hemisphere extra-tropical ecosystems (NHee) is well-established, but the relative contribution of different potential drivers remains highly uncertain. Here we isolated the historical role of carbon dioxide (CO2) fertilization by integrating estimates from 24 CO2-enrichment experiments, an ensemble of 10 dynamic global vegetation models (DGVMs) and two observation-based biomass datasets. Application of the emergent constraint technique revealed that DGVMs underestimated the historical response of plant biomass to increasing [CO2] in forests ( β Forest Mod ) but overestimated the response in grasslands ( β Grass Mod ) since the 1850s. Combining the constrained β Forest Mod (0.86 ± 0.28 kg C m−2 [100 ppm]−1) with observed forest biomass changes derived from inventories and satellites, we identified that CO2 fertilization alone accounted for more than half (54 ± 18% and 64 ± 21%, respectively) of the increase in biomass C storage since the 1990s. Our results indicate that CO2 fertilization dominated the forest biomass C sink over the past decades, and provide an essential step toward better understanding the key role of forests in land-based policies for mitigating climate change.  相似文献   

20.
When establishing a treatment in clinical trials, it is important to evaluate both effectiveness and toxicity. In phase II clinical trials, multinomial data are collected in m‐stage designs, especially in two‐stage () design. Exact tests on two proportions, for the response rate and for the nontoxicity rate, should be employed due to limited sample sizes. However, existing tests use certain parameter configurations at the boundary of null hypothesis space to determine rejection regions without showing that the maximum Type I error rate is achieved at the boundary of null hypothesis. In this paper, we show that the power function for each test in a large family of tests is nondecreasing in both and ; identify the parameter configurations at which the maximum Type I error rate and the minimum power are achieved and derive level‐α tests; provide optimal two‐stage designs with the least expected total sample size and the optimization algorithm; and extend the results to the case of . Some R‐codes are given in the Supporting Information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号