首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In scientific research, many hypotheses relate to the comparison of two independent groups. Usually, it is of interest to use a design (i.e., the allocation of sample sizes m and n for fixed ) that maximizes the power of the applied statistical test. It is known that the two‐sample t‐tests for homogeneous and heterogeneous variances may lose substantial power when variances are unequal but equally large samples are used. We demonstrate that this is not the case for the nonparametric Wilcoxon–Mann–Whitney‐test, whose application in biometrical research fields is motivated by two examples from cancer research. We prove the optimality of the design in case of symmetric and identically shaped distributions using normal approximations and show that this design generally offers power only negligibly lower than the optimal design for a wide range of distributions.  相似文献   

2.
When establishing a treatment in clinical trials, it is important to evaluate both effectiveness and toxicity. In phase II clinical trials, multinomial data are collected in m‐stage designs, especially in two‐stage () design. Exact tests on two proportions, for the response rate and for the nontoxicity rate, should be employed due to limited sample sizes. However, existing tests use certain parameter configurations at the boundary of null hypothesis space to determine rejection regions without showing that the maximum Type I error rate is achieved at the boundary of null hypothesis. In this paper, we show that the power function for each test in a large family of tests is nondecreasing in both and ; identify the parameter configurations at which the maximum Type I error rate and the minimum power are achieved and derive level‐α tests; provide optimal two‐stage designs with the least expected total sample size and the optimization algorithm; and extend the results to the case of . Some R‐codes are given in the Supporting Information.  相似文献   

3.
Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness‐of‐fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer–Lemeshow () and Pigeon–Heyse (J2) statistics can be applied directly. In a simulation study, , , and J2 were used to evaluate the fit of probit, log–log, complementary log–log, and log models, all calculated with a common grouping method. The statistic consistently maintained Type I error rates, while those of and J2 were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, had more power than or J2.  相似文献   

4.
Matched case‐control paired data are commonly used to study the association between a disease and an exposure of interest. This work provides a consistent test for this association with respect to the conditional odds ratio (), which is a measure of association that is also valid in prospective studies. We formulate the test from the maximum likelihood (ML) estimate of by using data under inverse binomial sampling, in which individuals are selected sequentially to form matched pairs until for the first time one obtains either a prefixed number of index pairs with the case unexposed but the control exposed or with the case exposed but the control unexposed. We discuss the situation of possible early stopping. We compare numerically the performance of our procedure with a competitor proposed by Lui ( 1996 ) in terms of type I error rate, power, average sample number (ASN) and the corresponding standard error. Our numerical study shows a gain in sample size without loss in power as compared to the competitor. Finally, we use the data taken from a case‐control study on the use of X‐rays and the risk of childhood acute myeloid leukemia for illustration.  相似文献   

5.
Biomarkers are subject to censoring whenever some measurements are not quantifiable given a laboratory detection limit. Methods for handling censoring have received less attention in genetic epidemiology, and censored data are still often replaced with a fixed value. We compared different strategies for handling a left‐censored continuous biomarker in a family‐based study, where the biomarker is tested for association with a genetic variant, , adjusting for a covariate, X. Allowing different correlations between X and , we compared simple substitution of censored observations with the detection limit followed by a linear mixed effect model (LMM), Bayesian model with noninformative priors, Tobit model with robust standard errors, the multiple imputation (MI) with and without in the imputation followed by a LMM. Our comparison was based on real and simulated data in which 20% and 40% censoring were artificially induced. The complete data were also analyzed with a LMM. In the MICROS study, the Bayesian model gave results closer to those obtained with the complete data. In the simulations, simple substitution was always the most biased method, the Tobit approach gave the least biased estimates at all censoring levels and correlation values, the Bayesian model and both MI approaches gave slightly biased estimates but smaller root mean square errors. On the basis of these results the Bayesian approach is highly recommended for candidate gene studies; however, the computationally simpler Tobit and the MI without are both good options for genome‐wide studies.  相似文献   

6.
In this paper, a new class of models for autoradiographic hot‐line data is proposed. The models, for which there is theoretical justification, are a linear combination of generalized Student's t‐distributions and have as special cases all currently accepted line‐spread models. The new models are used to analyse experimental hot‐line data and compared with the fit of current models. The data are from a line source labelled with iodine‐125 in a resin section of 0.6 m in thickness. It will be shown that a significant improvement in goodness of fit, over that of previous models, can be achieved by choosing from this new class of models. A single model from this class will be proposed that has a simple form made up of only two components, but which fits experimental data significantly better than previous models. A short sensitivity analysis indicates that estimation is reliable. The modelling approach, although motivated by and applied to autoradiography, is appropriate for any mixture modelling situation.  相似文献   

7.
In recent years accelerometers have become widely used to objectively assess physical activity. Usually intensity ranges are assigned to the measured accelerometer counts by simple cut points, disregarding the underlying activity pattern. Under the assumption that physical activity can be seen as distinct sequence of distinguishable activities, the use of hidden Markov models (HMM) has been proposed to improve the modeling of accelerometer data. As further improvement we propose to use expectile regression utilizing a Whittaker smoother with an L0‐penalty to better capture the intensity levels underlying the observed counts. Different expectile asymmetries beyond the mean allow the distinction of monotonous and more variable activities as expectiles effectively model the complete distribution of the counts. This new approach is investigated in a simulation study, where we simulated 1,000 days of accelerometer data with 1 and 5 s epochs, based on collected labeled data to resemble real‐life data as closely as possible. The expectile regression is compared to HMMs and the commonly used cut point method with regard to misclassification rate, number of identified bouts and identified levels as well as the proportion of the estimate being in the range of of the true activity level. In summary, expectile regression utilizing a Whittaker smoother with an L0‐penalty outperforms HMMs and the cut point method and is hence a promising approach to model accelerometer data.  相似文献   

8.
The decision curve plots the net benefit of a risk model for making decisions over a range of risk thresholds, corresponding to different ratios of misclassification costs. We discuss three methods to estimate the decision curve, together with corresponding methods of inference and methods to compare two risk models at a given risk threshold. One method uses risks (R) and a binary event indicator (Y) on the entire validation cohort. This method makes no assumptions on how well-calibrated the risk model is nor on the incidence of disease in the population and is comparatively robust to model miscalibration. If one assumes that the model is well-calibrated, one can compute a much more precise estimate of based on risks R alone. However, if the risk model is miscalibrated, serious bias can result. Case–control data can also be used to estimate if the incidence (or prevalence) of the event () is known. This strategy has comparable efficiency to using the full data, and its efficiency is only modestly less than that for the full data if the incidence is estimated from the mean of Y. We estimate variances using influence functions and propose a bootstrap procedure to obtain simultaneous confidence bands around the decision curve for a range of thresholds. The influence function approach to estimate variances can also be applied to cohorts derived from complex survey samples instead of simple random samples.  相似文献   

9.
Few articles have been written on analyzing three‐way interactions between drugs. It may seem to be quite straightforward to extend a statistical method from two‐drugs to three‐drugs. However, there may exist more complex nonlinear response surface of the interaction index () with more complex local synergy and/or local antagonism interspersed in different regions of drug combinations in a three‐drug study, compared in a two‐drug study. In addition, it is not possible to obtain a four‐dimensional (4D) response surface plot for a three‐drug study. We propose an analysis procedure to construct the dose combination regions of interest (say, the synergistic areas with ). First, use the model robust regression method (MRR), a semiparametric method, to fit the entire response surface of the , which allows to fit a complex response surface with local synergy/antagonism. Second, we run a modified genetic algorithm (MGA), a stochastic optimization method, many times with different random seeds, to allow to collect as many feasible points as possible that satisfy the estimated values of . Last, all these feasible points are used to construct the approximate dose regions of interest in a 3D. A case study with three anti‐cancer drugs in an in vitro experiment is employed to illustrate how to find the dose regions of interest.  相似文献   

10.
In this paper, we introduce a new estimator of a percentile residual life function with censored data under a monotonicity constraint. Specifically, it is assumed that the percentile residual life is a decreasing function. This assumption is useful when estimating the percentile residual life of units, which degenerate with age. We establish a law of the iterated logarithm for the proposed estimator, and its ‐equivalence to the unrestricted estimator. The asymptotic normal distribution of the estimator and its strong approximation to a Gaussian process are also established. We investigate the finite sample performance of the monotone estimator in an extensive simulation study. Finally, data from a clinical trial in primary biliary cirrhosis of the liver are analyzed with the proposed methods. One of the conclusions of our work is that the restricted estimator may be much more efficient than the unrestricted one.  相似文献   

11.
We consider the problem treated by Simes of testing the overall null hypothesis formed by the intersection of a set of elementary null hypotheses based on ordered p‐values of the associated test statistics. The Simes test uses critical constants that do not need tabulation. Cai and Sarkar gave a method to compute generalized Simes critical constants which improve upon the power of the Simes test when more than a few hypotheses are false. The Simes constants can be viewed as the first order (requiring solution of a linear equation) and the Cai‐Sarkar constants as the second order (requiring solution of a quadratic equation) constants. We extend the method to third order (requiring solution of a cubic equation) constants, and also offer an extension to an arbitrary kth order. We show by simulation that the third order constants are more powerful than the second order constants for testing the overall null hypothesis in most cases. However, there are some drawbacks associated with these higher order constants especially for , which limits their practical usefulness.  相似文献   

12.
Shape analysis is of great importance in many fields of medical imaging and computational biology. In this paper, we consider the shape space as the set of smooth planar immersed curves in (parameterized curves) and, using the property of being isometric to a classical manifold immersed in a Euclidean space, we introduce a new extrinsic sample mean and a new extrinsic variance for a finite set of shapes, which are not necessarily star shaped. This is a fundamental tool in medical image analysis, for instance, to assess uncertainties that arise in locating anatomical structures such as the prostate and the bladder. We apply it to a dataset consisting of parallel planar axial CT sections of human prostate, in order to study the variability between boundaries that have been manually delineated by several observers.  相似文献   

13.
We develop nonparametric maximum likelihood estimation for the parameters of an irreversible Markov chain on states from the observations with interval censored times of 0 → 1, 0 → 2 and 1 → 2 transitions. The distinguishing aspect of the data is that, in addition to all transition times being interval censored, the times of two events (0 → 1 and 1 → 2 transitions) can be censored into the same interval. This development was motivated by a common data structure in oral health research, here specifically illustrated by the data from a prospective cohort study on the longevity of dental veneers. Using the self‐consistency algorithm we obtain the maximum likelihood estimators of the cumulative incidences of the times to events 1 and 2 and of the intensity of the 1 → 2 transition. This work generalizes previous results on the estimation in an “illness‐death” model from interval censored observations.  相似文献   

14.
Count data sets are traditionally analyzed using the ordinary Poisson distribution. However, such a model has its applicability limited as it can be somewhat restrictive to handle specific data structures. In this case, it arises the need for obtaining alternative models that accommodate, for example, (a) zero‐modification (inflation or deflation at the frequency of zeros), (b) overdispersion, and (c) individual heterogeneity arising from clustering or repeated (correlated) measurements made on the same subject. Cases (a)–(b) and (b)–(c) are often treated together in the statistical literature with several practical applications, but models supporting all at once are less common. Hence, this paper's primary goal was to jointly address these issues by deriving a mixed‐effects regression model based on the hurdle version of the Poisson–Lindley distribution. In this framework, the zero‐modification is incorporated by assuming that a binary probability model determines which outcomes are zero‐valued, and a zero‐truncated process is responsible for generating positive observations. Approximate posterior inferences for the model parameters were obtained from a fully Bayesian approach based on the Adaptive Metropolis algorithm. Intensive Monte Carlo simulation studies were performed to assess the empirical properties of the Bayesian estimators. The proposed model was considered for the analysis of a real data set, and its competitiveness regarding some well‐established mixed‐effects models for count data was evaluated. A sensitivity analysis to detect observations that may impact parameter estimates was performed based on standard divergence measures. The Bayesian ‐value and the randomized quantile residuals were considered for model diagnostics.  相似文献   

15.
In this work we propose the use of functional data analysis (FDA) to deal with a very large dataset of atmospheric aerosol size distribution resolved in both space and time. Data come from a mobile measurement platform in the town of Perugia (Central Italy). An OPC (Optical Particle Counter) is integrated on a cabin of the Minimetrò, an urban transportation system, that moves along a monorail on a line transect of the town. The OPC takes a sample of air every six seconds and counts the number of particles of urban aerosols with a diameter between 0.28 m and 10 m and classifies such particles into 21 size bins according to their diameter. Here, we adopt a 2D functional data representation for each of the 21 spatiotemporal series. In fact, space is unidimensional since it is measured as the distance on the monorail from the base station of the Minimetrò. FDA allows for a reduction of the dimensionality of each dataset and accounts for the high space‐time resolution of the data. Functional cluster analysis is then performed to search for similarities among the 21 size channels in terms of their spatiotemporal pattern. Results provide a good classification of the 21 size bins into a relatively small number of groups (between three and four) according to the season of the year. Groups including coarser particles have more similar patterns, while those including finer particles show a more different behavior according to the period of the year. Such features are consistent with the physics of atmospheric aerosol and the highlighted patterns provide a very useful ground for prospective model‐based studies.  相似文献   

16.
Two‐stage designs that allow for early stopping if the treatment is ineffective are commonly used in phase II oncology trials. A limitation of current designs is that early stopping is only allowed at the end of the first stage, even if it becomes evident during the trial that a significant result is unlikely. One way to overcome this limitation is to implement stochastic curtailment procedures that enable stopping the trial whenever the conditional power is below a pre‐specified threshold θ. In this paper, we present the results for implementing curtailment rules in either only the second stage or both stages of the designs. In total, 102 scenarios with different parameter settings were investigated using conditional power thresholds θ between 0 and 1 in steps of 0.01. An increase in θ results not only in a decrease of the actual Type I error rate and power but also of the expected sample size. Therefore, a reasonable balance has to be found when selecting a specific threshold value in the planning phase of a curtailed two‐stage design. Given that the effect of curtailment highly depends on the underlying design parameters, no general recommendation for θ can be made. However, up to , the loss in power was less than 5% for all investigated scenarios while savings of up to 50% in expected sample size occurred. In general, curtailment is most appropriate when the outcome can be observed fast or when accrual is slow so that adequate information for making early and frequent decisions is available.  相似文献   

17.
In the risk analysis of sequential events, the successive gap times are often correlated, e.g. as a result of an individual heterogeneity. Correlation is usually accounted for by using a shared gamma‐frailty model, where the variance φ of the random individual effect quantifies the correlation between gap times. This method is known to yield satisfactory estimates of covariate effects, but underestimates φ, which could result in a lack of power of the test of independence. We propose a new test of independence between two sequential gap times where the first is the time elapsed from the origin. The test is based on an approximation of the hazard of the second event given the first gap time in a frailty model, with a frailty distribution belonging to the power variance function family. Simulation results show an increased power of the new test compared with the test derived from the gamma‐frailty model. In the realistic case where hazards are event specific, and using event‐specific approaches, the proposed estimation of the variance of the frailty is less biased than the gamma‐frailty based estimation for a wide range of values ( with the set of parameters considered), and similar for higher values. As an illustration, the methods are applied to a previously analysed asthma prevention trial with results showing a significant positive association between the successive times to asthmatic events. We also analyse data from a cohort of HIV‐seropositive patients in order to assess the effect of risk factors on the occurrence of two successive markers of progression of the HIV disease. The results demonstrate the ability of the proposed model to account for negative correlations between gap times.  相似文献   

18.
Regression modelling is a powerful statistical tool often used in biomedical and clinical research. It could be formulated as an inverse problem that measures the discrepancy between the target outcome and the data produced by representation of the modelled predictors. This approach could simultaneously perform variable selection and coefficient estimation. We focus particularly on a linear regression issue, , where is the parameter of interest and its components are the regression coefficients. The inverse problem finds an estimate for the parameter , which is mapped by the linear operator to the observed outcome data . This problem could be conveyed by finding a solution in the affine subspace . However, in the presence of collinearity, high-dimensional data and high conditioning number of the related covariance matrix, the solution may not be unique, so the introduction of prior information to reduce the subset and regularize the inverse problem is needed. Informed by Huber's robust statistics framework, we propose an optimal regularizer to the regression problem. We compare results of the proposed method and other penalized regression regularization methods: ridge, lasso, adaptive-lasso and elastic-net under different strong hypothesis such as high conditioning number of the covariance matrix and high error amplitude, on both simulated and real data from the South London Stroke Register. The proposed approach can be extended to mixed regression models. Our inverse problem framework coupled with robust statistics methodology offer new insights in statistical regression and learning. It could open a new research development for model fitting and learning.  相似文献   

19.
Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. ) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1‐se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1‐se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets  相似文献   

20.
We estimated local and metapopulation effective sizes ( and meta‐) for three coexisting salmonid species (Salmo salar, Salvelinus fontinalis, Salvelinus alpinus) inhabiting a freshwater system comprising seven interconnected lakes. First, we hypothesized that might be inversely related to within‐species population divergence as reported in an earlier study (i.e., FST: S. salar> S. fontinalis> S. alpinus). Using the approximate Bayesian computation method implemented in ONeSAMP, we found significant differences in () between species, consistent with a hierarchy of adult population sizes (). Using another method based on a measure of linkage disequilibrium (LDNE: ), we found more finite values for S. salar than for the other two salmonids, in line with the results above that indicate that S. salar exhibits the lowest among the three species. Considering subpopulations as open to migration (i.e., removing putative immigrants) led to only marginal and non‐significant changes in , suggesting that migration may be at equilibrium between genetically similar sources. Second, we hypothesized that meta‐ might be significantly smaller than the sum of local s (null model) if gene flow is asymmetric, varies among subpopulations, and is driven by common landscape features such as waterfalls. One ‘bottom‐up’ or numerical approach that explicitly incorporates variable and asymmetric migration rates showed this very pattern, while a number of analytical models provided meta‐ estimates that were not significantly different from the null model or from each other. Our study of three species inhabiting a shared environment highlights the importance and utility of differentiating species‐specific and landscape effects, not only on dispersal but also in the demography of wild populations as assessed through local s and meta‐s and their relevance in ecology, evolution and conservation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号