首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper proposes a novel approach for the confidence interval estimation and hypothesis testing of the common mean of several log-normal populations using the concept of generalized variable. Simulation studies demonstrate that the proposed approach can provide confidence intervals with satisfying coverage probabilities and can perform hypothesis testing with satisfying type-I error control even at small sample sizes. Overall, it is superior to the large sample approach. The proposed method is illustrated using two examples.  相似文献   

2.
The bootstrap error estimation method is investigated in comparison with the known π-method and with a combined error estimation suggested by us using simulated and normally distributed “populations” in 15 and 30 characters, respectively. For small sample sizes (below the double to threefold number of characters per class) the estimates resulting from the bootstrap method are on the average too small and can no longer be accepted. Significantly better results (with an essentially lower calculation expenditure) are obtained for the π-method and the combined estimation. The variability is essentially the same for all the three methods. This applies both in the case of rather badly separated and in the case of very well separated populations. A bootstrap estimation modified by us also gives unsatisfactory results.  相似文献   

3.
Auxiliary covariate data are often collected in biomedical studies when the primary exposure variable is only assessed on a subset of the study subjects. In this study, we investigate a semiparametric‐estimated likelihood estimation for the generalized linear mixed models (GLMM) in the presence of a continuous auxiliary variable. We use a kernel smoother to handle continuous auxiliary data. The method can be used to deal with missing or mismeasured covariate data problems in a variety of applications when an auxiliary variable is available and cluster sizes are not too small. Simulation study results show that the proposed method performs better than that which ignores the random effects in GLMM and that which only uses data in the validation data set. We illustrate the proposed method with a real data set from a recent environmental epidemiology study on the maternal serum 1,1‐dichloro‐2,2‐bis(p‐chlorophenyl) ethylene level in relationship to preterm births.  相似文献   

4.
The original intrinsic rank test is generalized in that the sizes of the k samples may now be arbitrary, and the number of intrinsic rank intervals need not equal the number of samples. Furthermore, the size of these intervals can be made variable, subject only to relatively mild constraints. These generalizations permit the formulation and testing of more specific hypotheses concerning the commonality of the sample distributions. A generalized intrinsic rank function is used to transform the usual ordinal ranks, obtained from the combined samples, into intrinsic ranks. Original sample identity and intrinsic ranks are then cross-tabulated and evaluated as 2-way contingency table.  相似文献   

5.
Beerli P 《Molecular ecology》2004,13(4):827-836
Current estimators of gene flow come in two methods; those that estimate parameters assuming that the populations investigated are a small random sample of a large number of populations and those that assume that all populations were sampled. Maximum likelihood or Bayesian approaches that estimate the migration rates and population sizes directly using coalescent theory can easily accommodate datasets that contain a population that has no data, a so-called 'ghost' population. This manipulation allows us to explore the effects of missing populations on the estimation of population sizes and migration rates between two specific populations. The biases of the inferred population parameters depend on the magnitude of the migration rate from the unknown populations. The effects on the population sizes are larger than the effects on the migration rates. The more immigrants from the unknown populations that are arriving in the sample populations the larger the estimated population sizes. Taking into account a ghost population improves or at least does not harm the estimation of population sizes. Estimates of the scaled migration rate M (migration rate per generation divided by the mutation rate per generation) are fairly robust as long as migration rates from the unknown populations are not huge. The inclusion of a ghost population does not improve the estimation of the migration rate M; when the migration rates are estimated as the number of immigrants Nm then a ghost population improves the estimates because of its effect on population size estimation. It seems that for 'real world' analyses one should carefully choose which populations to sample, but there is no need to sample every population in the neighbourhood of a population of interest.  相似文献   

6.
An essential basis of medical diagnosis are biopotentials obtained from the body-surface of the patients. If these time-functions are to serve for computer-aided diagnostics (using discrimination procedures) the known methods fail because of the existing small sample sizes for a large number of features (amplitudes). In the paper a method is given allowing discrimination of biopotentials also with such an extreme ratio of the number of features to the sample size. Furthermore, univariate tests are provided, enabling a decision to be made on whether the possibly arising distinctions may be attached to the mean values and/or the spectra of the underlying potentials. Clinical applications from the field of otolngical diagnostics demonstrate the usefulness of the described methods and show in particular their superiority compared with the known linear discriminance analysis.  相似文献   

7.
This paper focuses on inferences about the overall treatment effect in meta-analysis with normally distributed responses based on the concepts of generalized inference. A refined generalized pivotal quantity based on t distribution is presented and simulation study shows that it can provide confidence intervals with satisfactory coverage probabilities and perform hypothesis testing with satisfactory type-I error control at very small sample sizes.  相似文献   

8.
The currently used criterion for sample size calculation in a reference interval study is not well stated and leads to imprecise control of the ratio in question. We propose a generalization of the criterion used to determine sufficient sample size in reference interval studies. The generalization allows better estimation of the required sample size when the reference interval estimation will be using a power transformation or is nonparametric. Bootstrap methods are presented to estimate sample sizes required by the generalized criterion. Simulation of several distributions both symmetric and positively skewed is presented to compare the sample size estimators. The new method is illustrated on a data set of plasma glucose values from a 50‐g oral glucose tolerance test. It is seen that the sample sizes calculated from the generalized criterion leads to more reliable control of the desired ratio.  相似文献   

9.
Estimating abundance of wildlife populations can be challenging and costly, especially for species that are difficult to detect and that live at low densities, such as cougars (Puma concolor). Remote, motion-sensitive cameras are a relatively efficient monitoring tool, but most abundance estimation techniques using remote cameras rely on some or all of the population being uniquely identifiable. Recently developed methods estimate abundance from encounter rates with remote cameras and do not require identifiable individuals. We used 2 methods, the time-to-event and space-to-event models, to estimate the density of 2 cougar populations in Idaho, USA, over 3 winters from 2016–2019. We concurrently estimated cougar density using the random encounter model (REM), an existing camera-based method for unmarked populations, and genetic spatial capture recapture (SCR), an established method for monitoring cougar populations. In surveys for which we successfully estimated density using the SCR model, the time-to-event estimates were more precise and showed comparable variation between survey years. The space-to-event estimates were less precise than the SCR estimates and were more variable between survey years. Compared to REM, time-to-event was more precise and consistent, and space-to-event was less precise and consistent. Low sample sizes made the space-to-event and SCR models inconsistent from survey to survey, and non-random camera placement may have biased both of the camera-based estimators high. We show that camera-based estimators can perform comparably to existing methods for estimating abundance in unmarked species that live at low densities. With the time- and space-to-event models, managers could use remote cameras to monitor populations of multiple species at broader spatial and temporal scales than existing methods allow. © 2020 The Wildlife Society.  相似文献   

10.
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.  相似文献   

11.
In this paper, several different procedures for constructing confidence regions for the true evolutionary tree are evaluated both in terms of coverage and size without considering model misspecification. The regions are constructed on the basis of tests of hypothesis using six existing tests: Shimodaira Hasegawa (SH), SOWH, star form of SOWH (SSOWH), approximately unbiased (AU), likelihood weight (LW), generalized least squares, plus two new tests proposed in this paper: single distribution nonparametric bootstrap (SDNB) and single distribution parametric bootstrap (SDPB). The procedures are evaluated on simulated trees both with small and large number of taxa. Overall, the SH, SSOWH, AU, and LW tests led to regions with higher coverage than the nominal level at the price of including large numbers of trees. Under the specified model, the SOWH test gives accurate coverage and relatively small regions. The SDNB and SDPB tests led to the small regions with occasional undercoverage. These two procedures have a substantial computational advantage over the SOWH test. Finally, the cutoff levels for the SDNB test are shown to be more variable than those for the SDPB test.  相似文献   

12.
Abundance is an important population state variable for monitoring restoration progress. Efficient sampling often proves difficult, however, when populations are sparse and patchily distributed, such as early after restoration planting. Adaptive cluster sampling (ACS) can help by concentrating search effort in high density areas, improving the encounter rate and the ability to detect a population change over time. To illustrate the problem, I determined conventional design sample sizes for estimating abundance of 12 natural populations and 24 recently planted populations (divided among two preserves) of Lupinus perennis L. (wild blue lupine). I then determined the variance efficiency of ACS relative to simple random sampling at fixed effort and cost for 10 additional planted populations in two habitats (field vs. shrubland). Conventional design sample sizes to estimate lupine stem density with 10% or 20% margins of error were many times greater than initial sample size and would require sampling at least 90% of the study area. Differences in effort requirements were negligible for the two preserves and natural versus planted populations. At fixed sample size, ACS equaled or outperformed simple random sampling in 40% of populations; this shifted to 50% after correcting for travel time among sample units. ACS appeared to be a better strategy for inter‐seeded shrubland habitat than for planted field habitat. Restoration monitoring programs should consider adaptive sampling designs, especially when reliable abundance estimation under conventional designs proves elusive.  相似文献   

13.
Qu A  Li R 《Biometrics》2006,62(2):379-391
Nonparametric smoothing methods are used to model longitudinal data, but the challenge remains to incorporate correlation into nonparametric estimation procedures. In this article, we propose an efficient estimation procedure for varying-coefficient models for longitudinal data. The proposed procedure can easily take into account correlation within subjects and deal directly with both continuous and discrete response longitudinal data under the framework of generalized linear models. The proposed approach yields a more efficient estimator than the generalized estimation equation approach when the working correlation is misspecified. For varying-coefficient models, it is often of interest to test whether coefficient functions are time varying or time invariant. We propose a unified and efficient nonparametric hypothesis testing procedure, and further demonstrate that the resulting test statistics have an asymptotic chi-squared distribution. In addition, the goodness-of-fit test is applied to test whether the model assumption is satisfied. The corresponding test is also useful for choosing basis functions and the number of knots for regression spline models in conjunction with the model selection criterion. We evaluate the finite sample performance of the proposed procedures with Monte Carlo simulation studies. The proposed methodology is illustrated by the analysis of an acquired immune deficiency syndrome (AIDS) data set.  相似文献   

14.
The control of natural variation in cytosine methylation in Arabidopsis   总被引:1,自引:0,他引:1  
Riddle NC  Richards EJ 《Genetics》2002,161(1):355-363
The distance of pollen movement is an important determinant of the neighborhood area of plant populations. In earlier studies, we designed a method for estimating the distance of pollen dispersal, on the basis of the analysis of the differentiation among the pollen clouds of a sample of females, spaced across the landscape. The method was based solely on an estimate of the global level of differentiation among the pollen clouds of the total array of sampled females. Here, we develop novel estimators, on the basis of the divergence of pollen clouds for all pairs of females, assuming that an independent estimate of adult population density is available. A simulation study shows that the estimators are all slightly biased, but that most have enough precision to be useful, at least with adequate sample sizes. We show that one of the novel pairwise methods provides estimates that are slightly better than the best global estimate, especially when the markers used have low exclusion probability. The new method can also be generalized to the case where there is no prior information on the density of reproductive adults. In that case, we can jointly estimate the density itself and the pollen dispersal distance, given sufficient sample sizes. The bias of this last estimator is larger and the precision is lower than for those estimates based on independent estimates of density, but the estimate is of some interest, because a meaningful independent estimate of the density of reproducing individuals is difficult to obtain in most cases.  相似文献   

15.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.  相似文献   

16.
The influence of methodologic aspects on cytomorphometric features was studied using preparations of hepatoma and/or mastocytoma cells. First, two preparation techniques (smear and oese) were compared. Second, four methods of selecting cells for cytomorphometric analysis (two conventional and two stratified methods) were tested for reproducibility. Third, heterogeneous cell populations were used to estimate the required sample size using the running coefficient of variation (CV), and the results were compared with expected (theoretical) values of the required sample size calculated using the standard error of the mean. The results showed significantly lower CVs for the smear preparation technique. The stratified methods appeared to be superior to the conventional methods for selecting cells for measurement. The experimentally assessed sample sizes were considerably lower than the corresponding theoretical calculations. These findings suggest that morphometric assessments in cytologic smears should utilize a stratified cell selection method. While experimentally assessed sample sizes are relatively small and therefore better routinely applicable, they may yield less reliable results in some cases. The need to test a sample for its reproducibility as well as its discriminatory power is emphasized.  相似文献   

17.
Heinze G  Gnant M  Schemper M 《Biometrics》2003,59(4):1151-1157
The asymptotic log-rank and generalized Wilcoxon tests are the standard procedures for comparing samples of possibly censored survival times. For comparison of samples of very different sizes, an exact test is available that is based on a complete permutation of log-rank or Wilcoxon scores. While the asymptotic tests do not keep their nominal sizes if sample sizes differ substantially, the exact complete permutation test requires equal follow-up of the samples. Therefore, we have developed and present two new exact tests also suitable for unequal follow-up. The first of these is an exact analogue of the asymptotic log-rank test and conditions on observed risk sets, whereas the second approach permutes survival times while conditioning on the realized follow-up in each group. In an empirical study, we compare the new procedures with the asymptotic log-rank test, the exact complete permutation test, and an earlier proposed approach that equalizes the follow-up distributions using artificial censoring. Results confirm highly satisfactory performance of the exact procedure conditioning on realized follow-up, particularly in case of unequal follow-up. The advantage of this test over other options of analysis is finally exemplified in the analysis of a breast cancer study.  相似文献   

18.
Currently, among multiple comparison procedures for dependent groups, a bootstrap‐t with a 20% trimmed mean performs relatively well in terms of both Type I error probabilities and power. However, trimmed means suffer from two general concerns described in the paper. Robust M‐estimators address these concerns, but now no method has been found that gives good control over the probability of a Type I error when sample sizes are small. The paper suggests using instead a modified one‐step M‐estimator that retains the advantages of both trimmed means and robust M‐estimators. Yet another concern is that the more successful methods for trimmed means can be too conservative in terms of Type I errors. Two methods for performing all pairwise multiple comparisons are considered. In simulations, both methods avoid a familywise error (FWE) rate larger than the nominal level. The method based on comparing measures of location associated with the marginal distributions can have an actual FWE that is well below the nominal level when variables are highly correlated. However, the method based on difference scores performs reasonably well with very small sample sizes, and it generally performs better than any of the methods studied in Wilcox (1997b).  相似文献   

19.
Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which “working correlation structure” is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two‐group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for CRTs and multicenter trials. To overcome this shortcoming, Van Breukelen et al. consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where “optimal” refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different.  相似文献   

20.
Abstract. Statistical models of the realized niche of species are increasingly used, but systematic comparisons of alternative methods are still limited. In particular, only few studies have explored the effect of scale in model outputs. In this paper, we investigate the predictive ability of three statistical methods (generalized linear models, generalized additive models and classification tree analysis) using species distribution data at three scales: fine (Catalonia), intermediate (Portugal) and coarse (Europe). Four Mediterranean tree species were modelled for comparison. Variables selected by models were relatively consistent across scales and the predictive accuracy of models varied only slightly. However, there were slight differences in the performance of methods. Classification tree analysis had a lower accuracy than the generalized methods, especially at finer scales. The performance of generalized linear models also increased with scale. At the fine scale GLM with linear terms showed better accuracy than GLM with quadratic and polynomial terms. This is probably because distributions at finer scales represent a linear sub‐sample of entire realized niches of species. In contrast to GLM, the performance of GAM was constant across scales being more data‐oriented. The predictive accuracy of GAM was always at least equal to other techniques, suggesting that this modelling approach is more robust to variations of scale because it can deal with any response shape.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号