首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings.  相似文献   

2.
Summary Cluster randomized trials in health care may involve three instead of two levels, for instance, in trials where different interventions to improve quality of care are compared. In such trials, the intervention is implemented in health care units (“clusters”) and aims at changing the behavior of health care professionals working in this unit (“subjects”), while the effects are measured at the patient level (“evaluations”). Within the generalized estimating equations approach, we derive a sample size formula that accounts for two levels of clustering: that of subjects within clusters and that of evaluations within subjects. The formula reveals that sample size is inflated, relative to a design with completely independent evaluations, by a multiplicative term that can be expressed as a product of two variance inflation factors, one that quantifies the impact of within‐subject correlation of evaluations on the variance of subject‐level means and the other that quantifies the impact of the correlation between subject‐level means on the variance of the cluster means. Power levels as predicted by the sample size formula agreed well with the simulated power for more than 10 clusters in total, when data were analyzed using bias‐corrected estimating equations for the correlation parameters in combination with the model‐based covariance estimator or the sandwich estimator with a finite sample correction.  相似文献   

3.
Summary Ye, Lin, and Taylor (2008, Biometrics 64 , 1238–1246) proposed a joint model for longitudinal measurements and time‐to‐event data in which the longitudinal measurements are modeled with a semiparametric mixed model to allow for the complex patterns in longitudinal biomarker data. They proposed a two‐stage regression calibration approach that is simpler to implement than a joint modeling approach. In the first stage of their approach, the mixed model is fit without regard to the time‐to‐event data. In the second stage, the posterior expectation of an individual's random effects from the mixed‐model are included as covariates in a Cox model. Although Ye et al. (2008) acknowledged that their regression calibration approach may cause a bias due to the problem of informative dropout and measurement error, they argued that the bias is small relative to alternative methods. In this article, we show that this bias may be substantial. We show how to alleviate much of this bias with an alternative regression calibration approach that can be applied for both discrete and continuous time‐to‐event data. Through simulations, the proposed approach is shown to have substantially less bias than the regression calibration approach proposed by Ye et al. (2008) . In agreement with the methodology proposed by Ye et al. (2008) , an advantage of our proposed approach over joint modeling is that it can be implemented with standard statistical software and does not require complex estimation techniques.  相似文献   

4.
Multivariate recurrent event data are usually encountered in many clinical and longitudinal studies in which each study subject may experience multiple recurrent events. For the analysis of such data, most existing approaches have been proposed under the assumption that the censoring times are noninformative, which may not be true especially when the observation of recurrent events is terminated by a failure event. In this article, we consider regression analysis of multivariate recurrent event data with both time‐dependent and time‐independent covariates where the censoring times and the recurrent event process are allowed to be correlated via a frailty. The proposed joint model is flexible where both the distributions of censoring and frailty variables are left unspecified. We propose a pairwise pseudolikelihood approach and an estimating equation‐based approach for estimating coefficients of time‐dependent and time‐independent covariates, respectively. The large sample properties of the proposed estimates are established, while the finite‐sample properties are demonstrated by simulation studies. The proposed methods are applied to the analysis of a set of bivariate recurrent event data from a study of platelet transfusion reactions.  相似文献   

5.
Genotypes are frequently used to assess alternative reproductive strategies such as extra‐pair paternity and conspecific brood parasitism in wild populations. However, such analyses are vulnerable to genotyping error or molecular artefacts that can bias results. For example, when using multilocus microsatellite data, a mismatch at a single locus, suggesting the offspring was not directly related to its putative parents, can occur quite commonly even when the offspring is truly related. Some recent studies have advocated an ad‐hoc rule that offspring must differ at more than one locus in order to conclude that they are not directly related. While this reduces the frequency with which true offspring are identified as not directly related young, it also introduces bias in the opposite direction, wherein not directly related young are categorized as true offspring. More importantly, it ignores the additional information on allele frequencies which would reduce overall bias. In this study, we present a novel technique for assessing extra‐pair paternity and conspecific brood parasitism using a likelihood‐based approach in a new version of program cervus . We test the suitability of the technique by applying it to a simulated data set and then present an example to demonstrate its influence on the estimation of alternative reproductive strategies.  相似文献   

6.
Finite mixture of Gaussian distributions provide a flexible semiparametric methodology for density estimation when the continuous variables under investigation have no boundaries. However, in practical applications, variables may be partially bounded (e.g., taking nonnegative values) or completely bounded (e.g., taking values in the unit interval). In this case, the standard Gaussian finite mixture model assigns nonzero densities to any possible values, even to those outside the ranges where the variables are defined, hence resulting in potentially severe bias. In this paper, we propose a transformation‐based approach for Gaussian mixture modeling in case of bounded variables. The basic idea is to carry out density estimation not on the original data but on appropriately transformed data. Then, the density for the original data can be obtained by a change of variables. Both the transformation parameters and the parameters of the Gaussian mixture are jointly estimated by the expectation‐maximization (EM) algorithm. The methodology for partially and completely bounded data is illustrated using both simulated data and real data applications.  相似文献   

7.
Species distribution models (SDMs) are often calibrated using presence‐only datasets plagued with environmental sampling bias, which leads to a decrease of model accuracy. In order to compensate for this bias, it has been suggested that background data (or pseudoabsences) should represent the area that has been sampled. However, spatially‐explicit knowledge of sampling effort is rarely available. In multi‐species studies, sampling effort has been inferred following the target‐group (TG) approach, where aggregated occurrence of TG species informs the selection of background data. However, little is known about the species‐ specific response to this type of bias correction. The present study aims at evaluating the impacts of sampling bias and bias correction on SDM performance. To this end, we designed a realistic system of sampling bias and virtual species based on 92 terrestrial mammal species occurring in the Mediterranean basin. We manipulated presence and background data selection to calibrate four SDM types. Unbiased (unbiased presence data) and biased (biased presence data) SDMs were calibrated using randomly distributed background data. We used real and TG‐estimated sampling efforts in background selection to correct for sampling bias in presence data. Overall, environmental sampling bias had a deleterious effect on SDM performance. In addition, bias correction improved model accuracy, and especially when based on spatially‐explicit knowledge of sampling effort. However, our results highlight important species‐specific variations in susceptibility to sampling bias, which were largely explained by range size: widely‐distributed species were most vulnerable to sampling bias and bias correction was even detrimental for narrow‐ranging species. Furthermore, spatial discrepancies in SDM predictions suggest that bias correction effectively replaces an underestimation bias with an overestimation bias, particularly in areas of low sampling intensity. Thus, our results call for a better estimation of sampling effort in multispecies system, and cautions the uninformed and automatic application of TG bias correction.  相似文献   

8.
When analyzing Poisson count data sometimes a high frequency of extra zeros is observed. The Zero‐Inflated Poisson (ZIP) model is a popular approach to handle zero‐inflation. In this paper we generalize the ZIP model and its regression counterpart to accommodate the extent of individual exposure. Empirical evidence drawn from an occupational injury data set confirms that the incorporation of exposure information can exert a substantial impact on the model fit. Tests for zero‐inflation are also considered. Their finite sample properties are examined in a Monte Carlo study.  相似文献   

9.
The three‐state progressive model is a special multi‐state model with important applications in Survival Analysis. It provides a suitable representation of the individual’s history when an intermediate event (with a possible influence on the survival prognosis) is experienced before the main event of interest. Estimation of transition probabilities in this and other multi‐state models is usually performed through the Aalen–Johansen estimator. However, Aalen–Johansen may be biased when the underlying process is not Markov. In this paper, we provide a new approach for testing Markovianity in the three‐state progressive model. The new method is based on measuring the future‐past association along time. This results in a deep inspection of the process that often reveals a non‐Markovian behaviour with different trends in the association measure. A test of significance for zero future‐past association at each time point is introduced, and a significance trace is proposed accordingly. The finite sample performance of the test is investigated through simulations. We illustrate the new method through real data analysis.  相似文献   

10.
Simulation methods were used to generate paired data from a simulated population that included the age‐based process of movement and the length‐based process of gear selection. The age‐based process caused bias in the estimates of growth parameters assuming random at length, even when relatively few age classes were affected. Methods that assumed random at age were biased by the subsequent inclusion of the length‐based process of gear selection. Additional knowledge of the age structure of the sampled area is needed to ensure an unbiased estimate of the growth parameters when using the length‐conditional approach in the presence of age‐based movement. Estimates of the variability in the length‐at‐age relationship were better estimated with the length‐conditional than the traditional method even when the assumptions of random at length were violated. Inclusion of paired observations of length and associated age inside the population dynamics model may be the most appropriate way of estimating growth.  相似文献   

11.
This paper discusses two‐sample comparison in the case of interval‐censored failure time data. For the problem, one common approach is to employ some nonparametric test procedures, which usually give some p‐values but not a direct or exact quantitative measure of the survival or treatment difference of interest. In particular, these procedures cannot provide a hazard ratio estimate, which is commonly used to measure the difference between the two treatments or samples. For interval‐censored data, a few nonparametric test procedures have been developed, but it does not seem to exist as a procedure for hazard ratio estimation. Corresponding to this, we present two procedures for nonparametric estimation of the hazard ratio of the two samples for interval‐censored data situations. They are generalizations of the corresponding procedures for right‐censored failure time data. An extensive simulation study is conducted to evaluate the performance of the two procedures and indicates that they work reasonably well in practice. For illustration, they are applied to a set of interval‐censored data arising from a breast cancer study.  相似文献   

12.
Species distribution modelling (SDM) has become an essential method in ecology and conservation. In the absence of survey data, the majority of SDMs are calibrated with opportunistic presence‐only data, incurring substantial sampling bias. We address the challenge of correcting for sampling bias in the data‐sparse situations. We modelled the relative intensity of bat records in their entire range using three modelling algorithms under the point‐process modelling framework (GLMs with subset selection, GLMs fitted with an elastic‐net penalty, and Maxent). To correct for sampling bias, we applied model‐based bias correction by incorporating spatial information on site accessibility or sampling efforts. We evaluated the effect of bias correction on the models’ predictive performance (AUC and TSS), calculated on spatial‐block cross‐validation and a holdout data set. When evaluated with independent, but also sampling‐biased test data, correction for sampling bias led to improved predictions. The predictive performance of the three modelling algorithms was very similar. Elastic‐net models have intermediate performance, with slight advantage for GLMs on cross‐validation and Maxent on hold‐out evaluation. Model‐based bias correction is very useful in data‐sparse situations, where detailed data are not available to apply other bias correction methods. However, bias correction success depends on how well the selected bias variables describe the sources of bias. In this study, accessibility covariates described bias in our data better than the effort covariate, and their use led to larger changes in predictive performance. Objectively evaluating bias correction requires bias‐free presence–absence test data, and without them the real improvement for describing a species’ environmental niche cannot be assessed.  相似文献   

13.
Summary Occupational, environmental, and nutritional epidemiologists are often interested in estimating the prospective effect of time‐varying exposure variables such as cumulative exposure or cumulative updated average exposure, in relation to chronic disease endpoints such as cancer incidence and mortality. From exposure validation studies, it is apparent that many of the variables of interest are measured with moderate to substantial error. Although the ordinary regression calibration (ORC) approach is approximately valid and efficient for measurement error correction of relative risk estimates from the Cox model with time‐independent point exposures when the disease is rare, it is not adaptable for use with time‐varying exposures. By recalibrating the measurement error model within each risk set, a risk set regression calibration (RRC) method is proposed for this setting. An algorithm for a bias‐corrected point estimate of the relative risk using an RRC approach is presented, followed by the derivation of an estimate of its variance, resulting in a sandwich estimator. Emphasis is on methods applicable to the main study/external validation study design, which arises in important applications. Simulation studies under several assumptions about the error model were carried out, which demonstrated the validity and efficiency of the method in finite samples. The method was applied to a study of diet and cancer from Harvard's Health Professionals Follow‐up Study (HPFS).  相似文献   

14.
There is a great deal of recent interests in modeling right‐censored clustered survival time data with a possible fraction of cured subjects who are nonsusceptible to the event of interest using marginal mixture cure models. In this paper, we consider a semiparametric marginal mixture cure model for such data and propose to extend an existing generalized estimating equation approach by a new unbiased estimating equation for the regression parameters in the latency part of the model. The large sample properties of the regression effect estimators in both incidence and the latency parts are established. The finite sample properties of the estimators are studied in simulation studies. The proposed method is illustrated with a bone marrow transplantation data and a tonsil cancer data.  相似文献   

15.
Summary Gilbert, Rossini, and Shankarappa (2005 , Biometrics 61 , 106‐117) present four U‐statistic based tests to compare genetic diversity between different samples. The proposed tests improved upon previously used methods by accounting for the correlations in the data. We find, however, that the same correlations introduce an unacceptable bias in the sample estimators used for the variance and covariance of the inter‐sequence genetic distances for modest sample sizes. Here, we compute unbiased estimators for these and test the resulting improvement using simulated data. We also show that, contrary to the claims in Gilbert et al., it is not always possible to apply the Welch–Satterthwaite approximate t‐test, and we provide explicit formulas for the degrees of freedom to be used when, on the other hand, such approximation is indeed possible.  相似文献   

16.
Many methods for fitting demographic models to data sets of aligned sequences rely upon an assumption that the data have a branching coalescent history without recombination within regions or loci. To mitigate the effects of the failure of this assumption, a common approach is to filter data and sample regions that pass the four‐gamete criterion for recombination, an approach that allows data to run, but that is expected to detect only a minority of recombination events. A series of empirical tests of this approach were conducted using computer simulations with and without recombination for a variety of isolation‐with‐migration (IM) model for two and three populations. Only the IMa3 program was used, but the general results should apply to related genealogy‐sampling‐based methods for IM models or subsets of IM models. It was found that the details of sampling intervals that pass a four‐gamete filter have a moderate effect, and that schemes that use the longest intervals, or that use overlapping intervals, gave poorer results. A simple approach of using a random nonoverlapping interval returned the smallest difference between results with and without recombination, with the mean difference between parameter estimates usually less than 20% of the true value (usually much less). However, the posterior probability distributions for migration rates were flatter with recombination, suggesting that filtering based on the four‐gamete criterion, while necessary for methods like these, leads to reduced resolution on migration. A distinct, alternative approach, of using a finite sites mutation model and not filtering the data, performed quite poorly.  相似文献   

17.
Unit nonresponse is often a problem in sample surveys. It arises when the values of the survey variable cannot be recorded for some sampled units. In this paper, the use of nonresponse calibration weighting to treat nonresponse is considered in a complete design‐based framework. Nonresponse is viewed as a fixed characteristic of the units. The approach is suitable in environmental and forest surveys when sampled sites cannot be reached by field crews. Approximate expressions of design‐based bias and variance of the calibration estimator are derived and design‐based consistency is investigated. Choice of auxiliary variables to perform calibration is discussed. Sen–Yates–Grundy, Horvitz–Thompson, and jackknife estimators of the sampling variance are proposed. Analytical and Monte Carlo results demonstrate the validity of the procedure when the relationship between survey and auxiliary variables is similar in respondent and nonrespondent strata. An application to a forest survey performed in Northeastern Italy is considered.  相似文献   

18.
19.
We compared age and sex ratios among Eurasian Wigeon Anas penelope derived from Danish field observations and hunter‐based shot samples throughout an entire winter. Sex ratios did not differ significantly between the two samples. Overall, first‐year males were more than three times more likely to be represented than adult males in the hunter sample compared with field samples and were 7–20 times overrepresented in the hunting sample at the beginning of the season. These results confirm the need to account for such bias and its temporal variation when using the results of hunting surveys to model population parameters. Hunter‐shot age ratios may provide a long‐term measure of reproductive success of dabbling duck flyway populations given an understanding of such bias.  相似文献   

20.
Immobilized trypsin (IM) has been recognized as an alternative to free trypsin (FT) for accelerating protein digestion 30 years ago. However, some questions of IM still need to be answered. How does the solid matrix of IM influence its preference for protein cleavage and how well can IM perform for deep bottom‐up proteomics compared to FT? By analyzing Escherichia coli proteome samples digested with amine or carboxyl functionalized magnetic bead–based IM (IM‐N or IM‐C) or FT, it is observed that IM‐N with the nearly neutral solid matrix, IM‐C with the negatively charged solid matrix, and FT have similar cleavage preference considering the microenvironment surrounding the cleavage sites. IM‐N (15 min) and FT (12 h) both approach 9000 protein identifications (IDs) from a mouse brain proteome. Compared to FT, IM‐N has no bias in the digestion of proteins that are involved in various biological processes, are located in different components of cells, have diverse functions, and are expressed in varying abundance. A high‐throughput bottom‐up proteomics workflow comprising IM‐N‐based rapid protein cleavage and fast CZE‐MS/MS enables the completion of protein sample preparation, CZE‐MS/MS analysis, and data analysis in only 3 h, resulting in 1000 protein IDs from the mouse brain proteome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号