首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Bootstrap is a time-honoured distribution-free approach for attaching standard error to any statistic of interest, but has not received much attention for data with missing values especially when using imputation techniques to replace missing values. We propose a proportional bootstrap method that allows effective use of imputation techniques for all bootstrap samples. Five detcnninistic imputation techniques are examined and particular emphasis is placed on the estimation of standard error for correlation coefficient. Some real data examples are presented. Other possible applications of the proposed bootstrap method are discussed.  相似文献   

2.
It is a common occurrence in plant breeding programs to observe missing values in three-way three-mode multi-environment trial (MET) data. We proposed modifications of models for estimating missing observations for these data arrays, and developed a novel approach in terms of hierarchical clustering. Multiple imputation (MI) was used in four ways, multiple agglomerative hierarchical clustering, normal distribution model, normal regression model, and predictive mean match. The later three models used both Bayesian analysis and non-Bayesian analysis, while the first approach used a clustering procedure with randomly selected attributes and assigned real values from the nearest neighbour to the one with missing observations. Different proportions of data entries in six complete datasets were randomly selected to be missing and the MI methods were compared based on the efficiency and accuracy of estimating those values. The results indicated that the models using Bayesian analysis had slightly higher accuracy of estimation performance than those using non-Bayesian analysis but they were more time-consuming. However, the novel approach of multiple agglomerative hierarchical clustering demonstrated the overall best performances.  相似文献   

3.
基于质谱数据的蛋白质定量分析一直是目前高通量蛋白质组学的重要研究手段.但是基于现有质谱技术的限制,大规模蛋白质定量过程中往往会产生大量的缺失值,这在一定程度上影响了下游分析的准确性.尽管很多缺失值填补方法被不断提出,但是蛋白质组学领域对于不同情况下缺失值填补方法效力的综合评估仍然缺乏.本研究基于真实数据的分布特征,构建...  相似文献   

4.
We consider genomic imputation for low-coverage genotyping-by-sequencing data with high levels of missing data. We compensate for this loss of information by utilizing family relationships in multiparental experimental crosses. This nearly quadruples the number of usable markers when applied to a large rice Multiparent Advanced Generation InterCross (MAGIC) study.  相似文献   

5.
In longitudinal randomised trials and observational studies within a medical context, a composite outcome—which is a function of several individual patient-specific outcomes—may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.  相似文献   

6.

Background

Imputation techniques used to handle missing data are based on the principle of replacement. It is widely advocated that multiple imputation is superior to other imputation methods, however studies have suggested that simple methods for filling missing data can be just as accurate as complex methods. The objective of this study was to implement a number of simple and more complex imputation methods, and assess the effect of these techniques on the performance of undiagnosed diabetes risk prediction models during external validation.

Methods

Data from the Cape Town Bellville-South cohort served as the basis for this study. Imputation methods and models were identified via recent systematic reviews. Models’ discrimination was assessed and compared using C-statistic and non-parametric methods, before and after recalibration through simple intercept adjustment.

Results

The study sample consisted of 1256 individuals, of whom 173 were excluded due to previously diagnosed diabetes. Of the final 1083 individuals, 329 (30.4%) had missing data. Family history had the highest proportion of missing data (25%). Imputation of the outcome, undiagnosed diabetes, was highest in stochastic regression imputation (163 individuals). Overall, deletion resulted in the lowest model performances while simple imputation yielded the highest C-statistic for the Cambridge Diabetes Risk model, Kuwaiti Risk model, Omani Diabetes Risk model and Rotterdam Predictive model. Multiple imputation only yielded the highest C-statistic for the Rotterdam Predictive model, which were matched by simpler imputation methods.

Conclusions

Deletion was confirmed as a poor technique for handling missing data. However, despite the emphasized disadvantages of simpler imputation methods, this study showed that implementing these methods results in similar predictive utility for undiagnosed diabetes when compared to multiple imputation.  相似文献   

7.
Kidney and cardiovascular disease are widespread among populations with high prevalence of diabetes, such as American Indians participating in the Strong Heart Study (SHS). Studying these conditions simultaneously in longitudinal studies is challenging, because the morbidity and mortality associated with these diseases result in missing data, and these data are likely not missing at random. When such data are merely excluded, study findings may be compromised. In this article, a subset of 2264 participants with complete renal function data from Strong Heart Exams 1 (1989–1991), 2 (1993–1995), and 3 (1998–1999) was used to examine the performance of five methods used to impute missing data: listwise deletion, mean of serial measures, adjacent value, multiple imputation, and pattern-mixture. Three missing at random models and one non-missing at random model were used to compare the performance of the imputation techniques on randomly and non-randomly missing data. The pattern-mixture method was found to perform best for imputing renal function data that were not missing at random. Determining whether data are missing at random or not can help in choosing the imputation method that will provide the most accurate results.  相似文献   

8.
Summary Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated‐measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques ( Rubin, 1987 , in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within‐ and between‐imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply ( Kenward and Roger, 1997 , Biometrics 53, 983–997) degrees of freedom to account for the uncertainty associated with variance–covariance parameter estimates for the repeated measures model.  相似文献   

9.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration.  相似文献   

10.
Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIRIMP, a method for imputation of KIR copy number. We show that KIRIMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease.  相似文献   

11.

Background

In recent years, capabilities for genotyping large sets of single nucleotide polymorphisms (SNPs) has increased considerably with the ability to genotype over 1 million SNP markers across the genome. This advancement in technology has led to an increase in the number of genome-wide association studies (GWAS) for various complex traits. These GWAS have resulted in the implication of over 1500 SNPs associated with disease traits. However, the SNPs identified from these GWAS are not necessarily the functional variants. Therefore, the next phase in GWAS will involve the refining of these putative loci.

Methodology

A next step for GWAS would be to catalog all variants, especially rarer variants, within the detected loci, followed by the association analysis of the detected variants with the disease trait. However, sequencing a locus in a large number of subjects is still relatively expensive. A more cost effective approach would be to sequence a portion of the individuals, followed by the application of genotype imputation methods for imputing markers in the remaining individuals. A potentially attractive alternative option would be to impute based on the 1000 Genomes Project; however, this has the drawbacks of using a reference population that does not necessarily match the disease status and LD pattern of the study population. We explored a variety of approaches for carrying out the imputation using a reference panel consisting of sequence data for a fraction of the study participants using data from both a candidate gene sequencing study and the 1000 Genomes Project.

Conclusions

Imputation of genetic variation based on a proportion of sequenced samples is feasible. Our results indicate the following sequencing study design guidelines which take advantage of the recent advances in genotype imputation methodology: Select the largest and most diverse reference panel for sequencing and genotype as many “anchor” markers as possible.  相似文献   

12.
A Comparison of Pressure-Volume Curve Data Analysis Techniques   总被引:9,自引:0,他引:9  
Schulte, P. J. and Hinckley, T. M. 1985. A comparison of pressure-volumecurve data analysis techniques.—J. exp. Bot. 36: 1590–1602. Computer assisted analysis of data derived with the pressure-volumetechnique is currently feasible. In this study, various computeralgorithms were used to analyse a variety of pressure-volumecurve data sets. Comparisons were made with respect to estimatesof osmotic potential, turgor loss point, symplastic fraction,and bulk modulus of elasticity. While osmotic potential estimationwas fairly insensitive to the model used, estimates of the bulkmodulus of elasticity appear to be highly dependent on the modelused for analysis of the data. Key words: Pressure-volume, computer analysis, elasticity  相似文献   

13.
We propose a method to estimate the regression coefficients in a competing risks model where the cause-specific hazard for the cause of interest is related to covariates through a proportional hazards relationship and when cause of failure is missing for some individuals. We use multiple imputation procedures to impute missing cause of failure, where the probability that a missing cause is the cause of interest may depend on auxiliary covariates, and combine the maximum partial likelihood estimators computed from several imputed data sets into an estimator that is consistent and asymptotically normal. A consistent estimator for the asymptotic variance is also derived. Simulation results suggest the relevance of the theory in finite samples. Results are also illustrated with data from a breast cancer study.  相似文献   

14.

Background

Meta-analyses are considered the gold standard of evidence-based health care, and are used to guide clinical decisions and health policy. A major limitation of current meta-analysis techniques is their inability to pool ordinal data. Our objectives were to determine the extent of this problem in the context of neurological rating scales and to provide a solution.

Methods

Using an existing database of clinical trials of oral neuroprotective therapies, we identified the 6 most commonly used clinical rating scales and recorded how data from these scales were reported and analysed. We then identified systematic reviews of studies that used these scales (via the Cochrane database) and recorded the meta-analytic techniques used. Finally, we identified a statistical technique for calculating a common language effect size measure for ordinal data.

Results

We identified 103 studies, with 128 instances of the 6 clinical scales being reported. The majority– 80%–reported means alone for central tendency, with only 13% reporting medians. In analysis, 40% of studies used parametric statistics alone, 34% of studies employed non-parametric analysis, and 26% did not include or specify analysis. Of the 60 systematic reviews identified that included meta-analysis, 88% used mean difference and 22% employed difference in proportions; none included rank-based analysis. We propose the use of a rank-based generalised odds ratio (WMW GenOR) as an assumption-free effect size measure that is easy to compute and can be readily combined in meta-analysis.

Conclusion

There is wide scope for improvement in the reporting and analysis of ordinal data in the literature. We hope that adoption of the WMW GenOR will have the dual effect of improving the reporting of data in individual studies while also increasing the inclusivity (and therefore validity) of meta-analyses.  相似文献   

15.
天然氨基酸物质的不同制造技术的比较研究   总被引:1,自引:0,他引:1  
依据 1 7~ 1 8种天然氨基酸物质的三种制造技术的比较研究 ,无论是基于生物资源还是基于其产品销售和环境保护方面 ,都证明了蛋白质水解制造法优于发酵法和合成法  相似文献   

16.
Summary : Often clinical studies periodically record information on disease progression as well as results from laboratory studies that are believed to reflect the progressing stages of the disease. A primary aim of such a study is to determine the relationship between the lab measurements and a disease progression. If there were no missing or censored data, these analyses would be straightforward. However, often patients miss visits, and return after their disease has progressed. In this case, not only is their progression time interval censored, but their lab test series is also incomplete. In this article, we propose a simple test for the association between a longitudinal marker and an event time from incomplete data. We derive the test using a very intuitive technique of calculating the expected complete data score conditional on the observed incomplete data (conditional expected score test, CEST). The problem was motivated by data from an observational study of patients with diabetes.  相似文献   

17.
18.
Drug combinations are highly efficient in systemic treatment of complex multigene diseases such as cancer, diabetes, arthritis and hypertension. Most currently used combinations were found in empirical ways, which limits the speed of discovery for new and more effective combinations. Therefore, there is a substantial need for efficient and fast computational methods. Here, we present a principle that is based on the assumption that perturbations generated by multiple pharmaceutical agents propagate through an interaction network and can cause unexpected amplification at targets not immediately affected by the original drugs. In order to capture this phenomenon, we introduce a novel Target Overlap Score (TOS) that is defined for two pharmaceutical agents as the number of jointly perturbed targets divided by the number of all targets potentially affected by the two agents. We show that this measure is correlated with the known effects of beneficial and deleterious drug combinations taken from the DCDB, TTD and Drugs.com databases. We demonstrate the utility of TOS by correlating the score to the outcome of recent clinical trials evaluating trastuzumab, an effective anticancer agent utilized in combination with anthracycline- and taxane- based systemic chemotherapy in HER2-receptor (erb-b2 receptor tyrosine kinase 2) positive breast cancer.  相似文献   

19.

Background

There are a number of evidence-based, in-person clinical inteventions for problem drinkers, but most problem drinkers will never seek such treatments. Reaching the population of non-treatment seeking problem drinkers will require a different approach. Accordingly, this randomized clinical trial evaluated an intervention that has been validated in clinical settings and then modified into an ultra-brief format suitable for use as an indicated public health intervention (i.e., targeting the population of non-treatment seeking problem drinkers).

Methodology/Principal Findings

Problem drinkers (N = 1767) completed a baseline population telephone survey and then were randomized to one of three conditions – a personalized feedback pamphlet condition, a control pamphlet condition, or a no intervention control condition. In the week after the baseline survey, households in the two pamphlet conditions were sent their respective interventions by postal mail addressed to ‘Check Your Drinking.’ Changes in drinking were assessed post intervention at three-month and six-month follow-ups. The follow-up rate was 86% at three-months and 76% at six-months. There was a small effect (p = .04) in one of three outcome variables (reduction in AUDIT-C, a composite measure of quantity and frequency of drinking) observed for the personalized feedback pamphlet compared to the no intervention control. No significant differences (p>.05) between groups were observed for the other two outcome variables – number of drinks consumed in the past seven days and highest number of drinks on one occasion.

Conclusions/Significance

Based on the results of this study, we tentatively conclude that a brief intervention, modified to an ultra-brief, public health format can have a meaningful impact.

Trial Registration

ClinicalTrials.gov NCT00688584.  相似文献   

20.

Background

Clinical trial results registries may contain relevant unpublished information. Our main aim was to investigate the potential impact of the inclusion of reports from industry results registries on systematic reviews (SRs).

Methods

We identified a sample of 150 eligible SRs in PubMed via backward selection. Eligible SRs investigated randomized controlled trials of drugs and included at least 2 bibliographic databases (original search date: 11/2009). We checked whether results registries of manufacturers and/or industry associations had also been searched. If not, we searched these registries for additional trials not considered in the SRs, as well as for additional data on trials already considered. We reanalysed the primary outcome and harm outcomes reported in the SRs and determined whether results had changed. A “change” was defined as either a new relevant result or a change in the statistical significance of an existing result. We performed a search update in 8/2013 and identified a sample of 20 eligible SRs to determine whether mandatory results registration from 9/2008 onwards in the public trial and results registry ClinicalTrials.gov had led to its inclusion as a standard information source in SRs, and whether the inclusion rate of industry results registries had changed.

Results

133 of the 150 SRs (89%) in the original analysis did not search industry results registries. For 23 (17%) of these SRs we found 25 additional trials and additional data on 31 trials already included in the SRs. This additional information was found for more than twice as many SRs of drugs approved from 2000 as approved beforehand. The inclusion of the additional trials and data yielded changes in existing results or the addition of new results for 6 of the 23 SRs. Of the 20 SRs retrieved in the search update, 8 considered ClinicalTrials.gov or a meta-registry linking to ClinicalTrials.gov, and 1 considered an industry results registry.

Conclusion

The inclusion of industry and public results registries as an information source in SRs is still insufficient and may result in publication and outcome reporting bias. In addition to an essential search in ClinicalTrials.gov, authors of SRs should consider searching industry results registries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号