首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Andrea  Rotnitzky 《Biometrics》2009,65(1):326-328
Semiparametric Theory and Missing Data (A. A. Tsiatis) Andrea Rotnitzky Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis (M. J. Daniels and J. W. Hogan) Daniel F. Heitjan Bayesian Biostatistics and Diagnostic Medicine (L. D. Broemeling) Paul Gustafson Statistics in the Pharmaceutical Industry, 3rd edition (C. R. Buncher and J.‐Y. Tsay, Editors) Ralph B. D'Agostino Jr. Introduction to Machine Learning and Bioinformatics (S. Mitra, S. Datta, T. Perkins, and G. Michailidis) Yulan Liang The Statistics of Gene Mapping (D. Siegmund and B. Yakir) Hongyu Zhao DNA Methylation Microarrays: Experimental Design and Statistical Analysis (S.‐C. Wang and A. Petronis) Kimberly D. Siegmund Multiple Testing Procedures with Applications to Genomics (S. Dudoit and M. J. van der Laan) Ruth Heller The Statistical Analysis of Functional MRI Data (N. A. Lazar) Wesley K. Thompson Simulation and Inference for Stochastic Differential Equations with R Examples (S. M. Iacus) Dave Campbell Nonparametric Analysis of Univariate Heavy‐Tailed Data: Research and Practice (N. Markovich) M. Ivette Gomes Time Series Analysis with Applications in R, 2nd edition (J. D. Cryer and K.‐S. Chan) Timothy D. Johnson Brief Reports by the Editor Analysis of Variance and Covariance: How to Choose and Construct Models for the Life Sciences (C. P. Doncaster and A. J. H. Davey) Computational Statistics Handbook with MATLAB ®, 2nd edition (W. L. Martinez and A. R. Martinez) Models for Probability and Statistical Inference: Theory and Applications (J. H. Stapleton) Medical Biostatistics, 2nd edition (A. Indrayan) Computational Methods in Biomedical Research (R. Khattree and D. N. Naik, Editors)  相似文献   

2.
In many practical applications we deal with a problem of estimation of a density function of a vector x some components of which are discrete, while the remaining ones are continuous. Among many models that can be used in this case the most useful are the location model and the kernel model. The problem arises when the observed data contain missing values i.e. on some individuals some of the variables have not been observed with no particular pattern of missingness. An application of the EM algorithm will allow us to estimate the parameters of the location model from incomplete data. The method is described in Section 2. In Section 3 some suggestions how to deal with incompleteness when the kernel model is used are made. Finally, Section 4 contains an example.  相似文献   

3.

Objectives

Participants with complete accelerometer data often represent a low proportion of the total sample and, in some cases, may be distinguishable from participants with incomplete data. Because traditional reliability methods characterize the consistency of complete data, little is known about reliability properties for an entire sample. This study employed Generalizability theory to report an index of reliability characterizing complete (7 days) and observable (1 to 7 days) accelerometer data.

Design

Cross-sectional.

Methods

Accelerometer data from the Study of Early Child Care and Youth Development were analyzed in this study. Missing value analyses were conducted to describe the pattern and mechanism of missing data. Generalizability coefficients were derived from variance components to report reliability parameters for complete data and also for the entire observable sample. Analyses were conducted separately by age (9, 11, 12, and 15 yrs) and daily wear time criteria (6, 8, 10, and 12 hrs).

Results

Participants with complete data were limited (<34%) and, most often, data were not considered to be missing completely at random. Across conditions, reliability coefficients for complete data were between 0.74 and 0.87. Relatively lower reliability properties were found across all observable data, ranging from 0.52 to 0.67. Sample variability increased with longer wear time criteria, but decreased with advanced age.

Conclusions

A reliability coefficient that includes all participants, not just those with complete data, provides a global perspective of reliability that could be used to further understand group level associations between activity and health outcomes.  相似文献   

4.
Most statistical solutions to the problem of statistical inferencewith missing data involve integration or expectation. This canbe done in many ways: directly or indirectly, analytically ornumerically, deterministically or stochastically. Missing-dataproblems can be formulated in terms of latent random variables,so that hierarchical likelihood methods of Lee & Nelder(1996) can be applied to missing-value problems to provide onesolution to the problem of integration of the likelihood. Theresulting methods effectively use a Laplace approximation tothe marginal likelihood with an additional adjustment to themeasures of precision to accommodate the estimation of the fixedeffects parameters. We first consider missing at random caseswhere problems are simpler to handle because the integrationdoes not need to involve the missing-value mechanism and thenconsider missing not at random cases. We also study tobit regressionand refit the missing not at random selection model to the antidepressanttrial data analyzed in Diggle & Kenward (1994).  相似文献   

5.
Analyses of viral genetic linkage can provide insight into HIV transmission dynamics and the impact of prevention interventions. For example, such analyses have the potential to determine whether recently-infected individuals have acquired viruses circulating within or outside a given community. In addition, they have the potential to identify characteristics of chronically infected individuals that make their viruses likely to cluster with others circulating within a community. Such clustering can be related to the potential of such individuals to contribute to the spread of the virus, either directly through transmission to their partners or indirectly through further spread of HIV from those partners. Assessment of the extent to which individual (incident or prevalent) viruses are clustered within a community will be biased if only a subset of subjects are observed, especially if that subset is not representative of the entire HIV infected population. To address this concern, we develop a multiple imputation framework in which missing sequences are imputed based on a model for the diversification of viral genomes. The imputation method decreases the bias in clustering that arises from informative missingness. Data from a household survey conducted in a village in Botswana are used to illustrate these methods. We demonstrate that the multiple imputation approach reduces bias in the overall proportion of clustering due to the presence of missing observations.  相似文献   

6.
Functional diversity (FD) is an important component of biodiversity that quantifies the difference in functional traits between organisms. However, FD studies are often limited by the availability of trait data and FD indices are sensitive to data gaps. The distribution of species abundance and trait data, and its transformation, may further affect the accuracy of indices when data is incomplete. Using an existing approach, we simulated the effects of missing trait data by gradually removing data from a plant, an ant and a bird community dataset (12, 59, and 8 plots containing 62, 297 and 238 species respectively). We ranked plots by FD values calculated from full datasets and then from our increasingly incomplete datasets and compared the ranking between the original and virtually reduced datasets to assess the accuracy of FD indices when used on datasets with increasingly missing data. Finally, we tested the accuracy of FD indices with and without data transformation, and the effect of missing trait data per plot or per the whole pool of species. FD indices became less accurate as the amount of missing data increased, with the loss of accuracy depending on the index. But, where transformation improved the normality of the trait data, FD values from incomplete datasets were more accurate than before transformation. The distribution of data and its transformation are therefore as important as data completeness and can even mitigate the effect of missing data. Since the effect of missing trait values pool-wise or plot-wise depends on the data distribution, the method should be decided case by case. Data distribution and data transformation should be given more careful consideration when designing, analysing and interpreting FD studies, especially where trait data are missing. To this end, we provide the R package “traitor” to facilitate assessments of missing trait data.  相似文献   

7.
Noncancer risk assessments are generally forced to rely on animal bioassay data to estimate a Tolerable Daily Intake or Reference Dose, as a proxy for the threshold of human response. In cases where animal bioassays are missing from a complete data base, the critical NOAEL (no-observed-adverse-effect level) needs to be adjusted to account for the impact of the missing bioassay(s). This paper presents two approaches for making such adjustments. One is based on regression analysis and seeks to provide a point estimate of the adjustment needed. The other relies on non-parametric analysis and is intended to provide a distributional estimate of the needed adjustment. The adjustment needed is dependent on the definition of a complete data base, the number of bioassays missing, the specific bioassays which are missing, and the method used for interspecies scaling. The results from either approach can be used in conjunction with current practices for computing the TDI or RfD, or as an element of distributional approaches for estimating the human population threshold.  相似文献   

8.
A statistic, derived from the combination of two dependent tests, is proposed for testing the hypothesis of equality of the means of a bivariate normal distribution with unknown common variance and correlation coefficient when observations are missing on one or both variates. The null distribution of the statistic is approximated by a well-known distribution. The empirical powers of the statistic are computed and compared with some of the known statistics. The comparisons support the use of the proposed test.  相似文献   

9.
10.
11.
12.
In longitudinal randomised trials and observational studies within a medical context, a composite outcome—which is a function of several individual patient-specific outcomes—may be felt to best represent the outcome of interest. As in other contexts, missing data on patient outcome, due to patient drop-out or for other reasons, may pose a problem. Multiple imputation is a widely used method for handling missing data, but its use for composite outcomes has been seldom discussed. Whilst standard multiple imputation methodology can be used directly for the composite outcome, the distribution of a composite outcome may be of a complicated form and perhaps not amenable to statistical modelling. We compare direct multiple imputation of a composite outcome with separate imputation of the components of a composite outcome. We consider two imputation approaches. One approach involves modelling each component of a composite outcome using standard likelihood-based models. The other approach is to use linear increments methods. A linear increments approach can provide an appealing alternative as assumptions concerning both the missingness structure within the data and the imputation models are different from the standard likelihood-based approach. We compare both approaches using simulation studies and data from a randomised trial on early rheumatoid arthritis patients. Results suggest that both approaches are comparable and that for each, separate imputation offers some improvement on the direct imputation of a composite outcome.  相似文献   

13.
14.
Ying Yuan  Guosheng Yin 《Biometrics》2010,66(1):105-114
Summary .  We study quantile regression (QR) for longitudinal measurements with nonignorable intermittent missing data and dropout. Compared to conventional mean regression, quantile regression can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. We account for the within-subject correlation by introducing a   ℓ2   penalty in the usual QR check function to shrink the subject-specific intercepts and slopes toward the common population values. The informative missing data are assumed to be related to the longitudinal outcome process through the shared latent random effects. We assess the performance of the proposed method using simulation studies, and illustrate it with data from a pediatric AIDS clinical trial.  相似文献   

15.

Background

Meta-analyses are considered the gold standard of evidence-based health care, and are used to guide clinical decisions and health policy. A major limitation of current meta-analysis techniques is their inability to pool ordinal data. Our objectives were to determine the extent of this problem in the context of neurological rating scales and to provide a solution.

Methods

Using an existing database of clinical trials of oral neuroprotective therapies, we identified the 6 most commonly used clinical rating scales and recorded how data from these scales were reported and analysed. We then identified systematic reviews of studies that used these scales (via the Cochrane database) and recorded the meta-analytic techniques used. Finally, we identified a statistical technique for calculating a common language effect size measure for ordinal data.

Results

We identified 103 studies, with 128 instances of the 6 clinical scales being reported. The majority– 80%–reported means alone for central tendency, with only 13% reporting medians. In analysis, 40% of studies used parametric statistics alone, 34% of studies employed non-parametric analysis, and 26% did not include or specify analysis. Of the 60 systematic reviews identified that included meta-analysis, 88% used mean difference and 22% employed difference in proportions; none included rank-based analysis. We propose the use of a rank-based generalised odds ratio (WMW GenOR) as an assumption-free effect size measure that is easy to compute and can be readily combined in meta-analysis.

Conclusion

There is wide scope for improvement in the reporting and analysis of ordinal data in the literature. We hope that adoption of the WMW GenOR will have the dual effect of improving the reporting of data in individual studies while also increasing the inclusivity (and therefore validity) of meta-analyses.  相似文献   

16.
In a recent analysis of the historical biogeography of Melastomataceae, Renner, Clausing, and Meyer (2001; American Journal of Botany 88(7): 1290-1300) rejected the hypothesis of a Gondwana origin. Using a fossil-calibrated chloroplast DNA (ndhF) phylogeny, they placed the early diversification of Melastomataceae in Laurasia at the Paleocene/Eocene boundary (ca. 55 Ma) and suggested that long-distance oceanic dispersals in the Oligocene and Miocene (34 to 5 Ma) account for its range expansion into South America, Africa, and Madagascar. Their critical assumption-that oldest northern mid-latitude melastome fossils reflect tribal ages and their geographic origins-may be erroneous, however, because of the sparse fossil record in the tropics. We show that rates of synonymous nucleotide substitutions derived by the Renner et al. (2001) model are up to three times faster than most published rates. Under a Gondwana-origin model advocated here, which includes dispersals from Africa to Southeast Asia via the "Indian ark" and emphasizes filter rather than either sweepstakes dispersal or strict vicariance, rates of nucleotide substitution fall within the range of published rates. We suggest that biogeographic reconstructions need to consider the paucity of Gondwanan fossils and that frequently overlooked interplate dispersal routes provide alternatives to vicariance, boreotropical dispersal, and long-distance oceanic dispersal as explanations for the amphi-oceanic disjunctions of many tropical rain forest plants.  相似文献   

17.
18.

Background

Genome level analyses have enhanced our view of phylogenetics in many areas of the tree of life. With the production of whole genome DNA sequences of hundreds of organisms and large-scale EST databases a large number of candidate genes for inclusion into phylogenetic analysis have become available. In this work, we exploit the burgeoning genomic data being generated for plant genomes to address one of the more important plant phylogenetic questions concerning the hierarchical relationships of the several major seed plant lineages (angiosperms, Cycadales, Gingkoales, Gnetales, and Coniferales), which continues to be a work in progress, despite numerous studies using single, few or several genes and morphology datasets. Although most recent studies support the notion that gymnosperms and angiosperms are monophyletic and sister groups, they differ on the topological arrangements within each major group.

Methodology

We exploited the EST database to construct a supermatrix of DNA sequences (over 1,200 concatenated orthologous gene partitions for 17 taxa) to examine non-flowering seed plant relationships. This analysis employed programs that offer rapid and robust orthology determination of novel, short sequences from plant ESTs based on reference seed plant genomes. Our phylogenetic analysis retrieved an unbiased (with respect to gene choice), well-resolved and highly supported phylogenetic hypothesis that was robust to various outgroup combinations.

Conclusions

We evaluated character support and the relative contribution of numerous variables (e.g. gene number, missing data, partitioning schemes, taxon sampling and outgroup choice) on tree topology, stability and support metrics. Our results indicate that while missing characters and order of addition of genes to an analysis do not influence branch support, inadequate taxon sampling and limited choice of outgroup(s) can lead to spurious inference of phylogeny when dealing with phylogenomic scale data sets. As expected, support and resolution increases significantly as more informative characters are added, until reaching a threshold, beyond which support metrics stabilize, and the effect of adding conflicting characters is minimized.  相似文献   

19.
蛋白质三维结构叠加面临的主要问题是,参与叠加的目标蛋白质的氨基酸残基存在某些缺失,但是多结构叠加方法却大多数需要完整的氨基酸序列,而目前通用的方法是直接删去缺失的氨基酸序列,导致叠加结果不准确。由于同源蛋白质间结构的相似性,因此,一个蛋白质结构中缺失的某个区域,可能存在于另一个同源蛋白质结构中。基于此,本文提出一种新的、简单、有效的缺失数据下的蛋白质结构叠加方法(ITEMDM)。该方法采用缺失数据的迭代思想计算蛋白质的结构叠加,采用优化的最小二乘算法结合矩阵SVD分解方法,求旋转矩阵和平移向量。用该方法成功叠加了细胞色素C家族的蛋白质和标准Fischer’s 数据库的蛋白质(67对蛋白质),并且与其他方法进行了比较。数值实验表明,本算法有如下优点:①与THESEUS算法相比较,运行时间快,迭代次数少;②与PSSM算法相比较,结果准确,运算时间少。结果表明,该方法可以更好地叠加缺失数据的蛋白质三维结构。  相似文献   

20.
Missing information in motion capture data caused by occlusion or detachment of markers is a common problem that is difficult to avoid entirely. The aim of this study was to develop and test an algorithm for reconstruction of corrupted marker trajectories in datasets representing human gait. The reconstruction was facilitated using information of marker inter-correlations obtained from a principal component analysis, combined with a novel weighting procedure. The method was completely data-driven, and did not require any training data. We tested the algorithm on datasets with movement patterns that can be considered both well suited (healthy subject walking on a treadmill) and less suited (transitioning from walking to running and the gait of a subject with cerebral palsy) to reconstruct. Specifically, we created 50 copies of each dataset, and corrupted them with gaps in multiple markers at random temporal and spatial positions. Reconstruction errors, quantified by the average Euclidian distance between predicted and measured marker positions, was ≤ 3 mm for the well suited dataset, even when there were gaps in up to 70% of all time frames. For the less suited datasets, median reconstruction errors were in the range 5–6 mm. However, a few reconstructions had substantially larger errors (up to 29 mm). Our results suggest that the proposed algorithm is a viable alternative both to conventional gap-filling algorithms and state-of-the-art reconstruction algorithms developed for motion capture systems. The strengths of the proposed algorithm are that it can fill gaps anywhere in the dataset, and that the gaps can be considerably longer than when using conventional interpolation techniques. Limitations are that it does not enforce musculoskeletal constraints, and that the reconstruction accuracy declines if applied to datasets with less predictable movement patterns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号