首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Estimation of false discovery proportion under general dependence   总被引:1,自引:0,他引:1  
MOTIVATION: Wide-scale correlations between genes are commonly observed in gene expression data, due to both biological and technical reasons. These correlations increase the variability of the standard estimate of the false discovery rate (FDR). We highlight the false discovery proportion (FDP, instead of the FDR) as the suitable quantity for assessing differential expression in microarray data, demonstrate the deleterious effects of correlation on FDP estimation and propose an improved estimation method that accounts for the correlations. METHODS: We analyse the variation pattern of the distribution of test statistics under permutation using the singular value decomposition. The results suggest a latent FDR model that accounts for the effects of correlation, and is statistically closer to the FDP. We develop a procedure for estimating the latent FDR (ELF) based on a Poisson regression model. RESULTS: For simulated data based on the correlation structure of real datasets, we find that ELF performs substantially better than the standard FDR approach in estimating the FDP. We illustrate the use of ELF in the analysis of breast cancer and lymphoma data. AVAILABILITY: R code to perform ELF is available in http://www.meb.ki.se/~yudpaw.  相似文献   

2.
Summary .  Statistical methods have been developed and applied to estimating populations that are difficult or too costly to enumerate. Known as multilist methods in epidemiological settings, individuals are matched across lists and estimation of population size proceeds by modeling counts in incomplete multidimensional contingency tables (based on patterns of presence/absence on lists). As multilist methods typically assume that lists are compiled instantaneously, there are few options available for estimating the unknown size of a closed population based on continuously (longitudinally) compiled lists. However, in epidemiological settings, continuous time lists are a routine byproduct of administrative functions. Existing methods are based on time-to-event analyses with a second step of estimating population size. We propose an alternative approach to address the twofold epidemiological problem of estimating population size and of identifying patient factors related to duration (in days) between visits to a health care facility. A Bayesian framework is proposed to model interval lengths because, for many patients, the data are sparse; many patients were observed only once or twice. The proposed method is applied to the motivating data to illustrate the methods' applicability. Then, a small simulation study explores the performance of the estimator under a variety of conditions. Finally, a small discussion section suggests opportunities for continued methodological development for continuous time population estimation.  相似文献   

3.
Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

4.
We describe a nonparametric Bayesian approach for estimating the three-way ROC surface based on mixtures of finite Polya trees (MFPT) priors. Mixtures of finite Polya trees are robust models that can handle nonstandard features in the data. We address the difficulties in modeling continuous diagnostic data with skewness, multimodality, or other nonstandard features, and how parametric approaches can lead to misleading results in such cases. Robust, data-driven inference for the ROC surface and for the volume under the ROC surface is obtained. A simulation study is performed to assess the performance of the proposed method. Methods are applied to data from a magnetic resonance spectroscopy study on human immunodeficiency virus patients.  相似文献   

5.
M Mendoza 《Biometrics》1990,46(4):1059-1069
The analysis of biological assays has received the attention of statisticians for many years. However, when an indirect assay is considered and a continuous response variable is measured, the standard models lead to the problem of estimating a ratio, which has proved to be rather controversial if the statistical analysis is conducted under the classical approach. In this paper, within the Bayesian framework, the reference posterior distribution of slope ratio is obtained. This is the parameter of interest in a large class of biological assays. The results obtained are provided to avoid the drawbacks of the classical methods and generalize previous Bayesian analysis of the ratio of normal means.  相似文献   

6.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

7.
The receiver operating characteristic curve is a popular tool to characterize the capabilities of diagnostic tests with continuous or ordinal responses. One common design for assessing the accuracy of diagnostic tests involves multiple readers and multiple tests, in which all readers read all test results from the same patients. This design is most commonly used in a radiology setting, where the results of diagnostic tests depend on a radiologist's subjective interpretation. The most widely used approach for analyzing data from such a study is the Dorfman-Berbaum-Metz (DBM) method (Dorfman et al., 1992) which utilizes a standard analysis of variance (ANOVA) model for the jackknife pseudovalues of the area under the ROC curves (AUCs). Although the DBM method has performed well in published simulation studies, there is no clear theoretical basis for this approach. In this paper, focusing on continuous outcomes, we investigate its theoretical basis. Our result indicates that the DBM method does not satisfy the regular assumptions for standard ANOVA models, and thus might lead to erroneous inference. We then propose a marginal model approach based on the AUCs which can adjust for covariates as well. Consistent and asymptotically normal estimators are derived for regression coefficients. We compare our approach with the DBM method via simulation and by an application to data from a breast cancer study. The simulation results show that both our method and the DBM method perform well when the accuracy of tests under the study is the same and that our method outperforms the DBM method for inference on individual AUCs when the accuracy of tests is not the same. The marginal model approach can be easily extended to ordinal outcomes.  相似文献   

8.
Quality Control System: an understanding of analytical error; synthetic QC material; a set of QC rules; a process to follow if the rules signal. Quality Control (QC) Sera: reconstitution - staff trained; stability tested - post reconstitution and frozen. QC Rules: rules documented - basis of adoption; action to follow in case of failure documented; evidence of this procedure being used in place; are QC rules defined for both batch and continuous analysis - how is a 'run' defined for a continuous analytical process; means and standard deviations (SDs) of controls based on sufficient data points and reflects true state of system; evidence of staff training in the interpretation of QC rules; process documented; evidence of training of staff; evidence of regular review of Internal QC results. Patient-based QC Procedures in place: if delta check/anion gap/rerun of samples used, then a documented procedure to describe the process and evidence of it being in use; critical values - documented and evidence of use and documentation. Action on QC Rule Failure: documented process to follow with patient samples if control failure occurs; evidence that procedure has been followed in instances of control failure. External Quality Assessment (EQA) Program: Integration of Internal and External QC data.  相似文献   

9.
A new design for estimating the distribution of time to pregnancy is proposed and investigated. The design is based on recording current durations in a cross-sectional sample of women, leading to statistical problems similar to estimating renewal time distributions from backward recurrence times. Non-parametric estimation is studied in some detail and a parametric approach is indicated. The results are illustrated on Monte Carlo simulations and on data from a recent European collaborative study. The role and applicability of this approach is discussed.  相似文献   

10.
This paper presents a method for analysing longitudinal data when there are dropouts. In particular, we develop a simple method based on generalized linear mixture models for handling nonignorable dropouts for a variety of discrete and continuous outcomes. Statistical inference for the model parameters is based on a generalized estimating equations (GEE) approach (Liang and Zeger, 1986). The proposed method yields estimates of the model parameters that are valid when nonresponse is nonignorable under a variety of assumptions concerning the dropout process. Furthermore, the proposed method can be implemented using widely available statistical software. Finally, an example using data from a clinical trial of contracepting women is used to illustrate the methodology.  相似文献   

11.
Ten Have TR  Localio AR 《Biometrics》1999,55(4):1022-1029
We extend an approach for estimating random effects parameters under a random intercept and slope logistic regression model to include standard errors, thereby including confidence intervals. The procedure entails numerical integration to yield posterior empirical Bayes (EB) estimates of random effects parameters and their corresponding posterior standard errors. We incorporate an adjustment of the standard error due to Kass and Steffey (KS; 1989, Journal of the American Statistical Association 84, 717-726) to account for the variability in estimating the variance component of the random effects distribution. In assessing health care providers with respect to adult pneumonia mortality, comparisons are made with the penalized quasi-likelihood (PQL) approximation approach of Breslow and Clayton (1993, Journal of the American Statistical Association 88, 9-25) and a Bayesian approach. To make comparisons with an EB method previously reported in the literature, we apply these approaches to crossover trials data previously analyzed with the estimating equations EB approach of Waclawiw and Liang (1994, Statistics in Medicine 13, 541-551). We also perform simulations to compare the proposed KS and PQL approaches. These two approaches lead to EB estimates of random effects parameters with similar asymptotic bias. However, for many clusters with small cluster size, the proposed KS approach does better than the PQL procedures in terms of coverage of nominal 95% confidence intervals for random effects estimates. For large cluster sizes and a few clusters, the PQL approach performs better than the KS adjustment. These simulation results agree somewhat with those of the data analyses.  相似文献   

12.
Summary In the last decade, interest has been focused on human immunodeficiency virus (HIV) antibody assays and testing strategies that could distinguish recent infections from established infection in a single serum sample. Incidence estimates are obtained by using the relationship between prevalence, incidence, and duration of recent infection (window period). However, recent works demonstrated limitations of this approach due to the use of an estimated mean “window period.” We propose an alternative approach that consists in estimating the distribution of infection times based on serological marker values at the moment when the infection is first discovered. We propose a model based on the repeated measurements of virological markers of seroconversion for the marker trajectory. The parameters of the model are estimated using data from a cohort of HIV‐infected patients enrolled during primary infection. This model can be used for estimating the distribution of infection times for newly HIV diagnosed subjects reported in a HIV surveillance system. An approach is proposed for estimating HIV incidence from these results.  相似文献   

13.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration.  相似文献   

14.
Summary In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have been criticized in the literature from both a conceptual and a robustness perspective. As an alternative, we propose an approach where we exploit an imperfect reference standard with unknown diagnostic accuracy and conduct sensitivity analysis by varying this accuracy over scientifically reasonable ranges. In this article, a latent class model with crossed random effects is proposed for estimating the diagnostic accuracy of regional obstetrics and gynaecological (OB/GYN) physicians in diagnosing endometriosis. To avoid the pitfalls of models without a gold standard, we exploit the diagnostic results of a group of OB/GYN physicians with an international reputation for the diagnosis of endometriosis. We construct an ordinal reference standard based on the discordance among these international experts and propose a mechanism for conducting sensitivity analysis relative to the unknown diagnostic accuracy among them. A Monte Carlo EM algorithm is proposed for parameter estimation and a BIC‐type model selection procedure is presented. Through simulations and data analysis we show that this new approach provides a useful alternative to traditional latent class modeling approaches used in this setting.  相似文献   

15.
GenMiner is an implementation of association rule discovery dedicated to the analysis of genomic data. It allows the analysis of datasets integrating multiple sources of biological data represented as both discrete values, such as gene annotations, and continuous values, such as gene expression measures. GenMiner implements the new NorDi (normal discretization) algorithm for normalizing and discretizing continuous values and takes advantage of the Close algorithm to efficiently generate minimal non-redundant association rules. Experiments show that execution time and memory usage of GenMiner are significantly smaller than those of the standard Apriori-based approach, as well as the number of extracted association rules. AVAILABILITY: The GenMiner software and supplementary materials are available at http://bioinfo.unice.fr/publications/genminer_article/ and http://keia.i3s.unice.fr/?Implementations:GenMiner SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

16.
Many small cetacean, sirenian, and pinniped species aggregate in groups of large or variable size. Accurate estimation of group sizes is essential for estimating the abundance and distribution of these species, but is challenging as individuals are highly mobile and only partially visible. We developed a Bayesian approach for estimating group sizes using wide‐angle aerial photographic or video imagery. Our approach accounts for both availability and perception bias, including a new method (analogous to distance sampling) for estimating perception bias due to small image size in wide‐angle images. We demonstrate our approach through an application to aerial survey data for an endangered population of beluga whales (Delphinapterus leucas) in Cook Inlet, Alaska. Our results strengthen understanding of variation in group size estimates and allow for probabilistic statements about the size of detected groups. Aerial surveys are a standard tool for estimating the abundance and distribution of various marine mammal species. The role of aerial photographic and video data in wildlife assessment is expected to increase substantially with the widespread uptake of unmanned aerial vehicle technology. Key aspects of our approach are relevant to group size estimation for a broad range of marine mammal, seabird, other waterfowl, and terrestrial ungulate species.  相似文献   

17.
We review a model-based approach to estimate local population F(ST) 's that is based on the multinomial-Dirichlet distribution, the so-called F-model. As opposed to the standard method of estimating a single F(ST) value, this approach takes into account the fact that in most if not all realistic situations, local populations differ in their effective sizes and migration rates. Therefore, the use of this approach can help better describe the genetic structure of populations. Despite this obvious advantage, this method has remained largely underutilized by molecular ecologists. Thus, the objective of this review is to foster its use for studying the genetic structure of metapopulations. We present the derivation of the Bayesian formulation for the estimation of population-specific F(ST) 's based on the multinomial-Dirichlet distribution. We describe several recent applications of the F-model and present the results of a small simulation study that explains how the F-model can help better describe the genetic structure of populations.  相似文献   

18.
Evolutionary biologists have adopted simple likelihood models for purposes of estimating ancestral states and evaluating character independence on specified phylogenies; however, for purposes of estimating phylogenies by using discrete morphological data, maximum parsimony remains the only option. This paper explores the possibility of using standard, well-behaved Markov models for estimating morphological phylogenies (including branch lengths) under the likelihood criterion. An important modification of standard Markov models involves making the likelihood conditional on characters being variable, because constant characters are absent in morphological data sets. Without this modification, branch lengths are often overestimated, resulting in potentially serious biases in tree topology selection. Several new avenues of research are opened by an explicitly model-based approach to phylogenetic analysis of discrete morphological data, including combined-data likelihood analyses (morphology + sequence data), likelihood ratio tests, and Bayesian analyses.  相似文献   

19.
In response to the biopharmaceutical industry advancing from traditional batch operation to continuous operation, the Food and Drug Administration (FDA) has published a draft for continuous integrated biomanufacturing. This draft outlines the most important rules for establishing continuous integration. One of these rules is a thorough understanding of mass flows in the process. A computer simulation framework is developed for modeling the residence time distribution (RTD) of integrated continuous downstream processes based on a unit‐by‐unit modeling approach in which unit operations are simulated one‐by‐one across the entire processing time, and then combined into an integrated RTD model. The framework allows for easy addition or replacement of new unit operations, as well as quick adjustment of process parameters during evaluation of the RTD model. With this RTD model, the start‐up phase to reach steady state can be accelerated, the effects of process disturbances at any stage of the process can be calculated, and virtual tracking of a section of the inlet material throughout the process is possible. A hypothetical biomanufacturing process for an antibody was chosen for showcasing the RTD modeling approach.  相似文献   

20.
The ROC (receiver operating characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky, Faraggi and Schisterman (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non‐ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号