首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 20 毫秒
1.
ABSTRACT Statistical inference is an important element of science, but these inferences are constrained within the framework established by the objectives and design of a study. The choice of approach to data analysis, while important, has far less consequence on scientific inference than claimed by Sleep et al. (2007). Their principal assertion—that when model selection is used as the approach to data analysis, all studies provide a reliable foundation for distinguishing among mechanistic explanatory hypotheses—is incorrect and encourages faulty inferences. Sleep et al. (2007) overlook the critical distinction between inferences that result from studies designed a priori to discriminate among a set of candidate explanations versus inferences that result from exploring data post hoc from studies designed originally to meet pattern-based objectives. No approach to data analysis, including model selection, has the power to overcome fundamental limitations on inferences imposed by study design. The comments by Sleep et al. (2007) reinforce the need for scientists to understand clearly the inferential basis for their scientific claims, including the roles and limitations of data analysis.  相似文献   

2.
Missing data is a common issue in research using observational studies to investigate the effect of treatments on health outcomes. When missingness occurs only in the covariates, a simple approach is to use missing indicators to handle the partially observed covariates. The missing indicator approach has been criticized for giving biased results in outcome regression. However, recent papers have suggested that the missing indicator approach can provide unbiased results in propensity score analysis under certain assumptions. We consider assumptions under which the missing indicator approach can provide valid inferences, namely, (1) no unmeasured confounding within missingness patterns; either (2a) covariate values of patients with missing data were conditionally independent of treatment or (2b) these values were conditionally independent of outcome; and (3) the outcome model is correctly specified: specifically, the true outcome model does not include interactions between missing indicators and fully observed covariates. We prove that, under the assumptions above, the missing indicator approach with outcome regression can provide unbiased estimates of the average treatment effect. We use a simulation study to investigate the extent of bias in estimates of the treatment effect when the assumptions are violated and we illustrate our findings using data from electronic health records. In conclusion, the missing indicator approach can provide valid inferences for outcome regression, but the plausibility of its assumptions must first be considered carefully.  相似文献   

3.
Statistical inferences in phylogeography   总被引:2,自引:0,他引:2  
In conventional phylogeographic studies, historical demographic processes are elucidated from the geographical distribution of individuals represented on an inferred gene tree. However, the interpretation of gene trees in this context can be difficult as the same demographic/geographical process can randomly lead to multiple different genealogies. Likewise, the same gene trees can arise under different demographic models. This problem has led to the emergence of many statistical methods for making phylogeographic inferences. A popular phylogeographic approach based on nested clade analysis is challenged by the fact that a certain amount of the interpretation of the data is left to the subjective choices of the user, and it has been argued that the method performs poorly in simulation studies. More rigorous statistical methods based on coalescence theory have been developed. However, these methods may also be challenged by computational problems or poor model choice. In this review, we will describe the development of statistical methods in phylogeographic analysis, and discuss some of the challenges facing these methods.  相似文献   

4.
In studying rates of occurrence and progression of lesions (or tumors), it is typically not possible to obtain exact onset times for each lesion. Instead, data consist of the number of lesions that reach a detectable size between screening examinations, along with measures of the size/severity of individual lesions at each exam time. This interval-censored data structure makes it difficult to properly adjust for the onset time distribution in assessing covariate effects on rates of lesion progression. This article proposes a joint model for the multiple lesion onset and progression process, motivated by cross-sectional data from a study of uterine leiomyoma tumors. By using a joint model, one can potentially obtain more precise inferences on rates of onset, while also performing onset time-adjusted inferences on lesion severity. Following a Bayesian approach, we propose a data augmentation Markov chain Monte Carlo algorithm for posterior computation.  相似文献   

5.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

6.
Sample data from a number of sub-populations are often investigated in order to integrate the findings of different research studies on a particular area. In case of compositional samples, like the allele frequencies collected at a single locus in different surveys, the data are independent multinomial vectors. Each multinomial distribution depends on a specific probability vector, that is, the unknown relative composition of the sub-population. A Bayesian hierarchy approach is proposed here to model the variability of the sub-composition vectors around a common mean with possibly different scales. The common mean can be seen as the relative composition of the aggregated population. Scale parameters are well known in Biology as the Wright's inbreeding coefficients. The method presented here extends some previous work by assuming less prior knowledge on the subject and constraints on the model. A relatively simple Monte Carlo algorithm is described to perform joint inferences on general and local compositions and inbreeding coefficients. The method is applied on two case studies. The first one is based on DNA samples from ten Italian regions at the loci TH01 and FES, obtained from a database currently used for forensic identification, in which inbreeding assessments can be crucial. The second application is based on a set of colour-blind sample rates in North-East Indian populations collected by Choudhury (1994). The Author found some controversial results from the classical test for comparing proportions. A clearer picture, instead, is obtained by the current Bayesian approach.  相似文献   

7.
1. Observations of different organisms can often be used to infer environmental conditions at a site. These inferences may be useful for diagnosing the causes of degradation in streams and rivers. 2. When used for diagnosis, biological inferences must not only provide accurate, unbiased predictions of environmental conditions, but also pairs of inferred environmental variables must covary no more strongly than actual measurements of those same environmental variables. 3. Mathematical analysis of the relationship between the measured and inferred values of different environmental variables provides an approach for comparing the covariance between measurements with the covariance between inferences. Then, simulated and field‐collected data are used to assess the performance of weighted average and maximum likelihood inference methods. 4. Weighted average inferences became less accurate as covariance in the calibration data increased, whereas maximum likelihood inferences were unaffected by covariance in the calibration data. In contrast, the accuracy of weighted average inferences was unaffected by changes in measurement error, whilst the accuracy of maximum likelihood inferences decreased as measurement error increased. Weighted average inferences artificially increased the covariance of environmental variables beyond what was expected from measurements, whereas maximum likelihood inference methods more accurately reproduced the expected covariances. 5. Multivariate maximum likelihood inference methods can potentially provide more useful diagnostic information than single variable inference models.  相似文献   

8.
Over the last decade, many analytical methods and tools have been developed for microarray data. The detection of differentially expressed genes (DEGs) among different treatment groups is often a primary purpose of microarray data analysis. In addition, association studies investigating the relationship between genes and a phenotype of interest such as survival time are also popular in microarray data analysis. Phenotype association analysis provides a list of phenotype-associated genes (PAGs). However, it is sometimes necessary to identify genes that are both DEGs and PAGs. We consider the joint identification of DEGs and PAGs in microarray data analyses. The first approach we used was a naïve approach that detects DEGs and PAGs separately and then identifies the genes in an intersection of the list of PAGs and DEGs. The second approach we considered was a hierarchical approach that detects DEGs first and then chooses PAGs from among the DEGs or vice versa. In this study, we propose a new model-based approach for the joint identification of DEGs and PAGs. Unlike the previous two-step approaches, the proposed method identifies genes simultaneously that are DEGs and PAGs. This method uses standard regression models but adopts different null hypothesis from ordinary regression models, which allows us to perform joint identification in one-step. The proposed model-based methods were evaluated using experimental data and simulation studies. The proposed methods were used to analyze a microarray experiment in which the main interest lies in detecting genes that are both DEGs and PAGs, where DEGs are identified between two diet groups and PAGs are associated with four phenotypes reflecting the expression of leptin, adiponectin, insulin-like growth factor 1, and insulin. Model-based approaches provided a larger number of genes, which are both DEGs and PAGs, than other methods. Simulation studies showed that they have more power than other methods. Through analysis of data from experimental microarrays and simulation studies, the proposed model-based approach was shown to provide a more powerful result than the naïve approach and the hierarchical approach. Since our approach is model-based, it is very flexible and can easily handle different types of covariates.  相似文献   

9.
A Bayesian approach to analysing data from family-based association studies is developed. This permits direct assessment of the range of possible values of model parameters, such as the recombination frequency and allelic associations, in the light of the data. In addition, sophisticated comparisons of different models may be handled easily, even when such models are not nested. The methodology is developed in such a way as to allow separate inferences to be made about linkage and association by including theta, the recombination fraction between the marker and disease susceptibility locus under study, explicitly in the model. The method is illustrated by application to a previously published data set. The data analysis raises some interesting issues, notably with regard to the weight of evidence necessary to convince us of linkage between a candidate locus and disease.  相似文献   

10.
J S Lopes  M Arenas  D Posada  M A Beaumont 《Heredity》2014,112(3):255-264
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.  相似文献   

11.
Longitudinal studies frequently incur outcome-related nonresponse. In this article, we discuss a likelihood-based method for analyzing repeated binary responses when the mechanism leading to missing response data depends on unobserved responses. We describe a pattern-mixture model for the joint distribution of the vector of binary responses and the indicators of nonresponse patterns. Specifically, we propose an extension of the multivariate logistic model to handle nonignorable nonresponse. This method yields estimates of the mean parameters under a variety of assumptions regarding the distribution of the unobserved responses. Because these models make unverifiable identifying assumptions, we recommended conducting sensitivity analyses that provide a range of inferences, each of which is valid under different assumptions for nonresponse. The methodology is illustrated using data from a longitudinal study of obesity in children.  相似文献   

12.
Dunson DB  Chen Z  Harry J 《Biometrics》2003,59(3):521-530
In applications that involve clustered data, such as longitudinal studies and developmental toxicity experiments, the number of subunits within a cluster is often correlated with outcomes measured on the individual subunits. Analyses that ignore this dependency can produce biased inferences. This article proposes a Bayesian framework for jointly modeling cluster size and multiple categorical and continuous outcomes measured on each subunit. We use a continuation ratio probit model for the cluster size and underlying normal regression models for each of the subunit-specific outcomes. Dependency between cluster size and the different outcomes is accommodated through a latent variable structure. The form of the model facilitates posterior computation via a simple and computationally efficient Gibbs sampler. The approach is illustrated with an application to developmental toxicity data, and other applications, to joint modeling of longitudinal and event time data, are discussed.  相似文献   

13.
Guo Y 《Biometrics》2011,67(4):1532-1542
Independent component analysis (ICA) has become an important tool for analyzing data from functional magnetic resonance imaging (fMRI) studies. ICA has been successfully applied to single-subject fMRI data. The extension of ICA to group inferences in neuroimaging studies, however, is challenging due to the unavailability of a prespecified group design matrix and the uncertainty in between-subjects variability in fMRI data. We present a general probabilistic ICA (PICA) model that can accommodate varying group structures of multisubject spatiotemporal processes. An advantage of the proposed model is that it can flexibly model various types of group structures in different underlying neural source signals and under different experimental conditions in fMRI studies. A maximum likelihood (ML) method is used for estimating this general group ICA model. We propose two expectation-maximization (EM) algorithms to obtain the ML estimates. The first method is an exact EM algorithm, which provides an exact E-step and an explicit noniterative M-step. The second method is a variational approximation EM algorithm, which is computationally more efficient than the exact EM. In simulation studies, we first compare the performance of the proposed general group PICA model and the existing probabilistic group ICA approach. We then compare the two proposed EM algorithms and show the variational approximation EM achieves comparable accuracy to the exact EM with significantly less computation time. An fMRI data example is used to illustrate application of the proposed methods.  相似文献   

14.
In longitudinal studies and in clustered situations often binary and continuous response variables are observed and need to be modeled together. In a recent publication Dunson, Chen, and Harry (2003, Biometrics 59, 521-530) (DCH) propose a Bayesian approach for joint modeling of cluster size and binary and continuous subunit-specific outcomes and illustrate this approach with a developmental toxicity data example. In this note we demonstrate how standard software (PROC NLMIXED in SAS) can be used to obtain maximum likelihood estimates in an alternative parameterization of the model with a single cluster-level factor considered by DCH for that example. We also suggest that a more general model with additional cluster-level random effects provides a better fit to the data set. An apparent discrepancy between the estimates obtained by DCH and the estimates obtained earlier by Catalano and Ryan (1992, Journal of the American Statistical Association 87, 651-658) is also resolved. The issue of bias in inferences concerning the dose effect when cluster size is ignored is discussed. The maximum-likelihood approach considered herein is applicable to general situations with multiple clustered or longitudinally measured outcomes of different type and does not require prior specification and extensive programming.  相似文献   

15.
Sha Q  Zhang Z  Zhang S 《PloS one》2011,6(7):e21957
In family-based data, association information can be partitioned into the between-family information and the within-family information. Based on this observation, Steen et al. (Nature Genetics. 2005, 683-691) proposed an interesting two-stage test for genome-wide association (GWA) studies under family-based designs which performs genomic screening and replication using the same data set. In the first stage, a screening test based on the between-family information is used to select markers. In the second stage, an association test based on the within-family information is used to test association at the selected markers. However, we learn from the results of case-control studies (Skol et al. Nature Genetics. 2006, 209-213) that this two-stage approach may be not optimal. In this article, we propose a novel two-stage joint analysis for GWA studies under family-based designs. For this joint analysis, we first propose a new screening test that is based on the between-family information and is robust to population stratification. This new screening test is used in the first stage to select markers. Then, a joint test that combines the between-family information and within-family information is used in the second stage to test association at the selected markers. By extensive simulation studies, we demonstrate that the joint analysis always results in increased power to detect genetic association and is robust to population stratification.  相似文献   

16.
Adaptive diversification is driven by selection in ecologically different environments. In absence of geographical barriers to dispersal, this adaptive divergence (AD) may be constrained by gene flow (GF). And yet the reverse may also be true, with AD constraining GF (i.e. 'ecological speciation'). Both of these causal effects have frequently been inferred from the presence of negative correlations between AD and GF in nature - yet the bi-directional causality warrants caution in such inferences. We discuss how the ability of correlative studies to infer causation might be improved through the simultaneous measurement of multiple ecological and evolutionary variables. On the one hand, inferences about the causal role of GF can be made by examining correlations between AD and the potential for dispersal. On the other hand, inferences about the causal role of AD can be made by examining correlations between GF and environmental differences. Experimental manipulations of dispersal and environmental differences are a particularly promising approach for inferring causation. At present, the best studies find strong evidence that GF constrains AD and some studies also find the reverse. Improvements in empirical approaches promise to eventually allow general inferences about the relative strength of different causal interactions during adaptive diversification.  相似文献   

17.
Huang X  Zhang N 《Biometrics》2008,64(4):1090-1099
SUMMARY: In clinical studies, when censoring is caused by competing risks or patient withdrawal, there is always a concern about the validity of treatment effect estimates that are obtained under the assumption of independent censoring. Because dependent censoring is nonidentifiable without additional information, the best we can do is a sensitivity analysis to assess the changes of parameter estimates under different assumptions about the association between failure and censoring. This analysis is especially useful when knowledge about such association is available through literature review or expert opinions. In a regression analysis setting, the consequences of falsely assuming independent censoring on parameter estimates are not clear. Neither the direction nor the magnitude of the potential bias can be easily predicted. We provide an approach to do sensitivity analysis for the widely used Cox proportional hazards models. The joint distribution of the failure and censoring times is assumed to be a function of their marginal distributions. This function is called a copula. Under this assumption, we propose an iteration algorithm to estimate the regression parameters and marginal survival functions. Simulation studies show that this algorithm works well. We apply the proposed sensitivity analysis approach to the data from an AIDS clinical trial in which 27% of the patients withdrew due to toxicity or at the request of the patient or investigator.  相似文献   

18.
A statistical framework for genomic data fusion   总被引:8,自引:0,他引:8  
MOTIVATION: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. RESULTS: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data. AVAILABILITY: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm  相似文献   

19.
Aim Quantifying and predicting change in large ecosystems is an important research objective for applied ecologists as human disturbance effects become increasingly evident at regional and global scales. However, studies used to make inferences about large‐scale change are frequently of uneven quality and few in number, having been undertaken to study local, rather than global, change. Our aim is to improve the quality of inferences that can be made in meta‐analyses of large‐scale disturbance by integrating studies of varying quality in a unified modelling framework that is informative for both local and regional management. Innovation Here we improve conventionally structured meta‐analysis methods by including imputation of unknown study variances and the use of Bayesian factor potentials. The approach is a coherent framework for integrating data of varying quality across multiple studies while facilitating belief statements about the uncertainty in parameter estimates and the probable outcome of future events. The approach is applied to a regional meta‐analysis of the effects of loss of coral cover on species richness and the abundance of coral‐dependent fishes in the western Indian Ocean (WIO) before and after a mass bleaching event in 1998. Main conclusions Our Bayesian approach to meta‐analysis provided greater precision of parameter estimates than conventional weighted linear regression meta‐analytical techniques, allowing us to integrate all available data from 66 available study locations in the WIO across multiple scales. The approach thereby: (1) estimated uncertainty in site‐level estimates of change, (2) provided a regional estimate for future change at any given site in the WIO, and (3) provided a probabilistic belief framework for future management of reef resources at both local and regional scales.  相似文献   

20.
Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号