首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Data analytic methods for matched case-control studies   总被引:3,自引:0,他引:3  
D Pregibon 《Biometrics》1984,40(3):639-651
The recent introduction of complex multivariate statistical models in matched case-control studies is a mixed blessing. Their use can lead to a better understanding of the way in which many variables contribute to the risk of disease. On the other hand, these powerful methods can obscure salient features in the data that might have been detected by other, less sophisticated methods. This shortcoming is due to a lack of support methodology for the routine use of these models. Satisfactory computation of estimated relative risks and their standard errors is not sufficient justification for the fitted model. Goodness of fit must be examined if inferences are to be trusted. This paper is concerned with the analysis of matched case-control studies with logistic models. Analogies of these models to linear regression models are emphasized. In particular, basic concepts such as analysis of variance, multiple correlation coefficient, one-degree-of-freedom tests, and residual analysis are discussed. The fairly new field of regression diagnostics is also introduced. All procedures are illustrated on a study of bladder cancer in males.  相似文献   

2.
CONFIDENCE LIMITS ON PHYLOGENIES: THE BOOTSTRAP REVISITED   总被引:4,自引:0,他引:4  
Abstract— The bootstrap, a non-parametric statistical analysis, can be used to assess confidence limits on phylogcnics. The method most widely used tests the monophyly of individual clades. This paper proposes additional applications of the bootstrap which provide useful information about phylogeny even when many clades are found not to be supported with confidence (as often occurs in practice). In such cases it is still possible to place a constraint on the phylogenetic position of taxa by examining the relative size of the smallest monophyletic groups that contain them. In addition, the taxonomic composition of these larger clades can be determined, as well as the relative likelihood of their occurrence. The distinction between hypotheses about membership in particular clades and hypotheses about entire topologies is also discussed. To investigate the latter, the bootstrap is used to estimate the sampling distribution of tree similarity indices. All methods are illustrated by reference to a large data set on the angiosperm family Asteraccae, selected from the literature.  相似文献   

3.
A nonparametric test to detect a pulse in monthly data is presented. This test is a maximum rank-sum test. The test statistic can be computed from frequencies or rates. The exact null distribution of the test statistic is tabulated for pulses that last 3, 4, 5, or 6 months. Estimates from a simulation study of the test's type I error rate and power are presented. The statistical modeling of the data is discussed. Several examples are given to illustrate the application of the test and the modeling procedures. Practical matters such as the treatment of tied observations, the effect of unequal lengths in the months, sample-size calculation, and post-test power analysis are discussed and illustrated with examples.  相似文献   

4.
Statistical inference for simultaneous clustering of gene expression data   总被引:1,自引:0,他引:1  
Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function theta=Phi(P) of the true data generating distribution P, and an estimate is obtained by applying this function to the empirical distribution P(n). We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as parameters which are compositions of individual mappings for clustering patients and genes. This framework allows one to assess classical properties of clustering methods, such as consistency, and to formally study statistical inference regarding the clustering parameter. We present results of simulations designed to assess the asymptotic validity of different bootstrap methods for estimating the distribution of Phi(P(n)). The method is illustrated on a publicly available data set.  相似文献   

5.
The classification accuracy of a continuous marker is typically evaluated with the receiver operating characteristic (ROC) curve. In this paper, we study an alternative conceptual framework, the "percentile value." In this framework, the controls only provide a reference distribution to standardize the marker. The analysis proceeds by analyzing the standardized marker in cases. The approach is shown to be equivalent to ROC analysis. Advantages are that it provides a framework familiar to a broad spectrum of biostatisticians and it opens up avenues for new statistical techniques in biomarker evaluation. We develop several new procedures based on this framework for comparing biomarkers and biomarker performance in different populations. We develop methods that adjust such comparisons for covariates. The methods are illustrated on data from 2 cancer biomarker studies.  相似文献   

6.
Z Li  J M?tt?nen  M J Sillanp?? 《Heredity》2015,115(6):556-564
Linear regression-based quantitative trait loci/association mapping methods such as least squares commonly assume normality of residuals. In genetics studies of plants or animals, some quantitative traits may not follow normal distribution because the data include outlying observations or data that are collected from multiple sources, and in such cases the normal regression methods may lose some statistical power to detect quantitative trait loci. In this work, we propose a robust multiple-locus regression approach for analyzing multiple quantitative traits without normality assumption. In our method, the objective function is least absolute deviation (LAD), which corresponds to the assumption of multivariate Laplace distributed residual errors. This distribution has heavier tails than the normal distribution. In addition, we adopt a group LASSO penalty to produce shrinkage estimation of the marker effects and to describe the genetic correlation among phenotypes. Our LAD-LASSO approach is less sensitive to the outliers and is more appropriate for the analysis of data with skewedly distributed phenotypes. Another application of our robust approach is on missing phenotype problem in multiple-trait analysis, where the missing phenotype items can simply be filled with some extreme values, and be treated as outliers. The efficiency of the LAD-LASSO approach is illustrated on both simulated and real data sets.  相似文献   

7.
Continuous proportional data is common in biomedical research, e.g., the pre‐post therapy percent change in certain physiological and molecular variables such as glomerular filtration rate, certain gene expression level, or telomere length. As shown in (Song and Tan, 2000) such data requires methods beyond the common generalised linear models. However, the original marginal simplex model of (Song and Tan, 2000) for such longitudinal continuous proportional data assumes a constant dispersion parameter. This assumption of dispersion homogeneity is imposed mainly for mathematical convenience and may be violated in some situations. For example, the dispersion may vary in terms of drug treatment cohorts or follow‐up times. This paper extends their original model so that the heterogeneity of the dispersion parameter can be assessed and accounted for in order to conduct a proper statistical inference for the model parameters. A simulation study is given to demonstrate that statistical inference can be seriously affected by mistakenly assuming a varying dispersion parameter to be constant in the application of the available GEEs method. In addition, residual analysis is developed for checking various assumptions made in the modelling process, e.g., assumptions on error distribution. The methods are illustrated with the same eye surgery data in (Song and Tan, 2000) for ease of comparison. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

8.
The usual analysis of quantal response data occurring in diverse fields such as economics, medicine, psychology and toxicology use probit and logit models or their extensions with generalized least squares or the principle of likelihood as the method of statistical inference. The symmetric alternative models lead to practically comparable results and the choice of model or method is determined by considerations of familiarity and computational convenience. Recent attempts at improvement involve larger parametric families of tolerance distributions and employ the method of maximum likelihood in analysis. In this paper we consider models with the tolerance distributions based upon the Tukey-lambda distributions which are described in terms of their quantile functions. The likelihood methods for fitting the models and testing their adequacies are developed and illustrated using classical data due to BLISS (1935) and ASHFORD and SMITH (1964).  相似文献   

9.
The Korean guidelines developed by the Ministry of Environment for soil investigations do not seriously take into account statistical characteristics of collected data and statistical assumptions required for the methods applied. In this article, we point out the statistical omissions in the Korean guidelines and propose some supplements to them. Systematic sampling is recommended, since systematic sampling raises sample representativeness and provides a more efficient allocation of resources that lead to cost-savings. The type of statistical inference should be determined according to the objective of the investigation and the presence of normality. We provide a diagram for selecting an appropriate type of inference. We also introduce power transformation and propose a clustering-based stratification method for improving the accuracy of analysis and the normality condition of data. Both methods are illustrated with real datasets collected from a northern region of South Korea. One of those non-normal datasets was normalized simply by applying power transformation. The other needed to be clustered into two heterogeneous groups by our proposed method before transformation, which enables applying normality-based methods to the data.  相似文献   

10.
This paper provides a synopsis of statistical methods which can be used for the sequential analysis of possibly censored survival times in clinical trials. Especially, results on the asymptotic behaviour of the Breslow-Haug statistic and on the sequential version of the logrank statistic are presented in a standardized terminology. In addition, formulae for the explicit calculation of linear and square-root boundaries for sequential plans are given and illustrated by an example. Practical problems of applying these methods when monitoring a fixed-sample clinical trial as well as group sequential methods and calculation of P-values are also discussed.  相似文献   

11.
This review focuses on the analysis of temporal beta diversity, which is the variation in community composition along time in a study area. Temporal beta diversity is measured by the variance of the multivariate community composition time series and that variance can be partitioned using appropriate statistical methods. Some of these methods are classical, such as simple or canonical ordination, whereas others are recent, including the methods of temporal eigenfunction analysis developed for multiscale exploration (i.e. addressing several scales of variation) of univariate or multivariate response data, reviewed, to our knowledge for the first time in this review. These methods are illustrated with ecological data from 13 years of benthic surveys in Chesapeake Bay, USA. The following methods are applied to the Chesapeake data: distance-based Moran''s eigenvector maps, asymmetric eigenvector maps, scalogram, variation partitioning, multivariate correlogram, multivariate regression tree, and two-way MANOVA to study temporal and space–time variability. Local (temporal) contributions to beta diversity (LCBD indices) are computed and analysed graphically and by regression against environmental variables, and the role of species in determining the LCBD values is analysed by correlation analysis. A tutorial detailing the analyses in the R language is provided in an appendix.  相似文献   

12.
Bayesian statistics for parasitologists   总被引:3,自引:0,他引:3  
Bayesian statistical methods are increasingly being used in the analysis of parasitological data. Here, the basis of differences between the Bayesian method and the classical or frequentist approach to statistical inference is explained. This is illustrated with practical implications of Bayesian analyses using prevalence estimation of strongyloidiasis and onchocerciasis as two relevant examples. The strongyloidiasis example addresses the problem of parasitological diagnosis in the absence of a gold standard, whereas the onchocerciasis case focuses on the identification of villages warranting priority mass ivermectin treatment. The advantages and challenges faced by users of the Bayesian approach are also discussed and the readers pointed to further directions for a more in-depth exploration of the issues raised. We advocate collaboration between parasitologists and Bayesian statisticians as a fruitful and rewarding venture for advancing applied research in parasite epidemiology and the control of parasitic infections.  相似文献   

13.
Missing data are frequent in morphometric studies of both fossil and recent material. A common method of addressing the problem of missing data is to omit combinations of characters and specimens from subsequent analyses; however, omitting different subsets of characters and specimens can affect both the statistical robustness of the analyses and the resulting biological interpretations. We describe a method of examining all possible subsets of complete data and of scoring each subset by the 'condition' (ratio of first eigenvalue to second, or of second to first, depending on context) of the corresponding covariance or correlation matrix, and subsequently choosing the submatrix that either optimizes one of these criteria or matches the estimated condition of the original data matrix. We then describe an extension of this method that can be used to choose the 'best' characters and specimens for which some specified proportion of missing data can be estimated using standard imputation techniques such as the expectation-maximization algorithm or multiple imputation. The methods are illustrated with published and unpublished data sets on fossil and extant vertebrates. Although these problems and methods are discussed in the context of conventional morphometric data, they are applicable to many other kinds of data matrices.  © 2006 The Linnean Society of London, Biological Journal of the Linnean Society , 2006, 88 , 309–328.  相似文献   

14.
In observational studies, subjects are often nested within clusters. In medical studies, patients are often treated by doctors and therefore patients are regarded as nested or clustered within doctors. A concern that arises with clustered data is that cluster-level characteristics (e.g., characteristics of the doctor) are associated with both treatment selection and patient outcomes, resulting in cluster-level confounding. Measuring and modeling cluster attributes can be difficult and statistical methods exist to control for all unmeasured cluster characteristics. An assumption of these methods however is that characteristics of the cluster and the effects of those characteristics on the outcome (as well as probability of treatment assignment when using covariate balancing methods) are constant over time. In this paper, we consider methods that relax this assumption and allow for estimation of treatment effects in the presence of unmeasured time-dependent cluster confounding. The methods are based on matching with the propensity score and incorporate unmeasured time-specific cluster effects by performing matching within clusters or using fixed- or random-cluster effects in the propensity score model. The methods are illustrated using data to compare the effectiveness of two total hip devices with respect to survival of the device and a simulation study is performed that compares the proposed methods. One method that was found to perform well is matching within surgeon clusters partitioned by time. Considerations in implementing the proposed methods are discussed.  相似文献   

15.
Microbiome data are characterized by several aspects that make them challenging to analyse statistically: they are compositional, high dimensional and rich in zeros. A large array of statistical methods exist to analyse these data. Some are borrowed from other fields, such as ecology or RNA-sequencing, while others are custom-made for microbiome data. The large range of available methods, and which is continuously expanding, means that researchers have to invest considerable effort in choosing what method(s) to apply. In this paper we list 14 statistical methods or approaches that we think should be generally avoided. In several cases this is because we believe the assumptions behind the method are unlikely to be met for microbiome data. In other cases we see methods that are used in ways they are not intended to be used. We believe researchers would be helped by more critical evaluations of existing methods, as not all methods in use are suitable or have been sufficiently reviewed. We hope this paper contributes to a critical discussion on what methods are appropriate to use in the analysis of microbiome data.  相似文献   

16.
Methods are described for discovering whether a mixture of two poisons is as toxic as predicted on the hypothesis of independent joint action. These include X2tests and a procedure for finding the maximum likelihood estimate of the coefficient of correlation in resistance to the poisons. The methods are illustrated using data from insecticidal tests.
In the insecticidal tests, flour beetles, Tribolium castaneum Herbst, were sprayed with, or exposed to films of, different insecticides in solution in Shell oil P 31. The insecticides were pyrethrins, D.D.T., and B.H.C., singly and in pairs. The statistical analysis of the results showed that pyrethrins and D.D.T. could have acted independently both as films and as direct sprays; D.D.T. and B.H.C. could have acted independently as films but not as direct sprays; and B.H.C. and pyrethrins could not have acted independently either as films or as direct sprays.
These findings are discussed, and it is concluded that independent action should be regarded as a special case of a more general type of joint action for which the term dissimilar is proposed. A general method of approach is suggested for the conception and development of hypotheses of the joint action of poisons.  相似文献   

17.
With the rapid advances of various high-throughput technologies, generation of '-omics' data is commonplace in almost every biomedical field. Effective data management and analytical approaches are essential to fully decipher the biological knowledge contained in the tremendous amount of experimental data. Meta-analysis, a set of statistical tools for combining multiple studies of a related hypothesis, has become popular in genomic research. Here, we perform a systematic search from PubMed and manual collection to obtain 620 genomic meta-analysis papers, of which 333 microarray meta-analysis papers are summarized as the basis of this paper and the other 249 GWAS meta-analysis papers are discussed in the next companion paper. The review in the present paper focuses on various biological purposes of microarray meta-analysis, databases and software and related statistical procedures. Statistical considerations of such an analysis are further scrutinized and illustrated by a case study. Finally, several open questions are listed and discussed.  相似文献   

18.
Comparative studies tend to differ from optimality and functionality studies in how they treat adaptation. While the comparative approach focuses on the origin and change of traits, optimality studies assume that adaptations are maintained at an optimum by stabilizing selection. This paper presents a model of adaptive evolution on a macroevolutionary time scale that includes the maintenance of traits at adaptive optima by stabilizing selection as the dominant evolutionary force. Interspecific variation is treated as variation in the position of adaptive optima. The model illustrates how phylogenetic constraints not only lead to correlations between phylogenetically related species, but also to imperfect adaptations. From this model, a statistical comparative method is derived that can be used to estimate the effect of a selective factor on adaptive optima in a way that would be consistent with an optimality study of adaptation to this factor. The method is illustrated with an analysis of dental evolution in fossil horses. The use of comparative methods to study evolutionary trends is also discussed.  相似文献   

19.
Efron-type measures of prediction error for survival analysis   总被引:3,自引:0,他引:3  
Gerds TA  Schumacher M 《Biometrics》2007,63(4):1283-1287
Estimates of the prediction error play an important role in the development of statistical methods and models, and in their applications. We adapt the resampling tools of Efron and Tibshirani (1997, Journal of the American Statistical Association92, 548-560) to survival analysis with right-censored event times. We find that flexible rules, like artificial neural nets, classification and regression trees, or regression splines can be assessed, and compared to less flexible rules in the same data where they are developed. The methods are illustrated with data from a breast cancer trial.  相似文献   

20.
The design and statistical analysis of mutagenicity experiments involving microorganisms and a single dose of mutagen are discussed. Test statistics are derived for use in determining the mutagenicity of a chemical when survival data are available and also when such data are not available. One's likelihood (power) of correctly concluding a chemical is mutagenic is examined, and minimum total sample sizes required for 95% power are presented. It is found that one generally has greater power when survival data are available. Required precision is estimating survival is discussed in reference to type-1 and type-2 errors. The proper use of the formulae and figures presented is illustrated by examples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号