首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In a preceding paper, we (Nurminen et al. 1981) advocated the use of the sole referent series as the basis of estimating moments in the construction of test statistics for comparative studies. Three simple test statistics, two metric approaches and one procedure based on ranks, incorporating this principle are introduced for small matched samples with ordinal outcome variables. Associated methods for computing an “exact” probability value are derived. The techniques are illustrated by real data from a study in the field of occupational health epidemiology.  相似文献   

2.
On the basis of the conditional distribution, given the marginal totals of non-cases fixed for each of independent 2 × 2 tables under inverse sampling, this paper develops the conditional maximum likelihood (CMLE) estimator of the underlying common relative difference (RD) and its asymptotic conditional variance. This paper further provides for the RD an exact interval calculation procedure, of which the coverage probability is always larger than or equal to the desired confidence level and for investigating whether the underlying common RD equals any specified value an exact test procedure, of which Type I error is always less than or equal to the nominal α-level. These exact interval estimation and exact hypothesis testing procedures are especially useful for the situation in which the number of index subjects in a study is small and the asymptotically approximate methods may not be appropriate for use. This paper also notes the condition under which the CMLE of RD uniquely exists and includes a simple example to illustrate use of these techniques.  相似文献   

3.
The great increase in the number of phylogenetic studies of a wide variety of organisms in recent decades has focused considerable attention on the balance of phylogenetic trees—the degree to which sister clades within a tree tend to be of equal size—for at least two reasons: (1) the degree of balance of a tree may affect the accuracy of estimates of it; (2) the degree of balance, or imbalance, of a tree may reveal something about the macroevolutionary processes that produced it. In particular, variation among lineages in rates of speciation or extinction is expected to produce trees that are less balanced than those that result from phylogenetic evolution in which each extant species of a group has the same probability of speciation or extinction. Several coefficients for measuring the balance or imbalance of phylogenetic trees have been proposed. I focused on Colless's coefficient of imbalance (7) for its mathematical tractability and ease of interpretation. Earlier work on this statistic produced exact methods only for calculating the expected value. In those studies, the variance and confidence limits, which are necessary for testing the departure of observed values of I from the expected, were estimated by Monte Carlo simulation. I developed recursion equations that allow exact calculation of the mean, variance, skewness, and complete probability distribution of I for two different probability-generating models for bifurcating tree shapes. The Equal-Rates Markov (ERM) model assumes that trees grow by the random speciation and extinction of extant species, with all species that are extant at a given time having the same probability of speciation or extinction. The Equal Probability (EP) model assumes that all possible labeled trees for a given number of terminal taxa have the same probability of occurring. Examples illustrate how these theoretically derived probabilities and parameters may be used to test whether the evolution of a monophyletic group or set of monophyletic groups has proceeded according to a Markov model with equal rates of speciation and extinction among species, that is, whether there has been significant variation among lineages in expected rates of speciation or extinction.  相似文献   

4.
D Y Lin  L J Wei  D L DeMets 《Biometrics》1991,47(4):1399-1408
This paper considers clinical trials comparing two treatments with dichotomous responses where the data are examined periodically for early evidence of treatment difference. The existing group sequential methods for such trials are based on the large-sample normal approximation to the joint distribution of the estimators of treatment difference over interim analyses. We demonstrate through extensive numerical studies that, for small and even moderate-sized trials, these approximate procedures may lead to tests with supranominal size (mainly when unpooled estimators of variance are used) and confidence intervals with under-nominal coverage probability. We then study exact methods for group sequential testing, repeated interval estimation, and interval estimation following sequential testing. The new procedures can accommodate any treatment allocation rules. An example using real data is provided.  相似文献   

5.
Many environmental health and risk assessment techniques and models aim at estimating the fluctuations of selected biological endpoints through the time domain as a means of assessing changes in the environment or the probability of a particular measurement level occurring. In either case, estimates of the sample variance and mean of the sample variance are crucial to making appropriate statistical inferences. The commonly employed statistical techniques for estimating both measures presume the data were generated by a covariance stationary process. In such cases, the observations are treated as independently and identically distributed and classical statistical testing methods are applied. However, if the assumption of covariance stationarity is violated, the resulting sample variance and variance of the sample mean estimates are biased. The bias compromises statistical testing procedures by increasing the probability of detecting significance in tests of mean and variance differences. This can lead to inappropriate decisions being made about the severity of environmental damage. Accordingly, it is argued that data sets be examined for correlation in the time domain and appropriate adjustments be made to the required estimators before they are used in statistical hypothesis testing. Only then can credible and scientifically defensible decisions be made by environmental decision makers and regulators.  相似文献   

6.
Many confidence intervals calculated in practice are potentially not exact, either because the requirements for the interval estimator to be exact are known to be violated, or because the (exact) distribution of the data is unknown. If a confidence interval is approximate, the crucial question is how well its true coverage probability approximates its intended coverage probability. In this paper we propose to use the bootstrap to calculate an empirical estimate for the (true) coverage probability of a confidence interval. In the first instance, the empirical coverage can be used to assess whether a given type of confidence interval is adequate for the data at hand. More generally, when planning the statistical analysis of future trials based on existing data pools, the empirical coverage can be used to study the coverage properties of confidence intervals as a function of type of data, sample size, and analysis scale, and thus inform the statistical analysis plan for the future trial. In this sense, the paper proposes an alternative to the problematic pretest of the data for normality, followed by selection of the analysis method based on the results of the pretest. We apply the methodology to a data pool of bioequivalence studies, and in the selection of covariance patterns for repeated measures data.  相似文献   

7.
Brookmeyer R  You X 《Biometrics》2006,62(1):61-65
The objective of this article is to develop a hypothesis-testing procedure to determine whether a common source outbreak has ended. We consider the case when neither the calendar date of exposure to the pathogen nor the exact incubation period distribution is known. The hypothesis-testing procedure is based on the spacings between ordered calendar dates of disease onset of the cases. A simulation study was performed to evaluate the robustness of the methods to various models for the incubation period of infectious diseases. We investigated the impact of multiple testing on the overall outbreak-wise type I error probability. We derive expressions for the outbreak-wise type I error probability and show that multiple testing has minimal effect on inflating that error probability. The results are discussed in the context of the 2001 U.S. anthrax outbreak.  相似文献   

8.
In diagnostic medicine, the volume under the receiver operating characteristic (ROC) surface (VUS) is a commonly used index to quantify the ability of a continuous diagnostic test to discriminate between three disease states. In practice, verification of the true disease status may be performed only for a subset of subjects under study since the verification procedure is invasive, risky, or expensive. The selection for disease examination might depend on the results of the diagnostic test and other clinical characteristics of the patients, which in turn can cause bias in estimates of the VUS. This bias is referred to as verification bias. Existing verification bias correction in three‐way ROC analysis focuses on ordinal tests. We propose verification bias‐correction methods to construct ROC surface and estimate the VUS for a continuous diagnostic test, based on inverse probability weighting. By applying U‐statistics theory, we develop asymptotic properties for the estimator. A Jackknife estimator of variance is also derived. Extensive simulation studies are performed to evaluate the performance of the new estimators in terms of bias correction and variance. The proposed methods are used to assess the ability of a biomarker to accurately identify stages of Alzheimer's disease.  相似文献   

9.
A class of generalized linear mixed models can be obtained by introducing random effects in the linear predictor of a generalized linear model, e.g. a split plot model for binary data or count data. Maximum likelihood estimation, for normally distributed random effects, involves high-dimensional numerical integration, with severe limitations on the number and structure of the additional random effects. An alternative estimation procedure based on an extension of the iterative re-weighted least squares procedure for generalized linear models will be illustrated on a practical data set involving carcass classification of cattle. The data is analysed as overdispersed binomial proportions with fixed and random effects and associated components of variance on the logit scale. Estimates are obtained with standard software for normal data mixed models. Numerical restrictions pertain to the size of matrices to be inverted. This can be dealt with by absorption techniques familiar from e.g. mixed models in animal breeding. The final model fitted to the classification data includes four components of variance and a multiplicative overdispersion factor. Basically the estimation procedure is a combination of iterated least squares procedures and no full distributional assumptions are needed. A simulation study based on the classification data is presented. This includes a study of procedures for constructing confidence intervals and significance tests for fixed effects and components of variance. The simulation results increase confidence in the usefulness of the estimation procedure.  相似文献   

10.
A procedure is presented for constructing an exact confidence interval for the ratio of the two variance components in a possibly unbalanced mixed linear model that contains a single set of m random effects. This procedure can be used in animal and plant breeding problems to obtain an exact confidence interval for a heritability. The confidence interval can be defined in terms of the output of a least squares analysis. It can be computed by a graphical or iterative technique requiring the diagonalization of an m X m matrix or, alternatively, the inversion of a number of m X m matrices. Confidence intervals that are approximate can be obtained with much less computational burden, using either of two approaches. The various confidence interval procedures can be extended to some problems in which the mixed linear model contains more than one set of random effects. Corresponding to each interval procedure is a significance test and one or more estimators.  相似文献   

11.
The mixed-model factorial analysis of variance has been used in many recent studies in evolutionary quantitative genetics. Two competing formulations of the mixed-model ANOVA are commonly used, the “Scheffe” model and the “SAS” model; these models differ in both their assumptions and in the way in which variance components due to the main effect of random factors are defined. The biological meanings of the two variance component definitions have often been unappreciated, however. A full understanding of these meanings leads to the conclusion that the mixed-model ANOVA could have been used to much greater effect by many recent authors. The variance component due to the random main effect under the two-way SAS model is the covariance in true means associated with a level of the random factor (e.g., families) across levels of the fixed factor (e.g., environments). Therefore the SAS model has a natural application for estimating the genetic correlation between a character expressed in different environments and testing whether it differs from zero. The variance component due to the random main effect under the two-way Scheffe model is the variance in marginal means (i.e., means over levels of the fixed factor) among levels of the random factor. Therefore the Scheffe model has a natural application for estimating genetic variances and heritabilities in populations using a defined mixture of environments. Procedures and assumptions necessary for these applications of the models are discussed. While exact significance tests under the SAS model require balanced data and the assumptions that family effects are normally distributed with equal variances in the different environments, the model can be useful even when these conditions are not met (e.g., for providing an unbiased estimate of the across-environment genetic covariance). Contrary to statements in a recent paper, exact significance tests regarding the variance in marginal means as well as unbiased estimates can be readily obtained from unbalanced designs with no restrictive assumptions about the distributions or variance-covariance structure of family effects.  相似文献   

12.
In many applications where it is necessary to test multiple hypotheses simultaneously, the data encountered are discrete. In such cases, it is important for multiplicity adjustment to take into account the discreteness of the distributions of the p‐values, to assure that the procedure is not overly conservative. In this paper, we review some known multiple testing procedures for discrete data that control the familywise error rate, the probability of making any false rejection. Taking advantage of the fact that the exact permutation or exact pairwise permutation distributions of the p‐values can often be determined when the sample size is small, we investigate procedures that incorporate the dependence structure through the exact permutation distribution and propose two new procedures that incorporate the exact pairwise permutation distributions. A step‐up procedure is also proposed that accounts for the discreteness of the data. The performance of the proposed procedures is investigated through simulation studies and two applications. The results show that by incorporating both discreteness and dependency of p‐value distributions, gains in power can be achieved.  相似文献   

13.
L. Excoffier  P. E. Smouse    J. M. Quattro 《Genetics》1992,131(2):479-491
We present here a framework for the study of molecular variation within a single species. Information on DNA haplotype divergence is incorporated into an analysis of variance format, derived from a matrix of squared-distances among all pairs of haplotypes. This analysis of molecular variance (AMOVA) produces estimates of variance components and F-statistic analogs, designated here as phi-statistics, reflecting the correlation of haplotypic diversity at different levels of hierarchical subdivision. The method is flexible enough to accommodate several alternative input matrices, corresponding to different types of molecular data, as well as different types of evolutionary assumptions, without modifying the basic structure of the analysis. The significance of the variance components and phi-statistics is tested using a permutational approach, eliminating the normality assumption that is conventional for analysis of variance but inappropriate for molecular data. Application of AMOVA to human mitochondrial DNA haplotype data shows that population subdivisions are better resolved when some measure of molecular differences among haplotypes is introduced into the analysis. At the intraspecific level, however, the additional information provided by knowing the exact phylogenetic relations among haplotypes or by a nonlinear translation of restriction-site change into nucleotide diversity does not significantly modify the inferred population genetic structure. Monte Carlo studies show that site sampling does not fundamentally affect the significance of the molecular variance components. The AMOVA treatment is easily extended in several different directions and it constitutes a coherent and flexible framework for the statistical analysis of molecular data.  相似文献   

14.
A new approach to information is proposed with the intention of providing a conceptual tool adapted to biology, including a semantic value.Information involves a material support as well as a significance, adapted to the cognitive domain of the receiver and/or the transmitter. A message does not carry any information, only data. The receiver makes an identification by a procedure of recognition of the forms, which activate previously learned significance. This treatment leads to a new significance (or new knowledge).The notion of a probabilistic event is abandoned. The quantity of information is the product of the quantity of data (probability of recognition of the message) times the value of the significance, determined by its semantic level.  相似文献   

15.
Assuming a lognormally distributed measure of bioavailability, individual bioequivalence is defined as originally proposed by Anderson and Hauck (1990) and Wellek (1990; 1993). For the posterior probability of the associated statistical hypothesis with respect to a noninformative reference prior, a numerically efficient algorithm is constructed which serves as the building block of a procedure for computing exact rejection probabilities of the Bayesian test under arbitrary parameter constellations. By means of this tool, the Bayesian test can be shown to maintain the significance level without being over‐conservative and to yield gains in power of up to 30% as compared to the distribution‐free procedure which gained some popularity under the name TIER. Moreover, it is shown that the Bayesian construction also allows scaling of the probability‐based criterion with respect to the proportion of subjects exhibiting bioequivalent responses to repeated administrations of the reference formulation of the drug under study.  相似文献   

16.
The analysis of diallel crosses by including the components due to maternal effect and maternal interaction effects have been presented for Griffing method—1 (random effect model) and Griffing method—3 (fixed and random effect model). Wherever exact test of significance is not possible, testing procedure using Satterthwaite (1946) approximation has been presented.  相似文献   

17.
Many biological quantities cannot be measured directly but rather need to be estimated from models. Estimates from models are statistical objects with variance and, when derived simultaneously, covariance. It is well known that their variance–covariance (VC) matrix must be considered in subsequent analyses. Although it is always preferable to carry out the proposed analyses on the raw data themselves, a two‐step approach cannot always be avoided. This situation arises when the parameters of a multinomial must be regressed against a covariate. The Delta method is an appropriate and frequently recommended way of deriving variance approximations of transformed and correlated variables. Implementing the Delta method is not trivial, and there is a lack of a detailed information on the procedure in the literature for complex situations such as those involved in constraining the parameters of a multinomial distribution. This paper proposes a how‐to guide for calculating the correct VC matrices of dependant estimates involved in multinomial distributions and how to use them for testing the effects of covariates in post hoc analyses when the integration of these analyses directly into a model is not possible. For illustrative purpose, we focus on variables calculated in capture–recapture models, but the same procedure can be applied to all analyses dealing with correlated estimates with multinomial distribution and their variances and covariances.  相似文献   

18.
Eriksson J  Fenyö D 《Proteomics》2002,2(3):262-270
A rapid and accurate method for testing the significance of protein identities determined by mass spectrometric analysis of protein digests and genome database searching is presented. The method is based on direct computation using a statistical model of the random matching of measured and theoretical proteolytic peptide masses. Protein identification algorithms typically rank the proteins of a genome database according to a score based on the number of matches between the masses obtained by mass spectrometry analysis and the theoretical proteolytic peptide masses of a database protein. The random matching of experimental and theoretical masses can cause false results. A result is significant only if the score characterizing the result deviates significantly from the score expected from a false result. A distribution of the score (number of matches) for random (false) results is computed directly from our model of the random matching, which allows significance testing under any experimental and database search constraints. In order to mimic protein identification data quality in large-scale proteome projects, low-to-high quality proteolytic peptide mass data were generated in silico and subsequently submitted to a database search program designed to include significance testing based on direct computation. This simulation procedure demonstrates the usefulness of direct significance testing for automatically screening for samples that must be subjected to peptide sequence analysis by e.g. tandem mass spectrometry in order to determine the protein identity.  相似文献   

19.
It is well known that Cornfield 's confidence interval of the odds ratio with the continuity correction can mimic the performance of the exact method. Furthermore, because the calculation procedure of using the former is much simpler than that of using the latter, Cornfield 's confidence interval with the continuity correction is highly recommended by many publications. However, all these papers that draw this conclusion are on the basis of examining the coverage probability exclusively. The efficiency of the resulting confidence intervals is completely ignored. This paper calculates and compares the coverage probability and the average length for Woolf s logit interval estimator, Gart 's logit interval estimator of adding 0.50, Cornfield 's interval estimator with the continuity correction, and Cornfield 's interval estimator without the continuity correction in a variety of situations. This paper notes that Cornfield 's interval estimator with the continuity correction is too conservative, while Cornfield 's method without the continuity correction can improve efficiency without sacrificing the accuracy of the coverage probability. This paper further notes that when the sample size is small (say, 20 or 30 per group) and the probability of exposure in the control group is small (say, 0.10) or large (say, 0.90), using Cornfield 's method without the continuity correction is likely preferable to all the other estimators considered here. When the sample size is large (say, 100 per group) or when the probability of exposure in the control group is moderate (say, 0.50), Gart 's logit interval estimator is probably the best.  相似文献   

20.
We present a BAYEsian theory for testing model adequacy which provides exact results whether the model is linear or nonlinear in the parameters. We consider two cases: (a) replicated data exists, and (b) no replicated data exists but an external estimate of the variance is available. Furthermore, we derive a useful approximation for testing model adequacy when the model is nonlinear in the parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号