首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
BACKGROUND: Comparing distributions of data is an important goal in many applications. For example, determining whether two samples (e.g., a control and test sample) are statistically significantly different is useful to detect a response, or to provide feedback regarding instrument stability by detecting when collected data varies significantly over time. METHODS: We apply a variant of the chi-squared statistic to comparing univariate distributions. In this variant, a control distribution is divided such that an equal number of events fall into each of the divisions, or bins. This approach is thereby a mini-max algorithm, in that it minimizes the maximum expected variance for the control distribution. The control-derived bins are then applied to test sample distributions, and a normalized chi-squared value is computed. We term this algorithm Probability Binning. RESULTS: Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events derived from the same distribution. Based on this distribution, we derive a conversion of any given chi-squared value into a metric that is analogous to a t-score, i.e., it can be used to estimate the probability that a test distribution is different from a control distribution. We demonstrate that this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. Finally, we demonstrate the applicability of this metric to ranking immunophenotyping distributions to suggest that it indeed can be used to objectively determine the relative distance of distributions compared to a single control. CONCLUSION: Probability Binning, as shown here, provides a useful metric for determining the probability that two or more flow cytometric data distributions are different. This metric can also be used to rank distributions to identify which are most similar or dissimilar. In addition, the algorithm can be used to quantitate contamination of even highly-overlapping populations. Finally, as demonstrated in an accompanying paper, Probability Binning can be used to gate on events that represent significantly different subsets from a control sample. Published 2001 Wiley-Liss, Inc.  相似文献   

2.
BACKGROUND: While several algorithms for the comparison of univariate distributions arising from flow cytometric analyses have been developed and studied for many years, algorithms for comparing multivariate distributions remain elusive. Such algorithms could be useful for comparing differences between samples based on several independent measurements, rather than differences based on any single measurement. It is conceivable that distributions could be completely distinct in multivariate space, but unresolvable in any combination of univariate histograms. Multivariate comparisons could also be useful for providing feedback about instrument stability, when only subtle changes in measurements are occurring. METHODS: We apply a variant of Probability Binning, described in the accompanying article, to multidimensional data. In this approach, hyper-rectangles of n dimensions (where n is the number of measurements being compared) comprise the bins used for the chi-squared statistic. These hyper-dimensional bins are constructed such that the control sample has the same number of events in each bin; the bins are then applied to the test samples for chi-squared calculations. RESULTS: Using a Monte-Carlo simulation, we determined the distribution of chi-squared values obtained by comparing sets of events from the same distribution; this distribution of chi-squared values was identical as for the univariate algorithm. Hence, the same formulae can be used to construct a metric, analogous to a t-score, that estimates the probability with which distributions are distinct. As for univariate comparisons, this metric scales with the difference between two distributions, and can be used to rank samples according to similarity to a control. We apply the algorithm to multivariate immunophenotyping data, and demonstrate that it can be used to discriminate distinct samples and to rank samples according to a biologically-meaningful difference. CONCLUSION: Probability binning, as shown here, provides a useful metric for determining the probability with which two or more multivariate distributions represent distinct sets of data. The metric can be used to identify the similarity or dissimilarity of samples. Finally, as demonstrated in the accompanying paper, the algorithm can be used to gate on events in one sample that are different from a control sample, even if those events cannot be distinguished on the basis of any combination of univariate or bivariate displays. Published 2001 Wiley-Liss, Inc.  相似文献   

3.
Baggerly KA 《Cytometry》2001,45(2):141-150
BACKGROUND: A key problem in immunohistochemistry is assessing when two sample histograms are significantly different. One test that is commonly used for this purpose in the univariate case is the chi-squared test. Comparing multivariate distributions is qualitatively harder, as the "curse of dimensionality" means that the number of bins can grow exponentially. For the chi-squared test to be useful, data-dependent binning methods must be employed. An example of how this can be done is provided by the "probability binning" method of Roederer et al. (1,2,3). METHODS: We derive the theoretical distribution of the probability binning statistic, giving it a more rigorous foundation. We show that the null distribution is a scaled chi-square, and show how it can be related to the standard chi-squared statistic. RESULTS: A small simulation shows how the theoretical results can be used to (a) modify the probability binning statistic to make it more sensitive and (b) suggest variant statistics which, while still exploiting the data-dependent strengths of the probability binning procedure, may be easier to work with. CONCLUSIONS: The probability binning procedure effectively uses adaptive binning to locate structure in high-dimensional data. The derivation of a theoretical basis provides a more detailed interpretation of its behavior and renders the probability binning method more flexible.  相似文献   

4.
In this paper, we derive score test statistics to discriminate between proportional hazards and proportional odds models for grouped survival data. These models are embedded within a power family transformation in order to obtain the score tests. In simple cases, some small-sample results are obtained for the score statistics using Monte Carlo simulations. Score statistics have distributions well approximated by the chi-squared distribution. Real examples illustrate the proposed tests.  相似文献   

5.
Chen Y  Liang KY 《Biometrika》2010,97(3):603-620
This paper considers the asymptotic distribution of the likelihood ratio statistic T for testing a subset of parameter of interest θ, θ = (γ, η), H(0) : γ = γ(0), based on the pseudolikelihood L(θ, ??), where ?? is a consistent estimator of ?, the nuisance parameter. We show that the asymptotic distribution of T under H(0) is a weighted sum of independent chi-squared variables. Some sufficient conditions are provided for the limiting distribution to be a chi-squared variable. When the true value of the parameter of interest, θ(0), or the true value of the nuisance parameter, ?(0), lies on the boundary of parameter space, the problem is shown to be asymptotically equivalent to the problem of testing the restricted mean of a multivariate normal distribution based on one observation from a multivariate normal distribution with misspecified covariance matrix, or from a mixture of multivariate normal distributions. A variety of examples are provided for which the limiting distributions of T may be mixtures of chi-squared variables. We conducted simulation studies to examine the performance of the likelihood ratio test statistics in variance component models and teratological experiments.  相似文献   

6.
There are many epidemiologic studies or clinical trials, in which we may wish to establish an equivalence rather than to detect a difference between the distributions of responses. In this paper, we develop test procedures to detect equivalence with respect to the tail marginal distributions and the marginal proportions when the underlying data are on an ordinal scale with matched pairs. We include a numerical example concerning the unaided distance vision of two eyes over 7477 women to illustrate the practical usefulness of the proposed procedure. Finally, we include a brief discussion on the relation between the test procedures developed here and an asymptotic interval estimator proposed elsewhere for the simple difference in dichotomous data with matched‐pairs.  相似文献   

7.
A common testing problem for a life table or survival data is to test the equality of two survival distributions when the data is both grouped and censored. Several tests have been proposed in the literature which require various assumptions about the censoring distributions. It is shown that if these conditions are relaxed then the tests may no longer have the stated properties. The maximum likelihood test of equality when no assumptions are made about the censoring marginal distributions is derived. The properties of the test are found and it is compared to the existing tests. The fact that no assumptions are required about the censoring distributions make the test a useful initial testing procedure.  相似文献   

8.
This article discusses specific assumptions necessary for permutation multiple tests to control the Familywise Error Rate (FWER). At issue is that, in comparing parameters of the marginal distributions of two sets of multivariate observations, validity of permutation testing is affected by all the parameters in the joint distributions of the observations. We show the surprising fact that, in the case of a linear model with i.i.d. errors such as in the analysis of Quantitative Trait Loci (QTL), this issue has no impact on control of FWER, if the test statistic is of a particular form. On the other hand, in the analysis of gene expression levels or multiple safety endpoints, unless some assumption connecting the marginal distributions of the observations to their joint distributions is made, permutation multiple tests may not control FWER.  相似文献   

9.
A comparison has been made between the estimates obtained from maximum likelihood estimation of gamma, inverse normal, and normal distribution models for stage-frequency data. Results have been compared for six of sets of test data, and from many sets of simulated data. It is concluded that (1) some estimates may differ substantially between the models, (2) estimates from the correct model have little bias, and estimated standard errors are generally close to theoretical values, (3) there are problems in determining degrees of freedom for chi-squared goodness of fit tests, so that it is best to compare test statistics with simulated distributions, and (4) goodness of fit tests may not discriminate well between the three models.  相似文献   

10.
Pang Z  Kuk AY 《Biometrics》2007,63(1):218-227
Exchangeable binary data are often collected in developmental toxicity and other studies, and a whole host of parametric distributions for fitting this kind of data have been proposed in the literature. While these distributions can be matched to have the same marginal probability and intra-cluster correlation, they can be quite different in terms of shape and higher-order quantities of interest such as the litter-level risk of having at least one malformed fetus. A sensible alternative is to fit a saturated model (Bowman and George, 1995, Journal of the American Statistical Association 90, 871-879) using the expectation-maximization (EM) algorithm proposed by Stefanescu and Turnbull (2003, Biometrics 59, 18-24). The assumption of compatibility of marginal distributions is often made to link up the distributions for different cluster sizes so that estimation can be based on the combined data. Stefanescu and Turnbull proposed a modified trend test to test this assumption. Their test, however, fails to take into account the variability of an estimated null expectation and as a result leads to inaccurate p-values. This drawback is rectified in this article. When the data are sparse, the probability function estimated using a saturated model can be very jagged and some kind of smoothing is needed. We extend the penalized likelihood method (Simonoff, 1983, Annals of Statistics 11, 208-218) to the present case of unequal cluster sizes and implement the method using an EM-type algorithm. In the presence of covariate, we propose a penalized kernel method that performs smoothing in both the covariate and response space. The proposed methods are illustrated using several data sets and the sampling and robustness properties of the resulting estimators are evaluated by simulations.  相似文献   

11.
Cheng J 《Biometrics》2009,65(1):96-103
Summary .  This article considers the analysis of two-arm randomized trials with noncompliance, which have a multinomial outcome. We first define the causal effect in these trials as some function of outcome distributions of compliers with and without treatment (e.g., the complier average causal effect, the measure of stochastic superiority of treatment over control for compliers), then estimate the causal effect with the likelihood method. Next, based on the likelihood-ratio (LR) statistic, we test those functions of or the equality of the outcome distributions of compliers with and without treatment. Although the corresponding LR statistic follows a chi-squared  (χ2)  distribution asymptotically when the true values of parameters are in the interior of the parameter space under the null, its asymptotic distribution is not  χ2  when the true values of parameters are on the boundary of the parameter space under the null. Therefore, we propose a bootstrap/double bootstrap version of a LR test for the causal effect in these trials. The methods are illustrated by an analysis of data from a randomized trial of an encouragement intervention to improve adherence to prescribed depression treatments among depressed elderly patients in primary care practices.  相似文献   

12.
Modeling functional data with spatially heterogeneous shape characteristics   总被引:1,自引:0,他引:1  
We propose a novel class of models for functional data exhibiting skewness or other shape characteristics that vary with spatial or temporal location. We use copulas so that the marginal distributions and the dependence structure can be modeled independently. Dependence is modeled with a Gaussian or t-copula, so that there is an underlying latent Gaussian process. We model the marginal distributions using the skew t family. The mean, variance, and shape parameters are modeled nonparametrically as functions of location. A computationally tractable inferential framework for estimating heterogeneous asymmetric or heavy-tailed marginal distributions is introduced. This framework provides a new set of tools for increasingly complex data collected in medical and public health studies. Our methods were motivated by and are illustrated with a state-of-the-art study of neuronal tracts in multiple sclerosis patients and healthy controls. Using the tools we have developed, we were able to find those locations along the tract most affected by the disease. However, our methods are general and highly relevant to many functional data sets. In addition to the application to one-dimensional tract profiles illustrated here, higher-dimensional extensions of the methodology could have direct applications to other biological data including functional and structural magnetic resonance imaging (MRI).  相似文献   

13.
Quantitative immunoelectron microscopy often involves determining the distributions of gold label in different intracellular compartments and then drawing comparisons between compartments in the same sample of cells or between experimental groups of cells. In the case of within-group comparisons, recent developments in the estimation of relative labelling index and labelling density make it possible to test whether or not particular compartments are preferentially labelled. These methods are ideally suited to analysing gold label restricted to volume (organelle) or surface (membrane) compartments but may be modified to analyse label localised in mixtures of both. Here, a simple and efficient approach to drawing between-group comparisons for label associated with organelles and/or membranes is presented. The method relies on multistage random sampling of specimens (via blocks and microscopic fields) followed by simply counting gold particles associated with different compartments. The distributions of raw gold counts in different groups are then compared by contingency table analysis with statistical degrees of freedom for chi-squared values being determined by the number of compartments and the number of experimental groups of cells. Compartmental chi-squared values making substantial contributions to the total chi-squared values then identify where the main between-group differences reside. The method requires no information about compartment size (for example, organelle profile area or membrane trace length) and does not even depend critically on standardising between-group magnification. Its application is illustrated using datasets from immunolabelling studies designed to localise the KDEL receptor, phosphatidyl-inositol 4,5-bisphosphate, GLUT4 and rab4 at the electron microscopic level.  相似文献   

14.
A modified Bonferroni method for discrete data   总被引:5,自引:1,他引:4  
R E Tarone 《Biometrics》1990,46(2):515-522
The Bonferroni adjustment for multiple comparisons is a simple and useful method of controlling the overall false positive error rate when several significance tests are performed in the evaluation of an experiment. In situations with categorical data, the test statistics have discrete distributions. The discreteness of the null distributions can be exploited to reduce the number of significance tests taken into account in the Bonferroni procedure. This reduction is accomplished by using only the information contained in the marginal totals.  相似文献   

15.
Decady YJ  Thomas DR 《Biometrics》2000,56(3):893-896
Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.  相似文献   

16.
Several authors have noted the dependence of kappa measures of inter-rater agreement on the marginal distributions of contingency tables displaying the joint ratings. This paper introduces a smoothed version of kappa computed after raking the table to achieve pre-specified marginal distributions. A comparison of kappa with raked kappa for various margins can indicate the extent of the dependence on the margins, and can indicate how much of the lack of agreement is due to marginal heterogeneity.  相似文献   

17.
A fundamental problem in bioinformatics is to characterize the secondary structure of a protein, which has traditionally been carried out by examining a scatterplot (Ramachandran plot) of the conformational angles. We examine two natural bivariate von Mises distributions--referred to as Sine and Cosine models--which have five parameters and, for concentrated data, tend to a bivariate normal distribution. These are analyzed and their main properties derived. Conditions on the parameters are established which result in bimodal behavior for the joint density and the marginal distribution, and we note an interesting situation in which the joint density is bimodal but the marginal distributions are unimodal. We carry out comparisons of the two models, and it is seen that the Cosine model may be preferred. Mixture distributions of the Cosine model are fitted to two representative protein datasets using the expectation maximization algorithm, which results in an objective partition of the scatterplot into a number of components. Our results are consistent with empirical observations; new insights are discussed.  相似文献   

18.
Quantitative immunoelectron microscopy of gold label in intracellular compartments often involves calculating labelling densities (LDs). These are related to antigen concentrations and usually refer gold particle counts to the sizes of compartments on sections (for example, golds per microm(2) of organelle profile area or per microm of membrane trace length). Here, we show how LD values can be estimated more simply (without estimating areas or lengths) and also how observed and expected LD values can be used to calculate a relative labelling index (RLI) for each compartment and then test statistically for preferential (non-random) labelling. For random labelling, RLI=1. Compartment size is estimated stereologically by superimposing random test points (which hit organelle profiles in proportion to their area) or test lines (which intersect membrane traces in proportion to their length). By this means, the observed LD of a compartment (LD(obs)) can be expressed simply as golds per test point (organelles) or per intersection (membranes). Furthermore, the LD obtained by dividing total golds (on all compartments) by total points or intersections (on all compartments) is the value to be expected (LD(exp)) when compartments label randomly. For each compartment, RLI=LD(obs)/LD(exp). Statistical analysis is undertaken by comparing observed distributions of golds with predicted random distributions (calculated from point or intersection counts). A compartment is preferentially labelled if two criteria are met: (1) its RLI>1 (i.e. LD(obs) is greater than LD(exp)) and (2) its partial chi-squared value makes a substantial contribution to total chi-squared value. This approach provides a simple and efficient way of comparing LDs in different compartments. Its utility is illustrated using data from VPARP and LAMP-1 labelling experiments.  相似文献   

19.
The marginal Cox model approach is perhaps the most commonly used method in the analysis of correlated failure time data (Cai, 1999; Cai and Prentice, 1995; Lin, 1994; Wei, Lin and Weissfeld, 1989). It assumes that the marginal distributions for the correlated failure times can be described by the Cox model and leaves the dependence structure completely unspecified. This paper discusses the assessment of the marginal Cox model for correlated interval-censored data and a goodness-of-fit test is presented for the problem. The method is applied to a set of correlated interval-censored data arising from an AIDS clinical trial.  相似文献   

20.
Agreement coefficients quantify how well a set of instruments agree in measuring some response on a population of interest. Many standard agreement coefficients (e.g. kappa for nominal, weighted kappa for ordinal, and the concordance correlation coefficient (CCC) for continuous responses) may indicate increasing agreement as the marginal distributions of the two instruments become more different even as the true cost of disagreement stays the same or increases. This problem has been described for the kappa coefficients; here we describe it for the CCC. We propose a solution for all types of responses in the form of random marginal agreement coefficients (RMACs), which use a different adjustment for chance than the standard agreement coefficients. Standard agreement coefficients model chance agreement using expected agreement between two independent random variables each distributed according to the marginal distribution of one of the instruments. RMACs adjust for chance by modeling two independent readings both from the mixture distribution that averages the two marginal distributions. In other words, both independent readings represent first a random choice of instrument, then a random draw from the marginal distribution of the chosen instrument. The advantage of the resulting RMAC is that differences between the two marginal distributions will not induce greater apparent agreement. As with the standard agreement coefficients, the RMACs do not require any assumptions about the bivariate distribution of the random variables associated with the two instruments. We describe the RMAC for nominal, ordinal and continuous data, and show through the delta method how to approximate the variances of some important special cases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号