首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The observation that haplotypes from a particular region of the genome differ between affected and unaffected individuals or between chromosomes transmitted to affected individuals versus those not transmitted is sound evidence for a disease-liability mutation in the region. Tests for differentiation of haplotype distributions often take the form of either Pearson's chi(2) statistic or tests based on the similarity among haplotypes in the different populations. In this article, we show that many measures of haplotype similarity can be expressed in the same quadratic form, and we give the general form of the variance. As we describe, these methods can be applied to either phase-known or phase-unknown data. We investigate the performance of Pearson's chi(2) statistic and haplotype similarity tests through use of evolutionary simulations. We show that both approaches can be powerful, but under quite different conditions. Moreover, we show that the power of both approaches can be enhanced by clustering rare haplotypes from the distributions before performing a test.  相似文献   

2.
MOTIVATION: One of the recently developed statistics for identifying differentially expressed genetic networks is Hotelling T2 statistic, which is a quadratic form of difference in linear functions of means of gene expressions between two types of tissue samples, and so their power is limited. RESULTS: To improve the power of test statistics, a general statistical framework for construction of non-linear tests is presented, and two specific non-linear test statistics that use non-linear transformations of means are developed. Asymptotical distributions of the non-linear test statistics under the null and alternative hypothesis are derived. It has been proved that under some conditions the power of the non-linear test statistics is higher than that of the T2 statistic. Besides theory, to evaluate in practice the performance of the non-linear test statistics, they are applied to two real datasets. The preliminary results demonstrate that the P-values of the non-linear statistics for testing differential expressions of the genetic networks are much smaller than those of the T2 statistic. And furthermore simulations show the Type I errors of the non-linear statistics agree with the threshold used and the statistics fit the chi2 distribution. SUPPLEMENTARY INFORMATION: Supplementary data are available on Bioinformatics online.  相似文献   

3.
Lin DY  Wei LJ  Ying Z 《Biometrics》2002,58(1):1-12
Residuals have long been used for graphical and numerical examinations of the adequacy of regression models. Conventional residual analysis based on the plots of raw residuals or their smoothed curves is highly subjective, whereas most numerical goodness-of-fit tests provide little information about the nature of model misspecification. In this paper, we develop objective and informative model-checking techniques by taking the cumulative sums of residuals over certain coordinates (e.g., covariates or fitted values) or by considering some related aggregates of residuals, such as moving sums and moving averages. For a variety of statistical models and data structures, including generalized linear models with independent or dependent observations, the distributions of these stochastic processes tinder the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be easily generated by computer simulation. Each observed process can then be compared, both graphically and numerically, with a number of realizations from the Gaussian process. Such comparisons enable one to assess objectively whether a trend seen in a residual plot reflects model misspecification or natural variation. The proposed techniques are particularly useful in checking the functional form of a covariate and the link function. Illustrations with several medical studies are provided.  相似文献   

4.
A popular design for clinical trials assessing targeted therapies is the two-stage adaptive enrichment design with recruitment in stage 2 limited to a biomarker-defined subgroup chosen based on data from stage 1. The data-dependent selection leads to statistical challenges if data from both stages are used to draw inference on treatment effects in the selected subgroup. If subgroups considered are nested, as when defined by a continuous biomarker, treatment effect estimates in different subgroups follow the same distribution as estimates in a group-sequential trial. This result is used to obtain tests controlling the familywise type I error rate (FWER) for six simple subgroup selection rules, one of which also controls the FWER for any selection rule. Two approaches are proposed: one based on multivariate normal distributions suitable if the number of possible subgroups, k, is small, and one based on Brownian motion approximations suitable for large k. The methods, applicable in the wide range of settings with asymptotically normal test statistics, are illustrated using survival data from a breast cancer trial.  相似文献   

5.
In linear mixed‐effects models, random effects are used to capture the heterogeneity and variability between individuals due to unmeasured covariates or unknown biological differences. Testing for the need of random effects is a nonstandard problem because it requires testing on the boundary of parameter space where the asymptotic chi‐squared distribution of the classical tests such as likelihood ratio and score tests is incorrect. In the literature several tests have been proposed to overcome this difficulty, however all of these tests rely on the restrictive assumption of i.i.d. measurement errors. The presence of correlated errors, which often happens in practice, makes testing random effects much more difficult. In this paper, we propose a permutation test for random effects in the presence of serially correlated errors. The proposed test not only avoids issues with the boundary of parameter space, but also can be used for testing multiple random effects and any subset of them. Our permutation procedure includes the permutation procedure in Drikvandi, Verbeke, Khodadadi, and Partovi Nia (2013) as a special case when errors are i.i.d., though the test statistics are different. We use simulations and a real data analysis to evaluate the performance of the proposed permutation test. We have found that random slopes for linear and quadratic time effects may not be significant when measurement errors are serially correlated.  相似文献   

6.
7.
Lee OE  Braun TM 《Biometrics》2012,68(2):486-493
Inference regarding the inclusion or exclusion of random effects in linear mixed models is challenging because the variance components are located on the boundary of their parameter space under the usual null hypothesis. As a result, the asymptotic null distribution of the Wald, score, and likelihood ratio tests will not have the typical χ(2) distribution. Although it has been proved that the correct asymptotic distribution is a mixture of χ(2) distributions, the appropriate mixture distribution is rather cumbersome and nonintuitive when the null and alternative hypotheses differ by more than one random effect. As alternatives, we present two permutation tests, one that is based on the best linear unbiased predictors and one that is based on the restricted likelihood ratio test statistic. Both methods involve weighted residuals, with the weights determined by the among- and within-subject variance components. The null permutation distributions of our statistics are computed by permuting the residuals both within and among subjects and are valid both asymptotically and in small samples. We examine the size and power of our tests via simulation under a variety of settings and apply our test to a published data set of chronic myelogenous leukemia patients.  相似文献   

8.
Joint modeling of longitudinal data and survival data has been used widely for analyzing AIDS clinical trials, where a biological marker such as CD4 count measurement can be an important predictor of survival. In most of these studies, a normal distribution is used for modeling longitudinal responses, which leads to vulnerable inference in the presence of outliers in longitudinal measurements. Powerful distributions for robust analysis are normal/independent distributions, which include univariate and multivariate versions of the Student's t, the slash and the contaminated normal distributions in addition to the normal. In this paper, a linear‐mixed effects model with normal/independent distribution for both random effects and residuals and Cox's model for survival time are used. For estimation, a Bayesian approach using Markov Chain Monte Carlo is adopted. Some simulation studies are performed for illustration of the proposed method. Also, the method is illustrated on a real AIDS data set and the best model is selected using some criteria.  相似文献   

9.
Yin G  Li Y  Ji Y 《Biometrics》2006,62(3):777-787
A Bayesian adaptive design is proposed for dose-finding in phase I/II clinical trials to incorporate the bivariate outcomes, toxicity and efficacy, of a new treatment. Without specifying any parametric functional form for the drug dose-response curve, we jointly model the bivariate binary data to account for the correlation between toxicity and efficacy. After observing all the responses of each cohort of patients, the dosage for the next cohort is escalated, deescalated, or unchanged according to the proposed odds ratio criteria constructed from the posterior toxicity and efficacy probabilities. A novel class of prior distributions is proposed through logit transformations which implicitly imposes a monotonic constraint on dose toxicity probabilities and correlates the probabilities of the bivariate outcomes. We conduct simulation studies to evaluate the operating characteristics of the proposed method. Under various scenarios, the new Bayesian design based on the toxicity-efficacy odds ratio trade-offs exhibits good properties and treats most patients at the desirable dose levels. The method is illustrated with a real trial design for a breast medical oncology study.  相似文献   

10.
Summary We consider a problem of testing mixture proportions using two‐sample data, one from group one and the other from a mixture of groups one and two with unknown proportion, λ, for being in group two. Various statistical applications, including microarray study, infectious epidemiological studies, case–control studies with contaminated controls, clinical trials allowing “nonresponders,” genetic studies for gene mutation, and fishery applications can be formulated in this setup. Under the assumption that the log ratio of probability (density) functions from the two groups is linear in the observations, we propose a generalized score test statistic to test the mixture proportion. Under some regularity conditions, it is shown that this statistic converges to a weighted chi‐squared random variable under the null hypothesis of λ= 0 , where the weight depends only on the sampling fraction of both groups. The permutation method is used to provide more reliable finite sample approximation. Simulation results and two real data applications are presented.  相似文献   

11.
Case-control studies compare marker-allele distributions in affected and unaffected individuals, and significant results suggest linkage but may simply reflect population structure. For markers with m alleles (m > or = 2), a McNemar-like statistic, I, estimates the level of population association between marker and disease loci. To test for linkage after significant case-control tests, within-family tests are performed. These operate on the contingency table, with i, jth element equal to the number of parents that transmit marker allele Mi and do not transmit marker allele Mi to an affected offspring. The dimension of the table is the number of alleles at the marker locus. Three test statistics have recently been proposed in the literature: Tc compares symmetric pairs of cells (i, j) and (j, i), Tm compares row and column totals for the same marker allele, and a likelihood ratio statistic Tl uses all the cells in the table. In addition, we consider a new statistic, Tmhet, that uses only the heterozygous parents and is approximately chi2 with (m - 1) df. We use a Monte Carlo test to guarantee valid tests and to demonstrate the inferiority of Tc and the equality of Tm and Tl in terms of power. The power of the Tmhet test is close but not always equal to the power of the Tm test. We also show that under the alternative hypothesis of linkage, Tm is approximately noncentral chi2 with (m - 1) df and noncentrality parameter 2NT(1 - 2theta)2I*, when data on single affecteds in NT families are used. If the disease has a low population frequency, then I* is estimated using the case-control statistic I. This offers a basis for choosing sample size, or choosing a marker system.  相似文献   

12.
It is common in epidemiologic analyses to summarize continuous outcomes as falling above or below a threshold. With such a dichotomized outcome, the usual chi2 statistics for association or trend can be used to test for equality of proportions across strata of the study population. However, if the threshold is chosen to maximize the test statistic, the nominal chi2 reference distributions are incorrect. In this paper, the asymptotic distributions of maximally selected chi2 statistics for association and for trend for the k x 2 table are derived. The methodology is illustrated with data from an AIDS clinical trial. The results of simulation experiments that assess the accuracy of the asymptotic distributions in moderate sample sizes are also reported.  相似文献   

13.
Differences in sensory acuity and hedonic reactions to products lead to latent groups in pooled ratings data. Manufacturing locations and time differences also are sources of rating heterogeneity. Intensity and hedonic ratings are ordered categorical data. Categorical responses follow a multinomial distribution and this distribution can be applied to pooled data over trials if the multinomial probabilities are constant from trial to trial. The common test statistic used for comparing vectors of proportions or frequencies is the Pearson chi-square statistic. When ratings data are obtained from repeated ratings experiments or from a cluster sampling procedure, the covariance matrix for the vector of category proportions can differ dramatically from the one assumed for the multinomial model because of inter-trial. This effect is referred to as overdispersion. The standard multinomial model does not fit overdispersed multinomial data. The practical implication of this is that an inflated Type I error can result in a seriously erroneous conclusion. Another implication is that overdispersion is a measurable quantity that may be of interest because it can be used to signal the presence of latent segments. The Dirichlet-Multinomial (DM) model is introduced in this paper to fit overdispersed intensity and hedonic ratings data. Methods for estimating the parameters of the DM model and the test statistics based on them to test against a specified vector or compare vectors of proportions are given. A novel theoretical contribution of this paper is a method for calculating the power of the tests. This method is useful both in evaluating the tests and determining sample size and the number of trials. A test for goodness of fit of the multinomial model against the DM model is also given. The DM model can be extended further to the Generalized Dirichlet-Multinomial (GDM) model, in which multiple sources of variation are considered. The GDM model and its applications are discussed in this paper. Applications of the DM and GDM models in sensory and consumer research are illustrated using numerical examples.  相似文献   

14.
The paper deals with the quadratic invariant estimators of the linear functions of variance components in mixed linear model. The estimator with locally minimal mean square error with respect to a parameter ? is derived. Under the condition of normality of the vector Y the theoretical values of MSE of several types of estimators are compared in two different mixed models; under a different types of distributions a simulation study is carried out for the behaviour of derived estimators.  相似文献   

15.
Constraints arise naturally in many scientific experiments/studies such as in, epidemiology, biology, toxicology, etc. and often researchers ignore such information when analyzing their data and use standard methods such as the analysis of variance (ANOVA). Such methods may not only result in a loss of power and efficiency in costs of experimentation but also may result poor interpretation of the data. In this paper we discuss constrained statistical inference in the context of linear mixed effects models that arise naturally in many applications, such as in repeated measurements designs, familial studies and others. We introduce a novel methodology that is broadly applicable for a variety of constraints on the parameters. Since in many applications sample sizes are small and/or the data are not necessarily normally distributed and furthermore error variances need not be homoscedastic (i.e. heterogeneity in the data) we use an empirical best linear unbiased predictor (EBLUP) type residual based bootstrap methodology for deriving critical values of the proposed test. Our simulation studies suggest that the proposed procedure maintains the desired nominal Type I error while competing well with other tests in terms of power. We illustrate the proposed methodology by re-analyzing a clinical trial data on blood mercury level. The methodology introduced in this paper can be easily extended to other settings such as nonlinear and generalized regression models.  相似文献   

16.
The Kolmogorov-Smirnov test determines the consistency of empirical data with a particular probability distribution. Often, parameters in the distribution are unknown, and have to be estimated from the data. In this case, the Kolmogorov-Smirnov test depends on the form of the particular probability distribution under consideration, even when the estimated parameter-values are used within the distribution. In the present work, we address a less specific problem: to determine the consistency of data with a given functional form of a probability distribution (for example the normal distribution), without enquiring into values of unknown parameters in the distribution. For a wide class of distributions, we present a direct method for determining whether empirical data are consistent with a given functional form of the probability distribution. This utilizes a transformation of the data. If the data are from the class of distributions considered here, the transformation leads to an empirical distribution with no unknown parameters, and hence is susceptible to a standard Kolmogorov-Smirnov test. We give some general analytical results for some of the distributions from the class of distributions considered here. The significance level and power of the tests introduced in this work are estimated from simulations. Some biological applications of the method are given.  相似文献   

17.
Summary We propose a Bayesian chi‐squared model diagnostic for analysis of data subject to censoring. The test statistic has the form of Pearson's chi‐squared test statistic and is easy to calculate from standard output of Markov chain Monte Carlo algorithms. The key innovation of this diagnostic is that it is based only on observed failure times. Because it does not rely on the imputation of failure times for observations that have been censored, we show that under heavy censoring it can have higher power for detecting model departures than a comparable test based on the complete data. In a simulation study, we show that tests based on this diagnostic exhibit comparable power and better nominal Type I error rates than a commonly used alternative test proposed by Akritas (1988, Journal of the American Statistical Association 83, 222–230). An important advantage of the proposed diagnostic is that it can be applied to a broad class of censored data models, including generalized linear models and other models with nonidentically distributed and nonadditive error structures. We illustrate the proposed model diagnostic for testing the adequacy of two parametric survival models for Space Shuttle main engine failures.  相似文献   

18.
Liu D  Zhou XH 《Biometrics》2011,67(3):906-916
Covariate-specific receiver operating characteristic (ROC) curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this article, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates' effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted, and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form of the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.  相似文献   

19.
Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non‐target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess‐zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.  相似文献   

20.
Keith P. Lewis 《Oikos》2004,104(2):305-315
Ecologists rely heavily upon statistics to make inferences concerning ecological phenomena and to make management recommendations. It is therefore important to use statistical tests that are most appropriate for a given data-set. However, inappropriate statistical tests are often used in the analysis of studies with categorical data (i.e. count data or binary data). Since many types of statistical tests have been used in artificial nests studies, a review and comparison of these tests provides an opportunity to demonstrate the importance of choosing the most appropriate statistical approach for conceptual reasons as well as type I and type II errors.
Artificial nests have routinely been used to study the influences of habitat fragmentation, and habitat edges on nest predation. I review the variety of statistical tests used to analyze artificial nest data within the framework of the generalized linear model and argue that logistic regression is the most appropriate and flexible statistical test for analyzing binary data-sets. Using artificial nest data from my own studies and an independent data set from the medical literature as examples, I tested equivalent data using a variety of statistical methods. I then compared the p-values and the statistical power of these tests. Results vary greatly among statistical methods. Methods inappropriate for analyzing binary data often fail to yield significant results even when differences between study groups appear large, while logistic regression finds these differences statistically significant. Statistical power is is 2–3 times higher for logistic regression than for other tests. I recommend that logistic regression be used to analyze artificial nest data and other data-sets with binary data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号