首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It is sometimes claimed that different types of size data in biology follow a power law. Here, a formal statistical test of the power law for discrete size data is described. The test is based on embedding the power law in the nonparametric family of distributions for which frequency is nonincreasing with size. A parametric bootstrap is used to assess significance. The test is applied to four data sets concerning the frequency of genera of different sizes. The power law is rejected in three out of four cases.  相似文献   

2.
Tao Sun  Yu Cheng  Ying Ding 《Biometrics》2023,79(3):1713-1725
Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.  相似文献   

3.
Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.  相似文献   

4.
A stochastic model to analyze clonal data on multi-type cell populations   总被引:1,自引:0,他引:1  
This article presents a stochastic model designed to analyze experimental data on the development of cell clones composed of two (or more) distinct types of cells. The proposed model is an extension of the traditional multi-type Bellman-Harris branching stochastic process allowing for nonidentical time-to-transformation distributions defined for different cell types. A simulated pseudo likelihood method has been developed for the parametric statistical inference from experimental data on cell clones under the proposed model. The method uses simulation-based approximations of the means and the variance-covariance matrices of cell counts. The proposed estimator for the vector of unknown parameters is strongly consistent and asymptotically normal under mild regularity conditions, while its variance-covariance matrix is estimated by the parametric bootstrap. A Monte Carlo Wald test is proposed for the test of hypotheses. Finite sample properties of the estimator have been studied by computer simulations. The model and associated methods of parametric inference have been applied to the analysis of proliferation and differentiation of cultured O-2A progenitor cells that play a key role in the development of the central nervous system. It follows from this analysis that the time to division of the progenitor cell and the time to its differentiation (into an oligodendrocyte) are not identically distributed. This biological finding suggests that a molecular event determining the type of cell transformation is more likely to occur at the start rather than at the end of the mitotic cycle.  相似文献   

5.
Advances in sequencing technologies and bioinformatics tools have vastly improved our ability to collect and analyze data from complex microbial communities. A major goal of microbiome studies is to correlate the overall microbiome composition with clinical or environmental variables. La Rosa et al. recently proposed a parametric test for comparing microbiome populations between two or more groups of subjects. However, this method is not applicable for testing the association between the community composition and a continuous variable. Although multivariate nonparametric methods based on permutations are widely used in ecology studies, they lack interpretability and can be inefficient for analyzing microbiome data. We consider the problem of testing for independence between the microbial community composition and a continuous or many-valued variable. By partitioning the range of the variable into a few slices, we formulate the problem as a problem of comparing multiple groups of microbiome samples, with each group indexed by a slice. To model multivariate and over-dispersed count data, we use the Dirichlet-multinomial distribution. We propose an adaptive likelihood-ratio test by learning a good partition or slicing scheme from the data. A dynamic programming algorithm is developed for numerical optimization. We demonstrate the superiority of the proposed test by numerically comparing it with that of La Rosa et al. and other popular approaches on the same topic including PERMANOVA, the distance covariance test, and the microbiome regression-based kernel association test. We further apply it to test the association of gut microbiome with age in three geographically distinct populations and show how the learned partition facilitates differential abundance analysis.  相似文献   

6.
Tan M  Fang HB  Tian GL  Houghton PJ 《Biometrics》2002,58(3):612-620
In cancer drug development, demonstrating activity in xenograft models, where mice are grafted with human cancer cells, is an important step in bringing a promising compound to humans. A key outcome variable is the tumor volume measured in a given period of time for groups of mice given different doses of a single or combination anticancer regimen. However, a mouse may die before the end of a study or may be sacrificed when its tumor volume quadruples, and its tumor may be suppressed for some time and then grow back. Thus, incomplete repeated measurements arise. The incompleteness or missingness is also caused by drastic tumor shrinkage (<0.01 cm3) or random truncation. Because of the small sample sizes in these models, asymptotic inferences are usually not appropriate. We propose two parametric test procedures based on the EM algorithm and the Bayesian method to compare treatment effects among different groups while accounting for informative censoring. A real xenograft study on a new antitumor agent, temozolomide, combined with irinotecan is analyzed using the proposed methods.  相似文献   

7.
We present two tests for seasonal trend in monthly incidence data. The first approach uses a penalized likelihood to choose the number of harmonic terms to include in a parametric harmonic model (which includes time trends and autogression as well as seasonal harmonic terms) and then tests for seasonality using a parametric bootstrap test. The second approach uses a semiparametric regression model to test for seasonal trend. In the semiparametric model, the seasonal pattern is modeled nonparametrically, parametric terms are included for autoregressive effects and a linear time trend, and a parametric bootstrap test is used to test for seasonality. For both procedures, a null distribution is generated under a null Poisson model with time trends and autoregression parameters.We apply the methods to skin melanoma incidence rates collected by the surveillance, epidemiology, and end results (SEER) program of the National Cancer Institute, and perform simulation studies to evaluate the type I error rate and power for the two procedures. These simulations suggest that both procedures are alpha-level procedures. In addition, the harmonic model/bootstrap test had similar or larger power than the semiparametric model/bootstrap test for a wide range of alternatives, and the harmonic model/bootstrap test is much easier to implement. Thus, we recommend the harmonic model/bootstrap test for the analysis of seasonal incidence data.  相似文献   

8.
Pang Z  Kuk AY 《Biometrics》2007,63(1):218-227
Exchangeable binary data are often collected in developmental toxicity and other studies, and a whole host of parametric distributions for fitting this kind of data have been proposed in the literature. While these distributions can be matched to have the same marginal probability and intra-cluster correlation, they can be quite different in terms of shape and higher-order quantities of interest such as the litter-level risk of having at least one malformed fetus. A sensible alternative is to fit a saturated model (Bowman and George, 1995, Journal of the American Statistical Association 90, 871-879) using the expectation-maximization (EM) algorithm proposed by Stefanescu and Turnbull (2003, Biometrics 59, 18-24). The assumption of compatibility of marginal distributions is often made to link up the distributions for different cluster sizes so that estimation can be based on the combined data. Stefanescu and Turnbull proposed a modified trend test to test this assumption. Their test, however, fails to take into account the variability of an estimated null expectation and as a result leads to inaccurate p-values. This drawback is rectified in this article. When the data are sparse, the probability function estimated using a saturated model can be very jagged and some kind of smoothing is needed. We extend the penalized likelihood method (Simonoff, 1983, Annals of Statistics 11, 208-218) to the present case of unequal cluster sizes and implement the method using an EM-type algorithm. In the presence of covariate, we propose a penalized kernel method that performs smoothing in both the covariate and response space. The proposed methods are illustrated using several data sets and the sampling and robustness properties of the resulting estimators are evaluated by simulations.  相似文献   

9.
The copula of a bivariate distribution, constructed by making marginal transformations of each component, captures all the information in the bivariate distribution about the dependence between two variables. For frailty models for bivariate data the choice of a family of distributions for the random frailty corresponds to the choice of a parametric family for the copula. A class of tests of the hypothesis that the copula is in a given parametric family, with unspecified association parameter, based on bivariate right censored data is proposed. These tests are based on first making marginal Kaplan-Meier transformations of the data and then comparing a non-parametric estimate of the copula to an estimate based on the assumed family of models. A number of options are available for choosing the scale and the distance measure for this comparison. Significance levels of the test are found by a modified bootstrap procedure. The procedure is used to check the appropriateness of a gamma or a positive stable frailty model in a set of survival data on Danish twins.  相似文献   

10.
Lin S 《Human heredity》2002,53(2):103-112
We have previously proposed a confidence set approach for finding tightly linked genomic regions under the setting of parametric linkage analysis. In this article, we extend the confidence set approach to nonparametric linkage analysis of affected sib pair (ASP) data based on their identity-by-descent (IBD) information. Two well-known statistics in nonparametric linkage analysis, the Two-IBD test (proportion of ASPs sharing two alleles IBD), and the Mean test (average number of alleles shared IBD in the ASPs), are used for constructing confidence sets. Some numerical analyses as well as a simulation study were carried out to demonstrate the utility of the methods. Our results show that the fundamental advantages of the confidence set approach in parametric linkage analysis are retained when the method is generalized to nonparametric analysis. Our study on the accuracy of confidence sets, in terms of choice of tests, underlying disease incidence data, and amount of data available, leads us to conclude, among other things, that the Mean test outperforms the Two-IBD test in most situations, with the reverse being true only for traits with small additive variance. Although we describe how to construct confidence sets based on only two familiar tests, one can construct confidence sets similarly using other allele sharing statistics.  相似文献   

11.
Zhang D 《Biometrics》2004,60(1):8-15
The routinely assumed parametric functional form in the linear predictor of a generalized linear mixed model for longitudinal data may be too restrictive to represent true underlying covariate effects. We relax this assumption by representing these covariate effects by smooth but otherwise arbitrary functions of time, with random effects used to model the correlation induced by among-subject and within-subject variation. Due to the usually intractable integration involved in evaluating the quasi-likelihood function, the double penalized quasi-likelihood (DPQL) approach of Lin and Zhang (1999, Journal of the Royal Statistical Society, Series B61, 381-400) is used to estimate the varying coefficients and the variance components simultaneously by representing a nonparametric function by a linear combination of fixed effects and random effects. A scaled chi-squared test based on the mixed model representation of the proposed model is developed to test whether an underlying varying coefficient is a polynomial of certain degree. We evaluate the performance of the procedures through simulation studies and illustrate their application with Indonesian children infectious disease data.  相似文献   

12.
Testing for differentially expressed genes with microarray data   总被引:1,自引:1,他引:0       下载免费PDF全文
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.  相似文献   

13.
In this paper, we introduce a new model for recurrent event data characterized by a baseline rate function fully parametric, which is based on the exponential‐Poisson distribution. The model arises from a latent competing risk scenario, in the sense that there is no information about which cause was responsible for the event occurrence. Then, the time of each recurrence is given by the minimum lifetime value among all latent causes. The new model has a particular case, which is the classical homogeneous Poisson process. The properties of the proposed model are discussed, including its hazard rate function, survival function, and ordinary moments. The inferential procedure is based on the maximum likelihood approach. We consider an important issue of model selection between the proposed model and its particular case by the likelihood ratio test and score test. Goodness of fit of the recurrent event models is assessed using Cox‐Snell residuals. A simulation study evaluates the performance of the estimation procedure in the presence of a small and moderate sample sizes. Applications on two real data sets are provided to illustrate the proposed methodology. One of them, first analyzed by our team of researchers, considers the data concerning the recurrence of malaria, which is an infectious disease caused by a protozoan parasite that infects red blood cells.  相似文献   

14.
We propose to analyze panel count data using a spline-based semiparametric projected generalized estimating equation (GEE) method with the proportional mean model E(N(t)|Z) = Λ(0)(t) e(β(0)(T)Z). The natural logarithm of the baseline mean function, logΛ(0)(t), is approximated by a monotone cubic B-spline function. The estimates of regression parameters and spline coefficients are obtained by projecting the GEE estimates into the feasible domain using a weighted isotonic regression (IR). The proposed method avoids assuming any parametric structure of the baseline mean function or any stochastic model for the underlying counting process. Selection of the working covariance matrix that accounts for overdispersion improves the estimation efficiency and leads to less biased variance estimations. Simulation studies are conducted using different working covariance matrices in the GEE to investigate finite sample performance of the proposed method, to compare the estimation efficiency, and to explore the performance of different variance estimates in presence of overdispersion. Finally, the proposed method is applied to a real data set from a bladder tumor clinical trial.  相似文献   

15.
Dimension reduction methods have been proposed for regression analysis with predictors of high dimension, but have not received much attention on the problems with censored data. In this article, we present an iterative imputed spline approach based on principal Hessian directions (PHD) for censored survival data in order to reduce the dimension of predictors without requiring a prespecified parametric model. Our proposal is to replace the right-censored survival time with its conditional expectation for adjusting the censoring effect by using the Kaplan-Meier estimator and an adaptive polynomial spline regression in the residual imputation. A sparse estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented in not only PHD, but also other methods developed for estimating the central mean subspace. Simulation studies with right-censored data are conducted for the imputed spline approach to PHD (IS-PHD) in comparison with two methods of sliced inverse regression, minimum average variance estimation, and naive PHD in ignorance of censoring. The results demonstrate that the proposed IS-PHD method is particularly useful for survival time responses approximating symmetric or bending structures. Illustrative applications to two real data sets are also presented.  相似文献   

16.
The cross-odds ratio is defined as the ratio of the conditional odds of the occurrence of one cause-specific event for one subject given the occurrence of the same or a different cause-specific event for another subject in the same cluster over the unconditional odds of occurrence of the cause-specific event. It is a measure of the association between the correlated cause-specific failure times within a cluster. The joint cumulative incidence function can be expressed as a function of the marginal cumulative incidence functions and the cross-odds ratio. Assuming that the marginal cumulative incidence functions follow a generalized semiparametric model, this paper studies the parametric regression modeling of the cross-odds ratio. A set of estimating equations are proposed for the unknown parameters and the asymptotic properties of the estimators are explored. Non-parametric estimation of the cross-odds ratio is also discussed. The proposed procedures are applied to the Danish twin data to model the associations between twins in their times to natural menopause and to investigate whether the association differs among monozygotic and dizygotic twins and how these associations have changed over time.  相似文献   

17.
Huggins R 《Biometrics》2000,56(2):537-545
In the study of longitudinal twin and family data, interest is often in the covariance structure of the data and the decomposition of this covariance structure into genetic and environmental components rather than in estimating the mean function. Various parametric models for covariance structures have been proposed but, e.g., in studies of children where growth spurts occur at various ages, it is difficult to a priori determine an appropriate parametric model for the covariance structure. In particular, there is a general lack of the visualization procedures, such as lowess, that are invaluable in the initial stages of constructing a parametric model for a mean function. Here we use kernel smoothing to modify a cross-sectional approach based on the sample covariance matrices to obtain smoothed estimates of the genetic and environmental variances and correlations for longitudinal twin data. The methods are proposed to be exploratory as an aid to parametric modeling rather than inferential, although approximate asymptotic standard errors are derived in the Appendix.  相似文献   

18.
Pairwise dependence diagnostics for clustered failure-time data   总被引:1,自引:0,他引:1  
Glidden  David V. 《Biometrika》2007,94(2):371-385
Frailty and copula models specify a parametric dependence structurefor multivariate failure-time data. Estimation of some jointquantities can be highly sensitive to the assumed parametricform, and hence model fit is an important issue. This paperlays out a general diagnostic framework for evaluating and selectingfrailty and copula models. The approach is based on the cumulativesum of residuals that are calculated in bivariate time. Theresiduals reflect the difference between the observed and expectedbivariate association structures. The proposed model-checkingprocess is interpretable with a limiting distribution whichcan be approximated using the bootstrap. Simulations and a dataexample illustrate the practical application of the method.  相似文献   

19.
The principal axis method is an age independent analysis method for dental attrition data which avoids many problems associated with earlier methods of analysis, correlation and regression. A principal axis equation is determined from the scatter of M1 on X and M2 on Y, and the slope of the equation can be used to indicate rate of wear. High slopes indicate rapid rates of wear. Because rate of wear rather than degree of wear is the parameter of interest, the procedure is age independent. Confidence regions can be calculated to test the distinctness of the slopes. Dental data from three Amerind skeletal samples, Indian Knoll, Hardin Site and Campbell Site, are used to illustrate the technique. Because least squares fits such as the principal axis solution are strongly influenced by the magnitude of variance in the data, two different methods of ordinal data collection are used in this test: Molnar's 1–8 and Scott's 4–40 scales. The Scott scoring system provides more satisfactory results when used in the principal axis analysis technique because of smaller confidence regions. The testing of the technique on other samples is urged.  相似文献   

20.
When modeling survival data, it is common to assume that the (log-transformed) survival time (T) is conditionally independent of the (log-transformed) censoring time (C) given a set of covariates. There are numerous situations in which this assumption is not realistic, and a number of correction procedures have been developed for different models. However, in most cases, either some prior knowledge about the association between T and C is required, or some auxiliary information or data is/are supposed to be available. When this is not the case, the application of many existing methods turns out to be limited. The goal of this paper is to overcome this problem by developing a flexible parametric model, that is a type of transformed linear model. We show that the association between T and C is identifiable in this model. The performance of the proposed method is investigated both in an asymptotic way and through finite sample simulations. We also develop a formal goodness-of-fit test approach to assess the quality of the fitted model. Finally, the approach is applied to data coming from a study on liver transplants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号