首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Flexible estimation of multiple conditional quantiles is of interest in numerous applications, such as studying the effect of pregnancy-related factors on low and high birth weight. We propose a Bayesian nonparametric method to simultaneously estimate noncrossing, nonlinear quantile curves. We expand the conditional distribution function of the response in I-spline basis functions where the covariate-dependent coefficients are modeled using neural networks. By leveraging the approximation power of splines and neural networks, our model can approximate any continuous quantile function. Compared to existing models, our model estimates all rather than a finite subset of quantiles, scales well to high dimensions, and accounts for estimation uncertainty. While the model is arbitrarily flexible, interpretable marginal quantile effects are estimated using accumulative local effect plots and variable importance measures. A simulation study shows that our model can better recover quantiles of the response distribution when the data are sparse, and an analysis of birth weight data is presented.  相似文献   

2.
Summary Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change‐point, and discuss several methods for testing the existence of a change‐point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.  相似文献   

3.
This paper addresses the problem of estimating an age-at-death distribution or paleodemographic profile from osteological data. It is demonstrated that the classical two-stage procedure whereby one first constructs estimates of age-at-death of individual skeletons and then uses these age estimates to obtain a paleodemographic profile is not a correct approach. This is a consequence of Bayes' theorem. Instead, we demonstrate a valid approach that proceeds from the opposite starting point: given skeletal age-at-death, one first estimates the probability of assigning the skeleton into a specific osteological age-indicator stage. We show that this leads to a statistically valid method for obtaining a paleodemographic profile, and moreover, that valid individual age estimation itself requires a demographic profile and therefore is done subsequent to its construction. Individual age estimation thus becomes the last rather than the first step in the estimation procedure. A central concept of our statistical approach is that of a weight function. A weight function is associated with each osteological age-indicator stage or category, and provides the probability that a specific age indicator stage is observed, given age-at-death of the individual. We recommend that weight functions be estimated nonparametrically from a reference data set. In their entirety, the weight functions characterize the relevant stochastic properties of a chosen age indicator. For actual estimation of the paleodemographic profile, a parametric age distribution in the target sample is assumed. The maximum likelihood method is used to identify the unknown parameters of this distribution. As some components are estimated nonparametrically, one then has a semiparametric model. We show how to obtain valid estimates of individual age-at-death, confidence regions, and goodness-of-fit tests. The methods are illustrated with both real and simulated data.  相似文献   

4.
Censored quantile regression models, which offer great flexibility in assessing covariate effects on event times, have attracted considerable research interest. In this study, we consider flexible estimation and inference procedures for competing risks quantile regression, which not only provides meaningful interpretations by using cumulative incidence quantiles but also extends the conventional accelerated failure time model by relaxing some of the stringent model assumptions, such as global linearity and unconditional independence. Current method for censored quantile regressions often involves the minimization of the L1‐type convex function or solving the nonsmoothed estimating equations. This approach could lead to multiple roots in practical settings, particularly with multiple covariates. Moreover, variance estimation involves an unknown error distribution and most methods rely on computationally intensive resampling techniques such as bootstrapping. We consider the induced smoothing procedure for censored quantile regressions to the competing risks setting. The proposed procedure permits the fast and accurate computation of quantile regression parameter estimates and standard variances by using conventional numerical methods such as the Newton–Raphson algorithm. Numerical studies show that the proposed estimators perform well and the resulting inference is reliable in practical settings. The method is finally applied to data from a soft tissue sarcoma study.  相似文献   

5.
Holm's (1979) step-down multiple-testing procedure (MTP) is appealing for its flexibility, transparency, and general validity, but the derivation of corresponding simultaneous confidence regions has remained an unsolved problem. This article provides such confidence regions. In fact, simultanenous confidence regions are provided for any MTP in the class of short-cut consonant closed-testing procedures based on marginal p -values and weighted Bonferroni tests for intersection hypotheses considered by Hommel, Bretz and Maurer (2007). In addition to Holm's MTP, this class includes the fixed-sequence MTP, recently proposed gatekeeping MTPs, and the fallback MTP. The simultaneous confidence regions are generally valid if underlying marginal p -values and corresponding marginal confidence regions (assumed to be available) are valid. The marginal confidence regions and estimated quantities are not assumed to be of any particular kinds/dimensions. Compared to the rejections made by the MTP for the family of null hypotheses H under consideration, the proposed confidence regions provide extra free information. In particular, with Holm's MTP, such extra information is provided: for all nonrejected H s, in case not all H s are rejected; or for certain (possibly all) H s, in case all H s are rejected. In case not all H s are rejected, no extra information is provided for rejected H s. This drawback seems however difficult to overcome. Illustrations concerning clinical studies are given.  相似文献   

6.
The question as to the role that correlated activity plays in the coding of information in the brain continues to be one of the most important in neuroscience. One approach to understanding this role is to formally model the ensemble responses as multivariate probability distributions. We have previously introduced alternatives to linear assumptions of multivariate Gaussian dependence for spike timing in neural ensembles using the probabilistic copula approach. In probability theory the copula "couples" marginal distributions to form flexible multivariate distribution functions for characterizing ensemble behavior. The parametric copula can be factored out of the joint probability density, and as such is independent and isolated from the marginal densities. This greatly simplifies the analysis, and allows a direct examination of the shape of the dependence independent of the marginals. The shape of the copula function goes beyond describing the dependence with a single summarizing statistic. In this review, we illustrate the construction of the copula, and how it contributes to the analysis of information conveyed by populations of neurons.  相似文献   

7.
Royle JA 《Biometrics》2004,60(1):108-115
Spatial replication is a common theme in count surveys of animals. Such surveys often generate sparse count data from which it is difficult to estimate population size while formally accounting for detection probability. In this article, I describe a class of models (N-mixture models) which allow for estimation of population size from such data. The key idea is to view site-specific population sizes, N, as independent random variables distributed according to some mixing distribution (e.g., Poisson). Prior parameters are estimated from the marginal likelihood of the data, having integrated over the prior distribution for N. Carroll and Lombard (1985, Journal of American Statistical Association 80, 423-426) proposed a class of estimators based on mixing over a prior distribution for detection probability. Their estimator can be applied in limited settings, but is sensitive to prior parameter values that are fixed a priori. Spatial replication provides additional information regarding the parameters of the prior distribution on N that is exploited by the N-mixture models and which leads to reasonable estimates of abundance from sparse data. A simulation study demonstrates superior operating characteristics (bias, confidence interval coverage) of the N-mixture estimator compared to the Caroll and Lombard estimator. Both estimators are applied to point count data on six species of birds illustrating the sensitivity to choice of prior on p and substantially different estimates of abundance as a consequence.  相似文献   

8.
This paper proposes a semiparametric methodology for modeling multivariate and conditional distributions. We first build a multivariate distribution whose dependence structure is induced by a Gaussian copula and whose marginal distributions are estimated nonparametrically via mixtures of B‐spline densities. The conditional distribution of a given variable is obtained in closed form from this multivariate distribution. We take a Bayesian approach, using Markov chain Monte Carlo methods for inference. We study the frequentist properties of the proposed methodology via simulation and apply the method to estimation of conditional densities of summary statistics, used for computing conditional local false discovery rates, from genetic association studies of schizophrenia and cardiovascular disease risk factors.  相似文献   

9.
In the analysis of repeated measurements, multivariate regression models that account for the correlations among the observations from the same subject are widely used. Like the usual univariate regression models, these multivariate regression models also need some model diagnostic procedures. Though these models have been widely used, not many studies have been performed in model diagnostic areas. In this paper, we propose simple residual plots to investigate the goodness of model fit for repeated measures data. Here, we mainly focus on the mean model diagnostics. The proposed residual plots are based on the quantile‐quantile(Q–Q) plots of a χ2 distribution and a normal distribution. In particular, the proposed model is useful in comparing several models simultaneously. The proposed method is illustrated using two examples. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

10.
Species range shifts associated with environmental change or biological invasions are increasingly important study areas. However, quantifying range expansion rates may be heavily influenced by methodology and/or sampling bias. We compared expansion rate estimates of Roesel''s bush-cricket (Metrioptera roeselii, Hagenbach 1822), a nonnative species currently expanding its range in south-central Sweden, from range statistic models based on distance measures (mean, median, 95th gamma quantile, marginal mean, maximum, and conditional maximum) and an area-based method (grid occupancy). We used sampling simulations to determine the sensitivity of the different methods to incomplete sampling across the species'' range. For periods when we had comprehensive survey data, range expansion estimates clustered into two groups: (1) those calculated from range margin statistics (gamma, marginal mean, maximum, and conditional maximum: ˜3 km/year), and (2) those calculated from the central tendency (mean and median) and the area-based method of grid occupancy (˜1.5 km/year). Range statistic measures differed greatly in their sensitivity to sampling effort; the proportion of sampling required to achieve an estimate within 10% of the true value ranged from 0.17 to 0.9. Grid occupancy and median were most sensitive to sampling effort, and the maximum and gamma quantile the least. If periods with incomplete sampling were included in the range expansion calculations, this generally lowered the estimates (range 16–72%), with exception of the gamma quantile that was slightly higher (6%). Care should be taken when interpreting rate expansion estimates from data sampled from only a fraction of the full distribution. Methods based on the central tendency will give rates approximately half that of methods based on the range margin. The gamma quantile method appears to be the most robust to incomplete sampling bias and should be considered as the method of choice when sampling the entire distribution is not possible.  相似文献   

11.
This article considers global tests of differences between paired vectors of binomial probabilities, based on data from two dependent multivariate binary samples. Difference is defined as either an inhomogeneity in the marginal distributions or asymmetry in the joint distribution. For detecting the first type of difference, we propose a multivariate extension of McNemar's test and show that it is a generalized score test under a generalized estimating equations (GEE) approach. Univariate features such as the relationship between the Wald and score tests and the dropout of pairs with the same response carry over to the multivariate case and the test does not depend on the working correlation assumption among the components of the multivariate response. For sparse or imbalanced data, such as occurs when the number of variables is large or the proportions are close to zero, the test is best implemented using a bootstrap, and if this is computationally too complex, a permutation distribution. We apply the test to safety data for a drug, in which two doses are evaluated by comparing multiple responses by the same subjects to each one of them.  相似文献   

12.
Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately Kn/K! possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method.  相似文献   

13.
The process of nonindigenous species (NIS) arrival has received limited theoretical consideration despite importance in predicting and preventing the establishment of NIS. We formulate a mechanistically based hierarchical model of NIS arrival and demonstrate simplifications leading to a marginal distribution of the number of surviving introduced individuals from parameters of survival probability and propagule pressure. The marginal distribution is extended as a stochastic process from which establishment emerges with a waiting time distribution. This provides a probability of NIS establishment within a specified period and may be useful for identifying patterns of successful invaders. However, estimates of both the propagule pressure and the individual survival probability are rarely available for NIS, making estimates of the probability of establishment difficult. Alternatively, researchers are able to measure proportional estimates of propagule pressure through models of NIS transport, such as gravity models, or of survival probability through habitat-matching indexes measuring the similarity between potentially occupied and native NIS ranges. Therefore, we formulate the relative waiting time between two locations and the probability of one location being invaded before the other.  相似文献   

14.
Reference intervals are widely used in the interpretation of results of biochemical and physiological tests of patients. When there are multiple biochemical analytes measured from each subject, a multivariate reference region is needed. Because of their greater specificity against false positives, such reference regions are more desirable than separate univariate reference intervals that disregard the cross-correlations between variables. Traditionally, under multivariate normality, reference regions have been constructed as ellipsoidal regions. This approach suffers from a major drawback: it cannot detect component-wise extreme observations. In the present work, procedures are developed to construct rectangular reference regions in the multivariate normal setup. The construction is based on the criteria for tolerance intervals. The problems addressed include the computation of a rectangular tolerance region and simultaneous tolerance intervals. Also addressed is the computation of mixed reference intervals that include both two-sided and one-sided limits, simultaneously. A parametric bootstrap approach is used in the computations, and the accuracy of the proposed methodology is assessed using estimated coverage probabilities. The problem of sample size determination is also addressed, and the results are illustrated using examples that call for the computation of reference regions.  相似文献   

15.
Pang Z  Kuk AY 《Biometrics》2005,61(4):1076-1084
Existing distributions for modeling fetal response data in developmental toxicology such as the beta-binomial distribution have a tendency of inflating the probability of no malformed fetuses, and hence understating the risk of having at least one malformed fetus within a litter. As opposed to a shared probability extra-binomial model, we advocate a shared response model that allows a random number of fetuses within the same litter to share a common response. An explicit formula is given for the probability function and graphical plots suggest that it does not suffer from the problem of assigning too much probability to the event of no malformed fetuses. The EM algorithm can be used to estimate the model parameters. Results of a simulation study show that the EM estimates are nearly unbiased and the associated confidence intervals based on the usual standard error estimates have coverage close to the nominal level. Simulation results also suggest that the shared response model estimates of the marginal malformation probabilities are robust to misspecification of the distributional form, but not so for the estimates of intralitter correlation and the litter-level probability of having at least one malformed fetus. The proposed model is fitted to a set of data from the U.S. National Toxicology Program. For the same dose-response relationship, the fit based on the shared response distribution is superior to that based on the beta-binomial, and comparable to that based on the recently proposed q-power distribution (Kuk, 2004, Applied Statistics53, 369-386). An advantage of the shared response model over the q-power distribution is that it is more interpretable and can be extended more easily to the multivariate case. To illustrate this, a bivariate shared response model is fitted to fetal response data involving visceral and skeletal malformation.  相似文献   

16.
Deficiency of micronutrients is considered as the basic cause of health issues. There are a large number of micronutrients to be considered for good health, which are analyzed separately. However, such analyses involve practical as well as methodological complications and it requires construction of an index representing malnutrition of micronutrients. This study proposes copula methodology to categorize malnutrition of micronutrients at household level by combining the dependence structure of various correlated variables. Data of eleven micronutrients are extracted from HIICS- 2015–16 published by Pakistan -Bureau of Statistics. Seven out of the eleven variables are highly correlated, which are considered to construct the index. These include calcium, iron, iodine, zinc, riboflavin, thiamine and phosphorus intakes per capita at household level. Normal probability distribution is found as the best fit to the sample data of all variables. Gaussian copula function is used to derive multivariate probability distribution by combining univariate marginal probability distribution of each micronutrient. The Multivariate distribution of Gaussian copula model is used to calculate cumulative probabilities, which provide a base to categorize households’ malnutrition w.r.t. micronutrients. The results show that 60% households lie in very low or low category of micronutrient intakes, 20% of households fall into medium category while 20% fall into high or very high category of micronutrient consumption. The proposed methodology might be helpful to combine other micronutrients as well as a variety of correlated variables in many other fields having a survey data  相似文献   

17.
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.  相似文献   

18.
Simultaneous spike-counts of neural populations are typically modeled by a Gaussian distribution. On short time scales, however, this distribution is too restrictive to describe and analyze multivariate distributions of discrete spike-counts. We present an alternative that is based on copulas and can account for arbitrary marginal distributions, including Poisson and negative binomial distributions as well as second and higher-order interactions. We describe maximum likelihood-based procedures for fitting copula-based models to spike-count data, and we derive a so-called flashlight transformation which makes it possible to move the tail dependence of an arbitrary copula into an arbitrary orthant of the multivariate probability distribution. Mixtures of copulas that combine different dependence structures and thereby model different driving processes simultaneously are also introduced. First, we apply copula-based models to populations of integrate-and-fire neurons receiving partially correlated input and show that the best fitting copulas provide information about the functional connectivity of coupled neurons which can be extracted using the flashlight transformation. We then apply the new method to data which were recorded from macaque prefrontal cortex using a multi-tetrode array. We find that copula-based distributions with negative binomial marginals provide an appropriate stochastic model for the multivariate spike-count distributions rather than the multivariate Poisson latent variables distribution and the often used multivariate normal distribution. The dependence structure of these distributions provides evidence for common inhibitory input to all recorded stimulus encoding neurons. Finally, we show that copula-based models can be successfully used to evaluate neural codes, e.g., to characterize stimulus-dependent spike-count distributions with information measures. This demonstrates that copula-based models are not only a versatile class of models for multivariate distributions of spike-counts, but that those models can be exploited to understand functional dependencies.  相似文献   

19.
O'Brien SM  Dunson DB 《Biometrics》2004,60(3):739-746
Bayesian analyses of multivariate binary or categorical outcomes typically rely on probit or mixed effects logistic regression models that do not have a marginal logistic structure for the individual outcomes. In addition, difficulties arise when simple noninformative priors are chosen for the covariance parameters. Motivated by these problems, we propose a new type of multivariate logistic distribution that can be used to construct a likelihood for multivariate logistic regression analysis of binary and categorical data. The model for individual outcomes has a marginal logistic structure, simplifying interpretation. We follow a Bayesian approach to estimation and inference, developing an efficient data augmentation algorithm for posterior computation. The method is illustrated with application to a neurotoxicology study.  相似文献   

20.
Ekholm A  McDonald JW  Smith PW 《Biometrics》2000,56(3):712-718
Models for a multivariate binary response are parameterized by univariate marginal probabilities and dependence ratios of all orders. The w-order dependence ratio is the joint success probability of w binary responses divided by the joint success probability assuming independence. This parameterization supports likelihood-based inference for both regression parameters, relating marginal probabilities to explanatory variables, and association model parameters, relating dependence ratios to simple and meaningful mechanisms. Five types of association models are proposed, where responses are (1) independent given a necessary factor for the possibility of a success, (2) independent given a latent binary factor, (3) independent given a latent beta distributed variable, (4) follow a Markov chain, and (5) follow one of two first-order Markov chains depending on the realization of a binary latent factor. These models are illustrated by reanalyzing three data sets, foremost a set of binary time series on auranofin therapy against arthritis. Likelihood-based approaches are contrasted with approaches based on generalized estimating equations. Association models specified by dependence ratios are contrasted with other models for a multivariate binary response that are specified by odds ratios or correlation coefficients.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号