首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Flexible estimation of multiple conditional quantiles is of interest in numerous applications, such as studying the effect of pregnancy-related factors on low and high birth weight. We propose a Bayesian nonparametric method to simultaneously estimate noncrossing, nonlinear quantile curves. We expand the conditional distribution function of the response in I-spline basis functions where the covariate-dependent coefficients are modeled using neural networks. By leveraging the approximation power of splines and neural networks, our model can approximate any continuous quantile function. Compared to existing models, our model estimates all rather than a finite subset of quantiles, scales well to high dimensions, and accounts for estimation uncertainty. While the model is arbitrarily flexible, interpretable marginal quantile effects are estimated using accumulative local effect plots and variable importance measures. A simulation study shows that our model can better recover quantiles of the response distribution when the data are sparse, and an analysis of birth weight data is presented.  相似文献   

2.
Growing prosperity and changing diets have contributed to a surge in obesity prevalence in China. Previous research has investigated the relationships between BMI and several socioeconomic, diet‐related, and health‐related variables in China. This study proposes that such relationships are likely to differ along the conditional BMI distribution, and seeks to investigate such quantile‐dependent variation in effects. Special attention is paid to how variables affect the upper tail of the conditional BMI distribution where overweight and obesity concerns are more acute. Quantile regressions (QRs) and ordinary least squares (OLS) regressions are estimated. The sample consists of 3,407 adult individuals aged 20–45 who participated in the China Health and Nutrition Survey (CHNS), 2006. Substantial cross quantile variation is observed in the relationships between several key variables and BMI. The QR shows that the relationship between energy intake and BMI is largely insignificant in the lower and middle quantiles, whereas the upper quantiles show a positive and significant effect substantially larger than predicted by the least squares regression and by previous studies. This implies that a food‐based strategy aimed at limiting energy intake can be an effective way to fight obesity in China. The negative association between smoking and BMI, on the other hand, is found largely to hold only in the lower and middle quantiles, with the upper tail relatively unaffected by smoking status. Thus, smoking cessation policies may not exacerbate obesity.  相似文献   

3.
Noncrossing quantile regression curve estimation   总被引:4,自引:0,他引:4  
Bondell HD  Reich BJ  Wang H 《Biometrika》2010,97(4):825-838
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels.  相似文献   

4.
Clonal dominance in hematopoietic stem cell populations is an important question of interest but not one we can directly answer. Any estimates are based on indirect measurement. For marked populations, we can equate empirical and theoretical moments for binomial sampling, in particular we can use the well-known formula for the sampling variation of a binomial proportion. The empirical variance itself cannot always be reliably estimated and some caution is needed. We describe the difficulties here and identify ready solutions which only require appropriate use of variance-stabilizing transformations. From these we obtain estimators for the steady state, or dynamic equilibrium, of the number of hematopoietic stem cells involved in repopulating the marrow. The calculations themselves are not too involved. We give the distribution theory for the estimator as well as simple approximations for practical application. As an illustration, we rework on data recently gathered to address the question as to whether or not reconstitution of marrow grafts in the clinical setting might be considered to be oligoclonal.  相似文献   

5.

Background

Through social interactions, individuals affect one another’s phenotype. In such cases, an individual’s phenotype is affected by the direct (genetic) effect of the individual itself and the indirect (genetic) effects of the group mates. Using data on individual phenotypes, direct and indirect genetic (co)variances can be estimated. Together, they compose the total genetic variance that determines a population’s potential to respond to selection. However, it can be difficult or expensive to obtain individual phenotypes. Phenotypes on traits such as egg production and feed intake are, therefore, often collected on group level. In this study, we investigated whether direct, indirect and total genetic variances, and breeding values can be estimated from pooled data (pooled by group). In addition, we determined the optimal group composition, i.e. the optimal number of families represented in a group to minimise the standard error of the estimates.

Methods

This study was performed in three steps. First, all research questions were answered by theoretical derivations. Second, a simulation study was conducted to investigate the estimation of variance components and optimal group composition. Third, individual and pooled survival records on 12 944 purebred laying hens were analysed to investigate the estimation of breeding values and response to selection.

Results

Through theoretical derivations and simulations, we showed that the total genetic variance can be estimated from pooled data, but the underlying direct and indirect genetic (co)variances cannot. Moreover, we showed that the most accurate estimates are obtained when group members belong to the same family. Additional theoretical derivations and data analyses on survival records showed that the total genetic variance and breeding values can be estimated from pooled data. Moreover, the correlation between the estimated total breeding values obtained from individual and pooled data was surprisingly close to one. This indicates that, for survival in purebred laying hens, loss in response to selection will be small when using pooled instead of individual data.

Conclusions

Using pooled data, the total genetic variance and breeding values can be estimated, but the underlying genetic components cannot. The most accurate estimates are obtained when group members belong to the same family.  相似文献   

6.
Ecologists often estimate population trends of animals in time series of counts using linear regression to estimate parameters in a linear transformation of multiplicative growth models, where logarithms of rates of change in counts in time intervals are used as response variables. We present quantile regression estimates for the median (0.50) and interquartile (0.25, 0.75) relationships as an alternative to mean regression estimates for common density-dependent and density-independent population growth models. We demonstrate that the quantile regression estimates are more robust to outliers and require fewer distributional assumptions than conventional mean regression estimates and can provide information on heterogeneous rates of change ignored by mean regression. We provide quantile regression trend estimates for 2 populations of greater sage-grouse (Centrocercus urophasianus) in Wyoming, USA, and for the Crawford population of Gunnison sage-grouse (Centrocercus minimus) in southwestern Colorado, USA. Our selected Gompertz models of density dependence for both populations of greater sage-grouse had smaller negative estimates of density-dependence terms and less variation in corresponding predicted growth rates (λ) for quantile than mean regression models. In contrast, our selected Gompertz models of density dependence with piecewise linear effects of years for the Crawford population of Gunnison sage-grouse had predicted changes in λ across years from quantile regressions that varied more than those from mean regression because of heterogeneity in estimated λs that were both less and greater than mean estimates. Our results add to literature establishing that quantile regression provides better behaved estimates than mean regression when there are outlying growth rates, including those induced by adjustments for zeros in the time series of counts. The 0.25 and 0.75 quantiles bracketing the median provide robust estimates of population changes (λ) for the central 50% of time series data and provide a 50% prediction interval for a single new prediction without making parametric distributional assumptions or assuming homogeneous λs. Compared to mean estimates, our quantile regression trend estimates for greater sage-grouse indicated less variation in density-dependent λs by minimizing sensitivity to outlying values, and for Gunnison sage-grouse indicated greater variation in density-dependent λs associated with heterogeneity among quantiles.  相似文献   

7.
An ideal expression algorithm should be able to tell truly different expression levels with small false positive errors and be robust to assay changes. We propose two algorithms. PQN is the non-central trimmed mean of perfect match intensities with quantile normalization. DQN is the non-central trimmed mean of differences between perfect match and mismatch intensities with quantile normalization. The quantiles for normalization can be either empirical or theoretical. When array types and/or assay change in a study, the normalization to common quantiles at the probe set level is essential. We compared DQN, PQN, RMA, GCRMA, DCHIP, PLIER and MAS5 for the Affymetrix Latin square data and our data of two sets of experiments using the same bone marrow but different types of microarrays and different assay. We found the computation for AUC of ROC at affycomp.biostat.jhsph.edu can be improved.  相似文献   

8.
Summary Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change‐point, and discuss several methods for testing the existence of a change‐point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.  相似文献   

9.
Nonparametric quantile inference with competing risks data   总被引:1,自引:0,他引:1  
Peng  L.; Fine  J. P. 《Biometrika》2007,94(3):735-744
A conceptually simple quantile inference procedure is proposedfor cause-specific failure probabilities with competing risksdata. The quantiles are defined using the cumulative incidencefunction, which is intuitively meaningful in the competing–risksset–up. We establish the uniform consistency and weakconvergence of a nonparametric estimator of this quantile function.These results form the theoretical basis for extensions of standardone–sample and two–sample quantile inference forindependently censored data. This includes the constructionof confidence intervals and bands for the quantile function,and two–sample tests. Simulation studies and a real dataexample illustrate the practical utility of the methodology.  相似文献   

10.
Ecologically relevant references are useful for evaluating ecosystem recovery, but references that are temporally static may be less useful when environmental conditions and disturbances are spatially and temporally heterogeneous. This challenge is particularly acute for ecosystems dominated by sagebrush (Artemisia spp.), where communities may require decades to recover from disturbance. We demonstrated application of a dynamic reference approach to studying sagebrush recovery using three decades of sagebrush cover estimates from remote sensing (1985–2018). We modelled recovery on former oil and gas well pads (n = 1200) across southwestern Wyoming, USA, relative to paired references identified by the Disturbance Automated Reference Toolset. We also used quantile regression to account for unmodelled heterogeneity in recovery, and projected recovery from similar disturbance across the landscape. Responses to weather and site‐level factors often differed among quantiles, and sagebrush recovery on former well pads increased more when paired reference sites had greater sagebrush cover. Little (<5%) of the landscape was projected to recover within 100 years for low to mid quantiles, and recovery often occurred at higher elevations with cool and moist annual conditions. Conversely, 48%–78% of the landscape recovered quickly (within 25 years) for high quantiles of sagebrush cover. Our study demonstrates advantages of using dynamic reference sites when studying vegetation recovery, as well as how additional inferences obtained from quantile regression can inform management.  相似文献   

11.
Previous research has shown that fires burn certain land cover types disproportionally to their abundance. We used quantile regression to study land cover proneness to fire as a function of fire size, under the hypothesis that they are inversely related, for all land cover types. Using five years of fire perimeters, we estimated conditional quantile functions for lower (avoidance) and upper (preference) quantiles of fire selectivity for five land cover types - annual crops, evergreen oak woodlands, eucalypt forests, pine forests and shrublands. The slope of significant regression quantiles describes the rate of change in fire selectivity (avoidance or preference) as a function of fire size. We used Monte-Carlo methods to randomly permutate fires in order to obtain a distribution of fire selectivity due to chance. This distribution was used to test the null hypotheses that 1) mean fire selectivity does not differ from that obtained by randomly relocating observed fire perimeters; 2) that land cover proneness to fire does not vary with fire size. Our results show that land cover proneness to fire is higher for shrublands and pine forests than for annual crops and evergreen oak woodlands. As fire size increases, selectivity decreases for all land cover types tested. Moreover, the rate of change in selectivity with fire size is higher for preference than for avoidance. Comparison between observed and randomized data led us to reject both null hypotheses tested ( = 0.05) and to conclude it is very unlikely the observed values of fire selectivity and change in selectivity with fire size are due to chance.  相似文献   

12.
Wang H  He X 《Biometrics》2008,64(2):449-457
Summary .   Due to the small number of replicates in typical gene microarray experiments, the performance of statistical inference is often unsatisfactory without some form of information-sharing across genes. In this article, we propose an enhanced quantile rank score test (EQRS) for detecting differential expression in GeneChip studies by analyzing the quantiles of gene intensity distributions through probe-level measurements. A measure of sign correlation, δ, plays an important role in the rank score tests. By sharing information across genes, we develop a calibrated estimate of δ, which reduces the variability at small sample sizes. We compare the EQRS test with four other approaches for determining differential expression: the gene-specific quantile rank score test, the quantile rank score test assuming a common δ, a modified t -test using summarized probe-set-level intensities, and the Mack–Skillings rank test on probe-level data. The proposed EQRS is shown to be favorable for preserving false discovery rates and for being robust against outlying arrays. In addition, we demonstrate the merits of the proposed approach using a GeneChip study comparing gene expression in the livers of mice exposed to chronic intermittent hypoxia and of those exposed to intermittent room air.  相似文献   

13.
1. There may be bias associated with mark–recapture experiments used to estimate age and growth of freshwater mussels. Using subsets of a mark–recapture dataset for Quadrula pustulosa, I examined how age and growth parameter estimates are affected by (i) the range and skew of the data and (ii) growth reduction due to handling. I compared predictions from von Bertalanffy growth models based on mark–recapture data with direct observation of mussel age and growth inferred from validated shell rings. 2. Growth models based on a dataset that included observations from a wide range of length classes (spanning ≥ the upper 50% of the population length range) produced only slightly biased age estimates for small and medium‐sized individuals (overestimated by 1–2 years relative to estimates from validated shell rings) but estimates became increasingly biased for larger individuals. Growth models using data that included only observations of larger animals (< the upper 50% of length range) overestimated age for all length classes, and estimated maximum age was two to six times greater than the maximum age observed in the population (47 years). Similarly, growth models using a left‐skewed dataset overestimated age. 3. Reductions of growth due to repeated handling also resulted in overestimates of age. The estimated age of mussels that were handled in two consecutive years was as much as twice that of mussels that were handled only once over the same period. Assuming a constant reduction in the annual rate of growth, handling an individual for five consecutive years could result in an estimated age that is five times too high. 4. These findings show that mark–recapture methods have serious limitations for estimating mussel age and growth. A previous paper (Freshwater Biology, 46, 2001, 1349) presented longevity estimates for three mussel species that were an order of magnitude higher than estimates inferred from shell rings. Because those estimates of extreme longevity were based on mark–recapture methods and subject to multiple, additive sources of bias, they cannot be considered accurate representations of life span and cannot be used to conclude that traditional methods of bivalve ageing by interpretation of shell rings are flawed.  相似文献   

14.
An analysis is carried out to investigate the accuracy of kinetic parameters obtained using surface plasmon resonance methodology with a BIAcore instrument. The Cramer Rao lower bound for the least possible variance of an estimator of the kinetic parameters is determined. Using simulations it is shown that the standard least-squares estimation technique provides estimates that achieve this bound. The theoretical and simulation results are compared with experimental data obtained from an analysis of the interaction of the myc peptide with the anti-myc antibody, 9E10. This investigation indicates that the accuracy of the results depends on the signal level which has particular relevance to the design of experiments with low signal levels. It is shown how the accuracy of the estimates of the kinetic constants depends on the kinetic constants themselves and how the accuracy of the association constants depends on the concentration of the analyte that is used in the experiment. In addition, the effects of increasing the number of data points in the analysis of dissociation data on the accuracy of the estimates are quantitated. It is also demonstrated that signal averaging of data derived from repeat sensorgrams can result in a significant decrease in the standard deviation of the estimates.  相似文献   

15.
Transported mediation effects may contribute to understanding how interventions work differently when applied to new populations. However, we are not aware of any estimators for such effects. Thus, we propose two doubly robust, efficient estimators of transported stochastic (also called randomized interventional) direct and indirect effects. We demonstrate their finite sample properties in a simulation study. We then apply the preferred substitution estimator to longitudinal data from the Moving to Opportunity Study, a large‐scale housing voucher experiment, to transport stochastic indirect effect estimates of voucher receipt in childhood on subsequent risk of mental health or substance use disorder mediated through parental employment across sites, thereby gaining understanding of drivers of the site differences.  相似文献   

16.
There has been remarkably little attention to using the high resolution provided by genotyping‐by‐sequencing (i.e., RADseq and similar methods) for assessing relatedness in wildlife populations. A major hurdle is the genotyping error, especially allelic dropout, often found in this type of data that could lead to downward‐biased, yet precise, estimates of relatedness. Here, we assess the applicability of genotyping‐by‐sequencing for relatedness inferences given its relatively high genotyping error rate. Individuals of known relatedness were simulated under genotyping error, allelic dropout and missing data scenarios based on an empirical ddRAD data set, and their true relatedness was compared to that estimated by seven relatedness estimators. We found that an estimator chosen through such analyses can circumvent the influence of genotyping error, with the estimator of Ritland (Genetics Research, 67, 175) shown to be unaffected by allelic dropout and to be the most accurate when there is genotyping error. We also found that the choice of estimator should not rely solely on the strength of correlation between estimated and true relatedness as a strong correlation does not necessarily mean estimates are close to true relatedness. We also demonstrated how even a large SNP data set with genotyping error (allelic dropout or otherwise) or missing data still performs better than a perfectly genotyped microsatellite data set of tens of markers. The simulation‐based approach used here can be easily implemented by others on their own genotyping‐by‐sequencing data sets to confirm the most appropriate and powerful estimator for their data.  相似文献   

17.
The theoretical basis for the direct linear plot [Eisenthal & Cornish-Bowden (1974) Biochem. J. 139, 715-720], a non-parametric statistical method for the analysis of data-fitting the Michaelis-Menten equation, was reinvestigated in order to accommodate additional experimental designs and to provide estimates of precision more directly comparable with those obtained by parametric statistical methods. Methods are given for calculating upper and lower confidence limits for the estimated parameters, for accommodating replicate measurements and for comparing the results of two separate experiments. Factors that influence the proper design of experiments are discussed.  相似文献   

18.
Johnson MS  Clarke B  Murray J 《Genetics》1988,120(1):233-238
Methods for estimating gene flow (Nm) from genetic data should provide important insights into the dynamics of natural populations. If they are to be used with confidence, however, the methods must be shown to produce valid results. Estimates of Nm have been obtained for the snails Partula taeniata and Partula suturalis, based on F(ST) and on the frequencies of private alleles, p(1). Jackknifing was used to reduce the bias of estimates and to obtain confidence limits. The estimates derived from F(ST) are consistent with the low vagility of snails, and with direct field studies of gene flow in P. taeniata. In contrast, the estimates derived from p(1) were up to seven times as large, less precise and less consistent. Although the underlying causes of these discrepancies are not clear, the results suggest that F(ST) is the more reliable indirect estimator of gene flow, at least for Partula.  相似文献   

19.
冠幅是反映单木生长状态及构建林木生长收获模型的重要变量。本研究以辽东山区大边沟林场10~55年生红松人工林为对象,基于66块固定样地的2763株红松的每木检尺数据,选取冠幅基础模型,采用再参数化的方法引入单木竞争指标(Rd),利用哑变量的方法引入了林分密度、林层变量,构建不同分位点(0.50、0.90、0.93、0.95、0.96、0.99)的冠幅分位数回归模型,并与传统方法进行比较,选取模拟林分最大冠幅的最优分位点。为反映林分中单木冠幅在林木个体之间的差异,建立了基于样地水平的最优分位点的线性混合效应分位数回归冠幅模型,分析各变量对单木冠幅的影响。结果表明: 基于F统计检验,不同林分密度和林层的冠幅模型具有显著差异,在基础模型中引入林层、林分密度和竞争后,模型Ra2提高0.0104,均方根误差降低0.0115,均方误差降低为7.4%;与最小二乘法比较,分位数回归模型能够较好地模拟林分状态下的单木最大冠幅,并选出0.96分位点和0.93分位点作为上林层和下林层的分位数回归模型的最优分位点。引入混合效应的线性分位数回归模型的赤池信息准则、贝叶斯信息准则、HQ信息准则等评价指标优于传统分位数回归,参数标准误显著降低,混合效应的引入很好地解释了样地之间的差异。就上林层和下林层而言,林分密度越大,最大冠幅越小;相对直径越大,最大冠幅越大,其中林分密度对下林层的冠幅影响大于上林层,当林分密度足够大时,冠幅随着胸径的增大先增大后降低。本研究构建的基于混合效应的分位数回归模型能有效提高模型的拟合优度,今后可通过调控林分密度、适度抚育间伐等措施,实现对辽东山区红松人工林的科学营建和可持续发展。  相似文献   

20.
Brian S. Cade  Qinfeng Guo 《Oikos》2000,91(2):245-254
Rates of change in final summer densities of two desert annuals, Eriogonum abertianum and Haplopappus gracilis, as constrained by their initial winter germination densities were estimated with regression quantiles and compared with mechanistic fits based on a self‐thinning rule proposed by Guo et al. (1998); Oikos 83: 237–245). The allometric relation used was equivalent to S=Nf (Ni)?1=cf (Ni)?1, where S is the ratio of final to initial densities (survivorship), cf is a constant that is a final density specific to the species and environment, Ni is the initial plant density, and Nf is final plant density. We used regression quantiles to estimate cf assuming the exponent of ?1 was fixed (model 1, Nf (Ni)?1=cf (Ni)?1) and also obtained estimates by treating the exponent as a parameter to estimate (model 2, Nf (Ni)?1=cf (Ni)λ). Regression quantiles allow rates of change to be estimated through any part of a data distribution conditional on some linear function of covariates. We focused on estimates for upper (90–99th) quantiles near the boundary of the summer density distributions where we expected effects of self‐thinning to operate as the primary constraint on plant performance. Allometric functions estimated with regression quantiles were similar to functions fit by Guo et al. (1998) when the exponent was constrained to ?1. However, the data were more consistent with estimates for model (2), where exponents were closer to ?0.4 than ?1, although model fit was not as good at higher initial plant densities as when the exponent was fixed at ?1. An exponential form (model 3, Nf (Ni)?1=cf (Ni)λ eγNi) that is a generalization of the discrete logistic growth function, where estimates of λ were ?0.23 to ?0.28 and estimates of γ were ?0.003 to ?0.006, provided better fit from low to high initial germination densities. Model 3 predictions were consistent with an interpretation that final summer densities were constrained by initial germination densities when these were low (<40 per 0.25 m2 for Eriogonum and <100 per 0.25 m2 for Haplopappus) and were constrained by the self‐thinning process at higher germination densities. Our exponential model (3) estimated with regression quantiles had similar form to the mechanistic relation of Guo et al. (1998) when plotted as a survivorship function, but avoided the unrealistic assumption that all populations attained a similar final density, and was based on a statistical model that has formal rules for estimation and inference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号