首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider the assessment of local influence for generalized linear models when the covariates are measured with errors. We show how to evaluate the effect that perturbations to the data, case weights, and model assumptions may have on the parameter estimates. Based on the likelihood displacement functions, some useful influence diagnostics are derived. Two examples illustrate application of the proposed diagnostics and assessment of the measurement error assumptions.  相似文献   

2.
Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell's concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.  相似文献   

3.
A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection.  相似文献   

4.
Estimating quantitative genetic parameters ideally takes place in natural populations, but relatively few studies have overcome the inherent logistical difficulties. For this reason, no estimates currently exist for the genetic basis of life-history traits in natural populations of large marine vertebrates. And yet such estimates are likely to be important given the exposure of this taxon to changing selection pressures, and the relevance of life-history traits to population productivity. We report such estimates from a long-term (1995–2007) study of lemon sharks ( Negaprion brevirostris ) conducted at Bimini, Bahamas. We obtained these estimates by genetically reconstructing a population pedigree (117 dams, 487 sires, and 1351 offspring) and then using an "animal model" approach to estimate quantitative genetic parameters. We find significant additive genetic (co)variance, and hence moderate heritability, for juvenile length and mass. We also find substantial maternal effects for these traits at age-0, but not age-1, confirming that genotype–phenotype interactions between mother and offspring are strongest at birth; although these effects could not be parsed into their genetic and nongenetic components. Our results suggest that human-imposed selection pressures (e.g., size-selective harvesting) might impose noteworthy evolutionary change even in large marine vertebrates. We therefore use our findings to explain how maternal effects may sometimes promote maladaptive juvenile traits, and how lemon sharks at different nursery sites may show "constrained local adaptation." We also show how single-generation pedigrees, and even simple marker-based regression methods, can provide accurate estimates of quantitative genetic parameters in at least some natural systems.  相似文献   

5.
In many animal populations, demographic parameters such as survival and recruitment vary markedly with age, as do parameters related to sampling, such as capture probability. Failing to account for such variation can result in biased estimates of population‐level rates. However, estimating age‐dependent survival rates can be challenging because ages of individuals are rarely known unless tagging is done at birth. For many species, it is possible to infer age based on size. In capture–recapture studies of such species, it is possible to use a growth model to infer the age at first capture of individuals. We show how to build estimates of age‐dependent survival into a capture–mark–recapture model based on data obtained in a capture–recapture study. We first show how estimates of age based on length increments closely match those based on definitive aging methods. In simulated analyses, we show that both individual ages and age‐dependent survival rates estimated from simulated data closely match true values. With our approach, we are able to estimate the age‐specific apparent survival rates of Murray and trout cod in the Murray River, Australia. Our model structure provides a flexible framework within which to investigate various aspects of how survival varies with age and will have extensions within a wide range of ecological studies of animals where age can be estimated based on size.  相似文献   

6.
We propose a conditional scores procedure for obtaining bias-corrected estimates of log odds ratios from matched case-control data in which one or more covariates are subject to measurement error. The approach involves conditioning on sufficient statistics for the unobservable true covariates that are treated as fixed unknown parameters. For the case of Gaussian nondifferential measurement error, we derive a set of unbiased score equations that can then be solved to estimate the log odds ratio parameters of interest. The procedure successfully removes the bias in naive estimates, and standard error estimates are obtained by resampling methods. We present an example of the procedure applied to data from a matched case-control study of prostate cancer and serum hormone levels, and we compare its performance to that of regression calibration procedures.  相似文献   

7.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

8.
Summary In the analysis of missing data, sensitivity analyses are commonly used to check the sensitivity of the parameters of interest with respect to the missing data mechanism and other distributional and modeling assumptions. In this article, we formally develop a general local influence method to carry out sensitivity analyses of minor perturbations to generalized linear models in the presence of missing covariate data. We examine two types of perturbation schemes (the single‐case and global perturbation schemes) for perturbing various assumptions in this setting. We show that the metric tensor of a perturbation manifold provides useful information for selecting an appropriate perturbation. We also develop several local influence measures to identify influential points and test model misspecification. Simulation studies are conducted to evaluate our methods, and real datasets are analyzed to illustrate the use of our local influence measures.  相似文献   

9.
Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods.  相似文献   

10.
The imprint of demographic and selective processes on bacterial population structure needs to be evaluated as deviation from the expectations of an appropriate null neutral model. We explore the impact of varying the population mutation and recombination rates theta and rho on ideal populations, using a recently developed model of neutral drift at multiple loci. This model may be fitted to experimental data to provide estimates of these parameters, and we do so for seven bacterial species (Neisseria meningitidis, Streptococcus pneumoniae, Streptococcus pyogenes, Staphylococcus aureus, Helicobacter pylori, Burkholderia pseudomallei and Bacillus cereus), illustrating that bacterial species vary extensively in these fundamental parameters. Historically, the influence of recombination has often been estimated through its influence on the Index of Association I(A). We show that this may be relatively insensitive to changes in either mutation or recombination rates. It is known that biased sampling can lead to artificially high estimates of I(A). We therefore provide a method of precisely separating the effects of such bias and true linkage between alleles. We also demonstrate that by fitting the neutral model to experimental data, more informative and precise estimates of the relative roles of recombination and mutation may be obtained.  相似文献   

11.
The spatial dynamics of epidemics are fundamentally affected by patterns of human mobility. Mobile phone call detail records (CDRs) are a rich source of mobility data, and allow semi-mechanistic models of movement to be parameterised even for resource-poor settings. While the gravity model typically reproduces human movement reasonably well at the administrative level spatial scale, past studies suggest that parameter estimates vary with the level of spatial discretisation at which models are fitted. Given that privacy concerns usually preclude public release of very fine-scale movement data, such variation would be problematic for individual-based simulations of epidemic spread parametrised at a fine spatial scale. We therefore present new methods to fit fine-scale mathematical mobility models (here we implement variants of the gravity and radiation models) to spatially aggregated movement data and investigate how model parameter estimates vary with spatial resolution. We use gridded population data at 1km resolution to derive population counts at different spatial scales (down to ∼ 5km grids) and implement mobility models at each scale. Parameters are estimated from administrative-level flow data between overnight locations in Kenya and Namibia derived from CDRs: where the model spatial resolution exceeds that of the mobility data, we compare the flow data between a particular origin and destination with the sum of all model flows between cells that lie within those particular origin and destination administrative units. Clear evidence of over-dispersion supports the use of negative binomial instead of Poisson likelihood for count data with high values. Radiation models use fewer parameters than the gravity model and better predict trips between overnight locations for both considered countries. Results show that estimates for some parameters change between countries and with spatial resolution and highlight how imperfect flow data and spatial population distribution can influence model fit.  相似文献   

12.
Aim  To develop a physiologically based model of the plant niche for use in species distribution modelling. Location  Europe. Methods  We link the Thornley transport resistance (TTR) model with functions which describe how the TTR’s model parameters are influenced by abiotic environmental factors. The TTR model considers how carbon and nutrient uptake, and the allocation of these assimilates, influence growth. We use indirect statistical methods to estimate the model parameters from a high resolution data set on tree distribution for 22 European tree species. Results  We infer, from distribution data and abiotic forcing data, the physiological niche dimensions of 22 European tree species. We found that the model fits were reasonable (AUC: 0.79–0.964). The projected distributions were characterized by a false positive rate of 0.19 and a false negative rate 0.12. The fitted models are used to generate projections of the environmental factors that limit the range boundaries of the study species. Main conclusions  We show that physiological models can be used to derive physiological niche dimensions from species distribution data. Future work should focus on including prior information on physiological rates into the parameter estimation process. Application of the TTR model to species distribution modelling suggests new avenues for establishing explicit links between distribution and physiology, and for generating hypotheses about how ecophysiological processes influence the distribution of plants.  相似文献   

13.
The standard model of the dynamic energy budget theory for metabolic organisation has variables and parameters that can be quantified using indirect methods only. We present new methods (and software) to extract food‐independent parameter values of the energy budget from food‐dependent quantities that are easy to observe, and so facilitate the practical application of the theory to enhance predictability and extrapolation. A natural sequence of 10 steps is discussed to obtain some compound parameters first, then the primary parameters, then the composition parameters and finally the thermodynamic parameters; this sequence matches a sequence of required data of increasing complexity which is discussed in detail. Many applications do not require knowledge of all parameters, and we discuss methods to extrapolate parameters from one species to another. The conversion of mass, volume and energy measures of biomass is discussed; these conversions are not trivial because biomass can change in chemical composition in particular ways thanks to different forms of homeostasis. We solve problems like “What would be the ultimate reproduction rate and the von Bertalanffy growth rate at a specific food level, given that we have measured these statistics at abundant food?” and “What would be the maximum incubation time, given the parameters of the von Bertalanffy growth curve?”. We propose a new non‐destructive method for quantifying the chemical potential and entropy of living reserve and structure, that can potentially change our ideas on the thermodynamic properties of life. We illustrate the methods using data on daphnids and molluscs.  相似文献   

14.
In a recent paper, I presented a sampling formula for species abundances from multiple samples according to the prevailing neutral model of biodiversity, but practical implementation for parameter estimation was only possible when these samples were from local communities that were assumed to be equally dispersal limited. Here I show how the same sampling formula can also be used to estimate model parameters using maximum likelihood when the samples have different degrees of dispersal limitation. Moreover, it performs better than other, approximate, parameter estimation approaches. I also show how to calculate errors in the parameter estimates, which has so far been largely ignored in the development of and debate on neutral theory.  相似文献   

15.
MOTIVATION: DNA microarrays are now capable of providing genome-wide patterns of gene expression across many different conditions. The first level of analysis of these patterns requires determining whether observed differences in expression are significant or not. Current methods are unsatisfactory due to the lack of a systematic framework that can accommodate noise, variability, and low replication often typical of microarray data. RESULTS: We develop a Bayesian probabilistic framework for microarray data analysis. At the simplest level, we model log-expression values by independent normal distributions, parameterized by corresponding means and variances with hierarchical prior distributions. We derive point estimates for both parameters and hyperparameters, and regularized expressions for the variance of each gene by combining the empirical variance with a local background variance associated with neighboring genes. An additional hyperparameter, inversely related to the number of empirical observations, determines the strength of the background variance. Simulations show that these point estimates, combined with a t -test, provide a systematic inference approach that compares favorably with simple t -test or fold methods, and partly compensate for the lack of replication.  相似文献   

16.
Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo-based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods.  相似文献   

17.
We have investigated simulation-based techniques for parameter estimation in chaotic intercellular networks. The proposed methodology combines a synchronization–based framework for parameter estimation in coupled chaotic systems with some state–of–the–art computational inference methods borrowed from the field of computational statistics. The first method is a stochastic optimization algorithm, known as accelerated random search method, and the other two techniques are based on approximate Bayesian computation. The latter is a general methodology for non–parametric inference that can be applied to practically any system of interest. The first method based on approximate Bayesian computation is a Markov Chain Monte Carlo scheme that generates a series of random parameter realizations for which a low synchronization error is guaranteed. We show that accurate parameter estimates can be obtained by averaging over these realizations. The second ABC–based technique is a Sequential Monte Carlo scheme. The algorithm generates a sequence of “populations”, i.e., sets of randomly generated parameter values, where the members of a certain population attain a synchronization error that is lesser than the error attained by members of the previous population. Again, we show that accurate estimates can be obtained by averaging over the parameter values in the last population of the sequence. We have analysed how effective these methods are from a computational perspective. For the numerical simulations we have considered a network that consists of two modified repressilators with identical parameters, coupled by the fast diffusion of the autoinducer across the cell membranes.  相似文献   

18.
Wildlife populations consist of individuals that contribute disproportionately to growth and viability. Understanding a population's spatial and temporal dynamics requires estimates of abundance and demographic rates that account for this heterogeneity. Estimating these quantities can be difficult, requiring years of intensive data collection. Often, this is accomplished through the capture and recapture of individual animals, which is generally only feasible at a limited number of locations. In contrast, N‐mixture models allow for the estimation of abundance, and spatial variation in abundance, from count data alone. We extend recently developed multistate, open population N‐mixture models, which can additionally estimate demographic rates based on an organism's life history characteristics. In our extension, we develop an approach to account for the case where not all individuals can be assigned to a state during sampling. Using only state‐specific count data, we show how our model can be used to estimate local population abundance, as well as density‐dependent recruitment rates and state‐specific survival. We apply our model to a population of black‐throated blue warblers (Setophaga caerulescens) that have been surveyed for 25 years on their breeding grounds at the Hubbard Brook Experimental Forest in New Hampshire, USA. The intensive data collection efforts allow us to compare our estimates to estimates derived from capture–recapture data. Our model performed well in estimating population abundance and density‐dependent rates of annual recruitment/immigration. Estimates of local carrying capacity and per capita recruitment of yearlings were consistent with those published in other studies. However, our model moderately underestimated annual survival probability of yearling and adult females and severely underestimates survival probabilities for both of these male stages. The most accurate and precise estimates will necessarily require some amount of intensive data collection efforts (such as capture–recapture). Integrated population models that combine data from both intensive and extensive sources are likely to be the most efficient approach for estimating demographic rates at large spatial and temporal scales.  相似文献   

19.
Cadigan NG 《Biometrics》2006,62(3):713-720
We present local influence diagnostics to measure the sensitivity of a biological limit reference point (LRP) estimated from fitting a model to stock and recruitment data. LRPs are low levels of stock size that the management of commercial fisheries should avoid with high probability. The LRP we examine is the stock size at which recruitment is 50% of the maximum (S(50%)). We derive analytic equations to describe the effects on S(50%) of changing the weight that observations are given in estimation. We derive equations for the Ricker, Beverton-Holt, and hockey-stick stock-recruit models, and four estimation methods including the error sums of squares method on log responses and three quasi-likelihood methods. We conclude from case studies that the hockey-stick model produces the most robust estimates.  相似文献   

20.
Transect count data form the basis of many butterfly and other insect monitoring programs worldwide. A clear understanding of the limitations of such datasets, including the potential for biases in the statistical methods used to analyze them, is therefore crucial. The classical Zonneveld model (CZ) can extract estimates of a suite of demographic parameters from transect count datasets, and has also been used in theoretical analyses of protandry and reproductive asynchrony. The CZ relies on strong assumptions about the emergence and death processes underlying observed transect count datasets. Though reasonable as a starting place, a growing body of empirical evidence suggests these assumptions will, in many cases, not hold. Here, I explore how violations of these assumptions bias CZ-based estimates of two key population parameters: total population size and mean individual lifespan. To do this, I generalize the Zonneveld model by relaxing the symmetrical emergence distribution and constant death rate assumptions such that the generalized models contain the CZ as a special case. Using the generalized models as data generating processes, I then show that the CZ is able to closely mimic the shape of the abundance time course produced by either variant of the generalized model under a wide range of conditions, but produces highly biased estimates of population size and mean lifespan in doing so. My analysis therefore demonstrates both that the CZ is not robust to violations of its emergence and death assumptions, and that a good observed fit to transect count data does not mean these assumptions are satisfied.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号