首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Diseased animals may exhibit behavioral shifts that increase or decrease their probability of being randomly sampled. In harvest-based sampling approaches, animal movements, changes in habitat utilization, changes in breeding behaviors during harvest periods, or differential susceptibility to harvest via behaviors like hiding or decreased sensitivity to stimuli may result in a non-random sample that biases prevalence estimates. We present a method that can be used to determine whether bias exists in prevalence estimates from harvest samples. Using data from harvested mule deer (Odocoileus hemionus) sampled in northcentral Colorado (USA) during fall hunting seasons 1996-98 and Akaike's information criterion (AIC) model selection, we detected within-yr trends indicating potential bias in harvest-based prevalence estimates for chronic wasting disease (CWD). The proportion of CWD-positive deer harvested slightly increased through time within a yr. We speculate that differential susceptibility to harvest or breeding season movements may explain the positive trend in proportion of CWD-positive deer harvested during fall hunting seasons. Detection of bias may provide information about temporal patterns of a disease, suggest biological hypotheses that could further understanding of a disease, or provide wildlife managers with information about when diseased animals are more or less likely to be harvested. Although AIC model selection can be useful for detecting bias in data, it has limited utility in determining underlying causes of bias. In cases where bias is detected in data using such model selection methods, then design-based methods (i.e., experimental manipulation) may be necessary to assign causality.  相似文献   

2.
Ko H  Davidian M 《Biometrics》2000,56(2):368-375
The nonlinear mixed effects model is used to represent data in pharmacokinetics, viral dynamics, and other areas where an objective is to elucidate associations among individual-specific model parameters and covariates; however, covariates may be measured with error. For additive measurement error, we show substitution of mismeasured covariates for true covariates may lead to biased estimators for fixed effects and random effects covariance parameters, while regression calibration may eliminate bias in fixed effects but fail to correct that in covariance parameters. We develop methods to take account of measurement error that correct this bias and may be implemented with standard software, and we demonstrate their utility via simulation and application to data from a study of HIV dynamics.  相似文献   

3.
4.
Although adaptation to hunting-gathering life is a main hypothesis for understanding of the nature of humans, studies directly examining the hypothesis have not been done. In the present study, we used the method of showing a film depicting hunting and housework by African hunter-gatherers to elementary pupils and university students to examine their memories. In pupils and students, males showed higher percentage of correct answers than females for hunting-related questions, and female showed higher percentage for housework-related questions. The results suggest a males' learning bias to hunting and support the hunting-gathering hypothesis.  相似文献   

5.
6.
Comparative genome hybridization (CGH) is a laboratory method to measure gains and losses of chromosomal regions in tumor cells. It is believed that DNA gains and losses in tumor cells do not occur entirely at random, but partly through some flow of causality. Models that relate tumor progression to the occurrence of DNA gains and losses could be very useful in hunting cancer genes and in cancer diagnosis. We lay some mathematical foundations for inferring a model of tumor progression from a CGH data set. We consider a class of tree models that are more general than a path model that has been developed for colorectal cancer. We derive a tree model inference algorithm based on the idea of a maximum-weight branching in a graph, and we show that under plausible assumptions our algorithm infers the correct tree. We have implemented our methods in software, and we illustrate with a CGH data set for renal cancer.  相似文献   

7.
Harvesting represents a major source of mortality in many deer populations. The extent to which harvesting is selective for specific traits is important in order to understand contemporary evolutionary processes. In addition, since such data are frequently used in life-history studies, it is important to know the pattern of selectivity as a source of bias. Recently, it was demonstrated that different hunting methods were selected for different weights in red deer (Cervus elaphus), but little insight was offered into why this occurs. In this study, we show that foreign trophy stalkers select for larger antlers when hunting roe deer (Capreolus capreolus) than local hunters, but that close to half of the difference in selectivity was due to foreigners hunting earlier in the season and in locations with larger males. The relationship between antler size and age was nevertheless fairly similar based on whether deer was shot by foreign or local hunters.  相似文献   

8.
He Z  Sun D 《Biometrics》2000,56(2):360-367
A Bayesian hierarchical generalized linear model is used to estimate hunting success rates at the subarea level for postseason harvest surveys. The model includes fixed week effects, random geographic effects, and spatial correlations between neighboring subareas. The computation is done by Gibbs sampling and adaptive rejection sampling techniques. The method is illustrated using data from the Missouri Turkey Hunting Survey in the spring of 1996. Bayesian model selection methods are used to demonstrate that there are significant week differences and spatial correlations of hunting success rates among counties. The Bayesian estimates are also shown to be quite robust in terms of changes of hyperparameters.  相似文献   

9.
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.  相似文献   

10.
The value of an ecological indicator is no better than the uncertainty associated with its estimate. Nevertheless, indicator uncertainty is seldom estimated, even though legislative frameworks such as the European Water Framework Directive stress that the confidence of an assessment should be quantified. We introduce a general framework for quantifying uncertainties associated with indicators employed to assess ecological status in waterbodies. The framework is illustrated with two examples: eelgrass shoot density and chlorophyll a in coastal ecosystems. Aquatic monitoring data vary over time and space; variations that can only partially be described using fixed parameters, and remaining variations are deemed random. These spatial and temporal variations can be partitioned into uncertainty components operating at different scales. Furthermore, different methods of sampling and analysis as well as people involved in the monitoring introduce additional uncertainty. We have outlined 18 different sources of variation that affect monitoring data to a varying degree and are relevant to consider when quantifying the uncertainty of an indicator calculated from monitoring data. However, in most cases it is not possible to estimate all relevant sources of uncertainty from monitoring data from a single ecosystem, and those uncertainty components that can be quantified will not be well determined due to the lack of replication at different levels of the random variations (e.g. number of stations, number of years, and number of people). For example, spatial variations cannot be determined from datasets with just one station. Therefore, we recommend that random variations are estimated from a larger dataset, by pooling observations from multiple ecosystems with similar characteristics. We also recommend accounting for predictable patterns in time and space using parametric approaches in order to reduce the magnitude of the unpredictable random components and reduce potential bias introduced by heterogeneous monitoring across time. We propose to use robust parameter estimates for both fixed and random variations, determined from a large pooled dataset and assumed common across the range of ecosystems, and estimate a limited subset of parameters from ecosystem-specific data. Partitioning the random variation onto multiple uncertainty components is important to obtain correct estimates of the ecological indicator variance, and the magnitude of the different components provide useful information for improving methods applied and design of monitoring programs. The proposed framework allows comparing different indicators based on their precision relative to the cost of monitoring.  相似文献   

11.
Genomic sequences obtained through high-throughput sequencing are not uniformly distributed across the genome. For example, sequencing data of total genomic DNA show significant, yet unexpected enrichments on promoters and exons. This systematic bias is a particular problem for techniques such as chromatin immunoprecipitation, where the signal for a target factor is plotted across genomic features. We have focused on data obtained from Illumina's Genome Analyser platform, where at least three factors contribute to sequence bias: GC content, mappability of sequencing reads, and regional biases that might be generated by local structure. We show that relying on input control as a normalizer is not generally appropriate due to sample to sample variation in bias. To correct sequence bias, we present BEADS (bias elimination algorithm for deep sequencing), a simple three-step normalization scheme that successfully unmasks real binding patterns in ChIP-seq data. We suggest that this procedure be done routinely prior to data interpretation and downstream analyses.  相似文献   

12.
13.
Yan Li  Barry I. Graubard 《Biometrics》2009,65(4):1096-1104
Summary For studies on population genetics, the use of representative random samples of the target population can avoid ascertainment bias. Genetic variation data from over a hundred genes were collected in a U.S. nationally representative sample in the Third National Health and Nutrition Examination Survey (NHANES III). Surveys such as the NHANES have complex stratified multistage cluster sample designs with sample weighting that can inflate variances and alter the expectations of test statistics. Thus, classical statistical tests of Hardy–Weinberg equilibrium (HWE) and homogeneity of HW disequilibrium (HHWD) for simple random samples are not suitable for data from complex samples. We propose using Wald tests for HWE and generalized score tests for HHWD that have been modified for complex samples. Monte Carlo simulation studies are used to investigate the finite sample properties of the proposed tests. Rao–Scott corrections applied to the tests were found to improve their type I error properties. Our methods are applied to the NHANES III genetic data for three loci involved in metabolizing lead in the body.  相似文献   

14.
Patterns that resemble strongly skewed size distributions are frequently observed in ecology. A typical example represents tree size distributions of stem diameters. Empirical tests of ecological theories predicting their parameters have been conducted, but the results are difficult to interpret because the statistical methods that are applied to fit such decaying size distributions vary. In addition, binning of field data as well as measurement errors might potentially bias parameter estimates. Here, we compare three different methods for parameter estimation – the common maximum likelihood estimation (MLE) and two modified types of MLE correcting for binning of observations or random measurement errors. We test whether three typical frequency distributions, namely the power-law, negative exponential and Weibull distribution can be precisely identified, and how parameter estimates are biased when observations are additionally either binned or contain measurement error. We show that uncorrected MLE already loses the ability to discern functional form and parameters at relatively small levels of uncertainties. The modified MLE methods that consider such uncertainties (either binning or measurement error) are comparatively much more robust. We conclude that it is important to reduce binning of observations, if possible, and to quantify observation accuracy in empirical studies for fitting strongly skewed size distributions. In general, modified MLE methods that correct binning or measurement errors can be applied to ensure reliable results.  相似文献   

15.
The use of methodologies such as RAPD and AFLP for studying genetic variation in natural populations is widespread in the ecology community. Because data generated using these methods exhibit dominance, their statistical treatment is less straightforward. Several estimators have been proposed for estimating population genetic parameters, assuming simple random sampling and the Hardy-Weinberg (HW) law. The merits of these estimators remain unclear because no comparative studies of their theoretical properties have been carried out. Furthermore, ascertainment bias has not been explicitly modelled. Here, we present a comparison of a set of candidate estimators of null allele frequency (q), locus-specific heterozygosity (h) and average heterozygosity () in terms of their bias, standard error, and root mean square error (RMSE). For estimating q and h, we show that none of the estimators considered has the least RMSE over the parameter space. Our proposed zero-correction procedure, however, generally leads to estimators with improved RMSE. Assuming a beta model for the distribution of null homozygote proportions, we show how correction for ascertainment bias can be carried out using a linear transform of the sample average of h and the truncated beta-binomial likelihood. Simulation results indicate that the maximum likelihood and empirical Bayes estimator of have negligible bias and similar RMSE. Ascertainment bias in estimators of is most pronounced when the beta distribution is J-shaped and negligible when the latter is inverse J-shaped. The validity of the current findings depends importantly on the HW assumption-a point that we illustrate using data from two published studies.  相似文献   

16.
Mapping of environmental variables often relies on map accuracy assessment through cross-validation with the data used for calibrating the underlying mapping model. When the data points are spatially clustered, conventional cross-validation leads to optimistically biased estimates of map accuracy. Several papers have promoted spatial cross-validation as a means to tackle this over-optimism. Many of these papers blame spatial autocorrelation as the cause of the bias and propagate the widespread misconception that spatial proximity of calibration points to validation points invalidates classical statistical validation of maps. We present and evaluate alternative cross-validation approaches for assessing map accuracy from clustered sample data. The first method uses inverse sampling-intensity weighting to correct for selection bias. Sampling-intensity is estimated by a two-dimensional kernel approach. The two other approaches are model-based methods rooted in geostatistics, where the first assumes homogeneity of residual variance over the study area whilst the second accounts for heteroscedasticity as a function of the sampling intensity. The methods were tested and compared against conventional k-fold cross-validation and blocked spatial cross-validation to estimate map accuracy metrics of above-ground biomass and soil organic carbon stock maps covering western Europe. Results acquired over 100 realizations of five sampling designs ranging from non-clustered to strongly clustered confirmed that inverse sampling-intensity weighting and the heteroscedastic model-based method had smaller bias than conventional and spatial cross-validation for all but the most strongly clustered design. For the strongly clustered design where large portions of the maps were predicted by extrapolation, blocked spatial cross-validation was closest to the reference map accuracy metrics, but still biased. For such cases, extrapolation is best avoided by additional sampling or limitation of the prediction area. Weighted cross-validation is recommended for moderately clustered samples, while conventional random cross-validation suits fairly regularly spread samples.  相似文献   

17.
Phylogenomic subsampling is a procedure by which small sets of loci are selected from large genome-scale data sets and used for phylogenetic inference. This step is often motivated by either computational limitations associated with the use of complex inference methods or as a means of testing the robustness of phylogenetic results by discarding loci that are deemed potentially misleading. Although many alternative methods of phylogenomic subsampling have been proposed, little effort has gone into comparing their behavior across different data sets. Here, I calculate multiple gene properties for a range of phylogenomic data sets spanning animal, fungal, and plant clades, uncovering a remarkable predictability in their patterns of covariance. I also show how these patterns provide a means for ordering loci by both their rate of evolution and their relative phylogenetic usefulness. This method of retrieving phylogenetically useful loci is found to be among the top performing when compared with alternative subsampling protocols. Relatively common approaches such as minimizing potential sources of systematic bias or increasing the clock-likeness of the data are found to fare worse than selecting loci at random. Likewise, the general utility of rate-based subsampling is found to be limited: loci evolving at both low and high rates are among the least effective, and even those evolving at optimal rates can still widely differ in usefulness. This study shows that many common subsampling approaches introduce unintended effects in off-target gene properties and proposes an alternative multivariate method that simultaneously optimizes phylogenetic signal while controlling for known sources of bias.  相似文献   

18.
Information on the abundance of the Italian populations of black grouse (Lyrurus tetrix), Alpine rock ptarmigan (Lagopus muta helvetica) and Alpine rock partridge (Alectoris graeca saxatilis) rely only on extrapolations of local data to the national scale, since there is no national standardized survey. Consequently, their status is virtually unknown. We performed a first-ever assessment of a medium-term (1996–2014) population trend of these species using and comparing post-breeding count and bag data at hunting district scale. These data were collected from various authorities in charge of wildlife management and allowed us to test the influence of hunting policies on the estimated trends. Rock partridge showed a stable trend with numbers fluctuating between years, while there was evidence of a severe decline for rock ptarmigan. No general conclusion could be drawn for the black grouse, as we detected lack of consistency of count and bag data. Counts were greatly overdispersed as a result of an uneven count effort among hunting districts. Adding the game management authority as model covariate resulted in more robust trend estimations, suggesting a significant effect of different policies that emerged also as similar hunting pressure across species within authorities. Hunting effort variation over the time was instead negligible. Species-specific game management bias is discussed. Our results highlight the need for a survey scheme or guidelines to be applied uniformly at a national scale.  相似文献   

19.
Next-generation sequencing technologies have generated, and continue to produce, an increasingly large corpus of biological data. The data generated are inherently compositional as they convey only relative information dependent upon the capacity of the instrument, experimental design and technical bias. There is considerable information to be gained through network analysis by studying the interactions between components within a system. Network theory methods using compositional data are powerful approaches for quantifying relationships between biological components and their relevance to phenotype, environmental conditions or other external variables. However, many of the statistical assumptions used for network analysis are not designed for compositional data and can bias downstream results. In this mini-review, we illustrate the utility of network theory in biological systems and investigate modern techniques while introducing researchers to frameworks for implementation. We overview (1) compositional data analysis, (2) data transformations and (3) network theory along with insight on a battery of network types including static-, temporal-, sample-specific- and differential-networks. The intention of this mini-review is not to provide a comprehensive overview of network methods, rather to introduce microbiology researchers to (semi)-unsupervised data-driven approaches for inferring latent structures that may give insight into biological phenomena or abstract mechanics of complex systems.  相似文献   

20.
To overcome random experimental variation, even for simple screens, data from multiple microarrays have to be combined. There are, however, systematic differences between arrays, and any bias remaining after experimental measures to ensure consistency needs to be controlled for. It is often difficult to make the right choice of data transformation and normalisation methods to achieve this end. In this tutorial paper we review the problem and a selection of solutions, explaining the basic principles behind normalisation procedures and providing guidance for their application.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号