首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Repeatability (more precisely the common measure of repeatability, the intra‐class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between‐subject (or between‐group) variation. As a consequence, the non‐repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non‐Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non‐Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation‐based, analysis of variance (ANOVA)‐based and linear mixed‐effects model (LMM)‐based methods, while for non‐Gaussian data, we focus on generalised linear mixed‐effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM‐ and GLMM‐based approaches mainly because of the ease with which confounding variables can be controlled for. Furthermore, we compare two types of repeatability (ordinary repeatability and extrapolated repeatability) in relation to narrow‐sense heritability. This review serves as a collection of guidelines and recommendations for biologists to calculate repeatability and heritability from both Gaussian and non‐Gaussian data.  相似文献   

2.
The meta‐analysis of diagnostic accuracy studies is often of interest in screening programs for many diseases. The typical summary statistics for studies chosen for a diagnostic accuracy meta‐analysis are often two dimensional: sensitivities and specificities. The common statistical analysis approach for the meta‐analysis of diagnostic studies is based on the bivariate generalized linear‐mixed model (BGLMM), which has study‐specific interpretations. In this article, we present a population‐averaged (PA) model using generalized estimating equations (GEE) for making inference on mean specificity and sensitivity of a diagnostic test in the population represented by the meta‐analytic studies. We also derive the marginalized counterparts of the regression parameters from the BGLMM. We illustrate the proposed PA approach through two dataset examples and compare performance of estimators of the marginal regression parameters from the PA model with those of the marginalized regression parameters from the BGLMM through Monte Carlo simulation studies. Overall, both marginalized BGLMM and GEE with sandwich standard errors maintained nominal 95% confidence interval coverage levels for mean specificity and mean sensitivity in meta‐analysis of 25 of more studies even under misspecification of the covariance structure of the bivariate positive test counts for diseased and nondiseased subjects.  相似文献   

3.
Approaches like multiple interval mapping using a multiple-QTL model for simultaneously mapping QTL can aid the identification of multiple QTL, improve the precision of estimating QTL positions and effects, and are able to identify patterns and individual elements of QTL epistasis. Because of the statistical problems in analytically deriving the standard errors and the distributional form of the estimates and because the use of resampling techniques is not feasible for several linked QTL, there is the need to perform large-scale simulation studies in order to evaluate the accuracy of multiple interval mapping for linked QTL and to assess confidence intervals based on the standard statistical theory. From our simulation study it can be concluded that in comparison with a monogenetic background a reliable and accurate estimation of QTL positions and QTL effects of multiple QTL in a linkage group requires much more information from the data. The reduction of the marker interval size from 10 cM to 5 cM led to a higher power in QTL detection and to a remarkable improvement of the QTL position as well as the QTL effect estimates. This is different from the findings for (single) interval mapping. The empirical standard deviations of the genetic effect estimates were generally large and they were the largest for the epistatic effects. These of the dominance effects were larger than those of the additive effects. The asymptotic standard deviation of the position estimates was not a good criterion for the accuracy of the position estimates and confidence intervals based on the standard statistical theory had a clearly smaller empirical coverage probability as compared to the nominal probability. Furthermore the asymptotic standard deviation of the additive, dominance and epistatic effects did not reflect the empirical standard deviations of the estimates very well, when the relative QTL variance was smaller/equal to 0.5. The implications of the above findings are discussed.  相似文献   

4.
Summary Approximate standard errors of genetic parameter estimates were obtained using a simulation technique and approximation formulae for a simple statistical model. The similarity of the corresponding estimates of standard errors from the two methods indicated that the simulation technique may be useful for estimating the precision of genetic parameter estimates for complex models or unbalanced population structures where approxi mation formulae do not apply. The method of generating simulation populations in the computer is outlined, and a technique of setting approximate confidence limits to heritability estimates is described.  相似文献   

5.
Summary In diagnostic medicine, estimating the diagnostic accuracy of a group of raters or medical tests relative to the gold standard is often the primary goal. When a gold standard is absent, latent class models where the unknown gold standard test is treated as a latent variable are often used. However, these models have been criticized in the literature from both a conceptual and a robustness perspective. As an alternative, we propose an approach where we exploit an imperfect reference standard with unknown diagnostic accuracy and conduct sensitivity analysis by varying this accuracy over scientifically reasonable ranges. In this article, a latent class model with crossed random effects is proposed for estimating the diagnostic accuracy of regional obstetrics and gynaecological (OB/GYN) physicians in diagnosing endometriosis. To avoid the pitfalls of models without a gold standard, we exploit the diagnostic results of a group of OB/GYN physicians with an international reputation for the diagnosis of endometriosis. We construct an ordinal reference standard based on the discordance among these international experts and propose a mechanism for conducting sensitivity analysis relative to the unknown diagnostic accuracy among them. A Monte Carlo EM algorithm is proposed for parameter estimation and a BIC‐type model selection procedure is presented. Through simulations and data analysis we show that this new approach provides a useful alternative to traditional latent class modeling approaches used in this setting.  相似文献   

6.
Diagnostic or screening tests are widely used in medical fields to classify patients according to their disease status. Several statistical models for meta‐analysis of diagnostic test accuracy studies have been developed to synthesize test sensitivity and specificity of a diagnostic test of interest. Because of the correlation between test sensitivity and specificity, modeling the two measures using a bivariate model is recommended. In this paper, we extend the current standard bivariate linear mixed model (LMM) by proposing two variance‐stabilizing transformations: the arcsine square root and the Freeman–Tukey double arcsine transformation. We compared the performance of the proposed methods with the standard method through simulations using several performance measures. The simulation results showed that our proposed methods performed better than the standard LMM in terms of bias, root mean square error, and coverage probability in most of the scenarios, even when data were generated assuming the standard LMM. We also illustrated the methods using two real data sets.  相似文献   

7.
Estimating the evolutionary potential of quantitative traits and reliably predicting responses to selection in wild populations are important challenges in evolutionary biology. The genomic revolution has opened up opportunities for measuring relatedness among individuals with precision, enabling pedigree‐free estimation of trait heritabilities in wild populations. However, until now, most quantitative genetic studies based on a genomic relatedness matrix (GRM) have focused on long‐term monitored populations for which traditional pedigrees were also available, and have often had access to knowledge of genome sequence and variability. Here, we investigated the potential of RAD‐sequencing for estimating heritability in a free‐ranging roe deer (Capreolous capreolus) population for which no prior genomic resources were available. We propose a step‐by‐step analytical framework to optimize the quality and quantity of the genomic data and explore the impact of the single nucleotide polymorphism (SNP) calling and filtering processes on the GRM structure and GRM‐based heritability estimates. As expected, our results show that sequence coverage strongly affects the number of recovered loci, the genotyping error rate and the amount of missing data. Ultimately, this had little effect on heritability estimates and their standard errors, provided that the GRM was built from a minimum number of loci (above 7,000). Genomic relatedness matrix‐based heritability estimates thus appear robust to a moderate level of genotyping errors in the SNP data set. We also showed that quality filters, such as the removal of low‐frequency variants, affect the relatedness structure of the GRM, generating lower h2 estimates. Our work illustrates the huge potential of RAD‐sequencing for estimating GRM‐based heritability in virtually any natural population.  相似文献   

8.
Several statistical methods have been proposed for estimating the infection prevalence based on pooled samples, but these methods generally presume the application of perfect diagnostic tests, which in practice do not exist. To optimize prevalence estimation based on pooled samples, currently available and new statistical models were described and compared. Three groups were tested: (a) Frequentist models, (b) Monte Carlo Markov‐Chain (MCMC) Bayesian models, and (c) Exact Bayesian Computation (EBC) models. Simulated data allowed the comparison of the models, including testing the performance under complex situations such as imperfect tests with a sensitivity varying according to the pool weight. In addition, all models were applied to data derived from the literature, to demonstrate the influence of the model on real‐prevalence estimates. All models were implemented in the freely available R and OpenBUGS software and are presented in Appendix S1. Bayesian models can flexibly take into account the imperfect sensitivity and specificity of the diagnostic test (as well as the influence of pool‐related or external variables) and are therefore the method of choice for calculating population prevalence based on pooled samples. However, when using such complex models, very precise information on test characteristics is needed, which may in general not be available.  相似文献   

9.
Individual‐based estimates of the degree of inbreeding or parental relatedness from pedigrees provide a critical starting point for studies of inbreeding depression, but in practice wild pedigrees are difficult to obtain. Because inbreeding increases the proportion of genomewide loci that are identical by descent, inbreeding variation within populations has the potential to generate observable correlations between heterozygosity measured using molecular markers and a variety of fitness related traits. Termed heterozygosity‐fitness correlations (HFCs), these correlations have been observed in a wide variety of taxa. The difficulty of obtaining wild pedigree data, however, means that empirical investigations of how pedigree inbreeding influences HFCs are rare. Here, we assess evidence for inbreeding depression in three life‐history traits (hatching and fledging success and juvenile survival) in an isolated population of Stewart Island robins using both pedigree‐ and molecular‐derived measures of relatedness. We found results from the two measures were highly correlated and supported evidence for significant but weak inbreeding depression. However, standardized effect sizes for inbreeding depression based on the pedigree‐based kin coefficients (k) were greater and had smaller standard errors than those based on molecular genetic measures of relatedness (RI), particularly for hatching and fledging success. Nevertheless, the results presented here support the use of molecular‐based measures of relatedness in bottlenecked populations when information regarding inbreeding depression is desired but pedigree data on relatedness are unavailable.  相似文献   

10.
Species distributional or trait data based on range map (extent‐of‐occurrence) or atlas survey data often display spatial autocorrelation, i.e. locations close to each other exhibit more similar values than those further apart. If this pattern remains present in the residuals of a statistical model based on such data, one of the key assumptions of standard statistical analyses, that residuals are independent and identically distributed (i.i.d), is violated. The violation of the assumption of i.i.d. residuals may bias parameter estimates and can increase type I error rates (falsely rejecting the null hypothesis of no effect). While this is increasingly recognised by researchers analysing species distribution data, there is, to our knowledge, no comprehensive overview of the many available spatial statistical methods to take spatial autocorrelation into account in tests of statistical significance. Here, we describe six different statistical approaches to infer correlates of species’ distributions, for both presence/absence (binary response) and species abundance data (poisson or normally distributed response), while accounting for spatial autocorrelation in model residuals: autocovariate regression; spatial eigenvector mapping; generalised least squares; (conditional and simultaneous) autoregressive models and generalised estimating equations. A comprehensive comparison of the relative merits of these methods is beyond the scope of this paper. To demonstrate each method's implementation, however, we undertook preliminary tests based on simulated data. These preliminary tests verified that most of the spatial modeling techniques we examined showed good type I error control and precise parameter estimates, at least when confronted with simplistic simulated data containing spatial autocorrelation in the errors. However, we found that for presence/absence data the results and conclusions were very variable between the different methods. This is likely due to the low information content of binary maps. Also, in contrast with previous studies, we found that autocovariate methods consistently underestimated the effects of environmental controls of species distributions. Given their widespread use, in particular for the modelling of species presence/absence data (e.g. climate envelope models), we argue that this warrants further study and caution in their use. To aid other ecologists in making use of the methods described, code to implement them in freely available software is provided in an electronic appendix.  相似文献   

11.
Species distribution models have been widely used to predict species distributions for various purposes, including conservation planning, and climate change impact assessment. The success of these applications relies heavily on the accuracy of the models. Various measures have been proposed to assess the accuracy of the models. Rigorous statistical analysis should be incorporated in model accuracy assessment. However, since relevant information about the statistical properties of accuracy measures is scattered across various disciplines, ecologists find it difficult to select the most appropriate ones for their research. In this paper, we review accuracy measures that are currently used in species distribution modelling (SDM), and introduce additional metrics that have potential applications in SDM. For the commonly used measures (which are also intensively studied by statisticians), including overall accuracy, sensitivity, specificity, kappa, and area and partial area under the ROC curves, promising methods to construct confidence intervals and statistically compare the accuracy between two models are given. For other accuracy measures, methods to estimate standard errors are given, which can be used to construct approximate confidence intervals. We also suggest that as general tools, computer‐intensive methods, especially bootstrap and randomization methods can be used in constructing confidence intervals and statistical tests if suitable analytic methods cannot be found. Usually, these computer‐intensive methods provide robust results.  相似文献   

12.
Aim To explore the impacts of imperfect reference data on the accuracy of species distribution model predictions. The main focus is on impacts of the quality of reference data (labelling accuracy) and, to a lesser degree, data quantity (sample size) on species presence–absence modelling. Innovation The paper challenges the common assumption that some popular measures of model accuracy and model predictions are prevalence independent. It highlights how imperfect reference data may impact on a study and the actions that may be taken to address problems. Main conclusions The theoretical independence of prevalence of popular accuracy measures, such as sensitivity, specificity, true skills statistics (TSS) and area under the receiver operating characteristic curve (AUC), is unlikely to occur in practice due to reference data error; all of these measures of accuracy, together with estimates of species occurrence, showed prevalence dependency arising through the use of a non‐gold‐standard reference. The number of cases used also had implications for the ability of a study to meet its objectives. Means to reduce the negative effects of imperfect reference data in study design and interpretation are suggested.  相似文献   

13.
Summary We introduce a nearly automatic procedure to locate and count the quantum dots in images of kinesin motor assays. Our procedure employs an approximate likelihood estimator based on a two‐component mixture model for the image data; the first component has a normal distribution, and the other component is distributed as a normal random variable plus an exponential random variable. The normal component has an unknown variance, which we model as a function of the mean. We use B‐splines to estimate the variance function during a training run on a suitable image, and the estimate is used to process subsequent images. Parameter estimates are generated for each image along with estimates of standard errors, and the number of dots in the image is determined using an information criterion and likelihood ratio tests. Realistic simulations show that our procedure is robust and that it leads to accurate estimates, both of parameters and of standard errors.  相似文献   

14.
It has long been known that insufficient consideration of spatial autocorrelation leads to unreliable hypothesis‐tests and inaccurate parameter estimates. Yet, ecologists are confronted with a confusing array of methods to account for spatial autocorrelation. Although Beale et al. (2010) provided guidance for continuous data on regular grids, researchers still need advice for other types of data in more flexible spatial contexts. In this paper, we extend Beale et al. (2010)‘s work to count data on both regularly‐ and irregularly‐spaced plots, the latter being commonly encountered in ecological studies. Through a simulation‐based approach, we assessed the accuracy and the type I errors of two frequentist and two Bayesian ready‐to‐use methods in the family of generalized mixed models, with distance‐based or neighbourhood‐based correlated random effects. In addition, we tested whether the methods are robust to spatial non‐stationarity, and over‐ and under‐dispersion – both typical features of species distribution count data which violate standard regression assumptions. In the simplest of our simulated datasets, the two frequentist methods gave inflated type I errors, while the two Bayesian methods provided satisfying results. When facing real‐world complexities, the distance‐based Bayesian method (MCMC with Langevin–Hastings updates) performed best of all. We hope that, in the light of our results, ecological researchers will feel more comfortable including spatial autocorrelation in their analyses of count data.  相似文献   

15.
The performance of diagnostic tests is often evaluated by estimating their sensitivity and specificity with respect to a traditionally accepted standard test regarded as a “gold standard” in making the diagnosis. Correlated samples of binary data arise in many fields of application. The fundamental unit for analysis is occasionally the site rather than the subject in site-specific studies. Statistical methods that take into account the within-subject corelation should be employed to estimate the sensitivity and the specificity of diagnostic tests since site-specific results within a subject can be highly correlated. I introduce several statistical methods for the estimation of the sensitivity and the specificity of sitespecific diagnostic tests. I apply these techniques to the data from a study involving an enzymatic diagnostic test to motivate and illustrate the estimation of the sensitivity and the specificity of periodontal diagnostic tests. I present results from a simulation study for the estimation of diagnostic sensitivity when the data are correlated within subjects. Through a simulation study, I compare the performance of the binomial estimator pCBE, the ratio estimator pCBE, the weighted estimator pCWE, the intracluster correlation estimator pCIC, and the generalized estimating equation (GEE) estimator PCGEE in terms of biases, observed variances, mean squared errors (MSE), relative efficiencies of their variances and 95 per cent coverage proportions. I recommend using PCBE when σ == 0. I recommend use of the weighted estimator PCWE when σ = 0.6. When σ == 0.2 or σ == 0.4, and the number of subjects is at least 30, PCGEE performs well.  相似文献   

16.
The generalized estimating equations (GEE) derived by Liang and Zeger to analyze longitudinal data have been used in a wide range of medical and biological applications. To make regression a useful and meaningful statistical tool, emphasis should be placed not only on inference or fitting, but also on diagnosing potential data problems. Most of the usual diagnostics for linear regression models have been generalized for GEE. However, global influence measures based on the volume of confidence ellipsoids are not available for GEE analysis. This article presents an extension of these measures that is valid for correlated‐measures regression analysis using GEEs. The proposed measures are illustrated by an analysis of epileptic seizure count data arising from a study of prograbide as an adjuvant therapy for partial seizures and some simulated data sets.  相似文献   

17.
Summary Absence of a perfect reference test is an acknowledged source of bias in diagnostic studies. In the case of tuberculous pleuritis, standard reference tests such as smear microscopy, culture and biopsy have poor sensitivity. Yet meta‐analyses of new tests for this disease have always assumed the reference standard is perfect, leading to biased estimates of the new test’s accuracy. We describe a method for joint meta‐analysis of sensitivity and specificity of the diagnostic test under evaluation, while considering the imperfect nature of the reference standard. We use a Bayesian hierarchical model that takes into account within‐ and between‐study variability. We show how to obtain pooled estimates of sensitivity and specificity, and how to plot a hierarchical summary receiver operating characteristic curve. We describe extensions of the model to situations where multiple reference tests are used, and where index and reference tests are conditionally dependent. The performance of the model is evaluated using simulations and illustrated using data from a meta‐analysis of nucleic acid amplification tests (NAATs) for tuberculous pleuritis. The estimate of NAAT specificity was higher and the sensitivity lower compared to a model that assumed that the reference test was perfect.  相似文献   

18.
Forensic age estimation is receiving growing attention from researchers in the last few years. Accurate estimates of age are needed both for identifying real age in individuals without any identity document and assessing it for human remains. The methods applied in such context are mostly based on radiological analysis of some anatomical districts and entail the use of a regression model. However, estimating chronological age by regression models leads to overestimated ages in younger subjects and underestimated ages in older ones. We introduced a full Bayesian calibration method combined with a segmented function for age estimation that relied on a Normal distribution as a density model to mitigate this bias. In this way, we were also able to model the decreasing growth rate in juveniles. We compared our new Bayesian‐segmented model with other existing approaches. The proposed method helped producing more robust and precise forecasts of age than compared models while exhibited comparable accuracy in terms of forecasting measures. Our method seemed to overcome the estimation bias also when applied to a real data set of South‐African juvenile subjects.  相似文献   

19.
An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

20.
Sampling from a finite population on multiple occasions introduces dependencies between the successive samples when overlap is designed. Such sampling designs lead to efficient statistical estimates, while they allow estimating changes over time for the targeted outcomes. This makes them very popular in real‐world statistical practice. Sampling with partial replacement can also be very efficient in biological and environmental studies where estimation of toxicants and its trends over time is the main interest. Sampling with partial replacement is designed here on two occasions in order to estimate the median concentration of chemical constituents quantified by means of liquid chromatography coupled with tandem mass spectrometry. Such data represent relative peak areas resulting from the chromatographic analysis. They are therefore positive‐valued and skewed data, and are commonly fitted very well by the log‐normal model. A log‐normal model is assumed here for chemical constituents quantified in mainstream cigarette smoke in a real case study. Combining design‐based and model‐based approaches for statistical inference, we seek for the median estimation of chemical constituents by sampling with partial replacement on two time occasions. We also discuss the limitations of extending the proposed approach to other skewed population models. The latter is investigated by means of a Monte Carlo simulation study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号