首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.  相似文献   

2.
In the Gulf of Mexico (GOM), fish biomass estimates are necessary for the evaluation of habitat use and function following the mandate for ecosystem-based fisheries management in the recently reauthorized Sustainable Fisheries Act of 2007. Acoustic surveys have emerged as a potential tool to estimate fish biomass in shallow-water estuaries, however, the transformation of acoustic data into an index of fish biomass is not straightforward. In this article, we examine the consequences of equation selection for target strength (TS) to fish length relationships on potential error generation in hydroacoustic fish biomass estimates. We applied structural equation models (SEMs) to evaluate how our choice of an acoustic TS–fish length equation affected our biomass estimates, and how error occurred and propagated during this process. To demonstrate the magnitude of the error when applied to field data, we used SEMs on normally distributed simulated data to better understand the sources of error involved with converting acoustic data to fish biomass. As such, we describe where, and to what magnitude, error propagates when estimating fish biomass. Estimates of fish lengths were affected by measurement errors of TS, and from inexact relationships between fish length and TS. Differences in parameter estimates resulted in significant differences in fish biomass estimates and led to the conclusion that in the absence of known TS–fish length relationships, Love’s (J Acoust Soc Am 46:746–752, 1969) lateral-aspect equation may be an acceptable substitute for an ecosystem-specific TS–fish length relationship. Based upon SEMs applied to simulated data, perhaps the most important, yet most variable, component is the mean volume backscattering strength, which significantly inflated biomass errors in approximately 10% of the cases. Handling editor: M. Power  相似文献   

3.
Modelling dietary data, and especially 24-hr dietary recall (24HDR) data, is a challenge. Ignoring the inherent measurement error (ME) leads to biased effect estimates when the association between an exposure and an outcome is investigated. We propose an adapted simulation extrapolation (SIMEX) algorithm for modelling dietary exposures. For this purpose, we exploit the ME model of the NCI method where we assume the assumption of normally distributed errors of the reported intake on the Box-Cox transformed scale and of unbiased recalls on the original scale. According to the SIMEX algorithm, remeasurements of the observed data with additional ME are generated in order to estimate the association between the level of ME and the resulting effect estimate. Subsequently, this association is extrapolated to the case of zero ME to obtain the corrected estimate. We show that the proposed method fulfils the key property of the SIMEX approach, that is, that the MSE of the generated data will converge to zero if the ME variance converges to zero. Furthermore, the method is applied to real 24HDR data of the I.Family study to correct the effects of salt and alcohol intake on blood pressure. In a simulation study, the method is compared with the NCI method resulting in effect estimates with either smaller MSE or smaller bias in certain situations. In addition, we found our method to be more informative and easier to implement. Therefore, we conclude that the proposed method is useful to promote the dissemination of ME correction methods in nutritional epidemiology.  相似文献   

4.
Genotypes produced from samples collected non-invasively in harsh field conditions often lack the full complement of data from the selected microsatellite loci. The application to genetic mark-recapture methodology in wildlife species can therefore be prone to misidentifications leading to both ‘true non-recaptures’ being falsely accepted as recaptures (Type I errors) and ‘true recaptures’ being undetected (Type II errors). Here we present a new likelihood method that allows every pairwise genotype comparison to be evaluated independently. We apply this method to determine the total number of recaptures by estimating and optimising the balance between Type I errors and Type II errors. We show through simulation that the standard error of recapture estimates can be minimised through our algorithms. Interestingly, the precision of our recapture estimates actually improved when we included individuals with missing genotypes, as this increased the number of pairwise comparisons potentially uncovering more recaptures. Simulations suggest that the method is tolerant to per locus error rates of up to 5% per locus and can theoretically work in datasets with as little as 60% of loci genotyped. Our methods can be implemented in datasets where standard mismatch analyses fail to distinguish recaptures. Finally, we show that by assigning a low Type I error rate to our matching algorithms we can generate a dataset of individuals of known capture histories that is suitable for the downstream analysis with traditional mark-recapture methods.  相似文献   

5.
Although habitat fragmentation is one of the greatest threats to biodiversity worldwide, virtually no attention has been paid to the quantification of error in fragmentation statistics. Landscape pattern indices (LPIs), such as mean patch size and number of patches, are routinely used to quantify fragmentation and are often calculated using remote-sensing imagery that has been classified into different land-cover classes. No classified map is ever completely correct, so we asked if different maps with similar misclassification rates could result in widely different errors in pattern indices. We simulated landscapes with varying proportions of habitat and clumpiness (autocorrelation) and then simulated classification errors on the same maps. We simulated higher misclassification at patch edges (as is often observed), and then used a smoothing algorithm routinely used on images to correct salt-and-pepper classification error. We determined how well classification errors (and smoothing) corresponded to errors seen in four pattern indices. Maps with low misclassification rates often yielded errors in LPIs of much larger magnitude and substantial variability. Although smoothing usually improved classification error, it sometimes increased LPI error and reversed the direction of error in LPIs introduced by misclassification. Our results show that classification error is not always a good predictor of errors in LPIs, and some types of image postprocessing (for example, smoothing) might result in the underestimation of habitat fragmentation. Furthermore, our results suggest that there is potential for large errors in nearly every landscape pattern analysis ever published, because virtually none quantify the errors in LPIs themselves.  相似文献   

6.
Process life cycle assessment (PLCA) is widely used to quantify environmental flows associated with the manufacturing of products and other processes. As PLCA always depends on defining a system boundary, its application involves truncation errors. Different methods of estimating truncation errors are proposed in the literature; most of these are based on artificially constructed system complete counterfactuals. In this article, we review the literature on truncation errors and their estimates and systematically explore factors that influence truncation error estimates. We classify estimation approaches, together with underlying factors influencing estimation results according to where in the estimation procedure they occur. By contrasting different PLCA truncation/error modeling frameworks using the same underlying input‐output (I‐O) data set and varying cut‐off criteria, we show that modeling choices can significantly influence estimates for PLCA truncation errors. In addition, we find that differences in I‐O and process inventory databases, such as missing service sector activities, can significantly affect estimates of PLCA truncation errors. Our results expose the challenges related to explicit statements on the magnitude of PLCA truncation errors. They also indicate that increasing the strictness of cut‐off criteria in PLCA has only limited influence on the resulting truncation errors. We conclude that applying an additional I‐O life cycle assessment or a path exchange hybrid life cycle assessment to identify where significant contributions are located in upstream layers could significantly reduce PLCA truncation errors.  相似文献   

7.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to adjust for multiple comparisons when studying brain‐phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF)‐based tools can have inflated family‐wise error rates (FWERs). This has led to substantial controversy as to which processing choices are necessary to control the FWER using GRF‐based SEI. The failure of GRF‐based methods is due to unrealistic assumptions about the spatial covariance function of the imaging data. A permutation procedure is the most robust SEI tool because it estimates the spatial covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi‐) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the spatial covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use the methods to study the association between performance and executive functioning in a working memory functional magnetic resonance imaging study. The sPBJ has similar or greater power to the PBJ and permutation procedures while maintaining the nominal type 1 error rate in reasonable sample sizes. We provide an R package to perform inference using the PBJ and sPBJ procedures.  相似文献   

8.
9.
Microsatellite genotyping errors will be present in all but the smallest data sets and have the potential to undermine the conclusions of most downstream analyses. Despite this, little rigorous effort has been made to quantify the size of the problem and to identify the commonest sources of error. Here, we use a large data set comprising almost 2000 Antarctic fur seals Arctocephalus gazella genotyped at nine hypervariable microsatellite loci to explore error detection methods, common sources of error and the consequences of errors on paternal exclusion. We found good concordance among a range of contrasting approaches to error-rate estimation, our range being 0.0013 to 0.0074 per single locus PCR (polymerase chain reaction). The best approach probably involves blind repeat-genotyping, but this is also the most labour-intensive. We show that several other approaches are also effective at detecting errors, although the most convenient alternative, namely mother-offspring comparisons, yielded the lowest estimate of the error rate. In total, we found 75 errors, emphasizing their ubiquitous presence. The most common errors involved the misinterpretation of allele banding patterns (n = 60, 80%) and of these, over a third (n = 22, 36.7%) were due to confusion between homozygote and adjacent allele heterozygote genotypes. A specific test for whether a data set contains the expected number of adjacent allele heterozygotes could provide a useful tool with which workers can assess the likely size of the problem. Error rates are also positively correlated with both locus polymorphism and product size, again indicating aspects where extra effort at error reduction should be directed. Finally, we conducted simulations to explore the potential impact of genotyping errors on paternity exclusion. Error rates as low as 0.01 per allele resulted in a rate of false paternity exclusion exceeding 20%. Errors also led to reduced estimates of male reproductive skew and increases in the numbers of pups that matched more than one candidate male. Because even modest error rates can be strongly influential, we recommend that error rates should be routinely published and that researchers make an attempt to calculate how robust their analyses are to errors.  相似文献   

10.
Genotyping errors are present in almost all genetic data and can affect biological conclusions of a study, particularly for studies based on individual identification and parentage. Many statistical approaches can incorporate genotyping errors, but usually need accurate estimates of error rates. Here, we used a new microsatellite data set developed for brown rockfish (Sebastes auriculatus) to estimate genotyping error using three approaches: (i) repeat genotyping 5% of samples, (ii) comparing unintentionally recaptured individuals and (iii) Mendelian inheritance error checking for known parent–offspring pairs. In each data set, we quantified genotyping error rate per allele due to allele drop‐out and false alleles. Genotyping error rate per locus revealed an average overall genotyping error rate by direct count of 0.3%, 1.5% and 1.7% (0.002, 0.007 and 0.008 per allele error rate) from replicate genotypes, known parent–offspring pairs and unintentionally recaptured individuals, respectively. By direct‐count error estimates, the recapture and known parent–offspring data sets revealed an error rate four times greater than estimated using repeat genotypes. There was no evidence of correlation between error rates and locus variability for all three data sets, and errors appeared to occur randomly over loci in the repeat genotypes, but not in recaptures and parent–offspring comparisons. Furthermore, there was no correlation in locus‐specific error rates between any two of the three data sets. Our data suggest that repeat genotyping may underestimate true error rates and may not estimate locus‐specific error rates accurately. We therefore suggest using methods for error estimation that correspond to the overall aim of the study (e.g. known parent–offspring comparisons in parentage studies).  相似文献   

11.
Pedigree data can be evaluated, and subsequently corrected, by analysis of the distribution of genetic markers, taking account of the possibility of mistyping . Using a model of pedigree error developed previously, we obtained the maximum likelihood estimates of error parameters in pedigree data from Tokelau. Posterior probabilities for the possible true relationships in each family are conditional on the putative relationships and the marker data are calculated using the parameter estimates. These probabilities are used as a basis for discriminating between pedigree error and genetic marker errors in families where inconsistencies have been observed. When applied to the Tokelau data and compared with the results of retyping inconsistent families, these statistical procedures are able to discriminate between pedigree and marker error, with approximately 90% accuracy, for families with two or more offspring. The large proportion of inconsistencies inferred to be due to marker error (61%) indicates the importance of discriminating between error sources when judging the reliability of putative relationship data. Application of our model of pedigree error has proved to be an efficient way of determining and subsequently correcting sources of error in extensive pedigree data collected in large surveys.  相似文献   

12.
Christensen WF 《Biometrics》2011,67(3):947-957
When predicting values for the measurement-error-free component of an observed spatial process, it is generally assumed that the process has a common measurement error variance. However, it is often the case that each measurement in a spatial data set has a known, site-specific measurement error variance, rendering the observed process nonstationary. We present a simple approach for estimating the semivariogram of the unobservable measurement-error-free process using a bias adjustment of the classical semivariogram formula. We then develop a new kriging predictor that filters the measurement errors. For scenarios where each site's measurement error variance is a function of the process of interest, we recommend an approach that also uses a variance-stabilizing transformation. The properties of the heterogeneous variance measurement-error-filtered kriging (HFK) predictor and variance-stabilized HFK predictor, and the improvement of these approaches over standard measurement-error-filtered kriging are demonstrated using simulation. The approach is illustrated with climate model output from the Hudson Strait area in northern Canada. In the illustration, locations with high or low measurement error variances are appropriately down- or upweighted in the prediction of the underlying process, yielding a realistically smooth picture of the phenomenon of interest.  相似文献   

13.
ABSTRACT Censusing seabirds from coastal areas requires reliable estimates of bird numbers and the distances of the birds from the coastline. Logistical constraints make visual estimation of distances the only feasible method in many studies. We tested the accuracy of visually estimated offshore distances of six migratory seabird species in the Strait of Gibraltar using simultaneous measurements obtained by radar. Most birds (91%) were detected within 3 km of the coast and we truncated our calibration at this distance. We found a strong correlation between radar and visual estimates (R2= 0.83, P < 0.0001). The magnitude of errors in visual estimates was moderate and ranged from 0.08 to 0.20 for different distances and observers. Among the factors potentially affecting the accuracy of visual estimates of distance to seabird in our study were observer identity, bird species, bird behavior, and weather; the most parsimonious model in our study included observer identity as the only predictor, and no model with more than one predictor had a smaller Akaike's information criterion value. Radar can be used to help train observers and to reduce biases in visual estimates of distances by means of calibration. When no other methods are available to accurately measure distances to seabirds, visual estimates of distances, recorded by experienced observers and once calibrated with radar (or other ground‐truthing methods), may be acceptable for different species under a wide range of environmental conditions.  相似文献   

14.
Heritability is a central parameter in quantitative genetics, from both an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within- and between-genotype variability. This approach estimates broad-sense heritability and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker-based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here is to use mixed models at the individual plant or plot level. Using statistical arguments, simulations, and real data we investigate the feasibility of both approaches and how these affect genomic prediction with the best linear unbiased predictor and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at the individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For genome-wide association studies on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.  相似文献   

15.
Human subjects are proficient at tracking the mean and variance of rewards and updating these via prediction errors. Here, we addressed whether humans can also learn about higher-order relationships between distinct environmental outcomes, a defining ecological feature of contexts where multiple sources of rewards are available. By manipulating the degree to which distinct outcomes are correlated, we show that subjects implemented an explicit model-based strategy to learn the associated outcome correlations and were adept in using that information to dynamically adjust their choices in a task that required a minimization of outcome variance. Importantly, the experimentally generated outcome correlations were explicitly represented neuronally in right midinsula with a learning prediction error signal expressed in rostral anterior cingulate cortex. Thus, our data show that the human brain represents higher-order correlation structures between rewards, a core adaptive ability whose immediate benefit is optimized sampling.  相似文献   

16.
Conway-Cranos LL  Doak DF 《Oecologia》2011,167(1):199-207
Repeated, spatially explicit sampling is widely used to characterize the dynamics of sessile communities in both terrestrial and aquatic systems, yet our understanding of the consequences of errors made in such sampling is limited. In particular, when Markov transition probabilities are calculated by tracking individual points over time, misidentification of the same spatial locations will result in biased estimates of transition probabilities, successional rates, and community trajectories. Nonetheless, to date, all published studies that use such data have implicitly assumed that resampling occurs without error when making estimates of transition rates. Here, we develop and test a straightforward maximum likelihood approach, based on simple field estimates of resampling errors, to arrive at corrected estimates of transition rates between species in a rocky intertidal community. We compare community Markov models based on raw and corrected transition estimates using data from Endocladia muricata-dominated plots in a California intertidal assemblage, finding that uncorrected predictions of succession consistently overestimate recovery time. We tested the precision and accuracy of the approach using simulated datasets and found good performance of our estimation method over a range of realistic sample sizes and error rates.  相似文献   

17.
Recently released data on non-cancer mortality in Japanese atomic bomb survivors are analysed using a variety of generalised relative risk models that take account of errors in estimates of dose to assess the dose-response at low doses. If linear-threshold, quadratic-threshold or linear-quadratic-threshold relative risk models (the dose-response is assumed to be linear, quadratic or linear-quadratic above the threshold, respectively) are fitted to the non-cancer data there are no statistically significant (p>0.10) indications of threshold departures from linearity, quadratic curvature or linear-quadratic curvature. These findings are true irrespective of the assumed magnitude of dosimetric error, between 25%–45% geometric standard deviations. In general, increasing the assumed magnitude of dosimetric error had little effect on the central estimates of the threshold, but somewhat widened the associated confidence intervals. If a power of dose model is fitted, there is little evidence (p>0.10) that the power of dose in the dose-response is statistically significantly different from 1, again irrespective of the assumed magnitude of dosimetric errors in the range 25%–45%. Again, increasing the size of the errors resulted in wider confidence intervals on the power of dose, without marked effect on the central estimates. In general these findings remain true for various non-cancer disease subtypes.  相似文献   

18.
Exposure measurement error can result in a biased estimate of the association between an exposure and outcome. When the exposure–outcome relationship is linear on the appropriate scale (e.g. linear, logistic) and the measurement error is classical, that is the result of random noise, the result is attenuation of the effect. When the relationship is non‐linear, measurement error distorts the true shape of the association. Regression calibration is a commonly used method for correcting for measurement error, in which each individual's unknown true exposure in the outcome regression model is replaced by its expectation conditional on the error‐prone measure and any fully measured covariates. Regression calibration is simple to execute when the exposure is untransformed in the linear predictor of the outcome regression model, but less straightforward when non‐linear transformations of the exposure are used. We describe a method for applying regression calibration in models in which a non‐linear association is modelled by transforming the exposure using a fractional polynomial model. It is shown that taking a Bayesian estimation approach is advantageous. By use of Markov chain Monte Carlo algorithms, one can sample from the distribution of the true exposure for each individual. Transformations of the sampled values can then be performed directly and used to find the expectation of the transformed exposure required for regression calibration. A simulation study shows that the proposed approach performs well. We apply the method to investigate the relationship between usual alcohol intake and subsequent all‐cause mortality using an error model that adjusts for the episodic nature of alcohol consumption.  相似文献   

19.
Population stratification may confound the results of genetic association studies among unrelated individuals from admixed populations. Several methods have been proposed to estimate the ancestral information in admixed populations and used to adjust the population stratification in genetic association tests. We evaluate the performances of three different methods: maximum likelihood estimation, ADMIXMAP and Structure through various simulated data sets and real data from Latino subjects participating in a genetic study of asthma. All three methods provide similar information on the accuracy of ancestral estimates and control type I error rate at an approximately similar rate. The most important factor in determining accuracy of the ancestry estimate and in minimizing type I error rate is the number of markers used to estimate ancestry. We demonstrate that approximately 100 ancestry informative markers (AIMs) are required to obtain estimates of ancestry that correlate with correlation coefficients more than 0.9 with the true individual ancestral proportions. In addition, after accounting for the ancestry information in association tests, the excess of type I error rate is controlled at the 5% level when 100 markers are used to estimate ancestry. However, since the effect of admixture on the type I error rate worsens with sample size, the accuracy of ancestry estimates also needs to increase to make the appropriate correction. Using data from the Latino subjects, we also apply these methods to an association study between body mass index and 44 AIMs. These simulations are meant to provide some practical guidelines for investigators conducting association studies in admixed populations.  相似文献   

20.
Pennello GA  Devesa SS  Gail MH 《Biometrics》1999,55(3):774-781
Commonly used methods for depicting geographic variation in cancer rates are based on rankings. They identify where the rates are high and low but do not indicate the magnitude of the rates nor their variability. Yet such measures of variability may be useful in suggesting which types of cancer warrant further analytic studies of localized risk factors. We consider a mixed effects model in which the logarithm of the mean Poisson rate is additive in fixed stratum effects (e.g., age effects) and in logarithms of random relative risk effects associated with geographic areas. These random effects are assumed to follow a gamma distribution with unit mean and variance 1/alpha, similar to Clayton and Kaldor (1987, Biometrics 43, 671-681). We present maximum likelihood and method-of-moments estimates with standard errors for inference on alpha -1/2, the relative risk standard deviation (RRSD). The moment estimates rely on only the first two moments of the Poisson and gamma distributions but have larger standard errors than the maximum likelihood estimates. We compare these estimates with other measures of variability. Several examples suggest that the RRSD estimates have advantages compared to other measures of variability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号