首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the estimation of the scaled mutation parameter θ, which is one of the parameters of key interest in population genetics. We provide a general result showing when estimators of θ can be improved using shrinkage when taking the mean squared error as the measure of performance. As a consequence, we show that Watterson’s estimator is inadmissible, and propose an alternative shrinkage-based estimator that is easy to calculate and has a smaller mean squared error than Watterson’s estimator for all possible parameter values 0<θ<. This estimator is admissible in the class of all linear estimators. We then derive improved versions for other estimators of θ, including the MLE. We also investigate how an improvement can be obtained both when combining information from several independent loci and when explicitly taking into account recombination. A simulation study provides information about the amount of improvement achieved by our alternative estimators.  相似文献   

2.
We show that the number of segregating sites is a sufficient statistic for the scaled mutation parameter (θ) in the limit as the number of sites tends to infinity and there is free recombination between sites. We assume that the mutation parameter at each site tends to zero such than the total mutation parameter (θ) is constant in the limit. Our results show that Watterson’s estimator is the maximum likelihood estimator in this case, but that it estimates a composite parameter which is different for different mutation models. Some of our results hold when recombination is limited, because Watterson’s estimator is an unbiased, method-of-moments estimator regardless of the recombination rate. The quantity it estimates depends on the details of how mutations occur at each site.  相似文献   

3.
Estimating effective population size or mutation rate with microsatellites   总被引:4,自引:0,他引:4  
Xu H  Fu YX 《Genetics》2004,166(1):555-563
Microsatellites are short tandem repeats that are widely dispersed among eukaryotic genomes. Many of them are highly polymorphic; they have been used widely in genetic studies. Statistical properties of all measures of genetic variation at microsatellites critically depend upon the composite parameter theta = 4Nmicro, where N is the effective population size and micro is mutation rate per locus per generation. Since mutation leads to expansion or contraction of a repeat number in a stepwise fashion, the stepwise mutation model has been widely used to study the dynamics of these loci. We developed an estimator of theta, theta; (F), on the basis of sample homozygosity under the single-step stepwise mutation model. The estimator is unbiased and is much more efficient than the variance-based estimator under the single-step stepwise mutation model. It also has smaller bias and mean square error (MSE) than the variance-based estimator when the mutation follows the multistep generalized stepwise mutation model. Compared with the maximum-likelihood estimator theta; (L) by, theta; (F) has less bias and smaller MSE in general. theta; (L) has a slight advantage when theta is small, but in such a situation the bias in theta; (L) may be more of a concern.  相似文献   

4.
Small area estimation methods typically combine direct estimatesfrom a survey with predictions from a model in order to obtainestimates of population quantities with reduced mean squarederror. When the auxiliary information used in the model is measuredwith error, using a small area estimator such as the Fay–Herriotestimator while ignoring measurement error may be worse thansimply using the direct estimator. We propose a new small areaestimator that accounts for sampling variability in the auxiliaryinformation, and derive its properties, in particular showingthat it is approximately unbiased. The estimator is appliedto predict quantities measured in the U.S. National Health andNutrition Examination Survey, with auxiliary information fromthe U.S. National Health Interview Survey.  相似文献   

5.
It is generally accepted that mutation rates of RNA viruses are inherently high due to the lack of proofreading mechanisms. However, direct estimates of mutation rate are surprisingly scarce, in particular for plant viruses. Here, based on the analysis of in vivo mutation frequencies in tobacco etch virus, we calculate an upper-bound mutation rate estimation of 3×10−5 per site and per round of replication; a value which turns out to be undistinguishable from the methodological error. Nonetheless, the value is barely on the lower side of the range accepted for RNA viruses, although in good agreement with the only direct estimate obtained for other plant viruses. These observations suggest that, perhaps, differences in the selective pressures operating during plant virus evolution may have driven their mutation rates towards values lower than those characteristic of other RNA viruses infecting bacteria or animals.  相似文献   

6.
In sample surveys, it is usual to make use of auxiliary information to increase the precision of estimators. We propose a new exponential ratio-type estimator of a finite population mean using linear combination of two auxiliary variables and obtain mean square error (MSE) equation for proposed estimator. We find theoretical conditions that make proposed estimator more efficient than traditional multivariate ratio estimator using information of two auxiliary variables, the estimator of Bahl and Tuteja and the estimator proposed by Abu-Dayeh et al. In addition, we support these theoretical results with the aid of two numerical examples.  相似文献   

7.
Reynolds J  Weir BS  Cockerham CC 《Genetics》1983,105(3):767-779
A distance measure for populations diverging by drift only is based on the coancestry coefficient θ, and three estimators of the distance D = -ln(1 - θ) are constructed for multiallelic, multilocus data. Simulations of a monoecious population mating at random showed that a weighted ratio of single-locus estimators performed better than an unweighted average or a least squares estimator. Jackknifing over loci provided satisfactory variance estimates of distance values. In the drift situation, in which mutation is excluded, the weighted estimator of D appears to be a better measure of distance than others that have appeared in the literature.  相似文献   

8.
Ueki  Masao; Fueda  Kaoru 《Biometrika》2007,94(2):509-511
This note presents a direct adjustment of the estimative predictionlimit to reduce the coverage error from a target value to third-orderaccuracy. The adjustment is asymptotically equivalent to thoseof Barndorff-Nielsen & Cox (1994, 1996) and Vidoni (1998).It has a simpler form with a plug-in estimator of the coverageprobability of the estimative limit at the target value.  相似文献   

9.
Neural responses are known to be variable. In order to understand how this neural variability constrains behavioral performance, we need to be able to measure the reliability with which a sensory stimulus is encoded in a given population. However, such measures are challenging for two reasons: First, they must take into account noise correlations which can have a large influence on reliability. Second, they need to be as efficient as possible, since the number of trials available in a set of neural recording is usually limited by experimental constraints. Traditionally, cross-validated decoding has been used as a reliability measure, but it only provides a lower bound on reliability and underestimates reliability substantially in small datasets. We show that, if the number of trials per condition is larger than the number of neurons, there is an alternative, direct estimate of reliability which consistently leads to smaller errors and is much faster to compute. The superior performance of the direct estimator is evident both for simulated data and for neuronal population recordings from macaque primary visual cortex. Furthermore we propose generalizations of the direct estimator which measure changes in stimulus encoding across conditions and the impact of correlations on encoding and decoding, typically denoted by Ishuffle and Idiag respectively.  相似文献   

10.
Metagenomic sequencing projects from environments dominated by a small number of species produce genome-wide population samples. We present a two-site composite likelihood estimator of the scaled recombination rate, ρ = 2Nec, that operates on metagenomic assemblies in which each sequenced fragment derives from a different individual. This new estimator properly accounts for sequencing error, as quantified by per-base quality scores, and missing data, as inferred from the placement of reads in a metagenomic assembly. We apply our estimator to data from a sludge metagenome project to demonstrate how this method will elucidate the rates of exchange of genetic material in natural microbial populations. Surprisingly, for a fixed amount of sequencing, this estimator has lower variance than similar methods that operate on more traditional population genetic samples of comparable size. In addition, we can infer variation in recombination rate across the genome because metagenomic projects sample genetic diversity genome-wide, not just at particular loci. The method itself makes no assumption specific to microbial populations, opening the door for application to any mixed population sample where the number of individuals sampled is much greater than the number of fragments sequenced.  相似文献   

11.
Assessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction. We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets. For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible. We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

12.
The problem of estimation of ratio of population proportions is considered and a difference-type estimator is proposed using auxiliary information. The bias and mean squared error of the proposed estimator is found and compared to the usual estimator and also to WYNN'S (1976) type estimator. An example is included for illustration.  相似文献   

13.
The asymptotic error rate of the equal-mean, uniform-covariance-matrix classification rule is approximated by a first order asymptotic expansion. The approximation is compared for accuracy with a Monte Carlo simulation. Finally, an estimator of the error rate and an estimator of the variance of the error rate estimator are derived and applied to a classical example.  相似文献   

14.
Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.  相似文献   

15.
Ratio estimation with measurement error in the auxiliary variate   总被引:1,自引:0,他引:1  
Gregoire TG  Salas C 《Biometrics》2009,65(2):590-598
Summary .  With auxiliary information that is well correlated with the primary variable of interest, ratio estimation of the finite population total may be much more efficient than alternative estimators that do not make use of the auxiliary variate. The well-known properties of ratio estimators are perturbed when the auxiliary variate is measured with error. In this contribution we examine the effect of measurement error in the auxiliary variate on the design-based statistical properties of three common ratio estimators. We examine the case of systematic measurement error as well as measurement error that varies according to a fixed distribution. Aside from presenting expressions for the bias and variance of these estimators when they are contaminated with measurement error we provide numerical results based on a specific population. Under systematic measurement error, the biasing effect is asymmetric around zero, and precision may be improved or degraded depending on the magnitude of the error. Under variable measurement error, bias of the conventional ratio-of-means estimator increased slightly with increasing error dispersion, but far less than the increased bias of the conventional mean-of-ratios estimator. In similar fashion, the variance of the mean-of-ratios estimator incurs a greater loss of precision with increasing error dispersion compared with the other estimators we examine. Overall, the ratio-of-means estimator appears to be remarkably resistant to the effects of measurement error in the auxiliary variate.  相似文献   

16.
We analyze a decoupled Moran model with haploid population size N, a biallelic locus under mutation and drift with scaled forward and backward mutation rates θ1=μ1N and θ0=μ0N, and directional selection with scaled strength γ=sN. With small scaled mutation rates θ0 and θ1, which is appropriate for single nucleotide polymorphism data in highly recombining regions, we derive a simple approximate equilibrium distribution for polymorphic alleles with a constant of proportionality. We also put forth an even simpler model, where all mutations originate from monomorphic states. Using this model we derive the sojourn times, conditional on the ancestral and fixed allele, and under equilibrium the distributions of fixed and polymorphic alleles and fixation rates. Furthermore, we also derive the distribution of small samples in the diffusion limit and provide convenient recurrence relations for calculating this distribution. This enables us to give formulas analogous to the Ewens-Watterson estimator of θ for biased mutation rates and selection. We apply this theory to a polymorphism dataset of fourfold degenerate sites in Drosophila melanogaster.  相似文献   

17.
The dominant paradigm for the evolution of mutator alleles in bacterial populations is that they spread by indirect selection for linked beneficial mutations when bacteria are poorly adapted. In this paper, we challenge the ubiquity of this paradigm by demonstrating that a clinically important stressor, hydrogen peroxide, generates direct selection for an elevated mutation rate in the pathogenic bacterium Pseudomonas aeruginosa as a consequence of a trade-off between the fidelity of DNA repair and hydrogen peroxide resistance. We demonstrate that the biochemical mechanism underlying this trade-off in the case of mutS is the elevated secretion of catalase by the mutator strain. Our results provide, to our knowledge, the first experimental evidence that direct selection can favour mutator alleles in bacterial populations, and pave the way for future studies to understand how mutation and DNA repair are linked to stress responses and how this affects the evolution of bacterial mutation rates.  相似文献   

18.
The estimation of relatedness within social groups, such as the colonies of a population of social insects, is an important field for evaluating hypotheses concerning the evolution and maintenance of social behaviour. The methodology of this estimation from genetic data in the absence of pedigree information has been poorly understood; we develop this methodology for b, the regression coefficient of relatedness, and discuss its applications. Both b and G (the pedigree coefficient of relatedness) are potentially asymmetric coefficients, whereas φ, r, and FST are necessarily symmetric. We develop an estimator for b suitable for small samples, and also one for standard deviation, and examine the properties of both using sampling simulations. The b estimator returns values slightly below E(b), and the standard deviation estimator yields conservative confidence intervals. A comparative study of b and FST shows that, given the same set of data, b is estimated with greater reliability than is FST. As is the case for FST, b can be used to examine population structure at various levels, and b possesses the advantage of an estimator for its standard error, which can also be used to test for heterogeneity among the loci surveyed. The actual numbers of identical genes held in common by interacting individuals, and not simply their proportions, need to be considered in using coefficients of relatedness in inclusive fitness calculations. This necessity is handled by the weighted coefficients of relatedness, G′ and b′, which have been referred to in the literature as r (as have most relatedness measures).  相似文献   

19.
We describe a novel mutation in the coding region of theSRY gene in a 46, XY female with Swyer syndrome. Analysis ofSRY was carried out by direct sequencing of a 780-bp PCR product that included theSRY open reading frame (ORF). This revealed the presence of a point mutation, ins 108A, in the coding region 5’ to the HMG box which results in a frame shift and premature termination of the encoded protein. No other mutation was found in theSRY ORF. We infer that sex reversal in this individual is a result of this insertion. In none of the 13 other 46, XY females that were studied was a mutation detected inSRY, confirming earlier findings that most cases of XY femaleness are due to causes other than mutation inSRY. These observations and those of others are discussed in relation to the aetiology of XY sex reversal.  相似文献   

20.
There may be experiments where due to misadventure or logistic or ethical reasons final measurements on all experimental units cannot be obtained. If at least 50% of the final measurements have been taken estimates of the lower quantiles and the median can be obtained. For such curtailed experiments it is shown how quantiles, above those that can be estimated directly from the data set, can be estimated indirectly by exploiting a property of symmetric distributions. The performance of the indirect quantile estimator is compared with that of the direct quantile estimator and conditions for the indirect estimator to have smaller variance than the direct estimator are presented. It is also shown how the indirect estimator may be pooled with the direct estimator to obtain an improved estimate of the upper quantiles. When it cannot be assumed that the data come from a symmetric distribution transformations to symmetry may be performed and the indirect estimation technique used on the transformed data; back transformations then yield the estimates of the upper quantiles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号