首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider the estimation of the scaled mutation parameter θ, which is one of the parameters of key interest in population genetics. We provide a general result showing when estimators of θ can be improved using shrinkage when taking the mean squared error as the measure of performance. As a consequence, we show that Watterson’s estimator is inadmissible, and propose an alternative shrinkage-based estimator that is easy to calculate and has a smaller mean squared error than Watterson’s estimator for all possible parameter values 0<θ<. This estimator is admissible in the class of all linear estimators. We then derive improved versions for other estimators of θ, including the MLE. We also investigate how an improvement can be obtained both when combining information from several independent loci and when explicitly taking into account recombination. A simulation study provides information about the amount of improvement achieved by our alternative estimators.  相似文献   

2.
Working memory is a key executive function for flying an aircraft. This function is particularly critical when pilots have to recall series of air traffic control instructions. However, working memory limitations may jeopardize flight safety. Since the functional near-infrared spectroscopy (fNIRS) method seems promising for assessing working memory load, our objective is to implement an on-line fNIRS-based inference system that integrates two complementary estimators. The first estimator is a real-time state estimation MACD-based algorithm dedicated to identifying the pilot’s instantaneous mental state (not-on-task vs. on-task). It does not require a calibration process to perform its estimation. The second estimator is an on-line SVM-based classifier that is able to discriminate task difficulty (low working memory load vs. high working memory load). These two estimators were tested with 19 pilots who were placed in a realistic flight simulator and were asked to recall air traffic control instructions. We found that the estimated pilot’s mental state matched significantly better than chance with the pilot’s real state (62% global accuracy, 58% specificity, and 72% sensitivity). The second estimator, dedicated to assessing single trial working memory loads, led to 80% classification accuracy, 72% specificity, and 89% sensitivity. These two estimators establish reusable blocks for further fNIRS-based passive brain computer interface development.  相似文献   

3.
Guan Y 《Biometrics》2011,67(3):926-936
Summary We introduce novel regression extrapolation based methods to correct the often large bias in subsampling variance estimation as well as hypothesis testing for spatial point and marked point processes. For variance estimation, our proposed estimators are linear combinations of the usual subsampling variance estimator based on subblock sizes in a continuous interval. We show that they can achieve better rates in mean squared error than the usual subsampling variance estimator. In particular, for n×n observation windows, the optimal rate of n?2 can be achieved if the data have a finite dependence range. For hypothesis testing, we apply the proposed regression extrapolation directly to the test statistics based on different subblock sizes, and therefore avoid the need to conduct bias correction for each element in the covariance matrix used to set up the test statistics. We assess the numerical performance of the proposed methods through simulation, and apply them to analyze a tropical forest data set.  相似文献   

4.
Genetic correlations are frequently estimated from natural and experimental populations, yet many of the statistical properties of estimators of are not known, and accurate methods have not been described for estimating the precision of estimates of Our objective was to assess the statistical properties of multivariate analysis of variance (MANOVA), restricted maximum likelihood (REML), and maximum likelihood (ML) estimators of by simulating bivariate normal samples for the one-way balanced linear model. We estimated probabilities of non-positive definite MANOVA estimates of genetic variance-covariance matrices and biases and variances of MANOVA, REML, and ML estimators of and assessed the accuracy of parametric, jackknife, and bootstrap variance and confidence interval estimators for MANOVA estimates of were normally distributed. REML and ML estimates were normally distributed for but skewed for and 0.9. All of the estimators were biased. The MANOVA estimator was less biased than REML and ML estimators when heritability (H), the number of genotypes (n), and the number of replications (r) were low. The biases were otherwise nearly equal for different estimators and could not be reduced by jackknifing or bootstrapping. The variance of the MANOVA estimator was greater than the variance of the REML or ML estimator for most H, n, and r. Bootstrapping produced estimates of the variance of close to the known variance, especially for REML and ML. The observed coverages of the REML and ML bootstrap interval estimators were consistently close to stated coverages, whereas the observed coverage of the MANOVA bootstrap interval estimator was unsatisfactory for some H, n, and r. The other interval estimators produced unsatisfactory coverages. REML and ML bootstrap interval estimates were narrower than MANOVA bootstrap interval estimates for most H, and r. Received: 6 July 1995 / Accepted: 8 March 1996  相似文献   

5.
Inference about population history from DNA sequence data has become increasingly popular. For human populations, questions about whether a population has been expanding and when expansion began are often the focus of attention. For viral populations, questions about the epidemiological history of a virus, e.g., HIV-1 and Hepatitis C, are often of interest. In this paper I address the following question: Can population history be accurately inferred from single locus DNA data? An idealised world is considered in which the tree relating a sample of n non-recombining and selectively neutral DNA sequences is observed, rather than just the sequences themselves. This approach provides an upper limit to the information that possibly can be extracted from a sample. It is shown, based on Kingman's (1982a) coalescent process, that consistent estimation of parameters describing population history (e.g., a growth rate) cannot be achieved for increasing sample size, n. This is worse than often found for estimators of genetic parameters, e.g., the mutation rate typically converges at rate \(\) under the assumption that all historical mutations can be observed in the sample. In addition, various results for the distribution of maximum likelihood estimators are presented.  相似文献   

6.
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.  相似文献   

7.
We present a Monte-Carlo simulation analysis of the statistical properties of absolute genetic distance and of Nei's minimum and standard genetic distances. The estimation of distances (bias) and of their variances is analysed as well as the distributions of distance and variance estimators, taking into account both gamete and locus samplings. Both of Nei's statistics are non-linear when distances are small and consequently the distributions of their estimators are extremely asymmetrical. It is difficult to find theoretical laws that fit such asymmetrical distributions. Absolute genetic distance is linear and its distributions are better fit by a normal distribution. When distances are medium or large, minimum distance and absolute distance distributions are close to a normal distribution, but those of the standard distance can never be considered as normal. For large distances the jack-knife estimator of the standard distance variance is bad; another standard distance estimator is suggested. Absolute distance, which has the best mathematical properties, is particularly interesting for small distances if the gamete sample size is large, even when the number of loci is small. When both distance and gamete sample size are small, this statistic is biased.  相似文献   

8.
To accurately measure the number of species in a biological community, a complete inventory should be performed, which is generally unfeasible; hopefully, estimators of species richness can help. Our main objectives were (i) to assess the performance of nonparametric estimators of plant species richness with real data from a small set of meadows located in the Basque campiña (northern Spain), and (ii) to apply the best estimator to a larger dataset to test the effects on plant species richness caused by environmental conditions and human practices. Two non-asymptotic and seven asymptotic accumulation functions were fitted to a randomized sample-based rarefaction curve computed with data from three well sampled meadows, and information theoretic methods were used to select the best fitting model; this was the Morgan-Mercer-Flodin, and its asymptote was taken as our best guess of true richness. Then, five nonparametric estimators were computed: ICE, Chao 2, Jackknife 1 and 2, and Bootstrap; MMRuns and MMMeans were also assessed. According to the criteria set for our performance assessment (i.e., bias, precision, and accuracy), the best estimator was Jackknife 1. Finally, Jackknife 1 was applied to assess the effects of terrain slope and soil parent material, and also fertilization, grazing, and mowing, on plant species richness from a larger dataset (20 meadows). Results suggested that grass cutting was causing a loss of richness close to 30%, as compared to unmowed meadows. It is concluded that the use of nonparametric estimators of species richness can improve the evaluation of biodiversity responses to human management practices.  相似文献   

9.
MOTIVATION: Ranking feature sets is a key issue for classification, for instance, phenotype classification based on gene expression. Since ranking is often based on error estimation, and error estimators suffer to differing degrees of imprecision in small-sample settings, it is important to choose a computationally feasible error estimator that yields good feature-set ranking. RESULTS: This paper examines the feature-ranking performance of several kinds of error estimators: resubstitution, cross-validation, bootstrap and bolstered error estimation. It does so for three classification rules: linear discriminant analysis, three-nearest-neighbor classification and classification trees. Two measures of performance are considered. One counts the number of the truly best feature sets appearing among the best feature sets discovered by the error estimator and the other computes the mean absolute error between the top ranks of the truly best feature sets and their ranks as given by the error estimator. Our results indicate that bolstering is superior to bootstrap, and bootstrap is better than cross-validation, for discovering top-performing feature sets for classification when using small samples. A key issue is that bolstered error estimation is tens of times faster than bootstrap, and faster than cross-validation, and is therefore feasible for feature-set ranking when the number of feature sets is extremely large.  相似文献   

10.
Shrinkage Estimators for Covariance Matrices   总被引:1,自引:0,他引:1  
Estimation of covariance matrices in small samples has been studied by many authors. Standard estimators, like the unstructured maximum likelihood estimator (ML) or restricted maximum likelihood (REML) estimator, can be very unstable with the smallest estimated eigenvalues being too small and the largest too big. A standard approach to more stably estimating the matrix in small samples is to compute the ML or REML estimator under some simple structure that involves estimation of fewer parameters, such as compound symmetry or independence. However, these estimators will not be consistent unless the hypothesized structure is correct. If interest focuses on estimation of regression coefficients with correlated (or longitudinal) data, a sandwich estimator of the covariance matrix may be used to provide standard errors for the estimated coefficients that are robust in the sense that they remain consistent under misspecification of the covariance structure. With large matrices, however, the inefficiency of the sandwich estimator becomes worrisome. We consider here two general shrinkage approaches to estimating the covariance matrix and regression coefficients. The first involves shrinking the eigenvalues of the unstructured ML or REML estimator. The second involves shrinking an unstructured estimator toward a structured estimator. For both cases, the data determine the amount of shrinkage. These estimators are consistent and give consistent and asymptotically efficient estimates for regression coefficients. Simulations show the improved operating characteristics of the shrinkage estimators of the covariance matrix and the regression coefficients in finite samples. The final estimator chosen includes a combination of both shrinkage approaches, i.e., shrinking the eigenvalues and then shrinking toward structure. We illustrate our approach on a sleep EEG study that requires estimation of a 24 x 24 covariance matrix and for which inferences on mean parameters critically depend on the covariance estimator chosen. We recommend making inference using a particular shrinkage estimator that provides a reasonable compromise between structured and unstructured estimators.  相似文献   

11.
The quantum hypothesis proposes that a binomial distribution should fit the amplitude distribution for synaptic potentials. Since importance is now being attached to significant changes in the n and p parameters of the binomial model during various treatments of synaptic preparations, this paper describes an important extension of the method of moments which can be used to extract binomial parameters in difficult experimental circumstances. Essentially, the skewness (third moment) of the observed amplitude distribution of synaptic responses is used to provide the additional information needed in cases where spontaneous miniature responses are absent. Computer simulations are used to assess the reliability of the proposed new estimators. The estimator bias due to non-uniform unit responses is also evaluated. Other applications of the extended method of moments, including a new test of the binomial hypothesis, are also described.  相似文献   

12.
This paper deals with Bayes estimation of survival probability when the data are randomly censored. Such a situation arises in case of a clinical trial which extends for a limited period T. A fixed number of patients (n) are observed whose times to death have identical Weibull distribution with parameters β and θ. The maximum times of observation for different patients are also independent uniform variables as the patients arrive randomly throughout the trial. For the joint prior distribution of (β, θ) as suggested by Sinha and Kale (1980, page 137) Bayes estimator of survival probability at time t (0<t<T) has been obtained. Considering squared error loss function it is the mean of the survival probability with respect to the posterior distribution of (β, θ). This estimator is then compared with the maximum likelihood estimator, by simulation, for various values of β, θ and censoring percentage. The proposed estimator is found to be better under certain conditions.  相似文献   

13.
Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate that a popular class of such estimators are biased, and that this bias has an adverse effect on goodness of fit and estimates of substitution rates. We propose a “corrected” empirical estimator that begins with observed nucleotide counts, but accounts for the nucleotide composition of stop codons. We show via simulation that the corrected estimates outperform the de facto standard estimates not just by providing better estimates of the frequencies themselves, but also by leading to improved estimation of other parameters in the evolutionary models. On a curated collection of sequence alignments, our estimators show a significant improvement in goodness of fit compared to the approach. Maximum likelihood estimation of the frequency parameters appears to be warranted in many cases, albeit at a greater computational cost. Our results demonstrate that there is little justification, either statistical or computational, for continued use of the -style estimators.  相似文献   

14.
Jinliang Wang 《Molecular ecology》2016,25(19):4692-4711
In molecular ecology and conservation genetics studies, the important parameter of effective population size (Ne) is increasingly estimated from a single sample of individuals taken at random from a population and genotyped at a number of marker loci. Several estimators are developed, based on the information of linkage disequilibrium (LD), heterozygote excess (HE), molecular coancestry (MC) and sibship frequency (SF) in marker data. The most popular is the LD estimator, because it is more accurate than HE and MC estimators and is simpler to calculate than SF estimator. However, little is known about the accuracy of LD estimator relative to that of SF and about the robustness of all single‐sample estimators when some simplifying assumptions (e.g. random mating, no linkage, no genotyping errors) are violated. This study fills the gaps and uses extensive simulations to compare the biases and accuracies of the four estimators for different population properties (e.g. bottlenecks, nonrandom mating, haplodiploid), marker properties (e.g. linkage, polymorphisms) and sample properties (e.g. numbers of individuals and markers) and to compare the robustness of the four estimators when marker data are imperfect (with allelic dropouts). Extensive simulations show that SF estimator is more accurate, has a much wider application scope (e.g. suitable to nonrandom mating such as selfing, haplodiploid species, dominant markers) and is more robust (e.g. to the presence of linkage and genotyping errors of markers) than the other estimators. An empirical data set from a Yellowstone grizzly bear population was analysed to demonstrate the use of the SF estimator in practice.  相似文献   

15.
Understanding the functional relationship between the sample size and the performance of species richness estimators is necessary to optimize limited sampling resources against estimation error. Nonparametric estimators such as Chao and Jackknife demonstrate strong performances, but consensus is lacking as to which estimator performs better under constrained sampling. We explore a method to improve the estimators under such scenario. The method we propose involves randomly splitting species‐abundance data from a single sample into two equally sized samples, and using an appropriate incidence‐based estimator to estimate richness. To test this method, we assume a lognormal species‐abundance distribution (SAD) with varying coefficients of variation (CV), generate samples using MCMC simulations, and use the expected mean‐squared error as the performance criterion of the estimators. We test this method for Chao, Jackknife, ICE, and ACE estimators. Between abundance‐based estimators with the single sample, and incidence‐based estimators with the split‐in‐two samples, Chao2 performed the best when CV < 0.65, and incidence‐based Jackknife performed the best when CV > 0.65, given that the ratio of sample size to observed species richness is greater than a critical value given by a power function of CV with respect to abundance of the sampled population. The proposed method increases the performance of the estimators substantially and is more effective when more rare species are in an assemblage. We also show that the splitting method works qualitatively similarly well when the SADs are log series, geometric series, and negative binomial. We demonstrate an application of the proposed method by estimating richness of zooplankton communities in samples of ballast water. The proposed splitting method is an alternative to sampling a large number of individuals to increase the accuracy of richness estimations; therefore, it is appropriate for a wide range of resource‐limited sampling scenarios in ecology.  相似文献   

16.
 Multivariate analysis is a branch of statistics that successfully exploits the powerful tools of linear algebra to obtain a fairly comprehensive theory of estimation. The purpose of this paper is to explore to what extent a linear theory of estimation can be developed in the context of coalescent models used in the analysis of DNA polymorphism. We consider a large class of coalescent models, of which the neutral infinite sites model is one example. In the process, we discover several limitations of linear estimators that are quite distinct from those in the classical theory. In particular, we prove that there does not exist a uniformly BLUE (best linear unbiased estimator) for the scaled mutation parameter, under the assumptions of the neutral model of evolution. In fact, we show that no linear estimator performs uniformly better than the Watterson (1975) method based on the total number of segregating sites. For certain coalescent models, the segregating-sites estimator is actually optimal. The general conclusion is the following. If genealogical information is useful for estimating the rate of evolution, then there is no optimal linear method. If there is an optimal linear method, then no information other than the total number of segregating sites is needed. Received: 29 July 1998 / Revised version: 9 October 1998  相似文献   

17.

Background  

Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.  相似文献   

18.

Background

When unaccounted-for group-level characteristics affect an outcome variable, traditional linear regression is inefficient and can be biased. The random- and fixed-effects estimators (RE and FE, respectively) are two competing methods that address these problems. While each estimator controls for otherwise unaccounted-for effects, the two estimators require different assumptions. Health researchers tend to favor RE estimation, while researchers from some other disciplines tend to favor FE estimation. In addition to RE and FE, an alternative method called within-between (WB) was suggested by Mundlak in 1978, although is utilized infrequently.

Methods

We conduct a simulation study to compare RE, FE, and WB estimation across 16,200 scenarios. The scenarios vary in the number of groups, the size of the groups, within-group variation, goodness-of-fit of the model, and the degree to which the model is correctly specified. Estimator preference is determined by lowest mean squared error of the estimated marginal effect and root mean squared error of fitted values.

Results

Although there are scenarios when each estimator is most appropriate, the cases in which traditional RE estimation is preferred are less common. In finite samples, the WB approach outperforms both traditional estimators. The Hausman test guides the practitioner to the estimator with the smallest absolute error only 61% of the time, and in many sample sizes simply applying the WB approach produces smaller absolute errors than following the suggestion of the test.

Conclusions

Specification and estimation should be carefully considered and ultimately guided by the objective of the analysis and characteristics of the data. The WB approach has been underutilized, particularly for inference on marginal effects in small samples. Blindly applying any estimator can lead to bias, inefficiency, and flawed inference.  相似文献   

19.
Barabesi L  Pisani C 《Biometrics》2002,58(3):586-592
In practical ecological sampling studies, a certain design (such as plot sampling or line-intercept sampling) is usually replicated more than once. For each replication, the Horvitz-Thompson estimation of the objective parameter is considered. Finally, an overall estimator is achieved by averaging the single Horvitz-Thompson estimators. Because the design replications are drawn independently and under the same conditions, the overall estimator is simply the sample mean of the Horvitz-Thompson estimators under simple random sampling. This procedure may be wisely improved by using ranked set sampling. Hence, we propose the replicated protocol under ranked set sampling, which gives rise to a more accurate estimation than the replicated protocol under simple random sampling.  相似文献   

20.
For the estimation of population mean in simple random sampling, an efficient regression-type estimator is proposed which is more efficient than the conventional regression estimator and hence than mean per unit estimator, ratio and product estimators and many other estimators proposed by various authors. Some numerical examples are included for illustration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号