首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
L D Mueller 《Biometrics》1979,35(4):757-763
The delta and jackknife methods can be used to estimate Nei's measure of genetic distance and calculate confidence intervals for this estimate. Computer stimulations were used to study the bias and variance of each estimator and the accuracy of the corresponding approximate 95% confidence intervals. The simulations were conducted using 3 sets of data and several sample sizes. The results showed: (1) the jackknife reduced bias; (2) in 8 out of 9 cases the variance and mean square error of the jackknife estimator were less; (3) a second order jackknife reduced the bias the most but suffered a corresponding increase in variance; (4) both the first order jackknife and delta methods yielded intervals whose confidence levels were approximately equal but less than 95%.  相似文献   

2.
Best linear unbiased allele-frequency estimation in complex pedigrees   总被引:4,自引:0,他引:4  
McPeek MS  Wu X  Ober C 《Biometrics》2004,60(2):359-367
Many types of genetic analyses depend on estimates of allele frequencies. We consider the problem of allele-frequency estimation based on data from related individuals. The motivation for this work is data collected on the Hutterites, an isolated founder population, so we focus particularly on the case in which the relationships among the sampled individuals are specified by a large, complex pedigree for which maximum likelihood estimation is impractical. For this case, we propose to use the best linear unbiased estimator (BLUE) of allele frequency. We derive this estimator, which is equivalent to the quasi-likelihood estimator for this problem, and we describe an efficient algorithm for computing the estimate and its variance. We show that our estimator has certain desirable small-sample properties in common with the maximum likelihood estimator (MLE) for this problem. We treat both the case when parental origin of each allele is known and when it is unknown. The results are extended to prediction of allele frequency in some set of individuals S based on genotype data collected on a set of individuals R. We compare the mean-squared error of the BLUE, the commonly used naive estimator (sample frequency) and the MLE when the latter is feasible to calculate. The results indicate that although the MLE performs the best of the three, the BLUE is close in performance to the MLE and is substantially easier to calculate, making it particularly useful for large complex pedigrees in which MLE calculation is impractical or infeasible. We apply our method to allele-frequency estimation in a Hutterite data set.  相似文献   

3.
Important aspects of population evolution have been investigated using nucleotide sequences. Under the neutral Wright–Fisher model, the scaled mutation rate represents twice the average number of new mutations per generations and it is one of the key parameters in population genetics. In this study, we present various methods of estimation of this parameter, analytical studies of their asymptotic behavior as well as comparisons of the distribution's behavior of these estimators through simulations. As knowledge of the genealogy is needed to estimate the maximum likelihood estimator (MLE), an application with real data is also presented, using jackknife to correct the bias of the MLE, which can be generated by the estimation of the tree. We proved analytically that the Waterson's estimator and the MLE are asymptotically equivalent with the same rate of convergence to normality. Furthermore, we showed that the MLE has a better rate of convergence than Waterson's estimator for values of the parameter greater than one and this relationship is reversed when the parameter is less than one.  相似文献   

4.
Hidden Markov models were successfully applied in various fields of time series analysis, especially for analyzing ion channel recordings. The maximum likelihood estimator (MLE) has recently been proven to be asymptotically normally distributed. Here, we investigate finite sample properties of the MLE and of different types of likelihood ratio tests (LRTs) by means of simulation studies. The MLE is shown to reach the asymptotic behavior within sample sizes that are common for various applications. Thus, reliable estimates and confidence intervals can be obtained. We give an approximative scaling function for the estimation error for finite samples, and investigate the power of different LRTs suitable for applications to ion channels, including tests for superimposed hidden Markov processes. Our results are applied to physiological sodium channel data.  相似文献   

5.
Rannala B  Qiu WG  Dykhuizen DE 《Genetics》2000,155(2):499-508
Recent breakthroughs in molecular technology, most significantly the polymerase chain reaction (PCR) and in situ hybridization, have allowed the detection of genetic variation in bacterial communities without prior cultivation. These methods often produce data in the form of the presence or absence of alleles or genotypes, however, rather than counts of alleles. Using relative allele frequencies from presence-absence data as estimates of population allele frequencies tends to underestimate the frequencies of common alleles and overestimate those of rare ones, potentially biasing the results of a test of neutrality in favor of balancing selection. In this study, a maximum-likelihood estimator (MLE) of bacterial allele frequencies designed for use with presence-absence data is derived using an explicit stochastic model of the host infection (or bacterial sampling) process. The performance of the MLE is evaluated using computer simulation and a method is presented for evaluating the fit of estimated allele frequencies to the neutral infinite alleles model (IAM). The methods are applied to estimate allele frequencies at two outer surface protein loci (ospA and ospC) of the Lyme disease spirochete, Borrelia burgdorferi, infecting local populations of deer ticks (Ixodes scapularis) and to test the fit to a neutral IAM.  相似文献   

6.
Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species' relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of the total. We have developed a new Bayesian estimation procedure that quantifies our prior information about these characteristics, yielding a nonlinear shrinkage estimator with efficiency advantages over the MLE. Our prior is mixture of Dirichlets, whereby species are stochastically partitioned into abundant and scarce classes, each with its own multivariate prior. Simulation studies reveal our estimator has lower integrated mean squared error (IMSE) than the MLE for the SAGE scenarios simulated, and yields relative abundance profiles closer in Euclidean distance to the truth for all samples simulated. We apply our method to a SAGE library of normal colon tissue, and discuss its implications for assessing differential expression.  相似文献   

7.
Misclassification of exposure variables is a common problem in epidemiologic studies. This paper compares the matrix method (Barron, 1977, Biometrics 33, 414-418; Greenland, 1988a, Statistics in Medicine 7, 745-757) and the inverse matrix method (Marshall, 1990, Journal of Clinical Epidemiology 43, 941-947) to the maximum likelihood estimator (MLE) that corrects the odds ratio for bias due to a misclassified binary covariate. Under the assumption of differential misclassification, the inverse matrix method is always more efficient than the matrix method; however, the efficiency depends strongly on the values of the sensitivity, specificity, baseline probability of exposure, the odds ratio, case-control ratio, and validation sampling fraction. In a study on sudden infant death syndrome (SIDS), an estimate of the asymptotic relative efficiency (ARE) of the inverse matrix estimate was 0.99, while the matrix method's ARE was 0.19. Under nondifferential misclassification, neither the matrix nor the inverse matrix estimator is uniformly more efficient than the other; the efficiencies again depend on the underlying parameters. In the SIDS data, the MLE was more efficient than the matrix method (ARE = 0.39). In a study investigating the effect of vitamin A intake on the incidence of breast cancer, the MLE was more efficient than the matrix method (ARE = 0.75).  相似文献   

8.
MOTIVATION: The numerical values of gene expression measured using microarrays are usually presented to the biological end-user as summary statistics of spot pixel data, such as the spot mean, median and mode. Much of the subsequent data analysis reported in the literature, however, uses only one of these spot statistics. This results in sub-optimal estimates of gene expression levels and a need for improvement in quantitative spot variation surveillance. RESULTS: This paper develops a maximum-likelihood method for estimating gene expression using spot mean, variance and pixel number values available from typical microarray scanners. It employs a hierarchical model of variation between and within microarray spots. The hierarchical maximum-likelihood estimate (MLE) is shown to be a more efficient estimator of the mean than the 'conventional' estimate using solely the spot mean values (i.e. without spot variance data). Furthermore, under the assumptions of our model, the spot mean and spot variance are shown to be sufficient statistics that do not require the use of all pixel data.The hierarchical MLE method is applied to data from both Monte Carlo (MC) simulations and a two-channel dye-swapped spotted microarray experiment. The MC simulations show that the hierarchical MLE method leads to improved detection of differential gene expression particularly when 'outlier' spots are present on the arrays. Compared with the conventional method, the MLE method applied to data from the microarray experiment leads to an increase in the number of differentially expressed genes detected for low cut-off P-values of interest.  相似文献   

9.
Two methods are commonly employed for evaluating the extent of the uncertainty of evolutionary distances between sequences: either some estimator of the variance of the distance estimator, or the bootstrap method. However, both approaches can be misleading, particularly when the evolutionary distance is small. We propose using another statistical method which does not have the same defect: interval estimation. We show how confidence intervals may be constructed for the Jukes and Cantor (1969) and Kimura two-parameter (1980) estimators. We compare the exact confidence intervals thus obtained with the approximate intervals derived by the two previous methods, using artificial and biological data. The results show that the usual methods clearly underestimate the variability when the substitution rate is low and when sequences are short. Moreover, our analysis suggests that similar results may be expected for other evolutionary distance estimators.   相似文献   

10.
In a comparative clinical trial, if the maximum information is adjusted on the basis of unblinded data, the usual test statistic should be avoided due to possible type I error inflation. An adaptive test can be used as an alternative. The usual point estimate of the treatment effect and the usual confidence interval should also be avoided. In this article, we construct a point estimate and a confidence interval that are motivated by an adaptive test statistic. The estimator is consistent for the treatment effect and the confidence interval asymptotically has correct coverage probability.  相似文献   

11.
《Dendrochronologia》2014,32(4):343-356
A number of processing options associated with the use of a “regional curve” to standardise tree-ring measurements and generate a chronology representing changing tree growth over time are discussed. It is shown that failing to use pith offset estimates can generate a small but systematic chronology error. Where chronologies contain long-timescale signal variance, tree indices created by division of the raw measurements by RCS curve values produce chronologies with a skewed distribution. A simple empirical method of converting tree-indices to have a normal distribution is proposed. The Expressed Population Signal, which is widely used to estimate the statistical confidence of chronologies created using curve-fitting methods of standardisation, is not suitable for use with RCS generated chronologies. An alternative implementation, which takes account of the uncertainty associated with long-timescale as well as short-timescale chronology variance, is proposed. The need to assess the homogeneity of differently-sourced sets of measurement data and their suitability for amalgamation into a single data set for RCS standardisation is discussed. The possible use of multiple growth-rate based RCS curves is considered where a potential gain in chronology confidence must be balanced against the potential loss of long-timescale variance. An approach to the use of the “signal-free” method for generating artificial measurement series with the ‘noise’ characteristics of real data series but with a known chronology signal applied for testing standardisation performance is also described.  相似文献   

12.
Freeing phylogenies from artifacts of alignment.   总被引:1,自引:0,他引:1  
Widely used methods for phylogenetic inference, both those that require and those that produce alignments, share certain weaknesses. These weaknesses are discussed, and a method that lacks them is introduced. For each pair of sequences in the data set, the method utilizes both insertion-deletion and amino acid replacement information to estimate a pairwise evolutionary distance. It is also possible to allow regional heterogeneity of replacement rates. Because a likelihood framework is adopted, the standard deviation of each pairwise distance can be estimated. The distance matrix and standard error estimates are used to infer a phylogenetic tree. As an example, this method is used on 10 widely diverged sequences of the second largest RNA polymerase subunit. A pseudo-bootstrap technique is devised to assess the validity of the inferred phylogenetic tree.  相似文献   

13.
Anderson EC 《Genetics》2005,170(2):955-967
This article presents an efficient importance-sampling method for computing the likelihood of the effective size of a population under the coalescent model of Berthier et al. Previous computational approaches, using Markov chain Monte Carlo, required many minutes to several hours to analyze small data sets. The approach presented here is orders of magnitude faster and can provide an approximation to the likelihood curve, even for large data sets, in a matter of seconds. Additionally, confidence intervals on the estimated likelihood curve provide a useful estimate of the Monte Carlo error. Simulations show the importance sampling to be stable across a wide range of scenarios and show that the N(e) estimator itself performs well. Further simulations show that the 95% confidence intervals around the N(e) estimate are accurate. User-friendly software implementing the algorithm for Mac, Windows, and Unix/Linux is available for download. Applications of this computational framework to other problems are discussed.  相似文献   

14.
Melville and Welsh (2001, Biometrics 57, 1130-1137) consider an approach to line transect sampling using a separate calibration study to estimate the detection function g. They present a simulation study contrasting their results with poor results from a traditional estimator, labeled the "Buckland" estimator and referenced to Buckland et al. (1993, Distance Sampling: Estimating Abundance of Biological populations). The poor results from the "Buckland" estimator can be explained by the following observations: (i) the estimator is designated for untruncated distance data, but was applied by Melville and Welsh to truncated distance data; (ii) distance data were not pooled across transects, contrary to standard practice; and (iii) bias of the estimator was evaluated with respect to a fixed rather than a randomized grid of transect lines. We elaborate on the points above and show that the traditional methods perform to expectation when applied correctly. We also emphasize that the estimator labeled the "Buckland" estimator by Melville and Welsh is not an estimator recommended by Buckland et al. for practical survey applications.  相似文献   

15.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

16.
Reduction of bias in estimating the frequency of recessive genes.   总被引:3,自引:2,他引:1       下载免费PDF全文
The standard approach to estimating the frequency of a completely recessive autosomal gene is to use the maximum-likelihood estimator (MLE), q = square root q2. Since the expectation oof Q using MLE is systematically less than the true value, this estimator always gives a negatively biased estimate of q. Here we describe the bias associated the MLE over a range of q and N values, explore some of the properties of this estimator, and propose new estimators which reduce the bias. We also describe some of the new estimators' properties, as well as the remaining bias associated with them for varying q and N values. We further propose one of these estimators as the one which most effectively reduces bias over a specific q value range of approximately .005 to .05, and which is less biased than JLE over essentially all q and N values. The proposed estimator also is directly compared with MLE in calculating various available estimates of q, demonstrating the percentage of reduction in bias achieved. This reduction varies from negligible for estimates of q above .3 and N greater than 100, to a 23% reduction in bias for a q value of .09 and an N value of 215.  相似文献   

17.
Chen SX  Cowling A 《Biometrics》2001,57(3):732-742
When using bivariate line transect methods to estimate the biomass density of a tightly clustered biological population, it is generally assumed that both the perpendicular distance from the trackline to the cluster and the cluster size, or biomass, are measured without error. This is unlikely to be the case in practice. In this article, assuming additive mean zero errors in distance and multiplicative errors in size, we develop an estimator of density that corrects for these errors. We use the method of moments for the case of gamma cluster size, randomly placed transect lines, and the generalized exponential detection function. We derive results that show that it may not be necessary to correct for errors in distance or size when the distance and size estimates are not biased. When the size estimates are biased, the biomass density estimate has approximately the same bias as the size estimates. The work is illustrated in the context of annual aerial surveys for juvenile southern bluefin tuna in the Great Australian Bight.  相似文献   

18.
If animals are independently detected during surveys, many methods exist for estimating animal abundance despite detection probabilities <1. Common estimators include double‐observer models, distance sampling models and combined double‐observer and distance sampling models (known as mark‐recapture‐distance‐sampling models; MRDS). When animals reside in groups, however, the assumption of independent detection is violated. In this case, the standard approach is to account for imperfect detection of groups, while assuming that individuals within groups are detected perfectly. However, this assumption is often unsupported. We introduce an abundance estimator for grouped animals when detection of groups is imperfect and group size may be under‐counted, but not over‐counted. The estimator combines an MRDS model with an N‐mixture model to account for imperfect detection of individuals. The new MRDS‐Nmix model requires the same data as an MRDS model (independent detection histories, an estimate of distance to transect, and an estimate of group size), plus a second estimate of group size provided by the second observer. We extend the model to situations in which detection of individuals within groups declines with distance. We simulated 12 data sets and used Bayesian methods to compare the performance of the new MRDS‐Nmix model to an MRDS model. Abundance estimates generated by the MRDS‐Nmix model exhibited minimal bias and nominal coverage levels. In contrast, MRDS abundance estimates were biased low and exhibited poor coverage. Many species of conservation interest reside in groups and could benefit from an estimator that better accounts for imperfect detection. Furthermore, the ability to relax the assumption of perfect detection of individuals within detected groups may allow surveyors to re‐allocate resources toward detection of new groups instead of extensive surveys of known groups. We believe the proposed estimator is feasible because the only additional field data required are a second estimate of group size.  相似文献   

19.
20.
Many different methods for evaluating diagnostic test results in the absence of a gold standard have been proposed. In this paper, we discuss how one common method, a maximum likelihood estimate for a latent class model found via the Expectation-Maximization (EM) algorithm can be applied to longitudinal data where test sensitivity changes over time. We also propose two simplified and nonparametric methods which use data-based indicator variables for disease status and compare their accuracy to the maximum likelihood estimation (MLE) results. We find that with high specificity tests, the performance of simpler approximations may be just as high as the MLE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号