首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper we study some methods to detect biased samples and to test what is the bias. These methods can be also used to obtain parametric tests for the original model. We pay special attention to the case of length (size) biased samples. We apply these results to several real and simulated samples.  相似文献   

2.
In the capture‐recapture problem for two independent samples, the traditional estimator, calculated as the product of the two sample sizes divided by the number of sampled subjects appearing commonly in both samples, is well known to be a biased estimator of the population size and have no finite variance under direct or binomial sampling. To alleviate these theoretical limitations, the inverse sampling, in which we continue sampling subjects in the second sample until we obtain a desired number of marked subjects who appeared in the first sample, has been proposed elsewhere. In this paper, we consider five interval estimators of the population size, including the most commonly‐used interval estimator using Wald's statistic, the interval estimator using the logarithmic transformation, the interval estimator derived from a quadratic equation developed here, the interval estimator using the χ2‐approximation, and the interval estimator based on the exact negative binomial distribution. To evaluate and compare the finite sample performance of these estimators, we employ Monte Carlo simulation to calculate the coverage probability and the standardized average length of the resulting confidence intervals in a variety of situations. To study the location of these interval estimators, we calculate the non‐coverage probability in the two tails of the confidence intervals. Finally, we briefly discuss the optimal sample size determination for a given precision to minimize the expected total cost. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
Inferring admixture proportions from molecular data   总被引:19,自引:2,他引:17  
We derive here two new estimators of admixture proportions based on a coalescent approach that explicitly takes into account molecular information as well as gene frequencies. These estimators can be applied to any type of molecular data (such as DNA sequences, restriction fragment length polymorphisms [RFLPs], or microsatellite data) for which the extent of molecular diversity is related to coalescent times. Monte Carlo simulation studies are used to analyze the behavior of our estimators. We show that one of them (mY) appears suitable for estimating admixture from molecular data because of its absence of bias and relatively low variance. We then compare it to two conventional estimators that are based on gene frequencies. mY proves to be less biased than conventional estimators over a wide range of situations and especially for microsatellite data. However, its variance is larger than that of conventional estimators when parental populations are not very differentiated. The variance of mY becomes smaller than that of conventional estimators only if parental populations have been kept separated for about N generations and if the mutation rate is high. Simulations also show that several loci should always be studied to achieve a drastic reduction of variance and that, for microsatellite data, the mean square error of mY rapidly becomes smaller than that of conventional estimators if enough loci are surveyed. We apply our new estimator to the case of admixed wolflike Canid populations tested for microsatellite data.   相似文献   

4.
The classic Jaccard and Sørensen indices of compositional similarity (and other indices that depend upon the same variables) are notoriously sensitive to sample size, especially for assemblages with numerous rare species. Further, because these indices are based solely on presence–absence data, accurate estimators for them are unattainable. We provide a probabilistic derivation for the classic, incidence‐based forms of these indices and extend this approach to formulate new Jaccard‐type or Sørensen‐type indices based on species abundance data. We then propose estimators for these indices that include the effect of unseen shared species, based on either (replicated) incidence‐ or abundance‐based sample data. In sampling simulations, these new estimators prove to be considerably less biased than classic indices when a substantial proportion of species are missing from samples. Based on species‐rich empirical datasets, we show how incorporating the effect of unseen shared species not only increases accuracy but also can change the interpretation of results.  相似文献   

5.
Wang J 《Molecular ecology》2004,13(10):3169-3178
Knowledge of the genetic relatedness between a pair of individuals is important in many research areas of quantitative genetics, conservation genetics, evolution and ecology. Many estimators have been developed to estimate such pairwise relatedness (r) using codominant markers, such as microsatellites and enzymes. In contrast, only two estimators are proposed to use dominant markers, such as random amplified polymorphic DNAs (RAPDs) and amplified fragment length polymorphisms (AFLPs), in relatedness inference. They are both biased estimators, and their statistical properties and robustness to the sampling errors in allele frequency have not been investigated. In this short paper, I propose two new pairwise relatedness estimators for dominant markers, and compare them in precision, accuracy and robustness to sampling with the two previous estimators using simulations. It was found that the new estimator based on the least squares approach is unbiased when allele frequencies are known or estimated from a sample without correcting for sampling effects. It has, however, a low precision and as a result, an intermediate overall performance among the four estimators in terms of the mean squared deviation (MSD) of estimates from actual values of r. The new estimator based on a similarity index is slightly biased but has generally the lowest MSD among the four estimators compared, regardless of the number of loci, type of actual relationships, allele frequencies known or estimated from samples. Simulations also show that the confidence intervals estimated by bootstrapping are appropriate for different estimators provided that the number of loci used in the estimation is not small.  相似文献   

6.
Ecological surveys provide the basic information needed to estimate differences in species richness among assemblages. Comparable estimates of the differences in richness between assemblages require equal mean species detectabilities across assemblages. However, mean species detectabilities are often unknown, typically low, and potentially different from one assemblage to another. As a result, inferences regarding differences in species richness among assemblages can be biased. We evaluated how well three methods used to produce comparable estimates of species richness achieved equal mean species detectabilities across diverse assemblages: rarefaction, statistical estimators, and standardization of sampling effort on mean taxonomic similarity among replicate samples (MRS). We used simulated assemblages to mimic a wide range of species-occurrence distributions and species richness to compare the performance of these three methods. Inferences regarding differences in species richness based on rarefaction were highly biased when richness estimates were compared among assemblages with distinctly different species-occurrence distributions. Statistical estimators only marginally reduced this bias. Standardization on MRS yielded the most comparable estimates of differences in species richness. These findings have important implications for our understanding of species-richness patterns, inferences drawn from biological monitoring data, and planning for biodiversity conservation.  相似文献   

7.
Jing Qin  Yu Shen 《Biometrics》2010,66(2):382-392
Summary Length‐biased time‐to‐event data are commonly encountered in applications ranging from epidemiological cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length‐biased data. In this article, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length‐biased sampling in general. Although the existing partial likelihood approach for left‐truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length‐biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimators. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S‐PLUS or R .  相似文献   

8.
We present in this paper a simple method for estimating the mutation rate per site per year which also yields an estimate of the length of a generation when mutation rate per site per generation is known. The estimator, which takes advantage of DNA polymorphisms in longitudinal samples, is unbiased under a number of population models, including population structure and variable population size over time. We apply the new method to a longitudinal sample of DNA sequences of the env gene of human immunodeficiency virus type 1 (HIV-1) from a single patient and obtain 1.62 x 10(-2) as the mutation rate per site per year for HIV-1. Using an independent data set to estimate the mutation rate per generation, we obtain 1.8 days as the length of a generation of HIV-1, which agrees well with recent estimates based on viral load data. Our estimate of generation time differs considerably from a recent estimate by Rodrigo et al. when the same mutation rate per site per generation is used. Some factors that may contribute to the difference among different estimators are discussed.  相似文献   

9.
Achaz G 《Genetics》2008,179(3):1409-1424
Many data sets one could use for population genetics contain artifactual sites, i.e., sequencing errors. Here, we first explore the impact of such errors on several common summary statistics, assuming that sequencing errors are mostly singletons. We thus show that in the presence of those errors, estimators of can be strongly biased. We further show that even with a moderate number of sequencing errors, neutrality tests based on the frequency spectrum reject neutrality. This implies that analyses of data sets with such errors will systematically lead to wrong inferences of evolutionary scenarios. To avoid to these errors, we propose two new estimators of theta that ignore singletons as well as two new tests Y and Y* that can be used to test neutrality despite sequencing errors. All in all, we show that even though singletons are ignored, these new tests show some power to detect deviations from a standard neutral model. We therefore advise the use of these new tests to strengthen conclusions in suspicious data sets.  相似文献   

10.
T R Fears  C C Brown 《Biometrics》1986,42(4):955-960
There are a number of possible designs for case-control studies. The simplest uses two separate simple random samples, but an actual study may use more complex sampling procedures. Typically, stratification is used to control for the effects of one or more risk factors in which we are interested. It has been shown (Anderson, 1972, Biometrika 59, 19-35; Prentice and Pyke, 1979, Biometrika 66, 403-411) that the unconditional logistic regression estimators apply under stratified sampling, so long as the logistic model includes a term for each stratum. We consider the case-control problem with stratified samples and assume a logistic model that does not include terms for strata, i.e., for fixed covariates the (prospective) probability of disease does not depend on stratum. We assume knowledge of the proportion sampled in each stratum as well as the total number in the stratum. We use this knowledge to obtain the maximum likelihood estimators for all parameters in the logistic model including those for variables completely associated with strata. The approach may also be applied to obtain estimators under probability sampling.  相似文献   

11.
Studies of genetics and ecology often require estimates of relatedness coefficients based on genetic marker data. However, with the presence of null alleles, an observed genotype can represent one of several possible true genotypes. This results in biased estimates of relatedness. As the numbers of marker loci are often limited, loci with null alleles cannot be abandoned without substantial loss of statistical power. Here, we show how loci with null alleles can be incorporated into six estimators of relatedness (two novel). We evaluate the performance of various estimators before and after correction for null alleles. If the frequency of a null allele is <0.1, some estimators can be used directly without adjustment; if it is >0.5, the potency of estimation is too low and such a locus should be excluded. We make available a software package entitled PolyRelatedness v1.6, which enables researchers to optimize these estimators to best fit a particular data set.  相似文献   

12.
Bogdan M  Doerge RW 《Heredity》2005,95(6):476-484
In many empirical studies, it has been observed that genome scans yield biased estimates of heritability, as well as genetic effects. It is widely accepted that quantitative trait locus (QTL) mapping is a model selection procedure, and that the overestimation of genetic effects is the result of using the same data for model selection as estimation of parameters. There are two key steps in QTL modeling, each of which biases the estimation of genetic effects. First, test procedures are employed to select the regions of the genome for which there is significant evidence for the presence of QTL. Second, and most important for this demonstration, estimates of the genetic effects are reported only at the locations for which the evidence is maximal. We demonstrate that even when we know there is just one QTL present (ignoring the testing bias), and we use interval mapping to estimate its location and effect, the estimator of the effect will be biased. As evidence, we present results of simulations investigating the relative importance of the two sources of bias and the dependence of bias of heritability estimators on the true QTL heritability, sample size, and the length of the investigated part of the genome. Moreover, we present results of simulations demonstrating the skewness of the distribution of estimators of QTL locations and the resulting bias in estimation of location. We use computer simulations to investigate the dependence of this bias on the true QTL location, heritability, and the sample size.  相似文献   

13.
The control of natural variation in cytosine methylation in Arabidopsis   总被引:1,自引:0,他引:1  
Riddle NC  Richards EJ 《Genetics》2002,161(1):355-363
The distance of pollen movement is an important determinant of the neighborhood area of plant populations. In earlier studies, we designed a method for estimating the distance of pollen dispersal, on the basis of the analysis of the differentiation among the pollen clouds of a sample of females, spaced across the landscape. The method was based solely on an estimate of the global level of differentiation among the pollen clouds of the total array of sampled females. Here, we develop novel estimators, on the basis of the divergence of pollen clouds for all pairs of females, assuming that an independent estimate of adult population density is available. A simulation study shows that the estimators are all slightly biased, but that most have enough precision to be useful, at least with adequate sample sizes. We show that one of the novel pairwise methods provides estimates that are slightly better than the best global estimate, especially when the markers used have low exclusion probability. The new method can also be generalized to the case where there is no prior information on the density of reproductive adults. In that case, we can jointly estimate the density itself and the pollen dispersal distance, given sufficient sample sizes. The bias of this last estimator is larger and the precision is lower than for those estimates based on independent estimates of density, but the estimate is of some interest, because a meaningful independent estimate of the density of reproducing individuals is difficult to obtain in most cases.  相似文献   

14.
Measurement of temporal change in allele frequencies represents an indirect method for estimating the genetically effective size of populations. When allele frequencies are estimated for gene markers that display dominant gene expression, such as, e.g. random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) markers, the estimates can be seriously biased. We quantify bias for previous allele frequency estimators and present a new expression that is generally less biased and provides a more precise assessment of temporal allele frequency change. We further develop an estimator for effective population size that is appropriate when dealing with dominant gene markers. Comparison with estimates based on codominantly expressed genes, such as allozymes or microsatellites, indicates that about twice as many loci or sampled individuals are required when using dominant markers to achieve the same precision.  相似文献   

15.
There has been limited attention to estimating maternity rate because it appears to be relatively simple. However, when used for multi-annual breeder species, such as the largest carnivores, the most common estimators introduce an upward bias by excluding unproductive females. Using a simulated dataset based on published data, we compare the accuracy of maternity estimates derived from standard methods against estimates derived from an alternative method. We show that standard methods overestimate maternity rates in the presence of unsuccessful pregnancies. Importantly, population growth rates derived from a matrix model parameterized with the biased estimates may indicate increasing populations although the populations are stable or even declining. We recommend the abandonment of the biased standard methods and to instead use the unbiased alternative method for population projections and assessments of population viability.  相似文献   

16.
We consider multiple testing with false discovery rate (FDR) control when p values have discrete and heterogeneous null distributions. We propose a new estimator of the proportion of true null hypotheses and demonstrate that it is less upwardly biased than Storey's estimator and two other estimators. The new estimator induces two adaptive procedures, that is, an adaptive Benjamini–Hochberg (BH) procedure and an adaptive Benjamini–Hochberg–Heyse (BHH) procedure. We prove that the adaptive BH (aBH) procedure is conservative nonasymptotically. Through simulation studies, we show that these procedures are usually more powerful than their nonadaptive counterparts and that the adaptive BHH procedure is usually more powerful than the aBH procedure and a procedure based on randomized p‐value. The adaptive procedures are applied to a study of HIV vaccine efficacy, where they identify more differentially polymorphic positions than the BH procedure at the same FDR level.  相似文献   

17.
Since estimates of total species richness increase with sampling effort, methods to control for this sampling effect need to be tested and used. We present seven non-parametric and 12 accumulation curve methods that have been used recently in the ecological literature. To test their performance, we used data from bird communities in the Queen Charlotte Islands, Canada. The performance of each method was evaluated by calculating the bias and precision of its estimates against the known total species richness. For our data set, the two Chao estimators were the overall least biased and most precise estimation methods, followed by the two jackknife estimators, thus supporting results of previous studies. Nonparametric estimators tended to perform better than accumulation curve models. Most estimation methods had the problem that they tended to underestimate species richness for early samples, but slightly overestimated it for late samples. We briefly discuss the practical use of these methods which may greatly increase our ability to answer ecological questions and to guide conservation decisions, especially for species-rich tropical bird communities.  相似文献   

18.
BACKGROUND: The ratio of two measured fluorescence signals (called x and y) is used in different applications in fluorescence microscopy. Multiple instances of both signals can be combined in different ways to construct different ratio estimators. METHODS: The mean and variance of three estimators for the ratio between two random variables, x and y, are discussed. Given n samples of x and y, we can intuitively construct two different estimators: the mean of the ratio of each x and y and the ratio between the mean of x and the mean of y. The former is biased and the latter is only asymptotically unbiased. Using the statistical characteristics of this estimator, a third, unbiased estimator can be constructed. RESULTS: We tested the three estimators on simulated data, real-world fluorescence test images, and comparative genome hybridization (CGH) data. The results on the simulated and real-world test images confirm the presented theory. The CGH experiments show that our new estimator performs better than the existing estimators. CONCLUSIONS: We have derived an unbiased ratio estimator that outperforms intuitive ratio estimators.  相似文献   

19.
The present study demonstrates the possibility of estimating species numbers of animal or plant communities from samples using relative abundance distributions. We use log‐abundance–species‐rank order plots and derive two new estimators that are based on log‐series and lognormal distributions. At small to moderate sample sizes these estimators appear to be more precise than previous parametric and nonparametric estimators. We test our estimators using samples from 171 published medium‐sized to large animal and plant communities taken from the literature. By this we show that our new estimators define also limits of precision.  相似文献   

20.
Abstract Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号