首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Comparisons of species richness among assemblages using different sample sizes may produce erroneous conclusions due to the strong positive relationship between richness and sample size. A current way of handling the problem is to standardize sample sizes to the size of the smallest sample in the study. A major criticism about this approach is the loss of information contained in the larger samples. A potential way of solving the problem is to apply extrapolation techniques to smaller samples, and produce an estimated species richness expected to occur if sample size were increased to the same size of the largest sample. We evaluated the reliability of 11 potential extrapolation methods over a range of different data sets and magnitudes of extrapolation. The basic approach adopted in the evaluation process was a comparison between the observed richness in a sample and the estimated richness produced by estimators using a sub-sample of the same sample. The Log-Series estimator was the most robust for the range of data sets and sub-sample sizes used, followed closely by Negative Binomial, SO-J1, Logarithmic, Stout and Vandermeer, and Weibull estimators. When applied to a set of independently replicated samples from a species-rich assemblage, 95% confidence intervals of estimates produced by the six best evaluated methods were comparable to those of observed richness in the samples. Performance of estimators tended to be better for species-rich data sets rather than for those which contained few species. Good estimates were found when extrapolating up to 1.8-2.0 times the size of the sample. We suggest that the use of the best evaluated methods within the range of indicated conditions provides a safe solution to the problem of losing information when standardizing different sample sizes to the size of the smallest sample.  相似文献   

2.
Current methods for estimating sample size are not appropriate in situations where a minimum detectable difference cannot be specified a priori. For these cases, a method is proposed which incorporates resolving power as a primary factor and expended effort (feasibility) as a secondary factor. Trade-offs between resolving power and expended effort are evaluated over a range of sample sizes. The Se is used as a measure of resolving power and the greatest limiting factor on sample size is used as a measure of sample effort. Techniques for obtaining a range of sample sizes from a single large collection or from a preliminary experiment also are given.  相似文献   

3.
Aim Techniques that predict species potential distributions by combining observed occurrence records with environmental variables show much potential for application across a range of biogeographical analyses. Some of the most promising applications relate to species for which occurrence records are scarce, due to cryptic habits, locally restricted distributions or low sampling effort. However, the minimum sample sizes required to yield useful predictions remain difficult to determine. Here we developed and tested a novel jackknife validation approach to assess the ability to predict species occurrence when fewer than 25 occurrence records are available. Location Madagascar. Methods Models were developed and evaluated for 13 species of secretive leaf‐tailed geckos (Uroplatus spp.) that are endemic to Madagascar, for which available sample sizes range from 4 to 23 occurrence localities (at 1 km2 grid resolution). Predictions were based on 20 environmental data layers and were generated using two modelling approaches: a method based on the principle of maximum entropy (Maxent) and a genetic algorithm (GARP). Results We found high success rates and statistical significance in jackknife tests with sample sizes as low as five when the Maxent model was applied. Results for GARP at very low sample sizes (less than c. 10) were less good. When sample sizes were experimentally reduced for those species with the most records, variability among predictions using different combinations of localities demonstrated that models were greatly influenced by exactly which observations were included. Main conclusions We emphasize that models developed using this approach with small sample sizes should be interpreted as identifying regions that have similar environmental conditions to where the species is known to occur, and not as predicting actual limits to the range of a species. The jackknife validation approach proposed here enables assessment of the predictive ability of models built using very small sample sizes, although use of this test with larger sample sizes may lead to overoptimistic estimates of predictive power. Our analyses demonstrate that geographical predictions developed from small numbers of occurrence records may be of great value, for example in targeting field surveys to accelerate the discovery of unknown populations and species.  相似文献   

4.
Basing on the approach by McLachlan (1977) a procedure for the conditional and common error estimation of the classification error in discriminance analysis is described for k ≧ 2 classes. As a rapid procedure for large sample sizes and feature numbers, a modification of the resubstitution method is proposed being favourable with respect to computing time. Both methods provide useful estimations for the probability of misclassification. In calculating the weighting function w, deviations from preconditions known from the MANOVA such as the skewness, the truncation or the inequality of the covariance matrices, hardly play any role; it appears that only a variation of the sample sizes of the classes substantially influences the weighting functions. The error rates of the tested error estimation methods likewise in effect depend on the sample sizes of the classes. Violations of the mentioned preconditions in the form described above result in different variations of the error estimates, depending on these sample sizes. A comparison between error estimation and allocation relative to a simulated population demonstrates the goodness of the used error estimation procedures.  相似文献   

5.
Noninvasive sampling, of faeces and hair for example, has enabled many genetic studies of wildlife populations. However, two prevailing problems common to these studies are small sample sizes and high genotyping errors. The first problem stems from the difficulty in collecting noninvasive samples, particularly from populations of rare or elusive species, and the second is caused by the low quantity and quality of DNA extracted from a noninvasive sample. A common question is therefore whether noninvasive sampling provides sufficient information for the analyses commonly conducted in conservation genetics studies. Here, we conducted a simulation study to investigate the effect of small sample sizes and genotyping errors on the precision and accuracy of the most commonly estimated genetic parameters. Our results indicate that small sample sizes cause little bias in measures of expected heterozygosity, pairwise FST and population structure, but a large downward bias in estimates of allelic diversity. Allelic dropouts and false alleles had a much smaller effect than missing data, which effectively reduces sample size further. Overall, reasonable estimates of genetic variation and population subdivision are obtainable from noninvasive samples as long as error rates are kept below a frequency of 0.2. Similarly, unbiased estimates of population clustering can be made with genotyping error rates below 0.5 when the populations are highly differentiated. These results provide a useful guide for researchers faced with studying the conservation genetics of small, endangered populations from noninvasive samples.  相似文献   

6.
Determining sample sizes for microarray experiments is important but the complexity of these experiments, and the large amounts of data they produce, can make the sample size issue seem daunting, and tempt researchers to use rules of thumb in place of formal calculations based on the goals of the experiment. Here we present formulae for determining sample sizes to achieve a variety of experimental goals, including class comparison and the development of prognostic markers. Results are derived which describe the impact of pooling, technical replicates and dye-swap arrays on sample size requirements. These results are shown to depend on the relative sizes of different sources of variability. A variety of common types of experimental situations and designs used with single-label and dual-label microarrays are considered. We discuss procedures for controlling the false discovery rate. Our calculations are based on relatively simple yet realistic statistical models for the data, and provide straightforward sample size calculation formulae.  相似文献   

7.
Effects of sample size on the performance of species distribution models   总被引:8,自引:0,他引:8  
A wide range of modelling algorithms is used by ecologists, conservation practitioners, and others to predict species ranges from point locality data. Unfortunately, the amount of data available is limited for many taxa and regions, making it essential to quantify the sensitivity of these algorithms to sample size. This is the first study to address this need by rigorously evaluating a broad suite of algorithms with independent presence–absence data from multiple species and regions. We evaluated predictions from 12 algorithms for 46 species (from six different regions of the world) at three sample sizes (100, 30, and 10 records). We used data from natural history collections to run the models, and evaluated the quality of model predictions with area under the receiver operating characteristic curve (AUC). With decreasing sample size, model accuracy decreased and variability increased across species and between models. Novel modelling methods that incorporate both interactions between predictor variables and complex response shapes (i.e. GBM, MARS-INT, BRUTO) performed better than most methods at large sample sizes but not at the smallest sample sizes. Other algorithms were much less sensitive to sample size, including an algorithm based on maximum entropy (MAXENT) that had among the best predictive power across all sample sizes. Relative to other algorithms, a distance metric algorithm (DOMAIN) and a genetic algorithm (OM-GARP) had intermediate performance at the largest sample size and among the best performance at the lowest sample size. No algorithm predicted consistently well with small sample size ( n  < 30) and this should encourage highly conservative use of predictions based on small sample size and restrict their use to exploratory modelling.  相似文献   

8.
Community ecologists commonly perform multivariate techniques (e.g., ordination, cluster analysis) to assess patterns and gradients of taxonomic variation. A critical requirement for a meaningful statistical analysis is accurate information on the taxa found within an ecological sample. However, oversampling (too many individuals counted per sample) also comes at a cost, particularly for ecological systems in which identification and quantification is substantially more resource consuming than the field expedition itself. In such systems, an increasingly larger sample size will eventually result in diminishing returns in improving any pattern or gradient revealed by the data, but will also lead to continually increasing costs. Here, we examine 396 datasets: 44 previously published and 352 created datasets. Using meta-analytic and simulation-based approaches, the research within the present paper seeks (1) to determine minimal sample sizes required to produce robust multivariate statistical results when conducting abundance-based, community ecology research. Furthermore, we seek (2) to determine the dataset parameters (i.e., evenness, number of taxa, number of samples) that require larger sample sizes, regardless of resource availability. We found that in the 44 previously published and the 220 created datasets with randomly chosen abundances, a conservative estimate of a sample size of 58 produced the same multivariate results as all larger sample sizes. However, this minimal number varies as a function of evenness, where increased evenness resulted in increased minimal sample sizes. Sample sizes as small as 58 individuals are sufficient for a broad range of multivariate abundance-based research. In cases when resource availability is the limiting factor for conducting a project (e.g., small university, time to conduct the research project), statistically viable results can still be obtained with less of an investment.  相似文献   

9.
In microarray studies it is common that the number of replications (i.e. the sample size) is small and that the distribution of expression values differs from normality. In this situation, permutation and bootstrap tests may be appropriate for the identification of differentially expressed genes. However, unlike bootstrap tests, permutation tests are not suitable for very small sample sizes, such as three per group. A variety of different bootstrap tests exists. For example, it is possible to adjust the data to have a common mean before the bootstrap samples are drawn. For small significance levels, which can occur when a large number of genes is investigated, the original bootstrap test, as well as a bootstrap test suggested for the Behrens-Fisher problem, have no power in cases of very small sample sizes. In contrast, the modified test based on adjusted data is powerful. Using a Monte Carlo simulation study, we demonstrate that the difference in power can be huge. In addition, the different tests are illustrated using microarray data.  相似文献   

10.
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.  相似文献   

11.
J M Nam 《Biometrics》1987,43(3):701-705
A simple approximate formula for sample sizes for detecting a linear trend in proportions is derived. The formulas for both the uncorrected and corrected Cochran-Armitage test are given. For two binomial proportions these reduce to those given by Casagrande, Pike, and Smith (1978, Biometrics 34, 483-486). Some numerical results of a power study for small sample sizes show that the nominal power corresponding to the approximate sample size is a reasonably good approximation to the actual power.  相似文献   

12.
A new method for comparison of interpopulation genetic distances is proposed. The method allows a more precise location of populations in the space of genetic characters relative to one another at considerably different sample sizes. The method consists in multiple reduction of the sample sizes to the size of the smallest sample followed by averaging the calculated genetic distances between the reduced samples. Software for calculation of genetic distances by this method is presented.  相似文献   

13.
Several asymptotic tests were proposed for testing the null hypothesis of marginal homogeneity in square contingency tables with r categories. A simulation study was performed for comparing the power of four finite conservative conditional test procedures and of two asymptotic tests for twelve different contingency schemes for small sample sizes. While an asymptotic test proposed by STUART (1955) showed a rather satisfactory behaviour for moderate sample sizes, an asymptotic test proposed by BHAPKAR (1966) was quite anticonservative. With no a priori information the performance of (r - 1) simultaneous conditional binomial tests with a Bonferroni adjustment proved to be a quite efficient procedure. With assumptions about where to expect the deviations from the null hypothesis, other procedures favouring the larger or smaller conditional sample sizes, respectively, can have a great efficiency. The procedures are illustrated by means of a numerical example from clinical psychology.  相似文献   

14.
Similarity indices,sample size and diversity   总被引:20,自引:0,他引:20  
Henk Wolda 《Oecologia》1981,50(3):296-302
Summary The effect of sample size and species diversity on a variety of similarity indices is explored. Real values of a similarity index must be evaluated relative to the expected maximum value of that index, which is the value obtained for samples randomly drawn from the same universe, with the diversity and sample sizes of the real samples. It is shown that these expected maxima differ from the theoretical maxima, the values obtained for two identical samples, and that the relationship between expected and theoretical maxima depends on sample size and on species diversity in all cases, without exception. In all cases but one (the Morisita index) the expected maxima depend strongly to fairly strongly on sample size and diversity. For some of the more useful indices empirical equations are given to calculate the expected maximum value of the indices to which the observed values can be related at any combination of sample sizes. It is recommended that the Morisita index be used whenever possible to avoid the complex dealings with effects of sample size and diversity; however, when previous logarithmic transformation of the data is required, which often may be the case, the Morisita-Horn or the Renkonen indices are recommended.  相似文献   

15.
The extent of microbial diversity is an intrinsically fascinating subject of profound practical importance. The term 'diversity' may allude to the number of taxa or species richness as well as their relative abundance. There is uncertainty about both, primarily because sample sizes are too small. Non-parametric diversity estimators make gross underestimates if used with small sample sizes on unevenly distributed communities. One can make richness estimates over many scales using small samples by assuming a species/taxa-abundance distribution. However, no one knows what the underlying taxa-abundance distributions are for bacterial communities. Latterly, diversity has been estimated by fitting data from gene clone libraries and extrapolating from this to taxa-abundance curves to estimate richness. However, since sample sizes are small, we cannot be sure that such samples are representative of the community from which they were drawn. It is however possible to formulate, and calibrate, models that predict the diversity of local communities and of samples drawn from that local community. The calibration of such models suggests that migration rates are small and decrease as the community gets larger. The preliminary predictions of the model are qualitatively consistent with the patterns seen in clone libraries in 'real life'. The validation of this model is also confounded by small sample sizes. However, if such models were properly validated, they could form invaluable tools for the prediction of microbial diversity and a basis for the systematic exploration of microbial diversity on the planet.  相似文献   

16.
Although phylogenetic hypotheses can provide insights into mechanisms of evolution, their utility is limited by our inability to differentiate simultaneous speciation events (hard polytomies) from rapid cladogenesis (soft polytomies). In the present paper, we tested the potential for statistical power analysis to differentiate between hard and soft polytomies in molecular phytogenies. Classical power analysis typically is used a priori to determine the sample size required to detect a particular effect size at a particular level of significance (a) with a certain power (1 – β). A posteriori, power analysis is used to infer whether failure to reject a null hypothesis results from lack of an effect or from insufficient data (i.e., low power). We adapted this approach to molecular data to infer whether polytomies result from simultaneous branching events or from insufficient sequence information. We then used this approach to determine the amount of sequence data (sample size) required to detect a positive branch length (effect size). A worked example is provided based on the auklets (Charadriiformes: Alcidae), a group of seabirds among which relationships are represented by a polytomy, despite analyses of over 3000 bp of sequence data. We demonstrate the calculation of effect sizes and sample sizes from sequence data using a normal curve test for difference of a proportion from an expected value and a t-test for a difference of a mean from an expected value. Power analyses indicated that the data for the auklets should be sufficient to differentiate speciation events that occurred at least 100,000 yr apart (the duration of the shortest glacial and interglacial events of the Pleistocene), 2.6 million years ago.  相似文献   

17.
18.
T Tachibana 《Teratology》1990,42(3):207-214
A computer simulation experiment which attempted to examine the effect of sample size on reproducibility of the effect of treatment was performed on the basis of actual data obtained from the Collaborative Behavioral Teratology Study of the National Center for Toxicological Research. The degree of the treatment effect was assessed in terms of the strength of the association (eta square). The results indicate that sample size has a large effect on the reproducibility of results which are assessed with the magnitude of SD for eta squares obtained from replication experiments. Suitable sample sizes to obtain relatively consistent results across studies were discussed, pointing out that not enough attention has been paid to the effect of sample size in the issue of reproducibility of results in some behavioral teratology studies.  相似文献   

19.
A texture profile panel was developed for measuring textural properties of restructured beef steaks differing in meat particle size. For steaks of different particle sizes, considerable differences existed in the type of sample breakdown and shape of chewed pieces after just two chews. Panelists also found restructured steaks made from large meat particle sizes to be visually more distorted and to contain more gristle than steaks made from small meat particle sizes. Several characteristics (chunkiness after two chews, coarseness of chewed mass at 15 chews) were dropped from the profile over time, while several characteristics (type of sample breakdown and shape of chewed pieces at two chews, size of chewed pieces at 10 chews) not used initially, were added. The texture profile panel approach appears suitable for discerning the textural differences in restructured steaks that can arise from using different meat particle sizes during processing.  相似文献   

20.
Sample sizes given in regulatory guidelines are not based on statistical reasoning. However, from an ethical, scientific, and regulatory point of view, a mutagenicity experiment must have a reasonable chance of supporting the decision as to whether a result is negative or positive. Consequently, the sample size should be based on type I and type II errors, the underlying variability, and the specific size of a treatment effect. A two-stage adaptive interim analysis is presented, which permits an adaptive choice of sample size after an interim analysis of the data from the first stage. Because the sample size of the first stage is considered to be a minimum requirement, this stage can also be regarded as a pilot study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号