首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Behavioural studies are commonly plagued with data that violate the assumptions of parametric statistics. Consequently, classic nonparametric methods (e.g. rank tests) and novel distribution-free methods (e.g. randomization tests) have been used to a great extent by behaviourists. However, the robustness of such methods in terms of statistical power and type I error have seldom been evaluated. This probably reflects the fact that empirical methods, such as Monte Carlo approaches, are required to assess these concerns. In this study we show that analytical methods cannot always be used to evaluate the robustness of statistical tests, but rather Monte Carlo approaches must be employed. We detail empirical protocols for estimating power and type I error rates for parametric, nonparametric and randomization methods, and demonstrate their application for an analysis of variance and a regression/correlation analysis design. Together, this study provides a framework from which behaviourists can compare the reliability of different methods for data analysis, serving as a basis for selecting the most appropriate statistical test given the characteristics of data at hand. Copyright 2001 The Association for the Study of Animal Behaviour.  相似文献   

2.
A statistical challenge in community ecology is to identify segregated and aggregated pairs of species from a binary presence–absence matrix, which often contains hundreds or thousands of such potential pairs. A similar challenge is found in genomics and proteomics, where the expression of thousands of genes in microarrays must be statistically analyzed. Here we adapt the empirical Bayes method to identify statistically significant species pairs in a binary presence–absence matrix. We evaluated the performance of a simple confidence interval, a sequential Bonferroni test, and two tests based on the mean and the confidence interval of an empirical Bayes method. Observed patterns were compared to patterns generated from null model randomizations that preserved matrix row and column totals. We evaluated these four methods with random matrices and also with random matrices that had been seeded with an additional segregated or aggregated species pair. The Bayes methods and Bonferroni corrections reduced the frequency of false-positive tests (type I error) in random matrices, but did not always correctly identify the non-random pair in a seeded matrix (type II error). All of the methods were vulnerable to identifying spurious secondary associations in the seeded matrices. When applied to a set of 272 published presence–absence matrices, even the most conservative tests indicated a fourfold increase in the frequency of perfectly segregated “checkerboard” species pairs compared to the null expectation, and a greater predominance of segregated versus aggregated species pairs. The tests did not reveal a large number of significant species pairs in the Vanuatu bird matrix, but in the much smaller Galapagos bird matrix they correctly identified a concentration of segregated species pairs in the genus Geospiza. The Bayesian methods provide for increased selectivity in identifying non-random species pairs, but the analyses will be most powerful if investigators can use a priori biological criteria to identify potential sets of interacting species.  相似文献   

3.
Lájer (2007) notes that, to investigate phytosociological and ecological relationships, many authors apply traditional inferential tests to sets of relevés obtained by non-random methods. Unfortunately, this procedure does not provide reliable support for hypothesis testing because non-random sampling violates the assumptions of independence required by many parametric inferential tests. Instead, a random sampling scheme is recommended. Nonetheless, random sampling will not eliminate spatial autocorrelation. For instance, a classical law of geography holds that everything in a piece of (biotic) space is interrelated, but near objects are more related than distant ones. Because most ecological processes that shape community structure and species coexistence are spatially explicit, spatial autocorrelation is a vital part of almost all ecological data. This means that, independently from the underlying sampling design, ecological data are generally spatially autocorrelated, violating the assumption of independence that is generally required by traditional inferential tests. To overcome this drawback, randomization tests may be used. Such tests evaluate statistical significance based on empirical distributions generated from the sample and do not necessarily require data independence. However, as concerns hypothesis testing, randomization tests are not the universal remedy for ecologists, because the choice of inadequate null models can have significant effects on the ecological hypotheses tested. In this paper, I emphasize the need of developing null models for which the statistical assumptions match the underlying biological mechanisms.  相似文献   

4.
We have examined a number of statistical issues associated with methods for evaluating different tests of density dependence. The lack of definitive standards and benchmarks for conducting simulation studies makes it difficult to assess the performance of various tests. The biological researcher has a bewildering choice of statistical tests for testing density dependence and the list is growing. The most recent additions have been based on computationally intensive methods such as permutation tests and boot-strapping. We believe the computational effort and time involved will preclude their widespread adoption until: (1) these methods have been fully explored under a wide range of conditions and shown to be demonstrably superior than other, simpler methods, and (2) general purpose software is made available for performing the calculations. We have advocated the use of Bulmer's (first) test as a de facto standard for comparative studies on the grounds of its simplicity, applicability, and satisfactory performance under a variety of conditions. We show that, in terms of power, Bulmer's test is robust to certain departures from normality although, as noted by other authors, it is affected by temporal trends in the data. We are not convinced that the reported differences in power between Bulmer's test and the randomisation test of Pollard et al. (1987) justifies the adoption of the latter. Nor do we believe a compelling case has been established for the parametric bootstrap likelihood ratio test of Dennis and Taper (1994). Bulmer's test is essentially a test of the serial correlation in the (log) abundance data and is affected by the presence of autocorrelated errors. In such cases the test cannot distinguish between the autoregressive effect in the errors and a true density dependent effect in the time series data. We suspect other tests may be similarly affected, although this is an area for further research. We have also noted that in the presence of autocorrelation, the type I error rates can be substantially different from the assumed level of significance, implying that in such cases the test is based on a faulty significance region. We have indicated both qualitatively and quantitatively how autoregressive error terms can affect the power of Bulmer's test, although we suggest that more work is required in this area. These apparent inadequacies of Bulmer's test should not be interpreted as a failure of the statistical procedure since the test was not intended to be used with autocorrelated error terms.  相似文献   

5.
Functional shifts during protein evolution are expected to yield shifts in substitution rate, and statistical methods can test for this at both codon and amino acid levels. Although methods based on models of sequence evolution serve as powerful tools for studying evolutionary processes, violating underlying assumptions can lead to false biological conclusions. It is not unusual for functional shifts to be accompanied by changes in other aspects of the evolutionary process, such as codon or amino acid frequencies. However, models used to test for functional divergence assume these frequencies remain constant over time. We employed simulation to investigate the impact of non-stationary evolution on functional divergence inference. We investigated three likelihood ratio tests based on codon models and found varying degrees of sensitivity. Joint effects of shifts in frequencies and selection pressures can be large, leading to false signals for positive selection. Amino acid-based tests (FunDi and Bivar) were also compromised when several aspects of the substitution process were not adequately modeled. We applied the same tests to a core genome “scan” for functional divergence between light-adapted ecotypes of the cyanobacteria Prochlorococcus, and carried out gene-specific simulations for ten genes. Results of those simulations illustrated how the inference of functional divergence at the genomic level can be seriously impacted by model misspecification. Although computationally costly, simulations motivated by data in hand are warranted when several aspects of the substitution process are either misspecified or not included in the models upon which the statistical tests were built.  相似文献   

6.
Lájer (2007) raised the problem of using a non-random sample for statistical testing of plant community data. He argued that this violates basic assumptions of the tests, resulting thus in non-significant results. However, a huge part of present-day knowledge of vegetation science is still based on non-random, preferentially collected data of plant communities. I argue that, given the inherent limits of preferential sampling, a change of approach is now necessary, with the adoption of sampling based on random principles seeming the obvious choice. However, a complete transition to random-based sampling designs in vegetation science is limited by the yet undefined nature of plant communities and by the still diffused opinion that plant communities have a discrete nature. Randomly searching for such entities is almost impossible, given their dependence on scale of observation, plot size and shape, and the need for finding well-defined types. I conclude that the only way to solve this conundrum is to consider and study plant communities as operational units. If the limits of the plant communities are defined operationally, they can be investigated using proper sampling techniques and the collected data analyzed using adequate statistical tools.  相似文献   

7.
The safety of chemicals, drugs, novel foods and genetically modified crops is often tested using repeat-dose sub-acute toxicity tests in rats or mice. It is important to avoid misinterpretations of the results as these tests are used to help determine safe exposure levels in humans. Treated and control groups are compared for a range of haematological, biochemical and other biomarkers which may indicate tissue damage or other adverse effects. However, the statistical analysis and presentation of such data poses problems due to the large number of statistical tests which are involved. Often, it is not clear whether a “statistically significant” effect is real or a false positive (type I error) due to sampling variation. The author''s conclusions appear to be reached somewhat subjectively by the pattern of statistical significances, discounting those which they judge to be type I errors and ignoring any biomarker where the p-value is greater than p = 0.05. However, by using standardised effect sizes (SESs) a range of graphical methods and an over-all assessment of the mean absolute response can be made. The approach is an extension, not a replacement of existing methods. It is intended to assist toxicologists and regulators in the interpretation of the results. Here, the SES analysis has been applied to data from nine published sub-acute toxicity tests in order to compare the findings with those of the author''s. Line plots, box plots and bar plots show the pattern of response. Dose-response relationships are easily seen. A “bootstrap” test compares the mean absolute differences across dose groups. In four out of seven papers where the no observed adverse effect level (NOAEL) was estimated by the authors, it was set too high according to the bootstrap test, suggesting that possible toxicity is under-estimated.  相似文献   

8.
Zhongxue Chen  Qingzhong Liu  Kai Wang 《Genomics》2019,111(5):1152-1159
Gene- and pathway-based variant association tests are important tools in finding genetic variants that are associated with phenotypes of interest. Although some methods have been proposed in the literature, powerful and robust statistical tests are still desirable in this area. In this study, we propose a statistical test based on decomposing the genotype data into orthogonal parts from which powerful and robust independent p-value combination approaches can be utilized. Through a comprehensive simulation study, we compare the proposed test with some existing popular ones. Our simulation results show that the new test has great performance in terms of controlling type I error rate and statistical power. Real data applications are also conducted to illustrate the performance and usefulness of the proposed test.  相似文献   

9.
Statistical tests for non-random associations with components of habitat or different kinds of prey require information about the availability of sub-habitats or types of prey. The data are obtained from sampling (Stage 1 samples). Tests are then constructed using this information to predict what will be the occupancy of habitats or composition of diet under the null hypothesis of random association. Estimates of actual occupancy of habitats or composition of diet are then obtained from Stage 2 sampling and tests are done to compare the observed data from Stage 2 with what was predicted from Stage 1.Estimates from each stage of sampling are subject to sampling error, particularly where small samples are involved. The errors involved in Stage 1 sampling are often ignored, resulting in biases in tests and excessive rejection of null hypotheses (i.e. non-random patterns are claimed when they are not present). Here, accurate tests are developed which take into account both types of error.For animals in patchy habitats, with two or more types of patch, the data from Stages 1 and 2 are used to derive maximal likelihood estimators for the proportions of area occupied by the sub-habitats and the proportions of animals in each sub-habitat. These are then used in χ2 tests.For composition of diets, data are more complex, because the consumption of food of each type (on its own) must be estimated in separate experiments or sampling. So, Stage 1 sampling is more difficult and the maximal likelihood estimators described here are more complex. The accurate tests described here give much more realistic answers in that they properly control rates of Type I error, particularly with small samples. The effects of errors in Stage 1 sampling are, however, shown to be important, even for quite large samples. The tests can and should be used in any analyses of non-random association or preference among sub-habitats or types of prey.  相似文献   

10.
Yan Li  Barry I. Graubard 《Biometrics》2009,65(4):1096-1104
Summary For studies on population genetics, the use of representative random samples of the target population can avoid ascertainment bias. Genetic variation data from over a hundred genes were collected in a U.S. nationally representative sample in the Third National Health and Nutrition Examination Survey (NHANES III). Surveys such as the NHANES have complex stratified multistage cluster sample designs with sample weighting that can inflate variances and alter the expectations of test statistics. Thus, classical statistical tests of Hardy–Weinberg equilibrium (HWE) and homogeneity of HW disequilibrium (HHWD) for simple random samples are not suitable for data from complex samples. We propose using Wald tests for HWE and generalized score tests for HHWD that have been modified for complex samples. Monte Carlo simulation studies are used to investigate the finite sample properties of the proposed tests. Rao–Scott corrections applied to the tests were found to improve their type I error properties. Our methods are applied to the NHANES III genetic data for three loci involved in metabolizing lead in the body.  相似文献   

11.
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies.  相似文献   

12.
In ecological field surveys, observations are gathered at different spatial locations. The purpose may be to relate biological response variables (e.g., species abundances) to explanatory environmental variables (e.g., soil characteristics). In the absence of prior knowledge, ecologists have been taught to rely on systematic or random sampling designs. If there is prior knowledge about the spatial patterning of the explanatory variables, obtained from either previous surveys or a pilot study, can we use this information to optimize the sampling design in order to maximize our ability to detect the relationships between the response and explanatory variables?
The specific questions addressed in this paper are: a) What is the effect (type I error) of spatial autocorrelation on the statistical tests commonly used by ecologists to analyse field survey data? b) Can we eliminate, or at least minimize, the effect of spatial autocorrelation by the design of the survey? Are there designs that provide greater power for surveys, at least under certain circumstances? c) Can we eliminate or control for the effect of spatial autocorrelation during the analysis? To answer the last question, we compared regular regression analysis to a modified t‐test developed by Dutilleul for correlation coefficients in the presence of spatial autocorrelation.
Replicated surfaces (typically, 1000 of them) were simulated using different spatial parameters, and these surfaces were subjected to different sampling designs and methods of statistical analysis. The simulated surfaces may represent, for example, vegetation response to underlying environmental variation. This allowed us 1) to measure the frequency of type I error (the failure to reject the null hypothesis when in fact there is no effect of the environment on the response variable) and 2) to estimate the power of the different combinations of sampling designs and methods of statistical analysis (power is measured by the rate of rejection of the null hypothesis when an effect of the environment on the response variable has been created).
Our results indicate that: 1) Spatial autocorrelation in both the response and environmental variables affects the classical tests of significance of correlation or regression coefficients. Spatial autocorrelation in only one of the two variables does not affect the test of significance. 2) A broad‐scale spatial structure present in data has the same effect on the tests as spatial autocorrelation. When such a structure is present in one of the variables and autocorrelation is found in the other, or in both, the tests of significance have inflated rates of type I error. 3) Dutilleul's modified t‐test for the correlation coefficient, corrected for spatial autocorrelation, effectively corrects for spatial autocorrelation in the data. It also effectively corrects for the presence of deterministic structures, with or without spatial autocorrelation.
The presence of a broad‐scale deterministic structure may, in some cases, reduce the power of the modified t‐test.  相似文献   

13.
Reflections on univariate and multivariate analysis of metabolomics data   总被引:1,自引:0,他引:1  
Metabolomics experiments usually result in a large quantity of data. Univariate and multivariate analysis techniques are routinely used to extract relevant information from the data with the aim of providing biological knowledge on the problem studied. Despite the fact that statistical tools like the t test, analysis of variance, principal component analysis, and partial least squares discriminant analysis constitute the backbone of the statistical part of the vast majority of metabolomics papers, it seems that many basic but rather fundamental questions are still often asked, like: Why do the results of univariate and multivariate analyses differ? Why apply univariate methods if you have already applied a multivariate method? Why if I do not see something univariately I see something multivariately? In the present paper we address some aspects of univariate and multivariate analysis, with the scope of clarifying in simple terms the main differences between the two approaches. Applications of the t test, analysis of variance, principal component analysis and partial least squares discriminant analysis will be shown on both real and simulated metabolomics data examples to provide an overview on fundamental aspects of univariate and multivariate methods.  相似文献   

14.
Keith P. Lewis 《Oikos》2004,104(2):305-315
Ecologists rely heavily upon statistics to make inferences concerning ecological phenomena and to make management recommendations. It is therefore important to use statistical tests that are most appropriate for a given data-set. However, inappropriate statistical tests are often used in the analysis of studies with categorical data (i.e. count data or binary data). Since many types of statistical tests have been used in artificial nests studies, a review and comparison of these tests provides an opportunity to demonstrate the importance of choosing the most appropriate statistical approach for conceptual reasons as well as type I and type II errors.
Artificial nests have routinely been used to study the influences of habitat fragmentation, and habitat edges on nest predation. I review the variety of statistical tests used to analyze artificial nest data within the framework of the generalized linear model and argue that logistic regression is the most appropriate and flexible statistical test for analyzing binary data-sets. Using artificial nest data from my own studies and an independent data set from the medical literature as examples, I tested equivalent data using a variety of statistical methods. I then compared the p-values and the statistical power of these tests. Results vary greatly among statistical methods. Methods inappropriate for analyzing binary data often fail to yield significant results even when differences between study groups appear large, while logistic regression finds these differences statistically significant. Statistical power is is 2–3 times higher for logistic regression than for other tests. I recommend that logistic regression be used to analyze artificial nest data and other data-sets with binary data.  相似文献   

15.
16.
The Mantel test, based on comparisons of distance matrices, is commonly employed in comparative biology, but its statistical properties in this context are unknown. Here, we evaluate the performance of the Mantel test for two applications in comparative biology: testing for phylogenetic signal, and testing for an evolutionary correlation between two characters. We find that the Mantel test has poor performance compared to alternative methods, including low power and, under some circumstances, inflated type‐I error. We identify a remedy for the inflated type‐I error of three‐way Mantel tests using phylogenetic permutations; however, this test still has considerably lower power than independent contrasts. We recommend that use of the Mantel test should be restricted to cases in which data can only be expressed as pairwise distances among taxa.  相似文献   

17.
The currently dominating hypothetico-deductive research paradigm for ecology has statistical hypothesis testing as a basic element. Classic statistical hypothesis testing does, however, present the ecologist with two fundamental dilemmas when field data are to be analyzed: (1) that the statistically motivated demand for a random and representative sample and the ecologically motivated demand for representation of variation in the study area cannot be fully met at the same time; and (2) that the statistically motivated demand for independence of errors calls for sampling distances that exceed the scales of relevant pattern-generating processes, so that samples with statistically desirable properties will be ecologically irrelevant. Reasons for these dilemmas are explained by consideration of the classic statistical Neyman-Pearson test procedure, properties of ecological variables, properties of sampling designs, interactions between properties of the ecological variables and properties of sampling designs, and specific assumptions of the statistical methods. Analytic solutions to problems underlying the dilemmas are briefly reviewed. I conclude that several important research objectives cannot be approached without subjective elements in sampling designs. I argue that a research strategy entirely based on rigorous statistical testing of hypotheses is insufficient for field ecological data and that inductive and deductive approaches are complementary in the process of building ecological knowledge. I recommend that great care is taken when statistical tests are applied to ecological field data. Use of less formal modelling approaches is recommended for cases when formal testing is not strictly needed. Sets of recommendations, “Guidelines for wise use of statistical tools”, are proposed both for testing and for modelling. Important elements of wise-use guidelines are parallel use of methods that preferably belong to different methodologies, selection of methods with few and less rigorous assumptions, conservative interpretation of results, and abandonment of definitive decisions based a predefined significance level.  相似文献   

18.
Following the pioneering work of Felsenstein and Garland, phylogeneticists have been using regression through the origin to analyze comparative data using independent contrasts. The reason why regression through the origin must be used with such data was revisited. The demonstration led to the formulation of a permutation test for the coefficient of determination and the regression coefficient estimates in regression through the origin. Simulations were carried out to measure type I error and power of the parametric and permutation tests under two models of data generation: regression models I and II (correlation model). Although regression through the origin assumes model I data, in independent contrast data error is present in the explanatory as well as the response variables. Two forms of permutations were investigated to test the regression coefficients: permutation of the values of the response variable y, and permutation of the residuals of the regression model. The simulations showed that the parametric tests or any of the permutation tests can be used when the error is normal, which is the usual assumption in independent contrast studies; only the test by permutation of y should be used when the error is highly asymmetric; and the parametric tests should be used when extreme values are present in covariables. Two examples are presented. The first one concerns non-specificity in fish parasites of the genus Lamellodiscus, the second the richness in parasites in 78 species of mammals.  相似文献   

19.
Population structure and eigenanalysis   总被引:4,自引:0,他引:4       下载免费PDF全文
Current methods for inferring population structure from genetic data do not provide formal significance tests for population differentiation. We discuss an approach to studying population structure (principal components analysis) that was first applied to genetic data by Cavalli-Sforza and colleagues. We place the method on a solid statistical footing, using results from modern statistics to develop formal significance tests. We also uncover a general “phase change” phenomenon about the ability to detect structure in genetic data, which emerges from the statistical theory we use, and has an important implication for the ability to discover structure in genetic data: for a fixed but large dataset size, divergence between two populations (as measured, for example, by a statistic like FST) below a threshold is essentially undetectable, but a little above threshold, detection will be easy. This means that we can predict the dataset size needed to detect structure.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号