首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
In this paper, we provide an overview of recently developed methods for the analysis of multivariate data that do not necessarily emanate from a normal universe. Multivariate data occur naturally in the life sciences and in other research fields. When drawing inference, it is generally recommended to take the multivariate nature of the data into account, and not merely analyze each variable separately. Furthermore, it is often of major interest to select an appropriate set of important variables. We present contributions in three different, but closely related, research areas: first, a general approach to the comparison of mean vectors, which allows for profile analysis and tests of dimensionality; second, non‐parametric and parametric methods for the comparison of independent samples of multivariate observations; and third, methods for the situation where the experimental units are observed repeatedly, for example, over time, and the main focus is on analyzing different time profiles when the number p of repeated observations per subject is larger than the number n of subjects.  相似文献   

2.
Summary Permutation tests based on distances among multivariate observations have found many applications in the biological sciences. Two major testing frameworks of this kind are multiresponse permutation procedures and pseudo‐F tests arising from a distance‐based extension of multivariate analysis of variance. In this article, we derive conditions under which these two frameworks are equivalent. The methods and equivalence results are illustrated by reanalyzing an ecological data set and by a novel application to functional magnetic resonance imaging data.  相似文献   

3.
Aim Using a long‐term data set we investigated the response of semi‐desert grasslands to altered disturbance regimes in conjunction with climate patterns. Specifically, we were interested in the response of a non‐native grass (Eragrostis lehmanniana), mesquite (Prosopis velutina), and native species to the reintroduction of fire and removal of livestock. Location The study site is located on the 45,360‐ha Buenos Aires National Wildlife Refuge (31°32′ N, 110°30′ W) in southern Arizona, USA. In 1985, livestock were removed and prescribed fires were reintroduced to this semi‐desert grassland dominated by non‐native grasses and encroaching mesquite trees. Methods Plant species cover was monitored along 38, 30‐m transects five times over a period of 15 years. Data were analysed using principal components analysis on the variance–covariance and correlation matrix, multivariate analysis of variance for changes over time in relation to environmental data, and analysis of variance for altered disturbance regimes. Results Reintroduction of fire and removal of livestock have not led to an increase in native species diversity or a decrease in non‐native grasses or mesquite. The cover of non‐native grass was influenced by soil type in 1993. Main conclusions Variability of plant community richness, diversity, and cover over time appear to be most closely linked to fluctuations in precipitation rather than human‐altered disturbance regimes. The effects of altered grazing and fire regimes are likely confounded by complex interactions with climatic factors in systems significantly altered from their original physiognomy.  相似文献   

4.
Introduction – Rhodiola rosea is a broadly used medicinal plant with largely unexplored natural variability in secondary metabolite levels. Objective – The aim of this work was to develop a non‐target procedure for 1H NMR spectroscopic fingerprinting of rhizome extracts for pattern recognition analysis and identification of secondary metabolites responsible for differences in sample composition. To achieve this, plants from three different geographic areas (Swiss Alps, Finland, and Altai region in Siberia) were investigated. Results – A sample preparation procedure was developed in order to remove polymeric polyphenols as the 1H NMR analysis of low‐molecular‐weight metabolites was hampered by the presence of tannins. Principal component analysis disclosed tight clustering of samples according to population. PCA models based on the aromatic region of the spectra showed that the first two components reflected changes in the content of salidroside and rosavin, respectively, the rosavin content being negatively correlated to that of rhodiocyanoside A and minor aromatics. Score plots and non‐parametric variance tests demonstrated population‐dependent changes according to harvest time. Data consistency was assessed using score plots and box‐and‐whisker graphs. In addition, a procedure for presenting loadings of PCA models based on bucketed data as high‐resolution plots, which are reminiscent of real 1H NMR spectra and help to identify latent biomarkers, is presented. Conclusion – This study demonstrated the usefulness of the established procedure for multivariate non‐target 1H NMR metabolic profiling of Rhodiola rosea. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

5.
Gianola D  Fernando RL  Stella A 《Genetics》2006,173(3):1761-1776
Semiparametric procedures for prediction of total genetic value for quantitative traits, which make use of phenotypic and genomic data simultaneously, are presented. The methods focus on the treatment of massive information provided by, e.g., single-nucleotide polymorphisms. It is argued that standard parametric methods for quantitative genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., hundreds of thousands of markers, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations. This makes nonparametric procedures attractive. Kernel regression and reproducing kernel Hilbert spaces regression procedures are embedded into standard mixed-effects linear models, retaining additive genetic effects under multivariate normality for operational reasons. Inferential procedures are presented, and some extensions are suggested. An example is presented, illustrating the potential of the methodology. Implementations can be carried out after modification of standard software developed by animal breeders for likelihood-based or Bayesian analysis.  相似文献   

6.
Ranking multivariate ordinal data and applying a non‐parametric test is an analytical approach commonly employed to compare treatments. We study three types of ranking and demonstrate how to combine them. The ranking methods rest upon partial orders of the multidimensional measurements or upon the sum of ranks. Since their usage is simple as regards statistical assumptions and technical realization, they are also adapted for health professionals without deep statistical knowledge. Our goal is discussing differences between the approaches and disclosing possible statistical consequences of their usage (© 2009 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

7.
Repeatability (more precisely the common measure of repeatability, the intra‐class correlation coefficient, ICC) is an important index for quantifying the accuracy of measurements and the constancy of phenotypes. It is the proportion of phenotypic variation that can be attributed to between‐subject (or between‐group) variation. As a consequence, the non‐repeatable fraction of phenotypic variation is the sum of measurement error and phenotypic flexibility. There are several ways to estimate repeatability for Gaussian data, but there are no formal agreements on how repeatability should be calculated for non‐Gaussian data (e.g. binary, proportion and count data). In addition to point estimates, appropriate uncertainty estimates (standard errors and confidence intervals) and statistical significance for repeatability estimates are required regardless of the types of data. We review the methods for calculating repeatability and the associated statistics for Gaussian and non‐Gaussian data. For Gaussian data, we present three common approaches for estimating repeatability: correlation‐based, analysis of variance (ANOVA)‐based and linear mixed‐effects model (LMM)‐based methods, while for non‐Gaussian data, we focus on generalised linear mixed‐effects models (GLMM) that allow the estimation of repeatability on the original and on the underlying latent scale. We also address a number of methods for calculating standard errors, confidence intervals and statistical significance; the most accurate and recommended methods are parametric bootstrapping, randomisation tests and Bayesian approaches. We advocate the use of LMM‐ and GLMM‐based approaches mainly because of the ease with which confounding variables can be controlled for. Furthermore, we compare two types of repeatability (ordinary repeatability and extrapolated repeatability) in relation to narrow‐sense heritability. This review serves as a collection of guidelines and recommendations for biologists to calculate repeatability and heritability from both Gaussian and non‐Gaussian data.  相似文献   

8.
Anderson MJ 《Biometrics》2006,62(1):245-253
Summary The traditional likelihood‐based test for differences in multivariate dispersions is known to be sensitive to nonnormality. It is also impossible to use when the number of variables exceeds the number of observations. Many biological and ecological data sets have many variables, are highly skewed, and are zero‐inflated. The traditional test and even some more robust alternatives are also unreasonable in many contexts where measures of dispersion based on a non‐Euclidean dissimilarity would be more appropriate. Distance‐based tests of homogeneity of multivariate dispersions, which can be based on any dissimilarity measure of choice, are proposed here. They rely on the rotational invariance of either the multivariate centroid or the spatial median to obtain measures of spread using principal coordinate axes. The tests are straightforward multivariate extensions of Levene's test, with P‐values obtained either using the traditional F‐distribution or using permutation of either least‐squares or LAD residuals. Examples illustrate the utility of the approach, including the analysis of stabilizing selection in sparrows, biodiversity of New Zealand fish assemblages, and the response of Indonesian reef corals to an El Niño. Monte Carlo simulations from the real data sets show that the distance‐based tests are robust and powerful for relevant alternative hypotheses of real differences in spread.  相似文献   

9.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration.  相似文献   

10.
The presence of missing values in gel-based proteomics data represents a real challenge if an objective statistical analysis is pursued. Different methods to handle missing values were evaluated and their influence is discussed on the selection of important proteins through multivariate techniques. The evaluated methods consisted of directly dealing with them during the multivariate analysis with the nonlinear estimation by iterative partial least squares (NIPALS) algorithm or imputing them by using either k-nearest neighbor or Bayesian principal component analysis (BPCA) before carrying out the multivariate analysis. These techniques were applied to data obtained from gels stained with classical postrunning dyes and from DIGE gels. Before applying the multivariate techniques, the normality and homoscedasticity assumptions on which parametric tests are based on were tested in order to perform a sound statistical analysis. From the three tested methods to handle missing values in our datasets, BPCA imputation of missing values showed to be the most consistent method.  相似文献   

11.
Behavioural studies are commonly plagued with data that violate the assumptions of parametric statistics. Consequently, classic nonparametric methods (e.g. rank tests) and novel distribution-free methods (e.g. randomization tests) have been used to a great extent by behaviourists. However, the robustness of such methods in terms of statistical power and type I error have seldom been evaluated. This probably reflects the fact that empirical methods, such as Monte Carlo approaches, are required to assess these concerns. In this study we show that analytical methods cannot always be used to evaluate the robustness of statistical tests, but rather Monte Carlo approaches must be employed. We detail empirical protocols for estimating power and type I error rates for parametric, nonparametric and randomization methods, and demonstrate their application for an analysis of variance and a regression/correlation analysis design. Together, this study provides a framework from which behaviourists can compare the reliability of different methods for data analysis, serving as a basis for selecting the most appropriate statistical test given the characteristics of data at hand. Copyright 2001 The Association for the Study of Animal Behaviour.  相似文献   

12.
Understanding how stressors combine to affect population abundances and trajectories is a fundamental ecological problem with increasingly important implications worldwide. Generalisations about interactions among stressors are challenging due to different categorisation methods and how stressors vary across species and systems. Here, we propose using a newly introduced framework to analyse data from the last 25 years on ecological stressor interactions, for example combined effects of temperature, salinity and nutrients on population survival and growth. We contrast our results with the most commonly used existing method – analysis of variance (ANOVA) – and show that ANOVA assumptions are often violated and have inherent limitations for detecting interactions. Moreover, we argue that rescaling – examining relative rather than absolute responses – is critical for ensuring that any interaction measure is independent of the strength of single‐stressor effects. In contrast, non‐rescaled measures – like ANOVA – find fewer interactions when single‐stressor effects are weak. After re‐examining 840 two‐stressor combinations, we conclude that antagonism and additivity are the most frequent interaction types, in strong contrast to previous reports that synergy dominates yet supportive of more recent studies that find more antagonism. Consequently, measuring and re‐assessing the frequency of stressor interaction types is imperative for a better understanding of how stressors affect populations.  相似文献   

13.
Many biological data sets, from field observations and manipulative experiments, involve crossed factor designs, analysed in a univariate context by higher-way analyses of variance which partition out ‘main’ and ‘interaction’ effects. Indeed, tests for significance of interactions among factors, such as differing Before-After responses at Control and Impact sites, are the basis of the widely used BACI strategy for detecting impacts in the environment. There are difficulties, however, in generalising simple univariate definitions of interaction, from classic linear models, to the robust, non-parametric multivariate methods that are commonly required in handling assemblage data. The size of an interaction term, and even its existence at all, depends crucially on the measurement scale, so it is fundamentally a parametric construct. Despite this, certain forms of interaction can be examined using non-parametric methods, namely those evidenced by changing assemblage patterns over many time periods, for replicate sites from different experimental conditions (types of ‘Beyond BACI’ design) - or changing multivariate structure over space, at many observed times. Second-stage MDS, which can be thought of as an MDS plot of the pairwise similarities between MDS plots (e.g. of assemblage time trajectories), can be used to illustrate such interactions, and they can be formally tested by second-stage ANOSIM permutation tests. Similarities between (first-stage) multivariate patterns are assessed by rank-based matrix correlations, preserving the fully non-parametric approach common in marine community studies. The method is exemplified using time-series data on corals from Thailand, macrobenthos from Tees Bay, UK, and macroalgae from a complex recolonisation experiment carried out in the Ligurian Sea, Italy. The latter data set is also used to demonstrate how the analysis copes straightforwardly with certain repeated-measures designs.  相似文献   

14.
The additive main effects multiplicative interaction model is frequently used in the analysis of multilocation trials. In the analysis of such data it is of interest to decide how many of the multiplicative interaction terms are significant. Several tests for this task are available, all of which assume that errors are normally distributed with a common variance. This paper investigates the robustness of several tests (Gollob, F GH1, FGH2, FR)to departures from these assumptions. It is concluded that, because of its better robustness, the F Rtest is preferable. If the other tests are to be used, preliminary tests for the validity of assumptions should be performed.  相似文献   

15.
A multiple parametric test procedure is proposed, which considers tests of means of several variables. The single variables or subsets of variables are ordered according to a data‐dependent criterion and tested in this succession without alpha‐adjustment until the first non‐significant test. The test procedure needs the assumption of a multivariate normal distribution and utilizes the theory of spherical distributions. The basic version is particularly suited for variables with approximately equal variances. As a typical example, the procedure is applied to gene expression data from a commercial array.  相似文献   

16.
Monte‐Carlo simulation methods are commonly used for assessing the performance of statistical tests under finite sample scenarios. They help us ascertain the nominal level for tests with approximate level, e.g. asymptotic tests. Additionally, a simulation can assess the quality of a test on the alternative. The latter can be used to compare new tests and established tests under certain assumptions in order to determinate a preferable test given characteristics of the data. The key problem for such investigations is the choice of a goodness criterion. We expand the expected p‐value as considered by Sackrowitz and Samuel‐Cahn (1999) to the context of univariate equivalence tests. This presents an effective tool to evaluate new purposes for equivalence testing because of its independence of the distribution of the test statistic under null‐hypothesis. It helps to avoid the often tedious search for the distribution under null‐hypothesis for test statistics which have no considerable advantage over yet available methods. To demonstrate the usefulness in biometry a comparison of established equivalence tests with a nonparametric approach is conducted in a simulation study for three distributional assumptions.  相似文献   

17.
Estimating and predicting temporal trends in species richness is of general importance, but notably difficult because detection probabilities of species are imperfect and many datasets were collected in an opportunistic manner. We need to improve our capabilities to assess richness trends using datasets collected in unstandardized procedures with potential collection bias. Two methods are proposed and applied to estimate richness change, which both incorporate models for sampling effects and detection probability: (a) nonlinear species accumulation curves with an error variance model and (b) Pradel capture–recapture models. The methods are used to assess nationwide temporal trends (1945–2018) in the species richness of wild bees in the Netherlands. Previously, a decelerating decline in wild bee species richness was inferred for part of this dataset. Among the species accumulation curves, those with nonconstant changes in species richness are preferred. However, when analyzing data subsets, constant changes became selected for non‐Bombus bees (for samples in collections) and bumblebees (for spatial grid cells sampled in three periods). Smaller richness declines are predicted for non‐Bombus bees than bumblebees. However, when relative losses are calculated from confidence intervals limits, they overlap and touch zero loss. Capture–recapture analysis applied to species encounter histories infers a constant colonization rate per year and constant local species survival for bumblebees and other bees. This approach predicts a 6% reduction in non‐Bombus species richness from 1945 to 2018 and a significant 19% reduction for bumblebees. Statistical modeling to detect species richness time trends should be systematically complemented with model checking and simulations to interpret the results. Data inspection, assessing model selection bias, and comparisons of trends in data subsets were essential model checking strategies in this analysis. Opportunistic data will not satisfy the assumptions of most models and this should be kept in mind throughout.  相似文献   

18.
Abstract One of the assumptions of analysis of variance (anova ) is that the variances of the groups being compared are approximately equal. This assumption is routinely checked before doing an analysis, although some workers consider anova robust and do not bother and others avoid parametric procedures entirely. Two of the more commonly used heterogeneity tests are Bartlett's and Cochran's, although, as for most of these tests, they may well be more sensitive to violations of the anova assumptions than is anova itself. Simulations were used to examine how well these two tests protected anova against the problems created by variance heterogeneity. Although Cochran's test performed a little better than Bartlett's, both tests performed poorly, frequently disallowing perfectly valid analyses. Recommendations are made about how to proceed, given these results.  相似文献   

19.
Abstract. The use of Generalized Linear Models (GLM) in vegetation analysis has been advocated to accommodate complex species response curves. This paper investigates the potential advantages of using classification and regression trees (CART), a recursive partitioning method that is free of distributional assumptions. We used multiple logistic regression (a form of GLM) and CART to predict the distribution of three major oak species in California. We compared two types of model: polynomial logistic regression models optimized to account for non‐linearity and factor interactions, and simple CART‐models. Each type of model was developed using learning data sets of 2085 and 410 sample cases, and assessed on test sets containing 2016 and 3691 cases respectively. The responses of the three species to environmental gradients were varied and often non‐homogeneous or context dependent. We tested the methods for predictive accuracy: CART‐models performed significantly better than our polynomial logistic regression models in four of the six cases considered, and as well in the two remaining cases. CART also showed a superior ability to detect factor interactions. Insight gained from CART‐models then helped develop improved parametric models. Although the probabilistic form of logistic regression results is more adapted to test theories about species responses to environmental gradients, we found that CART‐models are intuitive, easy to develop and interpret, and constitute a valuable tool for modeling species distributions.  相似文献   

20.
When distributional assumptions for analysis of variance are suspect, and nonparametric methods are unavailable, ecologists frequently employ rank transformation (RT) methods. The technique replaces observations by their ranks, which are then analysed using standard parametric tests. RT methods are widely recommended in statistics texts and in manuals for packages like SAS and IMSL. They are robust and powerful for the analysis of additive factorial designs. Recently, however, RT methods have been found to be grossly inappropriate for use with non-additive models. This severe limitation remains largely unreported outside of the theoretical statistics literature. Our goal is to explain this shortcoming of RT methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号