首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Taxon sampling, correlated evolution, and independent contrasts   总被引:14,自引:0,他引:14  
Independent contrasts are widely used to incorporate phylogenetic information into studies of continuous traits, particularly analyses of evolutionary trait correlations, but the effects of taxon sampling on these analyses have received little attention. In this paper, simulations were used to investigate the effects of taxon sampling patterns and alternative branch length assignments on the statistical performance of correlation coefficients and sign tests; "full-tree" analyses based on contrasts at all nodes and "paired-comparisons" based only on contrasts of terminal taxon pairs were also compared. The simulations showed that random samples, with respect to the traits under consideration, provide statistically robust estimates of trait correlations. However, exact significance tests are highly dependent on appropriate branch length information; equal branch lengths maintain lower Type I error than alternative topological approaches, and adjusted critical values of the independent contrast correlation coefficient are provided for use with equal branch lengths. Nonrandom samples, with respect to univariate or bivariate trait distributions, introduce discrepancies between interspecific and phylogenetically structured analyses and bias estimates of underlying evolutionary correlations. Examples of nonrandom sampling processes may include community assembly processes, convergent evolution under local adaptive pressures, selection of a nonrandom sample of species from a habitat or life-history group, or investigator bias. Correlation analyses based on species pairs comparisons, while ignoring deeper relationships, entail significant loss of statistical power and as a result provide a conservative test of trait associations. Paired comparisons in which species differ by a large amount in one trait, a method introduced in comparative plant ecology, have appropriate Type I error rates and high statistical power, but do not correctly estimate the magnitude of trait correlations. Sign tests, based on full-tree or paired-comparison approaches, are highly reliable across a wide range of sampling scenarios, in terms of Type I error rates, but have very low power. These results provide guidance for selecting species and applying comparative methods to optimize the performance of statistical tests of trait associations.  相似文献   

2.
Zucker DM  Denne J 《Biometrics》2002,58(3):548-559
Clinical trialists recently have shown interest in two-stage procedures for updating the sample-size calculation at an interim point in a trial. Because many clinical trials involve repeated measures designs, it is desirable to have available practical two-stage procedures for such designs. Shih and Gould (1995, Statistics in Medicine 14, 2239-2248) discuss sample-size redetermination for repeated measures studies but under a highly simplified setup. We develop two-stage procedures under the general mixed linear model, allowing for dropouts and missed visits. We present a range of procedures and compare their Type I error and power by simulation. We find that, in general, the achieved power is brought considerably closer to the required level without inflating the Type I error rate. We also derive an inflation factor that ensures the power requirement is more closely met.  相似文献   

3.
We examined Type I error rates of Felsenstein's (1985; Am. Nat. 125:1-15) comparative method of phylogenetically independent contrasts when branch lengths are in error and the model of evolution is not Brownian motion. We used seven evolutionary models, six of which depart strongly from Brownian motion, to simulate the evolution of two continuously valued characters along two different phylogenies (15 and 49 species). First, we examined the performance of independent contrasts when branch lengths are distorted systematically, for example, by taking the square root of each branch segment. These distortions often caused inflated Type I error rates, but performance was almost always restored when branch length transformations were used. Next, we investigated effects of random errors in branch lengths. After the data were simulated, we added errors to the branch lengths and then used the altered phylogenies to estimate character correlations. Errors in the branches could be of two types: fixed, where branch lengths are either shortened or lengthened by a fixed fraction; or variable, where the error is a normal variate with mean zero and the variance is scaled to the length of the branch (so that expected error relative to branch length is constant for the whole tree). Thus, the error added is unrelated to the microevolutionary model. Without branch length checks and transformations, independent contrasts tended to yield extremely inflated and highly variable Type I error rates. Type I error rates were reduced, however, when branch lengths were checked and transformed as proposed by Garland et al. (1992; Syst. Biol. 41:18-32), and almost never exceeded twice the nominal P-value at alpha = 0.05. Our results also indicate that, if branch length transformations are applied, then the appropriate degrees of freedom for testing the significance of a correlation coefficient should, in general, be reduced to account for estimation of the best branch length transformation. These results extend those reported in Díaz-Uriarte and Garland (1996; Syst. Biol. 45:27-47), and show that, even with errors in branch lengths and evolutionary models different from Brownian motion, independent contrasts are a robust method for testing hypotheses of correlated evolution.  相似文献   

4.
Genotypes produced from samples collected non-invasively in harsh field conditions often lack the full complement of data from the selected microsatellite loci. The application to genetic mark-recapture methodology in wildlife species can therefore be prone to misidentifications leading to both ‘true non-recaptures’ being falsely accepted as recaptures (Type I errors) and ‘true recaptures’ being undetected (Type II errors). Here we present a new likelihood method that allows every pairwise genotype comparison to be evaluated independently. We apply this method to determine the total number of recaptures by estimating and optimising the balance between Type I errors and Type II errors. We show through simulation that the standard error of recapture estimates can be minimised through our algorithms. Interestingly, the precision of our recapture estimates actually improved when we included individuals with missing genotypes, as this increased the number of pairwise comparisons potentially uncovering more recaptures. Simulations suggest that the method is tolerant to per locus error rates of up to 5% per locus and can theoretically work in datasets with as little as 60% of loci genotyped. Our methods can be implemented in datasets where standard mismatch analyses fail to distinguish recaptures. Finally, we show that by assigning a low Type I error rate to our matching algorithms we can generate a dataset of individuals of known capture histories that is suitable for the downstream analysis with traditional mark-recapture methods.  相似文献   

5.
Confirmatory path analysis is a statistical technique to build models of causal hypotheses among variables and test if the data conform with the causal model. However, classical path analysis techniques ignore the nonindependence of observations due to phylogenetic relatedness among species, possibly leading to spurious results. Here, we present a simple method to perform phylogenetic confirmatory path analysis (PPA). We analyzed simulated datasets with varying amounts of phylogenetic signal in the data and a known underlying causal structure linking the traits to estimate Type I error and power. Results show that Type I error for PPA appeared to be slightly anticonservative (range: 0.047–0.072) but path analysis models ignoring phylogenetic signal resulted in much higher Type I error rates, which were positively related to the amount of phylogenetic signal (range: 0.051 for λ= 0 to 0.916 for λ= 1). Further, the power of the test was not compromised when accounting for phylogeny. As an example of the application of PPA, we revisit a study on the correlates of aggressive broodmate competition across seven avian families. The use of PPA allowed us to gain greater insight into the plausible causal paths linking species traits to aggressive broodmate competition.  相似文献   

6.
Managers and policy makers depend on empirical research to guide and support biosecurity measures that mitigate introduced species’ impacts. Research contributing to this knowledge base generally uses null hypothesis significance testing to determine the significance of data patterns. However, reliance on traditional statistical significance testing methods, combined with small effect and sample size and large variability inherent to many impact studies, may obscure effects on native species, communities or ecosystems. This may result in false certainty of no impact. We investigated potential Type II error rates and effect sizes for 31 non-significant empirical evaluations of impact for introduced algal and crustacean species. We found low power consistently led to acceptance of Type II errors at rates 5.6–19 times greater than Type I errors (despite moderate to large effect sizes). Our results suggest that introduced species for which impact studies have statistically non-significant outcomes (often interpreted as “no impact”) may potentially have large impacts that are missed due to small sample or effect sizes and/or high variation. This alarming willingness to “miss” impacts has severe implications for conservation efforts, including under-managing species’ impacts and discounting the costs of Type II errors.  相似文献   

7.
Abstract.— Explaining the uneven distribution of species among lineages is one of the oldest questions in evolution. Proposed correlations between biological traits and species diversity are routinely tested by making comparisons between phylogenetic sister clades. Several recent studies have used nested sister-clade comparisons to test hypotheses linking continuously varying traits, such as body size, with diversity. Evaluating the findings of these studies is complicated because they differ in the index of species richness difference used, the way in which trait differences were treated, and the statistical tests employed. In this paper, we use simulations to compare the performance of four species richness indices, two choices about the branch lengths used to estimate trait values for internal nodes and two statistical tests under a range of models of clade growth and character evolution. All four indices returned appropriate Type I error rates when the assumptions of the method were met and when branch lengths were set proportional to time. Only two of the indices were robust to the different evolutionary models and to different choices of branch lengths and statistical tests. These robust indices had comparable power under one nonnull scenario. Regression through the origin was consistently more powerful than the t -test, and the choice of branch lengths exerts a strong effect on both the validity and power. In the light of our simulations, we re-evaluate the findings of those who have previously used nested comparisons in the context of species richness. We provide a set of simple guidelines to maximize the performance of phylogenetically nested comparisons in tests of putative correlates of species richness.  相似文献   

8.
We consider the dependence of information transfer by neurons on the Type I vs. Type II classification of their dynamics. Our computational study is based on Type I and II implementations of the Morris-Lecar model. It mainly concerns neurons, such as those in the auditory or electrosensory system, which encode band-limited amplitude modulations of a periodic carrier signal, and which fire at random cycles yet preferred phases of this carrier. We first show that the Morris-Lecar model with additive broadband noise ("synaptic noise") can exhibit such firing patterns with either Type I or II dynamics, with or without amplitude modulations of the carrier. We then compare the encoding of band-limited random amplitude modulations for both dynamical types. The comparison relies on a parameter calibration that closely matches firing rates for both models across a range of parameters. In the absence of synaptic noise, Type I performs slightly better than Type II, and its performance is optimal for perithreshold signals. However, Type II performs well over a slightly larger range of inputs, and this range lies mostly in the subthreshold region. Further, Type II performs marginally better than Type I when synaptic noise, which yields more realistic baseline firing patterns, is present in both models. These results are discussed in terms of the tuning and phase locking properties of the models with deterministic and stochastic inputs.  相似文献   

9.
Studies of evolutionary correlations commonly use phylogenetic regression (i.e., independent contrasts and phylogenetic generalized least squares) to assess trait covariation in a phylogenetic context. However, while this approach is appropriate for evaluating trends in one or a few traits, it is incapable of assessing patterns in highly multivariate data, as the large number of variables relative to sample size prohibits parametric test statistics from being computed. This poses serious limitations for comparative biologists, who must either simplify how they quantify phenotypic traits, or alter the biological hypotheses they wish to examine. In this article, I propose a new statistical procedure for performing ANOVA and regression models in a phylogenetic context that can accommodate high‐dimensional datasets. The approach is derived from the statistical equivalency between parametric methods using covariance matrices and methods based on distance matrices. Using simulations under Brownian motion, I show that the method displays appropriate Type I error rates and statistical power, whereas standard parametric procedures have decreasing power as data dimensionality increases. As such, the new procedure provides a useful means of assessing trait covariation across a set of taxa related by a phylogeny, enabling macroevolutionary biologists to test hypotheses of adaptation, and phenotypic change in high‐dimensional datasets.  相似文献   

10.
The aim of the present study was to investigate the daily measured traits milk yield, water intake and dry matter intake with fixed and random regression models added with different error covariance structures. It was analysed whether these models deliver better model fitting in contrast to conventional fixed and random regression models. Furthermore, possible autocorrelation between repeated measures was investigated. The effect of model choice on statistical inference was also tested. Data recording was performed on the Futterkamp dairy research farm of the Chamber of Agriculture of Schleswig-Holstein. A dataset of about 21 000 observations from 178 Holstein cows was used. Average milk yield, water intake and dry matter intake were 34.9, 82.4 and 19.8 kg, respectively. Statistical analysis was performed using different linear mixed models. Lactation number, test day and the parameters to model the function of lactation day were included as fixed effects. Different structures were tested for the residuals; they were compared for their ability to fit the model using the likelihood ratio test, and Akaike's and Bayesian's information criteria. Different autocorrelation patterns were found. Adjacent repeated measures of daily milk yield were highest correlated (p1 = 0.32) in contrast to measures further apart, while for water intake and dry matter intake, the measurements with a lag of two units had the highest correlations with p2 = 0.11 and 0.12. The covariance structure of TOEPLITZ was most suitable to indicate the dependencies of the repeated measures for all traits. Generally, the most complex model, random regression with the additional covariance structure TOEPLITZ(4), provided the lowest information criteria. Furthermore, the model choice influenced the significance values of one fixed effect and therefore the general inference of the data analysis. Thus, the random regression + TOEPLITZ(4) model is recommended for use for the analysis of equally spaced datasets of milk yield, water intake and dry matter intake.  相似文献   

11.
For traits showing correlated evolution, one trait may evolve more slowly than the other, producing evolutionary lag. The species-pairs evolutionary lag test (SPELT) uses an independent contrasts based approach to detect evolutionary lag on a phylogeny. We investigated the statistical performance of SPELT in relation to degree of lag, sample size (species pairs), and strength of association between traits. We simulated trait evolution under two models: one in which trait X changes during speciation and the lagging trait Y catches up as a function of time since speciation; and another in which trait X evolves in a random walk and the lagging trait Y is a function of X at a previous time period. Type I error rates under “no lag” were close to the expected level of 5%, indicating that the method is not prone to false-positives. Simulation results suggest that reasonable statistical power (80%) is reached with around 140 species pairs, although the degree of lag and trait associations had additional influences on power. We applied the method to two datasets and discuss how estimation of a branch length scaling parameter (κ) can be used with SPELT to detect lag.  相似文献   

12.
Methanogens are a phylogenetically diverse group belonging to Euryarchaeota. Previously, phylogenetic approaches using large datasets revealed that methanogens can be grouped into two classes, “Class I” and “Class II”. However, some deep relationships were not resolved. For instance, the monophyly of “Class I” methanogens, which consist of Methanopyrales, Methanobacteriales and Methanococcales, is disputable due to weak statistical support. In this study, we use MSOAR to identify common orthologous genes from eight methanogen species and a Thermococcale species (outgroup), and apply GRAPPA and FastME to compute distance-based gene order phylogeny. The gene order phylogeny supports two classes of methanogens, but it differs from the original classification of methanogens by placing Methanopyrales and Methanobacteriales together with Methanosarcinales in Class II rather than with Methanococcales. This study suggests a new classification scheme for methanogens. In addition, it indicates that gene order phylogeny can complement traditional sequence-based methods in addressing taxonomic questions for deep relationships.  相似文献   

13.
Species are not independent points for comparative analyses because closely related species share more evolutionary history and are therefore more similar to each other than distantly related species. The extent to which independent-contrast analysis reduces type I and type II statistical error in comparison with cross-species analysis depends on the relative branch lengths in the phylogenetic tree: as deeper branches get relatively long, cross-species analyses have more statistical type I and type II error. Phylogenetic trees reconstructed from extant species, under the assumptions of a branching process with speciation (branching) and extinction rates remaining constant through time, will have relatively longer deep branches as the extinction rate increases relative to the speciation rate. We compare the statistical performance of cross-species and independent-contrast analyses with varying relative extinction rates, and conclude that cross-species comparisons have unacceptable statistical performance, particularly when extinction rates are relatively high.  相似文献   

14.
Constraints arise naturally in many scientific experiments/studies such as in, epidemiology, biology, toxicology, etc. and often researchers ignore such information when analyzing their data and use standard methods such as the analysis of variance (ANOVA). Such methods may not only result in a loss of power and efficiency in costs of experimentation but also may result poor interpretation of the data. In this paper we discuss constrained statistical inference in the context of linear mixed effects models that arise naturally in many applications, such as in repeated measurements designs, familial studies and others. We introduce a novel methodology that is broadly applicable for a variety of constraints on the parameters. Since in many applications sample sizes are small and/or the data are not necessarily normally distributed and furthermore error variances need not be homoscedastic (i.e. heterogeneity in the data) we use an empirical best linear unbiased predictor (EBLUP) type residual based bootstrap methodology for deriving critical values of the proposed test. Our simulation studies suggest that the proposed procedure maintains the desired nominal Type I error while competing well with other tests in terms of power. We illustrate the proposed methodology by re-analyzing a clinical trial data on blood mercury level. The methodology introduced in this paper can be easily extended to other settings such as nonlinear and generalized regression models.  相似文献   

15.
16.
Survival rates are a central component of life‐history strategies of large vertebrate species. However, comparative studies seldom investigate interspecific variation in survival rates with respect to other life‐history traits, especially for males. The lack of such studies could be due to the challenges associated with obtaining reliable datasets, incorporating information on the 0–1 probability scale, or dealing with several types of measurement error in life‐history traits, which can be a computationally intensive process that is often absent in comparative studies. We present a quantitative approach using a Bayesian phylogenetically controlled regression with the flexibility to incorporate uncertainty in estimated survival rates and quantitative life‐history traits while considering genetic similarity among species and uncertainty in relatedness. As with any comparative analysis, our approach makes several assumptions regarding the generalizability and comparability of empirical data from separate studies. Our model is versatile in that it can be applied to any species group of interest and include any life‐history traits as covariates. We used an unbiased simulation framework to provide “proof of concept” for our model and applied a slightly richer model to a real data example for pinnipeds. Pinnipeds are an excellent taxonomic group for comparative analysis, but survival rate data are scarce. Our work elucidates the challenges associated with addressing important questions related to broader ecological life‐history patterns and how survival–reproduction trade‐offs might shape evolutionary histories of extant taxa. Specifically, we underscore the importance of having high‐quality estimates of age‐specific survival rates and information on other life‐history traits that reasonably characterize a species for accurately comparing across species.  相似文献   

17.
We present optimized group sequential designs where testing of a single parameter theta is of interest. We require specification of a loss function and of a prior distribution for theta. For the examples presented, we pre-specify Type I and II error rates and minimize the expected sample size over the prior distribution for theta. Minimizing the square of sample size rather than the sample size is found to produce designs with slightly less aggressive interim stopping rules and smaller maximum sample sizes with essentially identical expected sample size. We compare optimal designs using Hwang-Shih-DeCani and Kim-DeMets spending functions to fully optimized designs not restricted by a spending function family. In the examples selected, we also examine when there might be substantial benefit gained by adding an interim analysis. Finally, we provide specific optimal asymmetric spending function designs that should be generally useful and simply applied when a design with minimal expected sample size is desired.  相似文献   

18.
EST clustering error evaluation and correction   总被引:4,自引:0,他引:4  
MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.  相似文献   

19.
Likelihood methods for detecting temporal shifts in diversification rates   总被引:8,自引:0,他引:8  
Maximum likelihood is a potentially powerful approach for investigating the tempo of diversification using molecular phylogenetic data. Likelihood methods distinguish between rate-constant and rate-variable models of diversification by fitting birth-death models to phylogenetic data. Because model selection in this context is a test of the null hypothesis that diversification rates have been constant over time, strategies for selecting best-fit models must minimize Type I error rates while retaining power to detect rate variation when it is present. Here I examine model selection, parameter estimation, and power to reject the null hypothesis using likelihood models based on the birth-death process. The Akaike information criterion (AIC) has often been used to select among diversification models; however, I find that selecting models based on the lowest AIC score leads to a dramatic inflation of the Type I error rate. When appropriately corrected to reduce Type I error rates, the birth-death likelihood approach performs as well or better than the widely used gamma statistic, at least when diversification rates have shifted abruptly over time. Analyses of datasets simulated under a range of rate-variable diversification scenarios indicate that the birth-death likelihood method has much greater power to detect variation in diversification rates when extinction is present. Furthermore, this method appears to be the only approach available that can distinguish between a temporal increase in diversification rates and a rate-constant model with nonzero extinction. I illustrate use of the method by analyzing a published phylogeny for Australian agamid lizards.  相似文献   

20.
Observed variations in rates of taxonomic diversification have been attributed to a range of factors including biological innovations, ecosystem restructuring, and environmental changes. Before inferring causality of any particular factor, however, it is critical to demonstrate that the observed variation in diversity is significantly greater than that expected from natural stochastic processes. Relative tests that assess whether observed asymmetry in species richness between sister taxa in monophyletic pairs is greater than would be expected under a symmetric model have been used widely in studies of rate heterogeneity and are particularly useful for groups in which paleontological data are problematic. Although one such test introduced by Slowinski and Guyer a decade ago has been applied to a wide range of clades and evolutionary questions, the statistical behavior of the test has not been examined extensively, particularly when used with Fisher's procedure for combining probabilities to analyze data from multiple independent taxon pairs. Here, certain pragmatic difficulties with the Slowinski-Guyer test are described, further details of the development of a recently introduced likelihood-based relative rates test are presented, and standard simulation procedures are used to assess the behavior of the two tests in a range of situations to determine: (1) the accuracy of the tests' nominal Type I error rate; (2) the statistical power of the tests; (3) the sensitivity of the tests to inclusion of taxon pairs with few species; (4) the behavior of the tests with datasets comprised of few taxon pairs; and (5) the sensitivity of the tests to certain violations of the null model assumptions. Our results indicate that in most biologically plausible scenarios, the likelihood-based test has superior statistical properties in terms of both Type I error rate and power, and we found no scenario in which the Slowinski-Guyer test was distinctly superior, although the degree of the discrepancy varies among the different scenarios. The Slowinski-Guyer test tends to be much more conservative (i.e., very disinclined to reject the null hypothesis) in datasets with many small pairs. In most situations, the performance of both the likelihood-based test and particularly the Slowinski-Guyer test improve when pairs with few species are excluded from the computation, although this is balanced against a decline in the tests' power and accuracy as fewer pairs are included in the dataset. The performance of both tests is quite poor when they are applied to datasets in which the taxon sizes do not conform to the distribution implied by the usual null model. Thus, results of analyses of taxonomic rate heterogeneity using the Slowinski-Guyer test can be misleading because the test's ability to reject the null hypothesis (equal rates) when true is often inaccurate and its ability to reject the null hypothesis when the alternative (unequal rates) is true is poor, particularly when small taxon pairs are included. Although not always perfect, the likelihood-based test provides a more accurate and powerful alternative as a relative rates test.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号