首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The amount of missing data in many contemporary phylogenetic analyses has substantially increased relative to previous norms, particularly in supermatrix studies that compile characters from multiple previous analyses. In such cases the missing data are non‐randomly distributed and usually present in all partitions (i.e. groups of characters) sampled. Parametric methods often provide greater resolution and support than parsimony in such cases, yet this may be caused by extrapolation of branch lengths from one partition to another. In this study I use contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non‐random distributions of missing data distributed across all partitions. I do so using a systematic exploration of alternative seven‐taxon tree topologies and distributions of missing data in two partitions to demonstrate that these likelihood‐based artefacts may occur frequently and are not shared by parsimony. I also demonstrate that Bayesian Markov chain Monte Carlo analysis is more robust to these artefacts than is likelihood. © The Willi Hennig Society 2011.  相似文献   

2.
ANOTHER MONOPHYLY INDEX: REVISITING THE JACKKNIFE   总被引:1,自引:0,他引:1  
Abstract — Randomization routines have quickly gained wide usage in phylogenetic systematies. Introduced a decade ago, the jackknife has rarely been applied in cladistic methodology. This data resampling technique was re-investigated here as a means to discover the effect that taxon removal may have on the stability of the results obtained from parsimony analyses. This study shows that the removal of even a single taxon in an analysis can cause a solution of relatively few multiple equally parsimonious trees in an inclusive matrix to result in hundreds of equally parsimonious trees with the single removal of a taxon. On the other hand, removal of other taxa can stabilize the results to fewer trees. An index of clade stability, the Jackknife Monophyly Index (JMI) is developed which, like the bootstrap, applies a value to each clade according to its frequency of occurrence in jackknife pseudoreplicates. Unlike the bootstrap and earlier application of the jackknife, alternative suboptimal hypotheses are not forwarded by this method. Only those clades in the most parsimonious tree(s) are given JMI values. The behaviour of this index is investigated both in relation to a hypothetical and a real data set, as well as how it performs in comparison to the bootstrap. The JMI is found to not be influenced by uninformative characters or relative synapomorphy number, unlike the bootstrap.  相似文献   

3.
The kinesin superfamily across eukaryotes was used to examine how incorporation of gap characters scored from conserved regions shared by all members of a gene family and incorporation of amino acid and gap characters scored from lineage‐specific regions affect gene‐tree inference of the gene family as a whole. We addressed these two questions in the context of two different densities of sequence sampling, four alignment programs, and two methods of tree construction. Taken together, our findings suggest the following. First, gap characters should be incorporated into gene‐tree inference, even for divergent sequences. Second, gene regions that are not conserved among all or most sequences sampled should not be automatically discarded without evaluation of potential phylogenetic signal that may be contained in gap and/or sequence characters. Third, among the four alignment programs evaluated using their default alignment parameters, Clustal may be expected to output alignments that result in the greatest gene‐tree resolution and support. Yet, this high resolution and support should be regarded as optimistic, rather than conservative, estimates. Fourth, this same conclusion regarding resolution and support holds for Bayesian gene‐tree analyses relative to parsimony‐jackknife gene‐tree analyses. We suggest that a more conservative approach, such as aligning the sequences using DIALIGN‐T or MAFFT, analyzing the appropriate characters using parsimony, and assessing branch support using the jackknife, is more appropriate for inferring gene trees of divergent gene families. © The Willi Hennig Society 2007.  相似文献   

4.
Using jackknife methods for estimating the parameter in dilution series   总被引:1,自引:0,他引:1  
R J Does  L W Strijbosch  W Albers 《Biometrics》1988,44(4):1093-1102
Dilution assays are quantal dose-response assays that detect a positive or negative response in each individual culture within groups of replicate cultures that vary in the dose of cells/organisms tested. We propose three jackknife versions of the maximum likelihood estimator of the unknown parameter, i.e., the frequency of a well-defined cell within the context of limiting dilution assays or the density of organisms within the context of serial dilution assays. The methods have been evaluated with artificial data from extensive Monte Carlo experiments. As a result of these experiments and theoretical considerations, the jackknife version based on deleting one individual culture at a time is proposed as the statistical procedure of choice. The next best method is the jackknife version based on leaving out the same replicate from each of the culture groups at a time.  相似文献   

5.
In this study, we used an empirical example based on 100 mitochondrial genomes from higher teleost fishes to compare the accuracy of parsimony-based jackknife values with Bayesian support values. Phylogenetic analyses of 366 partitions, using differential taxon and character sampling from the entire data matrix of 100 taxa and 7,990 characters, were performed for both phylogenetic methods. The tree topology and branch-support values from each partition were compared with the tree inferred from all taxa and characters. Using this approach, we quantified the accuracy of the branch-support values assigned by the jackknife and Bayesian methods, with respect to each of 15 basal clades. In comparing the jackknife and Bayesian methods, we found that (1) both measures of support differ significantly from an ideal support index; (2) the jackknife underestimated support values; (3) the Bayesian method consistently overestimated support; (4) the magnitude by which Bayesian values overestimate support exceeds the magnitude by which the jackknife underestimates support; and (5) both methods performed poorly when taxon sampling was increased and character sampling was not increases. These results indicate that (1) the higher Bayesian support values are inappropriate (in magnitude), and (2) Bayesian support values should not be interpreted as probabilities that clades are correctly resolved. We advocate the continued use of the relatively conservative bootstrap and jackknife approaches to estimating branch support rather than the more extreme overestimates provided by the Markov Chain Monte Carlo-based Bayesian methods.  相似文献   

6.
Several research fields frequently deal with the analysis of diverse classification results of the same entities. This should imply an objective detection of overlaps and divergences between the formed clusters. The congruence between classifications can be quantified by clustering agreement measures, including pairwise agreement measures. Several measures have been proposed and the importance of obtaining confidence intervals for the point estimate in the comparison of these measures has been highlighted. A broad range of methods can be used for the estimation of confidence intervals. However, evidence is lacking about what are the appropriate methods for the calculation of confidence intervals for most clustering agreement measures. Here we evaluate the resampling techniques of bootstrap and jackknife for the calculation of the confidence intervals for clustering agreement measures. Contrary to what has been shown for some statistics, simulations showed that the jackknife performs better than the bootstrap at accurately estimating confidence intervals for pairwise agreement measures, especially when the agreement between partitions is low. The coverage of the jackknife confidence interval is robust to changes in cluster number and cluster size distribution.  相似文献   

7.
Analysis of quantitative genetics in natural populations has been hindered by computational and methodological problems in statistical analysis. We developed and validated a jackknife procedure to test for existence of broad sense heritabilities and dominance or maternal effects influencing quantitative characters in Impatiens capensis. Early life cycle characters showed evidence of dominance and/or maternal effects, while later characters exhibited predominantly environmental variation. Monte Carlo simulations demonstrate that these jackknife tests of variance components are extremely robust to heterogeneous error variances. Statistical methods from human genetics provide evidence for either a major locus influencing germination date, or genes that affect phenotypic variability per se. We urge explicit consideration of statistical behavior of estimation and testing procedures for proper biological interpretation of statistical results.  相似文献   

8.
Summary The validity of limiting dilution assays can be compromised or negated by the use of statistical methodology which does not consider all issues surrounding the biological process. This study critically evaluates statistical methods for estimating the mean frequency of responding cells in multiple sample limiting dilution assays. We show that methods that pool limiting dilution assay data, or samples, are unable to estimate the variance appropriately. In addition, we use Monte Carlo simulations to evaluate an unweighted mean of the maximum likelihood estimator, an unweighted mean based on the jackknife estimator, and a log transform of the maximum likelihood estimator. For small culture replicate size, the log transform outperforms both unweighted mean procedures. For moderate culture replicate size, the unweighted mean based on the jackknife produces the most acceptable results. This study also addresses the important issue of experimental design in multiple sample limiting dilution assays. In particular, we demonstrate that optimization of multiple sample limiting dilution assays is achieved by increasing the number of biological samples at the expense of repeat cultures.  相似文献   

9.
A simple, closed-form jackknife estimate of the actual variance of the Mann-Whitney-Wilcoxon statistic, as opposed to the standard permutational variance under the test's null hypothesis has been derived which permits avoiding anticonservative performance in the presence of heterosce-dasticity. The formulation given allows modifications of the exponential scores test, of censored data tests by Gehan (1965), Peto & Peto (1977) and Prentice (1978), of tests for monotonic τ association by Kendall (1962) and for tests of ordered k-sample hypotheses. A Monte Carlo study supports recommendations for the jackknife procedures, but also shows their limited advantages in exponential scores and censored data versions. Thus, the paper extends results by Fligner & Policello (1981).  相似文献   

10.
The relationship between phylogenetic accuracy and congruence between data partitions collected from the same taxa was explored for mitochondrial DNA sequences from two well-supported vertebrate phylogenies. An iterative procedure was adopted whereby accuracy, phylogenetic signal, and congruence were measured before and after modifying a simple reconstruction model, equally weighted parsimony. These modifications included transversion parsimony, successive weighting, and six-parameter parsimony. For the data partitions examined, there is a generally positive relationship between congruence and phylogenetic accuracy. If congruence increased without decreasing resolution or phylogenetic signal, this increased congruence was a good predictor of accuracy. If congruence increased as a result of poor resolution, the degree of congruence was not a good predictor of accuracy. For all sets of data partitions, six-parameter parsimony methods show a consistently positive relationship between congruence and accuracy. Unlike successive weighting, six-parameter parsimony methods were not strongly influenced by the starting tree.  相似文献   

11.
The intra-phyletic relationships of sipunculan worms were analyzed based on DNA sequence data from four gene regions and 58 morphological characters. Initially we analyzed the data under direct optimization using parsimony as optimality criterion. An implied alignment resulting from the direct optimization analysis was subsequently utilized to perform a Bayesian analysis with mixed models for the different data partitions. For this we applied a doublet model for the stem regions of the 18S rRNA. Both analyses support monophyly of Sipuncula and most of the same clades within the phylum. The analyses differ with respect to the relationships among the major groups but whereas the deep nodes in the direct optimization analysis generally show low jackknife support, they are supported by 100% posterior probability in the Bayesian analysis. Direct optimization has been useful for handling sequences of unequal length and generating conservative phylogenetic hypotheses whereas the Bayesian analysis under mixed models provided high resolution in the basal nodes of the tree.  相似文献   

12.
A recent article published in Cladistics is critical of a number of heuristic methods for phylogenetic inference based on parsimony scores. One of my papers is among those criticized, and I would appreciate the opportunity to make a public response. The specific criticism is that I have re‐invented an algorithm for economizing parsimony calculations on trees that differ by a subtree pruning and regrafting (SPR) rearrangement. This criticism is justified, and I apologize for incorrectly claiming originality for my presentation of this algorithm. However, I would like to clarify the intent of my paper, if I can do so without detracting from the sincerity of my apology. My paper is not about that algorithm, nor even primarily about parsimony. Rather, it is about a novel strategy for Markov chain Monte Carlo (MCMC) sampling in a state space consisting of trees. The sampler involves drawing from conditional distributions over sets of trees: a Gibbs‐like strategy that had not previously been used to sample tree‐space. I would like to see this technique incorporated into MCMC samplers for phylogenetics, as it may have advantages over commonly used Metropolis‐like strategies. I have recently used it to sample phylogenies of a biological invasion, and I am finding many applications for it in agent‐based Bayesian ecological modelling. It is thus my contention that my 2005 paper retains substantial value.  相似文献   

13.
In the context of right-censored and interval-censored data, we develop asymptotic formulas to compute pseudo-observations for the survival function and the restricted mean survival time (RMST). These formulas are based on the original estimators and do not involve computation of the jackknife estimators. For right-censored data, Von Mises expansions of the Kaplan–Meier estimator are used to derive the pseudo-observations. For interval-censored data, a general class of parametric models for the survival function is studied. An asymptotic representation of the pseudo-observations is derived involving the Hessian matrix and the score vector. Theoretical results that justify the use of pseudo-observations in regression are also derived. The formula is illustrated on the piecewise-constant-hazard model for the RMST. The proposed approximations are extremely accurate, even for small sample sizes, as illustrated by Monte Carlo simulations and real data. We also study the gain in terms of computation time, as compared to the original jackknife method, which can be substantial for a large dataset.  相似文献   

14.
Several stochastic models of character change, when implemented in a maximum likelihood framework, are known to give a correspondence between the maximum parsimony method and the method of maximum likelihood. One such model has an independently estimated branch-length parameter for each site and each branch of the phylogenetic tree. This model--the no-common-mechanism model--has many parameters, and, in fact, the number of parameters increases as fast as the alignment is extended. We take a Bayesian approach to the no-common-mechanism model and place independent gamma prior probability distributions on the branch-length parameters. We are able to analytically integrate over the branch lengths, and this allowed us to implement an efficient Markov chain Monte Carlo method for exploring the space of phylogenetic trees. We were able to reliably estimate the posterior probabilities of clades for phylogenetic trees of up to 500 sequences. However, the Bayesian approach to the problem, at least as implemented here with an independent prior on the length of each branch, does not tame the behavior of the branch-length parameters. The integrated likelihood appears to be a simple rescaling of the parsimony score for a tree, and the marginal posterior probability distribution of the length of a branch is dependent upon how the maximum parsimony method reconstructs the characters at the interior nodes of the tree. The method we describe, however, is of potential importance in the analysis of morphological character data and also for improving the behavior of Markov chain Monte Carlo methods implemented for models in which sites share a common branch-length parameter.  相似文献   

15.
Transformation and computer intensive methods such as the jackknife and bootstrap are applied to construct accurate confidence intervals for the ratio of specific occurrence/exposure rates, which are used to compare the mortality (or survival) experience of individuals in two study populations. Monte Carlo simulations are employed to compare the performances of the proposed confidence intervals when sample sizes are small or moderate.  相似文献   

16.
Accurate estimation of the phases and amplitude of the endogenous circadian pacemaker from constant-routine core-temperature series is crucial for making inferences about the properties of the human biological clock from data collected under this protocol. This paper presents a set of statistical methods based on a harmonic-regression-plus-correlated-noise model for estimating the phases and the amplitude of the endogenous circadian pacemaker from constant-routine core-temperature data. The methods include a Bayesian Monte Carlo procedure for computing the uncertainty in these circadian functions. We illustrate the techniques with a detailed study of a single subject's core-temperature series and describe their relationship to other statistical methods for circadian data analysis. In our laboratory, these methods have been successfully used to analyze more than 300 constant routines and provide a highly reliable means of extracting phase and amplitude information from core-temperature data.  相似文献   

17.
The problem of character weighting in cladistic analysis is revisited. The finding that, in large molecular data sets, removal of third positions (with more homoplasy) decreases the number of well supported groups has been interpreted by some authors as indicating that weighting methods are unjustified. Two arguments against that interpretation are advanced. Characters that collectively determine few well‐supported groups may be highly reliable when taken individually (as shown by specific examples), so that inferring greater reliability for sets of characters that lead to an increase in jackknife frequencies may not always be warranted. But even if changes in jackknife frequencies can be used to infer reliability, we demonstrate that jackknife frequencies in large molecular data sets are actually improved when downweighting characters according to their homoplasy but using properly rescaled functions (instead of the very strong standard functions, or the extreme of inclusion/exclusion); this further weakens the argument that downweighting homoplastic characters is undesirable. Last, we show that downweighting characters according to their homoplasy (using standard homoplasy‐weighting methods) on 70 morphological data sets (with 50–170 taxa), produces clear increases in jackknife frequencies. The results obtained under homoplasy weighting also appear more stable than results under equal weights: adding either taxa or characters, when weighting against homoplasy, produced results more similar to original analyses (i.e., with larger numbers of groups that continue being supported after addition of taxa or characters), with similar or lower error rates (i.e., proportion of groups recovered that subsequently turn out to be incorrect). Therefore, the same argument that had been advanced against homoplasy weighting in the case of large molecular data sets is an argument in favor of such weighting in the case of morphological data sets. © The Willi Hennig Society 2008.  相似文献   

18.
RAPD problems in phylogenetics   总被引:1,自引:0,他引:1  
This paper is intended to clarify some of the questions related with the application of RAPD for phylogenetic reconstruction purposes. Using different specimens of mammals selected across various taxonomic levels, we assessed the validity of RAPD to recover a known phylogeny, using four distance coefficients (simple matching, Russell & Rao, Jaccard, and Dice). We assessed the minimum number of primers required in the computations to obtain stable results in terms of distance estimates and/or topologies of the derived trees. These results based on distance methods were compared with those obtained with parsimony analyses of RAPD markers. Both approaches have shown to be equally problematic for comparing taxa above the family level. On the basis of these comparisons among various indices and methods, we recommend the use of Jaccard or Dice coefficients, with no less than twelve primers. We also suggest validation of any phylogeny based on RAPD data with a resampling procedure (i.e. the bootstrap or the jackknife) before any sound conclusion can be drawn.  相似文献   

19.
The epidemiologic concept of the adjusted attributable risk is a useful approach to quantitatively describe the importance of risk factors on the population level. It measures the proportional reduction in disease probability when a risk factor is eliminated from the population, accounting for effects of confounding and effect-modification by nuisance variables. The computation of asymptotic variance estimates for estimates of the adjusted attributable risk is often done by applying the delta method. Investigations on the delta method have shown, however, that the delta method generally tends to underestimate the standard error, leading to biased confidence intervals. We compare confidence intervals for the adjusted attributable risk derived by applying computer intensive methods like the bootstrap or jackknife to confidence intervals based on asymptotic variance estimates using an extensive Monte Carlo simulation and within a real data example from a cohort study in cardiovascular disease epidemiology. Our results show that confidence intervals based on bootstrap and jackknife methods outperform intervals based on asymptotic theory. Best variants of computer intensive confidence intervals are indicated for different situations.  相似文献   

20.
Fluorescence correlation spectroscopy (FCS) methods are powerful tools for unveiling the dynamical organization of cells. For simple cases, such as molecules passively moving in a homogeneous media, FCS analysis yields analytical functions that can be fitted to the experimental data to recover the phenomenological rate parameters. Unfortunately, many dynamical processes in cells do not follow these simple models, and in many instances it is not possible to obtain an analytical function through a theoretical analysis of a more complex model. In such cases, experimental analysis can be combined with Monte Carlo simulations to aid in interpretation of the data. In response to this need, we developed a method called FERNET (Fluorescence Emission Recipes and Numerical routines Toolkit) based on Monte Carlo simulations and the MCell-Blender platform, which was designed to treat the reaction-diffusion problem under realistic scenarios. This method enables us to set complex geometries of the simulation space, distribute molecules among different compartments, and define interspecies reactions with selected kinetic constants, diffusion coefficients, and species brightness. We apply this method to simulate single- and multiple-point FCS, photon-counting histogram analysis, raster image correlation spectroscopy, and two-color fluorescence cross-correlation spectroscopy. We believe that this new program could be very useful for predicting and understanding the output of fluorescence microscopy experiments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号