首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

2.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

3.
In this paper, we consider the problem of estimating the size N of a finite and closed population, using data obtained from capture-recapture experiments. By defining an appropriate model, we investigate the maximum of the likelihood, of the profile likelihood and of an orthogonal adjusted profile likelihood (COX and REID, 1987) function. We show that they all may present infinity as the maximum likelihood estimator of N. This seems to be a characteristic of the likelihood approach in this problem. Further, we present a Bayesian approach with minimum prior information as a way of countering this difficulty. Exact analytical expressions for the posterior modes are also obtained.  相似文献   

4.
M. K. Kuhner  J. Yamato    J. Felsenstein 《Genetics》1995,140(4):1421-1430
We present a new way to make a maximum likelihood estimate of the parameter 4N(e)μ (effective population size times mutation rate per site, or θ) based on a population sample of molecular sequences. We use a Metropolis-Hastings Markov chain Monte Carlo method to sample genealogies in proportion to the product of their likelihood with respect to the data and their prior probability with respect to a coalescent distribution. A specific value of θ must be chosen to generate the coalescent distribution, but the resulting trees can be used to evaluate the likelihood at other values of θ, generating a likelihood curve. This procedure concentrates sampling on those genealogies that contribute most of the likelihood, allowing estimation of meaningful likelihood curves based on relatively small samples. The method can potentially be extended to cases involving varying population size, recombination, and migration.  相似文献   

5.
Clinical trials with Poisson distributed count data as the primary outcome are common in various medical areas such as relapse counts in multiple sclerosis trials or the number of attacks in trials for the treatment of migraine. In this article, we present approximate sample size formulae for testing noninferiority using asymptotic tests which are based on restricted or unrestricted maximum likelihood estimators of the Poisson rates. The Poisson outcomes are allowed to be observed for unequal follow‐up schemes, and both the situations that the noninferiority margin is expressed in terms of the difference and the ratio are considered. The exact type I error rates and powers of these tests are evaluated and the accuracy of the approximate sample size formulae is examined. The test statistic using the restricted maximum likelihood estimators (for the difference test problem) and the test statistic that is based on the logarithmic transformation and employs the maximum likelihood estimators (for the ratio test problem) show favorable type I error control and can be recommended for practical application. The approximate sample size formulae show high accuracy even for small sample sizes and provide power values identical or close to the aspired ones. The methods are illustrated by a clinical trial example from anesthesia.  相似文献   

6.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

7.
Many methods are available for estimating ancestral values of continuous characteristics, but little is known about how well these methods perform. Here we compare six methods: linear parsimony, squared-change parsimony, one-parameter maximum likelihood (Brownian motion), two-parameter maximum likelihood (Ornstein-Uhlenbeck process), and independent comparisons with and without branch-length information. We apply these methods to data from 20 morphospecies of Pleistocene planktic Foraminifera in order to estimate ancestral size and shape variables, and compare these estimates with measurements on fossils close to the phylogenetic position of 13 ancestors. No method produced accurate estimates for any variable: estimates were consistently less good as predictors of the observed values than were the averages of the observed values. The two-parameter maximum-likelihood model consistently produces the most accurate size estimates overall. Estimation of ancestral sizes is confounded by an evolutionary trend towards increasing size. Shape showed no trend but was still estimated very poorly: we consider possible reasons. We discuss the implications of our results for the use of estimates of ancestral characteristics.  相似文献   

8.
Marginal regression analysis of a multivariate binary response   总被引:2,自引:0,他引:2  
We propose the use of the mean parameter for regression analysisof a multivariate binary response. We model the associationusing dependence ratios defined in terms of the mean parameter,the components of which are the joint success probabilitiesof all orders. This permits flexible modelling of higher-orderassociations, using maximum likelihood estimation. We reanalysetwo data sets, one with variable cluster size and the othera longitudinal data set with constant cluster size.  相似文献   

9.
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.  相似文献   

10.
Summary .   Motivated by the spatial modeling of aberrant crypt foci (ACF) in colon carcinogenesis, we consider binary data with probabilities modeled as the sum of a nonparametric mean plus a latent Gaussian spatial process that accounts for short-range dependencies. The mean is modeled in a general way using regression splines. The mean function can be viewed as a fixed effect and is estimated with a penalty for regularization. With the latent process viewed as another random effect, the model becomes a generalized linear mixed model. In our motivating data set and other applications, the sample size is too large to easily accommodate maximum likelihood or restricted maximum likelihood estimation (REML), so pairwise likelihood, a special case of composite likelihood, is used instead. We develop an asymptotic theory for models that are sufficiently general to be used in a wide variety of applications, including, but not limited to, the problem that motivated this work. The splines have penalty parameters that must converge to zero asymptotically: we derive theory for this along with a data-driven method for selecting the penalty parameter, a method that is shown in simulations to improve greatly upon standard devices, such as likelihood crossvalidation. Finally, we apply the methods to the data from our experiment ACF. We discover an unexpected location for peak formation of ACF.  相似文献   

11.
MOTIVATION: The computation of large phylogenetic trees with statistical models such as maximum likelihood or bayesian inference is computationally extremely intensive. It has repeatedly been demonstrated that these models are able to recover the true tree or a tree which is topologically closer to the true tree more frequently than less elaborate methods such as parsimony or neighbor joining. Due to the combinatorial and computational complexity the size of trees which can be computed on a Biologist's PC workstation within reasonable time is limited to trees containing approximately 100 taxa. RESULTS: In this paper we present the latest release of our program RAxML-III for rapid maximum likelihood-based inference of large evolutionary trees which allows for computation of 1.000-taxon trees in less than 24 hours on a single PC processor. We compare RAxML-III to the currently fastest implementations for maximum likelihood and bayesian inference: PHYML and MrBayes. Whereas RAxML-III performs worse than PHYML and MrBayes on synthetic data it clearly outperforms both programs on all real data alignments used in terms of speed and final likelihood values. Availability SUPPLEMENTARY INFORMATION: RAxML-III including all alignments and final trees mentioned in this paper is freely available as open source code at http://wwwbode.cs.tum/~stamatak CONTACT: stamatak@cs.tum.edu.  相似文献   

12.
Summary The validity of limiting dilution assays can be compromised or negated by the use of statistical methodology which does not consider all issues surrounding the biological process. This study critically evaluates statistical methods for estimating the mean frequency of responding cells in multiple sample limiting dilution assays. We show that methods that pool limiting dilution assay data, or samples, are unable to estimate the variance appropriately. In addition, we use Monte Carlo simulations to evaluate an unweighted mean of the maximum likelihood estimator, an unweighted mean based on the jackknife estimator, and a log transform of the maximum likelihood estimator. For small culture replicate size, the log transform outperforms both unweighted mean procedures. For moderate culture replicate size, the unweighted mean based on the jackknife produces the most acceptable results. This study also addresses the important issue of experimental design in multiple sample limiting dilution assays. In particular, we demonstrate that optimization of multiple sample limiting dilution assays is achieved by increasing the number of biological samples at the expense of repeat cultures.  相似文献   

13.
A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.  相似文献   

14.
A simulation study was performed to investigate the effects of missing values, typing errors and distorted segregation ratios in molecular marker data on the construction of genetic linkage maps, and to compare the performance of three locus-ordering criteria (weighted least squares, maximum likelihood and minimum sum of adjacent recombination fractions criteria) in the presence of such effects. The study was based upon three linkage groups of 10 loci at 2, 6, and 10 cM spacings simulated from a doubled-haploid population of size 150. Criteria performance were assessed using the number of replicates with correctly estimated orders, the mean rank correlation between the estimated and the true order and the mean total map length. Bootstrap samples from replicates in the maximum likelihood analysis produced a measure of confidence in the estimated locus order. The effects of missing values and/or typing errors in the data are to reduce the proportion of correctly ordered maps, and this problem worsens as the distances between loci decreases. The maximum likelihood criterion is most successful at ordering loci correctly, but gives estimated map lengths, which are substantially inflated when typing errors are present. The presence of missing values in the data produces shorter map lengths for more widely spaced markers, especially under the weighted least-squares criterion. Overall, the presence of segregation distortion has little effect on this population.  相似文献   

15.
It is known that under neutral mutation at a known mutation rate a sample of nucleotide sequences, within which there is assumed to be no recombination, allows estimation of the effective size of an isolated population. This paper investigates the case of very long sequences, where each pair of sequences allows a precise estimate of the divergence time of those two gene copies. The average divergence time of all pairs of copies estimates twice the effective population number and an estimate can also be derived from the number of segregating sites. One can alternatively estimate the genealogy of the copies. This paper shows how a maximum likelihood estimate of the effective population number can be derived from such a genealogical tree. The pairwise and the segregating sites estimates are shown to be much less efficient than this maximum likelihood estimate, and this is verified by computer simulation. The result implies that there is much to gain by explicitly taking the tree structure of these genealogies into account.  相似文献   

16.
Cope's rule defines lineages that trend towards an increase in body size through geological time. The trilobite family Asaphidae is one of the most diverse of the class Trilobita and ranges from the Upper Cambrian through to the Upper Ordovician. The group is one trilobite clades that displays a large size range and contains several of the largest trilobite species. Reduced major axis correlations between the lengths of cephala and pygidia and the total sagittal length of complete individuals have high support and were used to standardise all incomplete specimens to total axial length. Phylogenetic studies into Cope's rule tend to use supertrees, composite trees or a single tree selected through a fit criterion. Here, for the first time, all trees recovered from a maximum parsimony analysis were analysed equally. Maximum likelihood was used to fit four evolutionary models: random walk, directional, Ornstein–Uhlenbeck (evolution towards an adaptive optimum) and stasis. These were compared equally using Akaike weights. Fitting of evolutionary models by maximum likelihood supports stasis as consistently the most likely model across all trees with low support for directionality.  相似文献   

17.
This paper introduces a dose-response model for teratological quantal response data where the probability of response for an offspring from a female at a given dose varies with the litter size. The maximum likelihood estimators for the parameters of the model are given as the solution of a nonlinear iterative algorithm. Two methods of low-dose extrapolation are presented, one based on the litter size distribution and the other a conservative method. The resulting procedures are then applied to a teratological data set from the literature.  相似文献   

18.
W W Hauck 《Biometrics》1984,40(4):1117-1123
The finite-sample properties of various point estimators of a common odds ratio from multiple 2 X 2 tables have been considered in a number of simulation studies. However, the conditional maximum likelihood estimator has received only limited attention. That omission is partially rectified here for cases of relatively small numbers of tables and moderate to large within-table sample sizes. The conditional maximum likelihood estimator is found to be superior to the unconditional maximum likelihood estimator, and equal or superior to the Mantel-Haenszel estimator in both bias and precision.  相似文献   

19.
Maximum likelihood estimation of the model parameters for a spatial population based on data collected from a survey sample is usually straightforward when sampling and non-response are both non-informative, since the model can then usually be fitted using the available sample data, and no allowance is necessary for the fact that only a part of the population has been observed. Although for many regression models this naive strategy yields consistent estimates, this is not the case for some models, such as spatial auto-regressive models. In this paper, we show that for a broad class of such models, a maximum marginal likelihood approach that uses both sample and population data leads to more efficient estimates since it uses spatial information from sampled as well as non-sampled units. Extensive simulation experiments based on two well-known data sets are used to assess the impact of the spatial sampling design, the auto-correlation parameter and the sample size on the performance of this approach. When compared to some widely used methods that use only sample data, the results from these experiments show that the maximum marginal likelihood approach is much more precise.  相似文献   

20.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号