首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 734 毫秒
1.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

2.
Mutation primarily occurs when cells divide and it is highly desirable to have knowledge of the rate of mutations for each of the cell divisions during individual development. Recently, recessive lethal or nearly lethal mutations which were observed in a large mutation accumulation experiment using Drosophila melanogaster suggested that mutation rates vary significantly during the germline development of male Drosophila melanogaster. The analysis of the data was based on a combination of the maximum likelihood framework with numerical assistance from a newly developed coalescent algorithm. Although powerful, the likelihood based framework is computationally highly demanding which limited the scope of the inference. This paper presents a new estimation approach by minimizing chi-square statistics which is asymptotically consistent with the maximum likelihood method. When only at most one mutation in a family is considered the minimization of chi-square is simplified to a constrained weighted minimum least square method which can be solved easily by optimization theory. The new methods effectively eliminates the computational bottleneck of the likelihood. Reanalysis of the published Drosophila melanogaster mutation data results in similar estimates of mutation rates. The new method is also expected to be applicable to the analysis of mutation data generated by next-generation sequencing technology.  相似文献   

3.
Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.  相似文献   

4.
We evaluate the performance of maximum likelihood (ML) analysis of allele frequency data in a linear array of populations. The parameters are a mutation rate and either the dispersal rate in a stepping stone model or a dispersal rate and a scale parameter in a geometric dispersal model. An approximate procedure known as maximum product of approximate conditional (PAC) likelihood is found to perform as well as ML. Mis-specification biases may occur because the importance sampling algorithm is formally defined in term of mutation and migration rates scaled by the total size of the population, and this size may differ widely in the statistical model and in reality. As could be expected, ML generally performs well when the statistical model is correctly specified. Otherwise, mutation rate estimates are much closer to mutation probability scaled by number of demes in the statistical model than scaled by number of demes in reality when mutation probability is high and dispersal is most limited. This mis-specification bias actually has practical benefits. However, opposite results are found in opposite conditions. Migration rate estimates show roughly similar trends, but they may not always be easily interpreted as low-bias estimates of dispersal rate under any scaling. Estimation of the dispersal scale parameter is also affected by mis-specification of the number of demes, and the different biases compensate each other in such a way that good estimation of the so-called neighborhood size (or more precisely the product of population density and mean-squared parent-offspring dispersal distance) is achieved. Results congruent with these findings are found in an application to a damselfly data set.  相似文献   

5.
6.
Sexual dimorphism describes substantial differences between male and female phenotypes. In spiders, sexual dimorphism research almost exclusively focuses on size, and recent studies have recovered steady evolutionary size increases in females, and independent evolutionary size changes in males. Their discordance is due to negative allometric size patterns caused by different selection pressures on male and female sizes (converse Rensch's rule). Here, we investigated macroevolutionary patterns of sexual size dimorphism (SSD) in Argiopinae, a global lineage of orb‐weaving spiders with varying degrees of SSD. We devised a Bayesian and maximum‐likelihood molecular species‐level phylogeny, and then used it to reconstruct sex‐specific size evolution, to examine general hypotheses and different models of size evolution, to test for sexual size coevolution, and to examine allometric patterns of SSD. Our results, revealing ancestral moderate sizes and SSD, failed to reject the Brownian motion model, which suggests a nondirectional size evolution. Contrary to predictions, male and female sizes were phylogenetically correlated, and SSD evolution was isometric. We interpret these results to question the classical explanations of female‐biased SSD via fecundity, gravity, and differential mortality. In argiopines, SSD evolution may be driven by these or additional selection mechanisms, but perhaps at different phylogenetic scales.  相似文献   

7.
最近,人们突变积累实验(MA)中测定有害基因突变(DGM)的兴趣大增。在MA实验中有两种常见的DGM估计方法(极大似然法ML和距法MM),依靠计算机模拟和处理真实数据的应用软件来比较这两种方法。结论是:ML法难于得到最大似然估计(MLEs),所以ML法不如MM法估计有效;即使MLEs可得,也因其具严重的微样误差(据偏差和抽样差异)而产生估计偏差;似然函数曲线较平坦而难于区分高峰态和低峰态的分布。  相似文献   

8.
Sexual size dimorphisms (SSDs) in body size are expected to evolve when selection on female and male sizes favors different optima. Many insects show female-biased SSD that is usually explained by the strong fecundity advantage of larger females. However, in some insects, males are as large as or even larger than females. The seed bug Togo hemipterus (Scott) also exhibits a male-biased SSD in body size. Many studies that have clarified the evolutionary causes of male-biased SSD have focused only on male advantages due to male–male competition. To clarify the evolutionary causes of male-biased SSD in body size, we should examine the degree of not only the sexual selection that favors larger males but also natural selection that is acting on female fecundity. The obtained results, which showed higher mating acceptance rates to larger males, implies that females prefer larger males. No significant relationship was detected between female body size and fecundity; body size effects on female fecundity were weak or undetectable. We conclude that male-biased SSD in T. hemipterus can be accounted for by a combination of sexual selection through male–male competition and female choice favoring large males, plus weak or undetectable natural selection that favors large females due to a fecundity advantage.  相似文献   

9.
The ratio of singletons to the total number of segregating sites is used to estimate a reproduction parameter in a population model of large offspring numbers without having to jointly estimate the mutation rate. For neutral genetic variation, the ratio of singletons to the total number of segregating sites is equivalent to the ratio of total length of external branches to the total length of the gene genealogy. A multinomial maximum likelihood method that takes into account more frequency classes than just the singletons is developed to estimate the parameter of another large offspring number model. The performance of these methods with regard to sample size, mutation rate, and bias, is investigated by simulation. The expected value of the ratio of the total length of external branches to the total length of the whole tree is, using simulation, shown to decrease for the Kingman coalescent as sample size increases, but can increase or decrease, depending on parameter values, for Λ coalescents. Considering ratios of tree statistics, as opposed to considering lengths of various subtrees separately, can yield better insight into the dynamics of gene genealogies.  相似文献   

10.
Fluctuation analysis is the most widely used approach in estimating microbial mutation rates. Development of methods for point and interval estimation of mutation rates has long been hampered by lack of closed form expressions for the probability mass function of the number of mutants in a parallel culture. This paper uses sequence convolution to derive exact algorithms for computing the score function and observed Fisher information, leading to efficient computation of maximum likelihood estimates and profile likelihood based confidence intervals for the expected number of mutations occurring in a test tube. These algorithms and their implementation in SALVADOR 2.0 facilitate routine use of modern statistical techniques in fluctuation analysis by biologists engaged in mutation research.  相似文献   

11.
When a beneficial mutation is fixed in a population that lacks recombination, the genetic background linked to that mutation is fixed. As a result, beneficial mutations on different backgrounds experience competition, or "clonal interference," that can cause asexual populations to evolve more slowly than their sexual counterparts. Factors such as a large population size (N) and high mutation rates (mu) increase the number of competing beneficial mutations, and hence are expected to increase the intensity of clonal interference. However, recent theory suggests that, with very large values of Nmu, the severity of clonal interference may instead decline. The reason is that, with large Nmu, genomes including both beneficial mutations are rapidly created by recurrent mutation, obviating the need for recombination. Here, we analyze data from experimentally evolved asexual populations of a bacteriophage and find that, in these nonrecombining populations with very large Nmu, recurrent mutation does appear to ameliorate this cost of asexuality.  相似文献   

12.
When the sample size is not large or when the underlying disease is rare, to assure collection of an appropriate number of cases and to control the relative error of estimation, one may employ inverse sampling, in which one continues sampling subjects until one obtains exactly the desired number of cases. This paper focuses discussion on interval estimation of the simple difference between two proportions under independent inverse sampling. This paper develops three asymptotic interval estimators on the basis of the maximum likelihood estimator (MLE), the uniformly minimum variance unbiased estimator (UMVUE), and the asymptotic likelihood ratio test (ALRT). To compare the performance of these three estimators, this paper calculates the coverage probability and the expected length of the resulting confidence intervals on the basis of the exact distribution. This paper finds that when the underlying proportions of cases in both two comparison populations are small or moderate (≤0.20), all three asymptotic interval estimators developed here perform reasonably well even for the pre-determined number of cases as small as 5. When the pre-determined number of cases is moderate or large (≥50), all three estimators are essentially equivalent in all the situations considered here. Because application of the two interval estimators derived from the MLE and the UMVUE does not involve any numerical iterative procedure needed in the ALRT, for simplicity we may use these two estimators without losing efficiency.  相似文献   

13.
A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.  相似文献   

14.
J A Koziol 《Mutation research》1991,249(1):275-280
The maximum likelihood and Luria-Delbrück P0 methods for the estimation of spontaneous mutation rates are compared. The maximum likelihood method is fully efficient, utilizing all available information in a fluctuation experiment, but can be numerically cumbersome. Under certain conditions, there is little loss of efficiency using the P0 method, which is readily implemented numerically. Design considerations should aid investigators in minimizing statistical errors associated with the statistical analysis of fluctuation experiments.  相似文献   

15.
In this paper we consider a cell population such as bacteria consisting of two types of cells, mutant and nonmutant. Under the mutation and homogeneous pure birth processes, this paper derives a maximum likelihood estimation procedure for estimating mutation rate and birth rate. The method is applied to Newcombe's data; further some Monte Carlo studies are generated. The numerical results indicate that the method is quite efficient for estimating genetic parameters in cell populations.  相似文献   

16.
宋丽丽  白中科  樊翔  孙鹏旸  卫怡 《生态学报》2018,38(4):1272-1283
植被覆盖度测度的准确性很大程度上影响着研究结论是否科学合理。在干旱半干旱退化草原区,尤其是受采矿剧烈扰动的矿区,发育的生物土壤结皮(Biological soil crust,BSC)由于其颜色和光谱同绿色植被具有相似性,导致对植被覆盖度的测量存在一定的影响。以伊敏露天矿区为研究区,在西排土场和内排土场采集了含苔藓结皮、地衣结皮和藻结皮的样方相片各四组(每组中包含样方喷水前和喷水后的相片各一张),并采集了一组不含结皮的样方相片作为对照组,运用数码照相法提取植被覆盖度,通过不同的数据处理方法(最大似然分类法及RGB阈值法)进行植被覆盖度提取,设立对比试验,分析BSC对于植被覆盖度测度是否有影响,其影响大小如何,影响程度是否受BSC含水量大小的影响,并对比各常规处理方法的优劣,研究能否通过结合纹理特征与色彩信息剔除BSC对植被覆盖度提取值的影响。研究结论:1)基于照相法的常规数据处理方法提取植被覆盖度时,BSC的存在导致测得的植被覆盖度值偏高,且苔藓结皮、地衣结皮吸水后比吸水前影响更显著,藻结皮相反;2)3个演替阶段的BSC中,尤以含苔藓结皮的样方植被覆盖度高估最为明显,其次为地衣,而含藻结皮样方规律不明显;3)样方内BSC覆盖度越高,植被覆盖度越低,其植被覆盖度测度越不准确,因此在研究草原矿区这类草本植物覆盖度较低、结皮发育的区域时,应当注意BSC的影响;4)试通过应用纹理信息提出改进的提取方法,发现单纯的纹理分类精度极低,而结合了纹理信息与RGB色彩信息的分类精度较高;5)对两种常规分类方法的精度进行比较,RGB阈值法较最大似然分类法更为不准确,对植被覆盖度的高估接近最大似然分类法的2倍。对两种改进的提取方法的精度进行比较,二者都可以有效提高测量精度,基于波段合成的纹理分类方法最佳。四种方法精度由高到低的顺序为:纹理结合RGB法考虑生物土壤结皮的最大似然分类法普通最大似然分类法RGB阈值法。  相似文献   

17.
We report a theory that gives the sampling distribution of two-marker haplotypes that are linked to a rare disease mutation. The sampling distribution is generated with successive Monte Carlo realizations of the coalescence of the disease mutation having recombination and marker mutation events placed along the lineage. Given a sample of mutation-bearing, two-marker haplotypes, the maximum likelihood estimate of the location of the disease mutation can be calculated from the generated sampling distribution, provided that one knows enough about the population history in order to model it. The two-marker likelihood method is compared to a single-marker likelihood and a composite likelihood. The two-marker maximum likelihood gives smaller confidence intervals for the location of the disease locus than a comparable single-marker maximum likelihood. The composite likelihood can give biased results and the bias increases as the extent of linkage disequilibrium on mutation-bearing chromosomes decreases. Haplotype configurations exist for which the composite likelihood will fail to place the disease locus in the correct marker interval.  相似文献   

18.
Some arachnids display extreme sexual size dimorphism (SSD) with adult females being several times larger than adult males. One explanation for SSD in species that exhibit pre‐copulatory sexual cannibalism (female attack, kill and consumption of the male prior to mating) is that smaller males may be less likely victims of predatory attacks by females. However, in some sexually cannibalistic species SSD is relatively moderate (i.e. males are similar in size to females) suggesting benefits of large male body size. Here, I report the results of an experiment designed to explore the ramifications of body size in mating interactions of the sexually cannibalistic, North American fishing spider (Dolomedes triton). Results suggest that male size does not influence courtship behavior, the likelihood of being attacked, or the male's ability to secure a mounting. However, large males were superior at gaining copulations once mounted. Sexual cannibalism may also be predicated on female size. Female condition (mass/cephalothorax area) did not explain any of these behaviors from the copulatory sequence, however, females with a smaller cephalothorax area were more likely to attack courting males. Finally, analysis of the ratio of female size to male size showed that when SSD is weak males are more likely to escape attacks and mate successfully. Results are discussed in light of several hypotheses for sexual cannibalism, and the benefits of large male body size illustrated here are put forth as potential explanations for the relatively moderate extent of SSD found in this sexually cannibalistic species.  相似文献   

19.
Summary A maximum likelihood method for inferring protein phylogeny was developed. It is based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages. Therefore, this method is expected to be powerful in inferring phylogeny among distantly related proteins, either orthologous or parallogous, where the evolutionary rate may deviate from constancy. Not only amino acid substitutions but also insertion/deletion events during evolution were incorporated into the Markov model. A simple method for estimating a bootstrap probability for the maximum likelihood tree among alternatives without performing a maximum likelihood estimation for each resampled data set was developed. These methods were applied to amino acid sequence data of a photosynthetic membrane protein,psbA, from photosystem II, and the phylogeny of this protein was discussed in relation to the origin of chloroplasts.  相似文献   

20.
M. K. Kuhner  J. Yamato    J. Felsenstein 《Genetics》1995,140(4):1421-1430
We present a new way to make a maximum likelihood estimate of the parameter 4N(e)μ (effective population size times mutation rate per site, or θ) based on a population sample of molecular sequences. We use a Metropolis-Hastings Markov chain Monte Carlo method to sample genealogies in proportion to the product of their likelihood with respect to the data and their prior probability with respect to a coalescent distribution. A specific value of θ must be chosen to generate the coalescent distribution, but the resulting trees can be used to evaluate the likelihood at other values of θ, generating a likelihood curve. This procedure concentrates sampling on those genealogies that contribute most of the likelihood, allowing estimation of meaningful likelihood curves based on relatively small samples. The method can potentially be extended to cases involving varying population size, recombination, and migration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号