期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Maximum likelihood estimation of ordered multinomial parameters

Jewell NP Kalbfleisch JD 《Biostatistics (Oxford, England)》2004,5(2):291-306

The pool adjacent violator algorithm Ayer et al. (1955, The Annals of Mathematical Statistics, 26, 641-647) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see Barlow et al., 1972, Statistical Inference under Order Restrictions, Wiley, New York). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of 'ordered' multinomial parameters p(i)= (p(1i),p(2i),.,p(mi)) for 1 相似文献

2.

Optimum probabilistic processing in colour perception. I. Colour discrimination.

G Buchsbaum J L Goldstein 《Proceedings of the Royal Society of London. Series B, Containing papers of a Biological character. Royal Society (Great Britain)》1979,205(1159):229-247

A comprehensive account of wavelength discrimination and colour saturation discrimination is given in terms of optimum probabilistic signal detection. The theory is a logical deduction from statistical estimation theory of the visual estimate of the spectral parameters of the stimulus. In place of geometrical concepts associated with colour-space geometry, stimulus discriminability is determined by optimum decision rules given by likelihood ratio tests on statistics that are postulated for the trichromatic responses. The classical line element theory and its formulations are deduced to be discriminability measures between signals. The different mathematical forms of classical theory are shown to correspond to different statistical constraints. 相似文献

3.

Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method.

J Felsenstein 《Genetical research》1992,60(3):209-220

We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination. 相似文献

4.

二个区域的封闭种群的标记重捕模型

张南松《生物数学学报》2003,18(1):109-115

通过引入区域的初始比例因子,考虑了二个区域A与B的封闭种群标记重捕模型,再利用完整的极大似然函数和多项分布函数的性质,给出了当个体在不同区域的个体捕捉率相等时的二个区域之间的转移概率与各区域的初始比例的求法,推导出在不同区域的个体捕捉率不相等但个体低转移率条件下二个区域的封闭种群的标记重捕模型的参数表达式,并用实例说明。相似文献

5.

Evaluation of several methods for estimating phylogenetic trees when substitution rates differ over nucleotide sites

Ziheng Yang 《Journal of molecular evolution》1995,40(6):689-697

Several maximum likelihood and distance matrix methods for estimating phylogenetic trees from homologous DNA sequences were compared when substitution rates at sites were assumed to follow a gamma distribution. Computer simulations were performed to estimate the probabilities that various tree estimation methods recover the true tree topology. The case of four species was considered, and a few combinations of parameters were examined. Attention was applied to discriminating among different sources of error in tree reconstruction, i.e., the inconsistency of the tree estimation method, the sampling error in the estimated tree due to limited sequence length, and the sampling error in the estimated probability due to the number of simulations being limited. Compared to the least squares method based on pairwise distance estimates, the joint likelihood analysis is found to be more robust when rate variation over sites is present but ignored and an assumption is thus violated. With limited data, the likelihood method has a much higher probability of recovering the true tree and is therefore more efficient than the least squares method. The concept of statistical consistency of a tree estimation method and its implications were explored, and it is suggested that, while the efficiency (or sampling error) of a tree estimation method is a very important property, statistical consistency of the method over a wide range of, if not all, parameter values is prerequisite. 相似文献

6.

Variance component estimation techniques compared for two mating designs with forest genetic architecture through computer simulation 总被引：3，自引：0，他引：3

D. A. Huber T. L. White G. R. Hodge 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》1994,88(2):236-242

Computer simulation was used to compare minimum variance quadratic estimation (MIVQUE), minimum norm quadratic unbiased estimation (MINQUE), restricted maximum likelihood (REML), maximum likelihood (ML), and Henderson's Method 3 (HM3) on the basis of variance among estimates, mean square error (MSE), bias and probability of nearness for estimation of both individual variance components and three ratios of variance components. The investigation also compared three procedures for dealing with negative estimates and included the use of both individual observations and plot means as the experimental unit of the analysis. The structure of data simulated (field design, mating designs, genetic architecture and imbalance) represented typical analysis problems in quantitative forest genetics. Results of comparing the estimation techniques demonstrated that: estimates of probability of nearness did not discriminate among techniques; bias was discriminatory among procedures for dealing with negative estimates but not among estimation techniques (except ML); sampling variance among estimates was discriminatory among procedures for dealing with negative estimates, estimation techniques and unit of observation; and MSE provided no additional information to variance of the estimates. HM3 and REML were the closest competitors under these criteria; however, REML demonstrated greater robustness to imbalance. Of the three negative estimate procedures, two are of practical significance and guidelines for their application are presented. Estimates from individual observations were always preferable to those from plot means over the experimental levels of this study.This is Journal Series NO. R-03768 of the Institute of Food and Agricultural Sciences 相似文献

7.

A classification of bioinformatics algorithms from the viewpoint of maximizing expected accuracy (MEA)

Hamada M Asai K 《Journal of computational biology》2012,19(5):532-549

Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution-even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA. 相似文献

8.

Sample size determination for matched-pair equivalence trials using rate ratio

Tang NS Tang ML Wang SF 《Biostatistics (Oxford, England)》2007,8(3):625-631

In this article, we compare Wald-type, logarithmic transformation, and Fieller-type statistics for the classical 2-sided equivalence testing of the rate ratio under matched-pair designs with a binary end point. These statistics can be implemented through sample-based, constrained least squares estimation and constrained maximum likelihood (CML) estimation methods. Sample size formulae based on the CML estimation method are developed. We consider formulae that control a prespecified power or confidence width. Our simulation studies show that statistics based on the CML estimation method generally outperform other statistics and methods with respect to actual type I error rate and average width of confidence intervals. Also, the corresponding sample size formulae are valid asymptotically in the sense that the exact power and actual coverage probability for the estimated sample size are generally close to their prespecified values. The methods are illustrated with a real example from a clinical laboratory study. 相似文献

9.

Phylogenetic analysis using parsimony and likelihood methods 总被引：1，自引：0，他引：1

Ziheng Yang 《Journal of molecular evolution》1996,42(2):294-307

The assumptions underlying the maximum-parsimony (MP) method of phylogenetic tree reconstruction were intuitively examined by studying the way the method works. Computer simulations were performed to corroborate the intuitive examination. Parsimony appears to involve very stringent assumptions concerning the process of sequence evolution, such as constancy of substitution rates between nucleotides, constancy of rates across nucleotide sites, and equal branch lengths in the tree. For practical data analysis, the requirement of equal branch lengths means similar substitution rates among lineages (the existence of an approximate molecular clock), relatively long interior branches, and also few species in the data. However, a small amount of evolution is neither a necessary nor a sufficient requirement of the method. The difficulties involved in the application of current statistical estimation theory to tree reconstruction were discussed, and it was suggested that the approach proposed by Felsenstein (1981,J. Mol. Evol. 17: 368–376) for topology estimation, as well as its many variations and extensions, differs fundamentally from the maximum likelihood estimation of a conventional statistical parameter. Evidence was presented showing that the Felsenstein approach does not share the asymptotic efficiency of the maximum likelihood estimator of a statistical parameter. Computer simulations were performed to study the probability that MP recovers the true tree under a hierarchy of models of nucleotide substitution; its performance relative to the likelihood method was especially noted. The results appeared to support the intuitive examination of the assumptions underlying MP. When a simple model of nucleotide substitution was assumed to generate data, the probability that MP recovers the true topology could be as high as, or even higher than, that for the likelihood method. When the assumed model became more complex and realistic, e.g., when substitution rates were allowed to differ between nucleotides or across sites, the probability that MP recovers the true topology, and especially its performance relative to that of the likelihood method, generally deteriorates. As the complexity of the process of nucleotide substitution in real sequences is well recognized, the likelihood method appears preferable to parsimony. However, the development of a statistical methodology for the efficient estimation of the tree topology remains a difficult open problem. 相似文献

10.

Dynamic information in uncertain and changing worlds 总被引：4，自引：0，他引：4

M Mangel 《Journal of theoretical biology》1990,146(3):317-332

A general theory for information processing by organisms living in uncertain and changing worlds is developed. The three fundamental properties of the theory are: (i) the use of a memory parameter that allows the organism to forget the more distant past, (ii) a succinct representation of encounters and information and (iii) flexibility in the estimates of parameters by including the uncertainty in these estimates in a consistent manner. The theory is developed using Bayesian methods (but can also be applied to maximum likelihood estimation) and is applied to the encounter models standardly used in ecology (Poisson, binomial, and negative binomial). Two applications are discussed: (i) patch selection and the matching rule and (ii) superparasitism by a parasitoid. 相似文献

11.

Estimating phylogenetic trees from pairwise likelihoods and posterior probabilities of substitution counts

Holder MT Steel M 《Journal of theoretical biology》2011,280(1):159-166

The field of phylogenetic tree estimation has been dominated by three broad classes of methods: distance-based approaches, parsimony and likelihood-based methods (including maximum likelihood (ML) and Bayesian approaches). Here we introduce two new approaches to tree inference: pairwise likelihood estimation and a distance-based method that estimates the number of substitutions along the paths through the tree. Our results include the derivation of the formulae for the probability that two leaves will be identical at a site given a number of substitutions along the path connecting them. We also derive the posterior probability of the number of substitutions along a path between two sequences. The calculations for the posterior probabilities are exact for group-based, symmetric models of character evolution, but are only approximate for more general models. 相似文献

12.

Multilevel Mixture Cure Models with Random Effects

Xin Lai Kelvin K. W. Yau 《Biometrical journal. Biometrische Zeitschrift》2009,51(3):456-466

This paper extends the multilevel survival model by allowing the existence of cured fraction in the model. Random effects induced by the multilevel clustering structure are specified in the linear predictors in both hazard function and cured probability parts. Adopting the generalized linear mixed model (GLMM) approach to formulate the problem, parameter estimation is achieved by maximizing a best linear unbiased prediction (BLUP) type log‐likelihood at the initial step of estimation, and is then extended to obtain residual maximum likelihood (REML) estimators of the variance component. The proposed multilevel mixture cure model is applied to analyze the (i) child survival study data with multilevel clustering and (ii) chronic granulomatous disease (CGD) data on recurrent infections as illustrations. A simulation study is carried out to evaluate the performance of the REML estimators and assess the accuracy of the standard error estimates. 相似文献

13.

Maximum likelihood inference of protein phylogeny and the origin of chloroplasts 总被引：15，自引：0，他引：15

Hirohisa Kishino Takashi Miyata Masami Hasegawa 《Journal of molecular evolution》1990,31(2):151-160

Summary A maximum likelihood method for inferring protein phylogeny was developed. It is based on a Markov model that takes into account the unequal transition probabilities among pairs of amino acids and does not assume constancy of rate among different lineages. Therefore, this method is expected to be powerful in inferring phylogeny among distantly related proteins, either orthologous or parallogous, where the evolutionary rate may deviate from constancy. Not only amino acid substitutions but also insertion/deletion events during evolution were incorporated into the Markov model. A simple method for estimating a bootstrap probability for the maximum likelihood tree among alternatives without performing a maximum likelihood estimation for each resampled data set was developed. These methods were applied to amino acid sequence data of a photosynthetic membrane protein,psbA, from photosystem II, and the phylogeny of this protein was discussed in relation to the origin of chloroplasts. 相似文献

14.

Efficiency of haplotype frequency estimation when nuclear family information is included

Becker T Knapp M 《Human heredity》2002,54(1):45-53

In genetic studies the haplotype structure of the regarded population is expected to carry important information. Experimental methods to derive haplotypes, however, are expensive and none of them has yet become standard methodology. On the other hand, maximum likelihood haplotype estimation from unphased individual genotypes may incur inaccuracies. We therefore investigated the relative efficiency of haplotype frequency estimation when nuclear family information is included compared to estimation from experimentally derived haplotypes. Efficiency was measured in terms of variance ratios of the estimates. The variances were derived from the binomial distribution for experimentally derived haplotypes, and from the Fisher information matrix corresponding to the general likelihood function of the haplotype frequency parameters, including family information. We subsequently compared these variance ratios to the variance ratios for the case of estimation from individual genotypes. We found that the information gained from a single child compensates missing phase information to a high degree, resulting in estimates almost as reliable as those derived from observed haplotypes. Thus, if children have already been genotyped for other reasons, it is highly recommendable to include them into the estimation. If child information is not already present, it depends on the number of loci and the haplotype diversity if it is useful to genotype a single child just to reduce phase ambiguity. In general, if the number of loci is less than or equal to three or if the number of haplotypes with a frequency >5% is less than or equal to four, haplotype estimation from individuals is quite good already and the improvement gained from a single child can not compensate the genotyping effort for it. On the other hand, under scenarios with many loci and high haplotype diversity, haplotype frequency estimation from trios can be more efficient than haplotype frequency estimation from individuals also on a per genotype base. 相似文献

15.

A targeted maximum likelihood estimator for two-stage designs

Rose S van der Laan MJ 《The international journal of biostatistics》2011,7(1):17

We consider two-stage sampling designs, including so-called nested case control studies, where one takes a random sample from a target population and completes measurements on each subject in the first stage. The second stage involves drawing a subsample from the original sample, collecting additional data on the subsample. This data structure can be viewed as a missing data structure on the full-data structure collected in the second-stage of the study. Methods for analyzing two-stage designs include parametric maximum likelihood estimation and estimating equation methodology. We propose an inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE) in two-stage sampling designs and present simulation studies featuring this estimator. 相似文献

16.

Comparison of transmission rates of HIV-1 and HIV-2 in a cohort of prostitutes in Senegal

Christl Donnelly Wendy Leisenring Phyllis Kanki Tamara Awerbuch Sonja Sandberg 《Bulletin of mathematical biology》1993,55(4):731-743

To explore the biological similarities and differences between the HIV-1 and HIV-2 viruses, we model the probability of male-to-female transmission of either HIV virus as a function of the number of sexual partners, the prevalence of the viruses and the infectivity per contact. Using maximum likelihood estimation theory and data from a prospective study of registered female prostitutes in Dakar, Senegal, we estimate and compare the infectivities of HIV-1 and HIV-2. Graphical goodness-of-fit methods are used to show that our model fits the data well. We find that in male-to-female transmission HIV-1 is significantly more infectious than HIV-2. This findings is consistent with other data from laboratory and epidemiologic studies comparing the biology of HIV-1 and HIV-2. 相似文献

17.

植物空间分布格局中邻体距离的概率分布模型及参数估计

高猛《生态学报》2016,36(14):4406-4414

最近邻体法是一类有效的植物空间分布格局分析方法,邻体距离的概率分布模型用于描述邻体距离的统计特征,属于常用的最近邻体法之一。然而,聚集分布格局中邻体距离(个体到个体)的概率分布模型表达式复杂,参数估计的计算量大。根据该模型期望和方差的特性,提出了一种简化的参数估计方法,并利用遗传算法来实现参数优化,结果表明遗传算法可以有效地估计的该模型的两个参数。同时,利用该模型拟合了加拿大南温哥华岛3个寒温带树种的空间分布数据,结果显示:该概率分布模型可以很好地拟合美国花旗松(P.menziesii)和西部铁杉(T.heterophylla)的邻体距离分布,但由于西北红柏(T.plicata)存在高度聚集的团簇分布,拟合结果不理想;美国花旗松在样地中近似随机分布,空间聚集参数对空间尺度的依赖性不强,但西北红柏和西部铁杉空间聚集参数具有尺度依赖性,随邻体距离阶数增加而变大。最后,讨论了该模型以及参数估计方法的优势和限制。相似文献

18.

SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data

R Nielsen T Korneliussen A Albrechtsen Y Li J Wang 《PloS one》2012,7(7):e37558

We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set. 相似文献

19.

Estimating Effective Population Size and Mutation Rate from Sequence Data Using Metropolis-Hastings Sampling 总被引：27，自引：14，他引：13

M. K. Kuhner J. Yamato J. Felsenstein 《Genetics》1995,140(4):1421-1430

We present a new way to make a maximum likelihood estimate of the parameter 4N(e)μ (effective population size times mutation rate per site, or θ) based on a population sample of molecular sequences. We use a Metropolis-Hastings Markov chain Monte Carlo method to sample genealogies in proportion to the product of their likelihood with respect to the data and their prior probability with respect to a coalescent distribution. A specific value of θ must be chosen to generate the coalescent distribution, but the resulting trees can be used to evaluate the likelihood at other values of θ, generating a likelihood curve. This procedure concentrates sampling on those genealogies that contribute most of the likelihood, allowing estimation of meaningful likelihood curves based on relatively small samples. The method can potentially be extended to cases involving varying population size, recombination, and migration. 相似文献

20.

Estimation in regression models for longitudinal binary data with outcome-dependent follow-up

Fitzmaurice GM Lipsitz SR Ibrahim JG Gelber R Lipshultz S 《Biostatistics (Oxford, England)》2006,7(3):469-485

In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children. 相似文献