首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Different statistical or low-pass filters may be used for the idealization of ion channel data. We address the problem of predicting optimal filter parameters, represented by a threshold test-value for statistical filters and by a cut-off frequency for low-pass filters. Optimal idealization is understood in the sense of maximal similarity between recovered and real signals. Special procedures are suggested to quantitatively characterize the difference between the recovered and the real signals, the latter being known for simulated data. These procedures, called objective criteria, play the role of referees in estimating the performance of different predictive optimality criteria. We have tested standard Akaike's AIC and its modification by Rissanen, MDL. Both gave unsatisfactory results. We have shown analytically, that the Akaike-type criterion, based on the use of a certain penalty for the log likelihood function per transition, indicates the correct optimum point only if the penalty is set equal to half the optimal threshold. As the latter varies significantly for different data sets, this criterion is not particularly helpful. A new universal predictive optimality criterion, valid for real data and any idealization method, is suggested. It is formally similar to AIC, but instead of log likelihood it uses the doubled number of false transitions. The predictive power of the new criterion is demonstrated with different types of data for Hinkley and 50% amplitude methods. Received: 23 July 1996 / Accepted: 9 May 1997  相似文献   

2.
A comparison has been made between the estimates obtained from maximum likelihood estimation of gamma, inverse normal, and normal distribution models for stage-frequency data. Results have been compared for six of sets of test data, and from many sets of simulated data. It is concluded that (1) some estimates may differ substantially between the models, (2) estimates from the correct model have little bias, and estimated standard errors are generally close to theoretical values, (3) there are problems in determining degrees of freedom for chi-squared goodness of fit tests, so that it is best to compare test statistics with simulated distributions, and (4) goodness of fit tests may not discriminate well between the three models.  相似文献   

3.
Lee SY  Song XY 《Biometrics》2004,60(3):624-636
A general two-level latent variable model is developed to provide a comprehensive framework for model comparison of various submodels. Nonlinear relationships among the latent variables in the structural equations at both levels, as well as the effects of fixed covariates in the measurement and structural equations at both levels, can be analyzed within the framework. Moreover, the methodology can be applied to hierarchically mixed continuous, dichotomous, and polytomous data. A Monte Carlo EM algorithm is implemented to produce the maximum likelihood estimate. The E-step is completed by approximating the conditional expectations through observations that are simulated by Markov chain Monte Carlo methods, while the M-step is completed by conditional maximization. A procedure is proposed for computing the complicated observed-data log likelihood and the BIC for model comparison. The methods are illustrated by using a real data set.  相似文献   

4.
Hjort & Claeskens (2003) developed an asymptotic theoryfor model selection, model averaging and subsequent inferenceusing likelihood methods in parametric models, along with associatedconfidence statements. In this article, we consider a semiparametricversion of this problem, wherein the likelihood depends on parametersand an unknown function, and model selection/averaging is tobe applied to the parametric parts of the model. We show thatall the results of Hjort & Claeskens hold in the semiparametriccontext, if the Fisher information matrix for parametric modelsis replaced by the semiparametric information bound for semiparametricmodels, and if maximum likelihood estimators for parametricmodels are replaced by semiparametric efficient profile estimators.Our methods of proof employ Le Cam's contiguity lemmas, leadingto transparent results. The results also describe the behaviourof semiparametric model estimators when the parametric componentis misspecified, and also have implications for pointwise-consistentmodel selectors.  相似文献   

5.
The purpose of the study is to estimate the population size under a homogeneous truncated count model and under model contaminations via the Horvitz‐Thompson approach on the basis of a count capture‐recapture experiment. The proposed estimator is based on a mixture of zero‐truncated Poisson distributions. The benefit of using the proposed model is statistical inference of the long‐tailed or skewed distributions and the concavity of the likelihood function with strong results available on the nonparametric maximum likelihood estimator (NPMLE). The results of comparisons, for finding the appropriate estimator among McKendrick's, Mantel‐Haenszel's, Zelterman's, Chao's, the maximum likelihood, and the proposed methods in a simulation study, reveal that under model contaminations the proposed estimator provides the best choice according to its smallest bias and smallest mean square error for a situation of sufficiently large population sizes and the further results show that the proposed estimator performs well even for a homogeneous situation. The empirical examples, containing the cholera epidemic in India based on homogeneity and the heroin user data in Bangkok 2002 based on heterogeneity, are fitted with an excellent goodness‐of‐fit of the models and the confidence interval estimations may also be of considerable interest. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

6.
In the analysis of data generated by change-point processes, one critical challenge is to determine the number of change-points. The classic Bayes information criterion (BIC) statistic does not work well here because of irregularities in the likelihood function. By asymptotic approximation of the Bayes factor, we derive a modified BIC for the model of Brownian motion with changing drift. The modified BIC is similar to the classic BIC in the sense that the first term consists of the log likelihood, but it differs in the terms that penalize for model dimension. As an example of application, this new statistic is used to analyze array-based comparative genomic hybridization (array-CGH) data. Array-CGH measures the number of chromosome copies at each genome location of a cell sample, and is useful for finding the regions of genome deletion and amplification in tumor cells. The modified BIC performs well compared to existing methods in accurately choosing the number of regions of changed copy number. Unlike existing methods, it does not rely on tuning parameters or intensive computing. Thus it is impartial and easier to understand and to use.  相似文献   

7.
Allozyme data are widely used to infer the phylogenies of populations and closely-related species. Numerous parsimony, distance, and likelihood methods have been proposed for phylogenetic analysis of these data; the relative merits of these methods have been debated vigorously, but their accuracy has not been well explored. In this study, I compare the performance of 13 phylogenetic methods (six parsimony, six distance, and continuous maximum likelihood) by applying a congruence approach to eight allozyme data sets from the literature. Clades are identified that are supported by multiple data sets other than allozymes (e.g. morphology, DNA sequences), and the ability of different methods to recover these 'known' clades is compared. The results suggest that (1) distance and likelihood methods generally outperform parsimony methods, (2) methods that utilize frequency data tend to perform well, and (3) continuous maximum likelihood is among the most accurate methods, and appears to be robust to violations of its assumptions. These results are in agreement with those from recent simulation studies, and help provide a basis for empirical workers to choose among the many methods available for analysing allozyme characters.  相似文献   

8.
Björn Bornkamp 《Biometrics》2012,68(3):893-901
Summary This article considers the topic of finding prior distributions when a major component of the statistical model depends on a nonlinear function. Using results on how to construct uniform distributions in general metric spaces, we propose a prior distribution that is uniform in the space of functional shapes of the underlying nonlinear function and then back‐transform to obtain a prior distribution for the original model parameters. The primary application considered in this article is nonlinear regression, but the idea might be of interest beyond this case. For nonlinear regression the so constructed priors have the advantage that they are parametrization invariant and do not violate the likelihood principle, as opposed to uniform distributions on the parameters or the Jeffrey’s prior, respectively. The utility of the proposed priors is demonstrated in the context of design and analysis of nonlinear regression modeling in clinical dose‐finding trials, through a real data example and simulation.  相似文献   

9.
Delayed dose limiting toxicities (i.e. beyond first cycle of treatment) is a challenge for phase I trials. The time‐to‐event continual reassessment method (TITE‐CRM) is a Bayesian dose‐finding design to address the issue of long observation time and early patient drop‐out. It uses a weighted binomial likelihood with weights assigned to observations by the unknown time‐to‐toxicity distribution, and is open to accrual continually. To avoid dosing at overly toxic levels while retaining accuracy and efficiency for DLT evaluation that involves multiple cycles, we propose an adaptive weight function by incorporating cyclical data of the experimental treatment with parameters updated continually. This provides a reasonable estimate for the time‐to‐toxicity distribution by accounting for inter‐cycle variability and maintains the statistical properties of consistency and coherence. A case study of a First‐in‐Human trial in cancer for an experimental biologic is presented using the proposed design. Design calibrations for the clinical and statistical parameters are conducted to ensure good operating characteristics. Simulation results show that the proposed TITE‐CRM design with adaptive weight function yields significantly shorter trial duration, does not expose patients to additional risk, is competitive against the existing weighting methods, and possesses some desirable properties.  相似文献   

10.
In some clinical trials or clinical practice, the therapeutic agent is administered repeatedly, and doses are adjusted in each patient based on repeatedly measured continuous responses, to maintain the response levels in a target range. Because a lower dose tends to be selected for patients with a better outcome, simple summarizations may wrongly show a better outcome for the lower dose, producing an incorrect dose–response relationship. In this study, we consider the dose–response relationship under these situations. We show that maximum‐likelihood estimates are consistent without modeling the dose‐modification mechanisms when the selection of the dose as a time‐dependent covariate is based only on observed, but not on unobserved, responses, and measurements are generated based on administered doses. We confirmed this property by performing simulation studies under several dose‐modification mechanisms. We examined an autoregressive linear mixed effects model. The model represents profiles approaching each patient's asymptote when identical doses are repeatedly administered. The model takes into account the previous dose history and provides a dose–response relationship of the asymptote as a summary measure. We also examined a linear mixed effects model assuming all responses are measured at steady state. In the simulation studies, the estimates of both the models were unbiased under the dose modification based on observed responses, but biased under the dose modification based on unobserved responses. In conclusion, the maximum‐likelihood estimates of the dose–response relationship are consistent under the dose modification based only on observed responses.  相似文献   

11.
Composite likelihood methods have become very popular for the analysis of large-scale genomic data sets because of the computational intractability of the basic coalescent process and its generalizations: It is virtually impossible to calculate the likelihood of an observed data set spanning a large chromosomal region without using approximate or heuristic methods. Composite likelihood methods are approximate methods and, in the present article, assume the likelihood is written as a product of likelihoods, one for each of a number of smaller regions that together make up the whole region from which data is collected. A very general framework for neutral coalescent models is presented and discussed. The framework comprises many of the most popular coalescent models that are currently used for analysis of genetic data. Assume data is collected from a series of consecutive regions of equal size. Then it is shown that the observed data forms a stationary, ergodic process. General conditions are given under which the maximum composite estimator of the parameters describing the model (e.g. mutation rates, demographic parameters and the recombination rate) is a consistent estimator as the number of regions tends to infinity.  相似文献   

12.
Methods for the analysis of unmatched case-control data based on a finite population sampling model are developed. Under this model, and the prospective logistic model for disease probabilities, a likelihood for case-control data that accommodates very general sampling of controls is derived. This likelihood has the form of a weighted conditional logistic likelihood. The flexibility of the methods is illustrated by providing a number of control sampling designs and a general scheme for their analyses. These include frequency matching, counter-matching, case-base, randomized recruitment, and quota sampling. A study of risk factors for childhood asthma illustrates an application of the counter-matching design. Some asymptotic efficiency results are presented and computational methods discussed. Further, it is shown that a 'marginal' likelihood provides a link to unconditional logistic methods. The methods are examined in a simulation study that compares frequency and counter-matching using conditional and unconditional logistic analyses and indicate that the conditional logistic likelihood has superior efficiency. Extensions that accommodate sampling of cases and multistage designs are presented. Finally, we compare the analysis methods presented here to other approaches, compare counter-matching and two-stage designs, and suggest areas for further research.To whom correspondence should be addressed.  相似文献   

13.
Akaike's information criterion in generalized estimating equations   总被引:15,自引:0,他引:15  
Pan W 《Biometrics》2001,57(1):120-125
Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.  相似文献   

14.
Estimating nonlinear dose‐response relationships in the context of pharmaceutical clinical trials is often a challenging problem. The data in these trials are typically variable and sparse, making this a hard inference problem, despite sometimes seemingly large sample sizes. Maximum likelihood estimates often fail to exist in these situations, while for Bayesian methods, prior selection becomes a delicate issue when no carefully elicited prior is available, as the posterior distribution will often be sensitive to the priors chosen. This article provides guidance on the usage of functional uniform prior distributions in these situations. The essential idea of functional uniform priors is to employ a distribution that weights the functional shapes of the nonlinear regression function equally. By doing so one obtains a distribution that exhaustively and uniformly covers the underlying potential shapes of the nonlinear function. On the parameter scale these priors will often result in quite nonuniform prior distributions. This paper gives hints on how to implement these priors in practice and illustrates them in realistic trial examples in the context of Phase II dose‐response trials as well as Phase I first‐in‐human studies.  相似文献   

15.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

16.
W G Hill 《Biometrics》1975,31(4):881-888
Methods are outlined for analyzing data on genotype frequencies at several codominant loci in random mating diploid populations. Maximum likelihood (ML) methods are given for estimating chromosomal frequencies. Using these, a succession of models of assumed independence of gene frequency are fitted. These are based on those used in multi-dimensional contigency tables, and tests for association (linkage disequilibrium), made using likelihood ratios. The methods are illustrated with an example.  相似文献   

17.
Protein conserved domains are distinct units of molecular structure, usually associated with particular aspects of molecular function such as catalysis or binding. These conserved subsequences are often unobserved and thus in need of detection. Motif discovery methods can be used to find these unobserved domains given a set of sequences. This paper presents the data augmentation (DA) framework that unifies a suite of motif-finding algorithms through maximizing the same likelihood function by imputing the unobserved data. The data augmentation refers to those methods that formulate iterative optimization by exploiting the unobserved data. Two categories of maximum likelihood based motif-finding algorithms are illustrated under the DA framework. The first is the deterministic algorithms that are to maximize the likelihood function by performing an iteratively optimal local search in the alignment space. The second is the stochastic algorithms that are to iteratively draw motif location samples via Monte Carlo simulation and simultaneously keep track of the superior solution with the best likelihood. As a result, four DA motif discovery algorithms are described, evaluated, and compared by aligning real and simulated protein sequences.  相似文献   

18.
Design and analysis methods are presented for studying the association of a candidate gene with a disease by using parental data in place of nonrelated controls. This alternative design eliminates spurious differences in allele frequencies between cases and nonrelated controls resulting from different ethnic origins and population stratification for these two groups. We present analysis methods which are based on two genetic relative risks: (1) the relative risk of disease for homozygotes with two copies of the candidate gene versus homozygotes without the candidate gene and (2) the relative risk for heterozygotes with one copy of the candidate gene versus homozygotes without the candidate gene. In addition to estimating the magnitude of these relative risks, likelihood methods allow specific hypotheses to be tested, namely, a test for overall association of the candidate gene with disease, as well as specific genetic hypotheses, such as dominant or recessive inheritance. Two likelihood methods are presented: (1) a likelihood method appropriate when Hardy-Weinberg equilibrium holds and (2) a likelihood method in which we condition on parental genotype data when Hardy-Weinberg equilibrium does not hold. The results for the relative efficiency of these two methods suggest that the conditional approach may at times be preferable, even when equilibrium holds. Sample-size and power calculations are presented for a multitiered design. The purpose of tier 1 is to detect the presence of an abnormal sequence for a postulated candidate gene among a small group of cases. The purpose of tier 2 is to test for association of the abnormal variant with disease, such as by the likelihood methods presented. The purpose of tier 3 is to confirm positive results from tier 2. Results indicate that required sample sizes are smaller when expression of disease is recessive, rather than dominant, and that, for recessive disease and large relative risks, necessary sample sizes may be feasible, even if only a small percentage of the disease can be attributed to the candidate gene.  相似文献   

19.
Ren F  Tanaka H  Yang Z 《Gene》2009,441(1-2):119-125
Supermatrix and supertree methods are two strategies advocated for phylogenetic analysis of sequence data from multiple gene loci, especially when some species are missing at some loci. The supermatrix method concatenates sequences from multiple genes into a data supermatrix for phylogenetic analysis, and ignores differences in evolutionary dynamics among the genes. The supertree method analyzes each gene separately and assembles the subtrees estimated from individual genes into a supertree for all species. Most algorithms suggested for supertree construction lack statistical justifications and ignore uncertainties in the subtrees. Instead of supermatrix or supertree, we advocate the use of likelihood function to combine data from multiple genes while accommodating their differences in the evolutionary process. This combines the strengths of the supermatrix and supertree methods while avoiding their drawbacks. We conduct computer simulation to evaluate the performance of the supermatrix, supertree, and maximum likelihood methods applied to two phylogenetic problems: molecular-clock dating of species divergences and reconstruction of species phylogenies. The results confirm the theoretical superiority of the likelihood method. Supertree or separate analyses of data of multiple genes may be useful in revealing the characteristics of the evolutionary process of multiple gene loci, and the information may be used to formulate realistic models for combined analysis of all genes by likelihood.  相似文献   

20.
Statistical models are the traditional choice to test scientific theories when observations, processes or boundary conditions are subject to stochasticity. Many important systems in ecology and biology, however, are difficult to capture with statistical models. Stochastic simulation models offer an alternative, but they were hitherto associated with a major disadvantage: their likelihood functions can usually not be calculated explicitly, and thus it is difficult to couple them to well-established statistical theory such as maximum likelihood and Bayesian statistics. A number of new methods, among them Approximate Bayesian Computing and Pattern-Oriented Modelling, bypass this limitation. These methods share three main principles: aggregation of simulated and observed data via summary statistics, likelihood approximation based on the summary statistics, and efficient sampling. We discuss principles as well as advantages and caveats of these methods, and demonstrate their potential for integrating stochastic simulation models into a unified framework for statistical modelling.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号