首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

2.
The paper presents a method of multivariate data analysis described by a model which involves fixed effects, additive polygenic individual effects and the effects of a major gene. To find the estimates of model parameters, the maximization of likelihood function method is applied. The maximum of likelihood function is computed by the use of the Gibbs sampling approach. In this approach, following the conditional posterior distributions, values of all unknown parameters are generated. On the basis of the obtained samples the marginal posterior densities as well as the estimates of fixed effects, gene frequency, genotypic values, major gene, polygenic and error (co)variances are calculated. A numerical example, supplemented to theoretical considerations, deals with data simulated according to the considered model.  相似文献   

3.
The posterior probability of linkage (PPL) statistic has been developed as a method for the rigorous accumulation of evidence for or against linkage allowing for both intra- and inter-sample heterogeneity. To date, the method has assumed linkage equilibrium between alleles at the trait locus and the marker locus. We now generalize the PPL to allow for linkage disequilibrium (LD), by incorporating variable phase probabilities into the underlying linkage likelihood. This enables us to recover the marginal posterior density of the recombination fraction, integrating out nuisance parameters of the trait model, including the locus heterogeneity (admixture) parameter, as well as a vector of LD parameters. The marginal posterior density can then be updated across data subsets or new data as they become available, while allowing parameters of the trait model to vary between data sets. The method applies immediately to general pedigree structures and to markers with multiple alleles. In the case of SNPs, the likelihood is parameterized in terms of the standard single LD parameter D'; and it therefore affords a mechanism for estimation of D' between the marker and the trait, again, without fixing the parameters of the trait model and allowing for updating across data sets. It is even possible to allow for a different associated allele in different populations, while accumulating information regarding the strength of LD. While a computationally efficient implementation for multi-allelic markers is still in progress, we have implemented a version of this new LD-PPL for SNPs and evaluated its performance in nuclear families. Our simulations show that LD-PPLs tend to be larger than PPLs (stronger evidence in favor of linkage/LD) with increased LD level, under a variety of generating models; while in the absence of linkage and LD, LD-PPLs tend to be smaller than PPLs (stronger evidence against linkage). The estimate of D' also behaves well even in relatively small, heterogeneous samples.  相似文献   

4.
It is a challenging issue to map Quantitative Trait Loci (QTL) underlying complex discrete traits, which usually show discontinuous distribution and less information, using conventional statistical methods. Bayesian-Markov chain Monte Carlo (Bayesian-MCMC) approach is the key procedure in mapping QTL for complex binary traits, which provides a complete posterior distribution for QTL parameters using all prior information. As a consequence, Bayesian estimates of all interested variables can be obtained straightforwardly basing on their posterior samples simulated by the MCMC algorithm. In our study, utilities of Bayesian-MCMC are demonstrated using simulated several animal outbred full-sib families with different family structures for a complex binary trait underlied by both a QTL and polygene. Under the Identity-by-Descent-Based variance component random model, three samplers basing on MCMC, including Gibbs sampling, Metropolis algorithm and reversible jump MCMC, were implemented to generate the joint posterior distribution of all unknowns so that the QTL parameters were obtained by Bayesian statistical inferring. The results showed that Bayesian-MCMC approach could work well and robust under different family structures and QTL effects. As family size increases and the number of family decreases, the accuracy of the parameter estimates will be improved. When the true QTL has a small effect, using outbred population experiment design with large family size is the optimal mapping strategy.  相似文献   

5.
Markov chain Monte Carlo (MCMC) techniques are applied to simultaneously identify multiple quantitative trait loci (QTL) and the magnitude of their effects. Using a Bayesian approach a multi-locus model is fit to quantitative trait and molecular marker data, instead of fitting one locus at a time. The phenotypic trait is modeled as a linear function of the additive and dominance effects of the unknown QTL genotypes. Inference summaries for the locations of the QTL and their effects are derived from the corresponding marginal posterior densities obtained by integrating the likelihood, rather than by optimizing the joint likelihood surface. This is done using MCMC by treating the unknown QTL genotypes, and any missing marker genotypes, as augmented data and then by including these unknowns in the Markov chain cycle along with the unknown parameters. Parameter estimates are obtained as means of the corresponding marginal posterior densities. High posterior density regions of the marginal densities are obtained as confidence regions. We examine flowering time data from double haploid progeny of Brassica napus to illustrate the proposed method.  相似文献   

6.
The problem of ascertainment for linkage analysis.   总被引:2,自引:0,他引:2       下载免费PDF全文
It is generally believed that ascertainment corrections are unnecessary in linkage analysis, provided individuals are selected for study solely on the basis of trait phenotype and not on the basis of marker genotype. The theoretical rationale for this is that standard linkage analytic methods involve conditioning likelihoods on all the trait data, which may be viewed as an application of the ascertainment assumption-free (AAF) method of Ewens and Shute. In this paper, we show that when the observed pedigree structure depends on which relatives within a pedigree happen to have been the probands (proband-dependent, or PD, sampling) conditioning on all the trait data is not a valid application of the AAF method and will result in asymptotically biased estimates of genetic parameters (except under single ascertainment). Furthermore, this result holds even if the recombination fraction R is the only parameter of interest. Since the lod score is proportional to the likelihood of the marker data conditional on all the trait data, this means that when data are obtained under PD sampling the lod score will yield asymptotically biased estimates of R, and that so-called mod scores (i.e., lod scores maximized over both R and parameters theta of the trait distribution) will yield asymptotically biased estimates of R and theta. Furthermore, the problem appears to be intractable, in the sense that it is not possible to formulate the correct likelihood conditional on observed pedigree structure. In this paper we do not investigate the numerical magnitude of the bias, which may be small in many situations. On the other hand, virtually all linkage data sets are collected under PD sampling. Thus, the existence of this bias will be the rule rather than the exception in the usual applications.  相似文献   

7.
It is a challenging issue to map Quantitative Trait Loci (QTL) underlying complex discrete traits,which usually show discontinuous distribution and less information,using conventional statisti-cal methods. Bayesian-Markov chain Monte Carlo (Bayesian-MCMC) approach is the key procedure in mapping QTL for complex binary traits,which provides a complete posterior distribution for QTL parameters using all prior information. As a consequence,Bayesian estimates of all interested vari-ables can be obtained straightforwardly basing on their posterior samples simulated by the MCMC algorithm. In our study,utilities of Bayesian-MCMC are demonstrated using simulated several ani-mal outbred full-sib families with different family structures for a complex binary trait underlied by both a QTL and polygene. Under the Identity-by-Descent-Based variance component random model,three samplers basing on MCMC,including Gibbs sampling,Metropolis algorithm and reversible jump MCMC,were implemented to generate the joint posterior distribution of all unknowns so that the QTL parameters were obtained by Bayesian statistical inferring. The results showed that Bayesian-MCMC approach could work well and robust under different family structures and QTL effects. As family size increases and the number of family decreases,the accuracy of the parameter estimates will be im-proved. When the true QTL has a small effect,using outbred population experiment design with large family size is the optimal mapping strategy.  相似文献   

8.
9.
To many, the foundations of statistical inference are cryptic and irrelevant to routine statistical practice. The analysis of 2 x 2 contingency tables, omnipresent in the scientific literature, is a case in point. Fisher''s exact test is routinely used even though it has been fraught with controversy for over 70 years. The problem, not widely acknowledged, is that several different p-values can be associated with a single table, making scientific inference inconsistent. The root cause of this controversy lies in the table''s origins and the manner in which nuisance parameters are eliminated. However, fundamental statistical principles (e.g., sufficiency, ancillarity, conditionality, and likelihood) can shed light on the controversy and guide our approach in using this test. In this paper, we use these fundamental principles to show how much information is lost when the tables origins are ignored and when various approaches are used to eliminate unknown nuisance parameters. We present novel likelihood contours to aid in the visualization of information loss and show that the information loss is often virtually non-existent. We find that problems arising from the discreteness of the sample space are exacerbated by p-value-based inference. Accordingly, methods that are less sensitive to this discreteness - likelihood ratios, posterior probabilities and mid-p-values - lead to more consistent inferences.  相似文献   

10.
Stratified data arise in several settings, such as longitudinal studies or multicenter clinical trials. Between-strata heterogeneity is usually addressed by random effects models, but an alternative approach is given by fixed effects models, which treat the incidental nuisance parameters as fixed unknown quantities. This approach presents several advantages, like computational simplicity and robustness to confounding by strata. However, maximum likelihood estimates of the parameter of interest are typically affected by incidental parameter bias. A remedy to this is given by the elimination of stratum-specific parameters by exact or approximate conditioning. The latter solution is afforded by the modified profile likelihood, which is the method applied in this paper. The aim is to demonstrate how the theory of modified profile likelihoods provides convenient solutions to various inferential problems in this setting. Specific procedures are available for different kinds of response variables, and they are useful both for inferential purposes and as a diagnostic method for validating random effects models. Some examples with real data illustrate these points.  相似文献   

11.
Yi N  George V  Allison DB 《Genetics》2003,164(3):1129-1138
In this article, we utilize stochastic search variable selection methodology to develop a Bayesian method for identifying multiple quantitative trait loci (QTL) for complex traits in experimental designs. The proposed procedure entails embedding multiple regression in a hierarchical normal mixture model, where latent indicators for all markers are used to identify the multiple markers. The markers with significant effects can be identified as those with higher posterior probability included in the model. A simple and easy-to-use Gibbs sampler is employed to generate samples from the joint posterior distribution of all unknowns including the latent indicators, genetic effects for all markers, and other model parameters. The proposed method was evaluated using simulated data and illustrated using a real data set. The results demonstrate that the proposed method works well under typical situations of most QTL studies in terms of number of markers and marker density.  相似文献   

12.
Kitakado T  Kitada S  Kishino H  Skaug HJ 《Genetics》2006,173(4):2073-2082
The aim of this article is to develop an integrated-likelihood (IL) approach to estimate the genetic differentiation between populations. The conventional maximum-likelihood (ML) and pseudolikelihood (PL) methods that use sample counts of alleles may cause severe underestimations of FST, which means overestimations of theta=4Nm, when the number of sampling localities is small. To reduce such bias in the estimation of genetic differentiation, we propose an IL method in which the mean allele frequencies over populations are regarded as nuisance parameters and are eliminated by integration. To maximize the IL function, we have developed two algorithms, a Monte Carlo EM algorithm and a Laplace approximation. Our simulation studies show that the method proposed here outperforms the conventional ML and PL methods in terms of unbiasedness and precision. The IL method was applied to real data for Pacific herring and African elephants.  相似文献   

13.
MOTIVATION: The classification of samples using gene expression profiles is an important application in areas such as cancer research and environmental health studies. However, the classification is usually based on a small number of samples, and each sample is a long vector of thousands of gene expression levels. An important issue in parametric modeling for so many gene expression levels is the control of the number of nuisance parameters in the model. Large models often lead to intensive or even intractable computation, while small models may be inadequate for complex data.Methodology: We propose a two-step empirical Bayes classification method as a solution to this issue. At the first step, we use the model-based cluster algorithm with a non-traditional purpose of assigning gene expression levels to form abundance groups. At the second step, by assuming the same variance for all the genes in the same group, we substantially reduce the number of nuisance parameters in our statistical model. RESULTS: The proposed model is more parsimonious, which leads to efficient computation under an empirical Bayes estimation procedure. We consider two real examples and simulate data using our method. Desired low classification error rates are obtained even when a large number of genes are pre-selected for class prediction.  相似文献   

14.
Outcome-dependent sampling (ODS) schemes can be a cost effective way to enhance study efficiency. The case-control design has been widely used in epidemiologic studies. However, when the outcome is measured on a continuous scale, dichotomizing the outcome could lead to a loss of efficiency. Recent epidemiologic studies have used ODS sampling schemes where, in addition to an overall random sample, there are also a number of supplemental samples that are collected based on a continuous outcome variable. We consider a semiparametric empirical likelihood inference procedure in which the underlying distribution of covariates is treated as a nuisance parameter and is left unspecified. The proposed estimator has asymptotic normality properties. The likelihood ratio statistic using the semiparametric empirical likelihood function has Wilks-type properties in that, under the null, it follows a chi-square distribution asymptotically and is independent of the nuisance parameters. Our simulation results indicate that, for data obtained using an ODS design, the semiparametric empirical likelihood estimator is more efficient than conditional likelihood and probability weighted pseudolikelihood estimators and that ODS designs (along with the proposed estimator) can produce more efficient estimates than simple random sample designs of the same size. We apply the proposed method to analyze a data set from the Collaborative Perinatal Project (CPP), an ongoing environmental epidemiologic study, to assess the relationship between maternal polychlorinated biphenyl (PCB) level and children's IQ test performance.  相似文献   

15.
Wu R  Li B  Wu SS  Casella G 《Biometrics》2001,57(3):764-768
In this article, we present a maximum likelihood-based analytical approach for detecting a major gene of large effect on a quantitative trait in a progeny population derived from a mating design. Our analysis is based on a mixed genetic model specifying both major gene and background polygenic inheritance. The likelihood of the data is formulated by combining the information about population behaviors of the major gene during hybridization and its phenotypic distribution densities. The EM algorithm is implemented to obtain maximum likelihood estimates for population and quantitative genetic parameters of the major locus. This approach is applied to detect an overdominant gene governing stem volume growth in a factorial mating design of aspen trees. It is suggested that further molecular genetic research toward mapping single genes affecting aspen growth and production based on the same experimental data has a high probability of success.  相似文献   

16.
Estimating species trees using multiple-allele DNA sequence data   总被引:3,自引:0,他引:3  
Several techniques, such as concatenation and consensus methods, are available for combining data from multiple loci to produce a single statement of phylogenetic relationships. However, when multiple alleles are sampled from individual species, it becomes more challenging to estimate relationships at the level of species, either because concatenation becomes inappropriate due to conflicts among individual gene trees, or because the species from which multiple alleles have been sampled may not form monophyletic groups in the estimated tree. We propose a Bayesian hierarchical model to reconstruct species trees from multiple-allele, multilocus sequence data, building on a recently proposed method for estimating species trees from single allele multilocus data. A two-step Markov Chain Monte Carlo (MCMC) algorithm is adopted to estimate the posterior distribution of the species tree. The model is applied to estimate the posterior distribution of species trees for two multiple-allele datasets--yeast (Saccharomyces) and birds (Manacus-manakins). The estimates of the species trees using our method are consistent with those inferred from other methods and genetic markers, but in contrast to other species tree methods, it provides credible regions for the species tree. The Bayesian approach described here provides a powerful framework for statistical testing and integration of population genetics and phylogenetics.  相似文献   

17.
B R Smith  C M Herbinger  H R Merry 《Genetics》2001,158(3):1329-1338
Two Markov chain Monte Carlo algorithms are proposed that allow the partitioning of individuals into full-sib groups using single-locus genetic marker data when no parental information is available. These algorithms present a method of moving through the sibship configuration space and locating the configuration that maximizes an overall score on the basis of pairwise likelihood ratios of being full-sib or unrelated or maximizes the full joint likelihood of the proposed family structure. Using these methods, up to 757 out of 759 Atlantic salmon were correctly classified into 12 full-sib families of unequal size using four microsatellite markers. Large-scale simulations were performed to assess the sensitivity of the procedures to the number of loci and number of alleles per locus, the allelic distribution type, the distribution of families, and the independent knowledge of population allelic frequencies. The number of loci and the number of alleles per locus had the most impact on accuracy. Very good accuracy can be obtained with as few as four loci when they have at least eight alleles. Accuracy decreases when using allelic frequencies estimated in small target samples with skewed family distributions with the pairwise likelihood approach. We present an iterative approach that partly corrects that problem. The full likelihood approach is less sensitive to the precision of allelic frequencies estimates but did not perform as well with the large data set or when little information was available (e.g., four loci with four alleles).  相似文献   

18.
Y Hochberg  I Marom  R Keret  S Peleg 《Biometrics》1983,39(1):97-107
Two new estimators for calibrating unknowns from dose-response curves, in a system of quality-controlled assays, are examined. In contrast with the conventional estimator which uses only the results of the one assay in which the response of the unknown dose is measured, the new estimators also utilize the results of all other assays through the replications of the control samples in the system. The first estimator is based on maximizing the likelihood of the given system (with respect to the different dose-response parameters, the levels of the control samples and the levels of the unknowns) when response errors are normally distributed. The second estimator is a regression-like estimator obtained by subtracting from the conventional estimator its estimated regression on the deviation of the calibrated control levels in the given assay from their average values in the system. Evaluations of the reductions in bias and variance attained by the new estimators show when substantial reductions in mean square error can be expected. The new estimators are illustrated with a system of 22 hFSH radioimmunoassays.  相似文献   

19.
Approximate Bayesian computation in population genetics   总被引:23,自引:0,他引:23  
Beaumont MA  Zhang W  Balding DJ 《Genetics》2002,162(4):2025-2035
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.  相似文献   

20.
Cho H  Ibrahim JG  Sinha D  Zhu H 《Biometrics》2009,65(1):116-124
We propose Bayesian case influence diagnostics for complex survival models. We develop case deletion influence diagnostics for both the joint and marginal posterior distributions based on the Kullback-Leibler divergence (K-L divergence). We present a simplified expression for computing the K-L divergence between the posterior with the full data and the posterior based on single case deletion, as well as investigate its relationships to the conditional predictive ordinate. All the computations for the proposed diagnostic measures can be easily done using Markov chain Monte Carlo samples from the full data posterior distribution. We consider the Cox model with a gamma process prior on the cumulative baseline hazard. We also present a theoretical relationship between our case-deletion diagnostics and diagnostics based on Cox's partial likelihood. A simulated data example and two real data examples are given to demonstrate the methodology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号