首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the case of the mixed linear model the random effects are usually assumed to be normally distributed in both the Bayesian and classical frameworks. In this paper, the Dirichlet process prior was used to provide nonparametric Bayesian estimates for correlated random effects. This goal was achieved by providing a Gibbs sampler algorithm that allows these correlated random effects to have a nonparametric prior distribution. A sampling based method is illustrated. This method which is employed by transforming the genetic covariance matrix to an identity matrix so that the random effects are uncorrelated, is an extension of the theory and the results of previous researchers. Also by using Gibbs sampling and data augmentation a simulation procedure was derived for estimating the precision parameter M associated with the Dirichlet process prior. All needed conditional posterior distributions are given. To illustrate the application, data from the Elsenburg Dormer sheep stud were analysed. A total of 3325 weaning weight records from the progeny of 101 sires were used.  相似文献   

2.
Carey VJ  Baker CJ  Platt R 《Biometrics》2001,57(1):135-142
In the study of immune responses to infectious pathogens, the minimum protective antibody concentration (MPAC) is a quantity of great interest. We use case-control data to estimate the posterior distribution of the conditional risk of disease given a lower bound on antibody concentration in an at-risk subject. The concentration bound beyond which there is high credibility that infection risk is zero or nearly so is a candidate for the MPAC. A very simple Gibbs sampling procedure that permits inference on the risk of disease given antibody level is presented. In problems involving small numbers of patients, the procedure is shown to have favorable accuracy and robustness to choice/misspecification of priors. Frequentist evaluation indicates good coverage probabilities of credibility intervals for antibody-dependent risk, and rules for estimation of the MPAC are illustrated with epidemiological data.  相似文献   

3.
An experiment design procedure is proposed for nonlinear parameter estimation studies that formally incorporates prior parameter uncertainty. The design criterion derives from information theory considerations and involves an asymptotic interpretation of the expected posterior information provided by an experiment. A pharmacokinetic sample schedule design problem is used to illustrate and evaluate this information theoretic design strategy. The model considered is commonly used to describe the plasma concentration of a drug following its oral administration. The limitations and advantages of the proposed design procedure are discussed in relation to other previously reported design techniques for incorporating parameter uncertainty.  相似文献   

4.
Dunson DB  Chen Z 《Biometrics》2004,60(2):352-358
In multivariate survival analysis, investigators are often interested in testing for heterogeneity among clusters, both overall and within specific classes. We represent different hypotheses about the heterogeneity structure using a sequence of gamma frailty models, ranging from a null model with no random effects to a full model having random effects for each class. Following a Bayesian approach, we define prior distributions for the frailty variances consisting of mixtures of point masses at zero and inverse-gamma densities. Since frailties with zero variance effectively drop out of the model, this prior allocates probability to each model in the sequence, including the overall null hypothesis of homogeneity. Using a counting process formulation, the conditional posterior distributions of the frailties and proportional hazards regression coefficients have simple forms. Posterior computation proceeds via a data augmentation Gibbs sampling algorithm, a single run of which can be used to obtain model-averaged estimates of the population parameters and posterior model probabilities for testing hypotheses about the heterogeneity structure. The methods are illustrated using data from a lung cancer trial.  相似文献   

5.
Inferring speciation times under an episodic molecular clock   总被引:5,自引:0,他引:5  
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.  相似文献   

6.
The activity of a neural net is represented in terms of a matrix vector equation with a normalizing operator in which the matrix represents only the complete structure of the net, and the normalized vector-matrix product represents the activity of all the non-afferent neurons. The activity vectors are functions of a quantized time variable whose elements are zero (no activity) or one (activity). Certain properties of the structure matrix are discussed and the computational procedure which results from the matrix vector equation is illustrated by a specific example.  相似文献   

7.
Bayes decision procedures are considered for change point estimation in the simple bilinear segmented model. A discretized normal prior density is employed as the prior distribution for the change point index. Posterior probability functions are developed for this index under a vague prior formulation on the regression parameters. The procedure is applied to an example involving mercury toxicity data.  相似文献   

8.
James H. MacLeod 《CMAJ》1966,95(3):114-117
Three methods of blood loss estimation which are simple, accurate and cheap are: (1) weighing of sponges; (2) estimation of blood volume with Evans blue dye and (3) measurement of central venous pressure (CVP).Weighing of sponges is a valuable operating-room procedure although it has certain defects some of which are described. The Evans dye method is used chiefly in preoperative assessment when hypovolemia is suspected but serial estimations are feasible and can be performed in 45 minutes. Measurement of CVP, however, is the best single criterion of effective blood volume in relation to cardiac functional capacity and is the best guide to blood and fluid requirements. A simple “homemade” device for making serial CVP determinations, incorporating a manometer and a zero level, is described.These methods of blood loss estimation do not supersede the traditional methods of the clinical assessment of the surgical patient, but are valuable adjuncts to such assessment.  相似文献   

9.
We implement a Bayesian Markov chain Monte Carlo algorithm for estimating species divergence times that uses heterogeneous data from multiple gene loci and accommodates multiple fossil calibration nodes. A birth-death process with species sampling is used to specify a prior for divergence times, which allows easy assessment of the effects of that prior on posterior time estimates. We propose a new approach for specifying calibration points on the phylogeny, which allows the use of arbitrary and flexible statistical distributions to describe uncertainties in fossil dates. In particular, we use soft bounds, so that the probability that the true divergence time is outside the bounds is small but nonzero. A strict molecular clock is assumed in the current implementation, although this assumption may be relaxed. We apply our new algorithm to two data sets concerning divergences of several primate species, to examine the effects of the substitution model and of the prior for divergence times on Bayesian time estimation. We also conduct computer simulation to examine the differences between soft and hard bounds. We demonstrate that divergence time estimation is intrinsically hampered by uncertainties in fossil calibrations, and the error in Bayesian time estimates will not go to zero with increased amounts of sequence data. Our analyses of both real and simulated data demonstrate potentially large differences between divergence time estimates obtained using soft versus hard bounds and a general superiority of soft bounds. Our main findings are as follows. (1) When the fossils are consistent with each other and with the molecular data, and the posterior time estimates are well within the prior bounds, soft and hard bounds produce similar results. (2) When the fossils are in conflict with each other or with the molecules, soft and hard bounds behave very differently; soft bounds allow sequence data to correct poor calibrations, while poor hard bounds are impossible to overcome by any amount of data. (3) Soft bounds eliminate the need for "safe" but unrealistically high upper bounds, which may bias posterior time estimates. (4) Soft bounds allow more reliable assessment of estimation errors, while hard bounds generate misleadingly high precisions when fossils and molecules are in conflict.  相似文献   

10.
Researchers are often interested in predicting outcomes, detecting distinct subgroups of their data, or estimating causal treatment effects. Pathological data distributions that exhibit skewness and zero‐inflation complicate these tasks—requiring highly flexible, data‐adaptive modeling. In this paper, we present a multipurpose Bayesian nonparametric model for continuous, zero‐inflated outcomes that simultaneously predicts structural zeros, captures skewness, and clusters patients with similar joint data distributions. The flexibility of our approach yields predictions that capture the joint data distribution better than commonly used zero‐inflated methods. Moreover, we demonstrate that our model can be coherently incorporated into a standardization procedure for computing causal effect estimates that are robust to such data pathologies. Uncertainty at all levels of this model flow through to the causal effect estimates of interest—allowing easy point estimation, interval estimation, and posterior predictive checks verifying positivity, a required causal identification assumption. Our simulation results show point estimates to have low bias and interval estimates to have close to nominal coverage under complicated data settings. Under simpler settings, these results hold while incurring lower efficiency loss than comparator methods. We use our proposed method to analyze zero‐inflated inpatient medical costs among endometrial cancer patients receiving either chemotherapy or radiation therapy in the SEER‐Medicare database.  相似文献   

11.
Stoicheiometric analysis of metabolic pathways provides a systematic way of determining which metabolite concentrations are subject to constraints, information that may otherwise be very difficult to recognize in a large branched pathway. The procedure involves representing the pathway structure in the form of a matrix and then carrying out row operations to convert the matrix into "row echelon form": this is a form in which as many as possible of the elements on the main diagonal are non-zero, and all of the elements below the main diagonal are zero. If exactly the same operations are carried out on a unit matrix of order equal to the number of intermediate metabolites in the pathway, the resulting matrix allows the stoicheiometric constraints to be read off directly.  相似文献   

12.
Linear mixed model (LMM) analysis has been recently used extensively for estimating additive genetic variances and narrow-sense heritability in many genomic studies. While the LMM analysis is computationally less intensive than the Bayesian algorithms, it remains infeasible for large-scale genomic data sets. In this paper, we advocate the use of a statistical procedure known as symmetric differences squared (SDS) as it may serve as a viable alternative when the LMM methods have difficulty or fail to work with large datasets. The SDS procedure is a general and computationally simple method based only on the least squares regression analysis. We carry out computer simulations and empirical analyses to compare the SDS procedure with two commonly used LMM-based procedures. Our results show that the SDS method is not as good as the LMM methods for small data sets, but it becomes progressively better and can match well with the precision of estimation by the LMM methods for data sets with large sample sizes. Its major advantage is that with larger and larger samples, it continues to work with the increasing precision of estimation while the commonly used LMM methods are no longer able to work under our current typical computing capacity. Thus, these results suggest that the SDS method can serve as a viable alternative particularly when analyzing ‘big’ genomic data sets.  相似文献   

13.
Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.  相似文献   

14.
Ranked set sampling (RSS) is a sampling procedure that can be considerably more efficient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure.  相似文献   

15.
Calibration is a critical step in every molecular clock analysis but it has been the least considered. Bayesian approaches to divergence time estimation make it possible to incorporate the uncertainty in the degree to which fossil evidence approximates the true time of divergence. We explored the impact of different approaches in expressing this relationship, using arthropod phylogeny as an example for which we established novel calibrations. We demonstrate that the parameters distinguishing calibration densities have a major impact upon the prior and posterior of the divergence times, and it is critically important that users evaluate the joint prior distribution of divergence times used by their dating programmes. We illustrate a procedure for deriving calibration densities in Bayesian divergence dating through the use of soft maximum constraints.  相似文献   

16.
Ronald A. Fisher, who is the founder of maximum likelihood estimation (ML estimation), criticized the Bayes estimation of using a uniform prior distribution, because we can create estimates arbitrarily if we use Bayes estimation by changing the transformation used before the analysis. Thus, the Bayes estimates lack the scientific objectivity, especially when the amount of data is small. However, we can use the Bayes estimates as an approximation to the objective ML estimates if we use an appropriate transformation that makes the posterior distribution close to a normal distribution. One-to-one correspondence exists between a uniform prior distribution under a transformed scale and a non-uniform prior distribution under the original scale. For this reason, the Bayes estimation of ML estimates is essentially identical to the estimation using Jeffreys prior.  相似文献   

17.

Background

The incorporation of genomic coefficients into the numerator relationship matrix allows estimation of breeding values using all phenotypic, pedigree and genomic information simultaneously. In such a single-step procedure, genomic and pedigree-based relationships have to be compatible. As there are many options to create genomic relationships, there is a question of which is optimal and what the effects of deviations from optimality are.

Methods

Data of litter size (total number born per litter) for 338,346 sows were analyzed. Illumina PorcineSNP60 BeadChip genotypes were available for 1,989. Analyses were carried out with the complete data set and with a subset of genotyped animals and three generations pedigree (5,090 animals). A single-trait animal model was used to estimate variance components and breeding values. Genomic relationship matrices were constructed using allele frequencies equal to 0.5 (G05), equal to the average minor allele frequency (GMF), or equal to observed frequencies (GOF). A genomic matrix considering random ascertainment of allele frequencies was also used (GOF*). A normalized matrix (GN) was obtained to have average diagonal coefficients equal to 1. The genomic matrices were combined with the numerator relationship matrix creating H matrices.

Results

In G05 and GMF, both diagonal and off-diagonal elements were on average greater than the pedigree-based coefficients. In GOF and GOF*, the average diagonal elements were smaller than pedigree-based coefficients. The mean of off-diagonal coefficients was zero in GOF and GOF*. Choices of G with average diagonal coefficients different from 1 led to greater estimates of additive variance in the smaller data set. The correlation between EBV and genomic EBV (n = 1,989) were: 0.79 using G05, 0.79 using GMF, 0.78 using GOF, 0.79 using GOF*, and 0.78 using GN. Accuracies calculated by inversion increased with all genomic matrices. The accuracies of genomic-assisted EBV were inflated in all cases except when GN was used.

Conclusions

Parameter estimates may be biased if the genomic relationship coefficients are in a different scale than pedigree-based coefficients. A reasonable scaling may be obtained by using observed allele frequencies and re-scaling the genomic relationship matrix to obtain average diagonal elements of 1.  相似文献   

18.
The identification of sources of cerebral activity is investigated, and the resulting methodology is applied to the simple case of hippocampal afterdischarges in the cat. We develop an “imagery” technique which consists in defining, in a preliminary step, the number and the power spectrum density of unknown sources (identification of sources) assumed to emit independent signals in the ill-defined noisy cerebral medium. The technique assumes the medium to be quasilinear and quasistationary, and these assumptions have to be checked. It is based upon the interspectral matrix and its diagonal form. It makes it clear that (1) the problem of estimating the number of sources is closely dependent on the estimation method used to assess the power spectrum density, and (2) the coherence matrix should be preferred to the interspectral matrix for reasons linked with the estimation variance of its elements and the proximity of the sources and sensors. In order to assess the validity of the methodology, a source of hippocampal afterdischarges has been created by threshold stimulation of the ventral hippocampus of the cat. The resulting EEG signals are used to show that there is a single source and to estimate its power density spectrum, which can then be compared with the true one.  相似文献   

19.
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.  相似文献   

20.
In the field of animal breeding, estimation of genetic parameters and prediction of breeding values are routinely conducted by analyzing quantitative traits. Using an animal model and including the direct inverse of a numerator relationship matrix (NRM) into a mixed model has made these analyses possible. However, a method including a genetically identical animal (GIA) in NRM if genetic relationships between pairs of GIAs are not perfect, is still lacking. Here, we describe a method to incorporate GIAs into NRM using a K matrix in which diagonal elements are set to 1.0, off-diagonal elements between pairs of GIAs to (1-x) and the other elements to 0, where x is a constant less than 0.05. The inverse of the K matrix is then calculated directly by a simple formula. Thus, the inverse of the NRM is calculated by the products of the lower triangular matrix that identifies the parents of each individual, its transpose matrix, the inverse of the K matrix and the inverse of diagonal matrix D, in which the diagonal elements comprise a number of known parents and their inbreeding coefficients. The computing method is adaptable to the analysis of a data set including pairs of GIAs with imperfect relationships.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号