首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

We have developed a new haplotyping program based on the combination of an iterative multiallelic EM algorithm (IEM), bootstrap resampling and a pseudo Gibbs sampler. The use of the IEM-bootstrap procedure considerably reduces the space of possible haplotype configurations to be explored, greatly reducing computation time, while the adaptation of the Gibbs sampler with a recombination model on this restricted space maintains high accuracy. On large SNP datasets (>30 SNPs), we used a segmented approach based on a specific partition-ligation strategy. We compared this software, Ishape (Iterative Segmented HAPlotyping by Em), with reference programs such as Phase, Fastphase, and PL-EM. Analogously with Phase, there are 2 versions of Ishape: Ishape1 which uses a simple coalescence model for the pseudo Gibbs sampler step, and Ishape2 which uses a recombination model instead.  相似文献   

2.
Abstract

A new modification of the Gibbs ensemble Monte Carlo computer simulation method for fluid phase equilibria is described. The modification is based on a thermodynamic model for the vapor phase, and uses an equation of state to account for the weak interactions between the vapor phase molecules. Reductions in the computational time by 30–40% as compared to the original Gibbs ensemble method are obtained. The algorithm is applied to Lennard-Jones - (12,6) fluids and their mixtures and the results are in good agreement with results obtained from simulations using the full Gibbs ensemble method.  相似文献   

3.
Abstract

A modification of the Gibbs ensemble Monte Carlo computer simulation method for fluid phase equilibrium is described. The modification, which is based on the assumption of a thermodynamic model for the vapor phase, reduces the computational time for the simulation as compared to the original Gibbs ensemble methods. Since the computational time is largely proportional to the number of particle-particle interactions, avoiding the direct simulation of the vapor phase typically leads to a thirty to forty percent reduction in computational time. For a pure Leonard-Jones-(12,6) fluid the results obtained at moderate reduced temperatures, T/Tc < 0.8, are in good agreement with the full Gibbs ensemble method.  相似文献   

4.
以胸径和树高作为自变量,基于多元似然分析、似乎不相关回归等方法研建了黑龙江省天然蒙古栎可加性生物量模型系统。结果表明: 树高显著提高了树干生物量模型的效果,决定系数(R2)从0.953提高到0.988,均方根误差(RMSE)减小14 kg,对树枝、树叶和树根生物量的影响并不显著。单变量(仅含胸径)和双变量(胸径、树高)幂函数形式的生物量模型系统的误差结构均为相乘型,表明对数转换后的线性模型形式更合适。树干、树枝、树叶、树根生物量模型的R2分别为0.953~0.988、0.982~0.983、0.916~0.917、0.951~0.952,RMSE分别为13.42~27.03、6.84~7.00、1.95~1.97、9.71~9.84 kg。与广义最小二乘法(FGLS)相比,贝叶斯估计产生了相似的模型拟合效果,却提供了不同变异大小的参数估计值。FGLS各参数标准误为0.054~0.211,而使用Jeffreys不变先验的两种贝叶斯估计方法(DMC和Gibbs1)产生相似的参数变异(标准差为0.055~0.221);使用均值向量为0、方差为1000且协方差为0的多元正态先验(Gibbs2)和使用来自栎属树种生物量模型历史研究汇总的先验(Gibbs3)产生了更大的变异(标准差为0.080~0.278),使用自身数据获取的先验(Gibbs4)估计得到的各参数变异小于其他方法(标准差为0.004~0.013)。当使用Gibbs4法建立模型时,两类模型不仅能提供最窄的95%预测区间,还能产生更小的预估偏差,树干、树枝、树叶、树根和总生物量在单变量模型中的平均绝对偏差百分比(MAPE)分别为19.8%、24.7%、24.6%、29.0%和13.1%,树干和总生物量在双变量模型中的MAPE分别减小到10.5%和9.8%,其他组织MAPE未改变,表明Gibbs4法能提供更准确的生物量预测值。与传统回归方法相比,准确的先验信息使贝叶斯统计在估计稳定性和不确定性减小方面具有优势。  相似文献   

5.
In phylogenetic foot-printing, putative regulatory elements are found in upstream regions of orthologous genes by searching for common motifs. Motifs in different upstream sequences are subject to mutations along the edges of the corresponding phylogenetic tree, consequently taking advantage of the tree in the motif search is an appealing idea. We describe the Motif Yggdrasil sampler; the first Gibbs sampler based on a general tree that uses unaligned sequences. Previous tree-based Gibbs samplers have assumed a star-shaped tree or partially aligned upstream regions. We give a probabilistic model (MY model) describing upstream sequences with regulatory elements and build a Gibbs sampler with respect to this model. The model allows toggling, i.e., the restriction of a position to a subset of nucleotides, but does not require aligned sequences nor edge lengths, which may be difficult to come by. We apply the collapsing technique to eliminate the need to sample nuisance parameters, and give a derivation of the predictive update formula. We show that the MY model improves the modeling of difficult motif instances and that the use of the tree achieves a substantial increase in nucleotide level correlation coefficient both for synthetic data and 37 bacterial lexA genes. We investigate the sensitivity to errors in the tree and show that using random trees MY sampler still has a performance similar to the original version.  相似文献   

6.
An increased availability of genotypes at marker loci has prompted the development of models that include the effect of individual genes. Selection based on these models is known as marker-assisted selection (MAS). MAS is known to be efficient especially for traits that have low heritability and non-additive gene action. BLUP methodology under non-additive gene action is not feasible for large inbred or crossbred pedigrees. It is easy to incorporate non-additive gene action in a finite locus model. Under such a model, the unobservable genotypic values can be predicted using the conditional mean of the genotypic values given the data. To compute this conditional mean, conditional genotype probabilities must be computed. In this study these probabilities were computed using iterative peeling, and three Markov chain Monte Carlo (MCMC) methods – scalar Gibbs, blocking Gibbs, and a sampler that combines the Elston Stewart algorithm with iterative peeling (ESIP). The performance of these four methods was assessed using simulated data. For pedigrees with loops, iterative peeling fails to provide accurate genotype probability estimates for some pedigree members. Also, computing time is exponentially related to the number of loci in the model. For MCMC methods, a linear relationship can be maintained by sampling genotypes one locus at a time. Out of the three MCMC methods considered, ESIP, performed the best while scalar Gibbs performed the worst.  相似文献   

7.
Bayesian mixture model based clustering of replicated microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: Identifying patterns of co-expression in microarray data by cluster analysis has been a productive approach to uncovering molecular mechanisms underlying biological processes under investigation. Using experimental replicates can generally improve the precision of the cluster analysis by reducing the experimental variability of measurements. In such situations, Bayesian mixtures allow for an efficient use of information by precisely modeling between-replicates variability. RESULTS: We developed different variants of Bayesian mixture based clustering procedures for clustering gene expression data with experimental replicates. In this approach, the statistical distribution of microarray data is described by a Bayesian mixture model. Clusters of co-expressed genes are created from the posterior distribution of clusterings, which is estimated by a Gibbs sampler. We define infinite and finite Bayesian mixture models with different between-replicates variance structures and investigate their utility by analyzing synthetic and the real-world datasets. Results of our analyses demonstrate that (1) improvements in precision achieved by performing only two experimental replicates can be dramatic when the between-replicates variability is high, (2) precise modeling of intra-gene variability is important for accurate identification of co-expressed genes and (3) the infinite mixture model with the 'elliptical' between-replicates variance structure performed overall better than any other method tested. We also introduce a heuristic modification to the Gibbs sampler based on the 'reverse annealing' principle. This modification effectively overcomes the tendency of the Gibbs sampler to converge to different modes of the posterior distribution when started from different initial positions. Finally, we demonstrate that the Bayesian infinite mixture model with 'elliptical' variance structure is capable of identifying the underlying structure of the data without knowing the 'correct' number of clusters. AVAILABILITY: The MS Windows based program named Gaussian Infinite Mixture Modeling (GIMM) implementing the Gibbs sampler and corresponding C++ code are available at http://homepages.uc.edu/~medvedm/GIMM.htm SUPPLEMENTAL INFORMATION: http://expression.microslu.washington.edu/expression/kayee/medvedovic2003/medvedovic_bioinf2003.html  相似文献   

8.
Simulated data were used to investigate the influence of the choice of priors on estimation of genetic parameters in multivariate threshold models using Gibbs sampling. We simulated additive values, residuals and fixed effects for one continuous trait and liabilities of four binary traits, and QTL effects for one of the liabilities. Within each of four replicates six different datasets were generated which resembled different practical scenarios in horses with respect to number and distribution of animals with trait records and availability of QTL information. (Co)Variance components were estimated using a Bayesian threshold animal model via Gibbs sampling. The Gibbs sampler was implemented with both a flat and a proper prior for the genetic covariance matrix. Convergence problems were encountered in > 50% of flat prior analyses, with indications of potential or near posterior impropriety between about round 10 000 and 100 000. Terminations due to non-positive definite genetic covariance matrix occurred in flat prior analyses of the smallest datasets. Use of a proper prior resulted in improved mixing and convergence of the Gibbs chain. In order to avoid (near) impropriety of posteriors and extremely poorly mixing Gibbs chains, a proper prior should be used for the genetic covariance matrix when implementing the Gibbs sampler.  相似文献   

9.
Y Zhu  C C Chen  J A King  L B Evans 《Biochemistry》1992,31(43):10591-10601
The native state of a protein molecule in aqueous solutions represents one of the lowest states of Gibbs energy [Anfinsen, C.B. (1973) Science 181, 223-230]. Much progress has been made about the rules of protein folding [King, J. (1989) Chem. Eng. News 67, 32-54] and the dominant forces in protein folding [Dill, K.A. (1990) Biochemistry 29, 7133-7155]. However, the quantitative contributions of different Gibbs energy terms to protein stability remains a controversial issue [Moult, J., & Unger, R. (1991) Biochemistry 30, 3816-3824]. A molecular thermodynamic model has been proposed for the Gibbs energy of folding a residue in aqueous homopolypeptides from a random-coiled state to either the alpha-helix state or the beta-sheet state [Chen, C.-C., Zhu, Y., King, J.A., & Evans, L.B. (1992) Biopolymers 32, 1375-1392]. In this work, we present a generalization of the molecular thermodynamic model for the Gibbs energy of folding natural and synthetic heteropolypeptides from random-coiled conformations into alpha-helical conformations. The generalized model incorporates the intrinsic folding potential due to residue-solvent interactions, the cooperative folding effect due to residue-residue interactions, and the location and length of alpha-helices. The utility of the model was demonstrated by examining the stability of alpha-helical conformations of a number of natural polypeptides including C-peptide (residues 1-13) and S-peptide (residues 1-20) of RNase A (bovine pancreatic ribonuclease A), the P alpha fragment in BPTI (bovine pancreatic trypsin inhibitor), and synthetic polypeptides (the copolymers of different amino acid residues) including alanine-based peptides (16 or 17 residues long) in water. The computed Gibbs energies correspond well with the experimental data on helicity. The results also accounted for the effects of amino acid substitution and temperature on the stability of alpha-helical conformations of the test polypeptides.  相似文献   

10.
11.
We propose a new Markov Chain Monte Carlo (MCMC) sampling mechanism for Bayesian phylogenetic inference. This method, which we call conjugate Gibbs, relies on analytical conjugacy properties, and is based on an alternation between data augmentation and Gibbs sampling. The data augmentation step consists in sampling a detailed substitution history for each site, and across the whole tree, given the current value of the model parameters. Provided convenient priors are used, the parameters of the model can then be directly updated by a Gibbs sampling procedure, conditional on the current substitution history. Alternating between these two sampling steps yields a MCMC device whose equilibrium distribution is the posterior probability density of interest. We show, on real examples, that this conjugate Gibbs method leads to a significant improvement of the mixing behavior of the MCMC. In all cases, the decorrelation times of the resulting chains are smaller than those obtained by standard Metropolis Hastings procedures by at least one order of magnitude. The method is particularly well suited to heterogeneous models, i.e. assuming site-specific random variables. In particular, the conjugate Gibbs formalism allows one to propose efficient implementations of complex models, for instance assuming site-specific substitution processes, that would not be accessible to standard MCMC methods.  相似文献   

12.
In the present work an equilibrium model (gas-solid), based on the minimization of the Gibbs energy, has been used in order to estimate the theoretical yield and the equilibrium composition of the reaction products (syngas and char) of biomass thermochemical conversion processes (pyrolysis and gasification). The data obtained from this model have also been used to calculate the heating value of the fuel gas, in order to evaluate the overall energy efficiency of the thermal conversion stage. The proposed model has been applied both to partial oxidation and steam gasification processes with varying air to biomass (ER) and steam to carbon (SC) ratio values and using different feedstocks; the obtained results have been compared with experimental data and with other model predictions obtaining a satisfactory agreement.  相似文献   

13.
Wolfinger RD  Kass RE 《Biometrics》2000,56(3):768-774
We consider the usual normal linear mixed model for variance components from a Bayesian viewpoint. With conjugate priors and balanced data, Gibbs sampling is easy to implement; however, simulating from full conditionals can become difficult for the analysis of unbalanced data with possibly nonconjugate priors, thus leading one to consider alternative Markov chain Monte Carlo schemes. We propose and investigate a method for posterior simulation based on an independence chain. The method is customized to exploit the structure of the variance component model, and it works with arbitrary prior distributions. As a default reference prior, we use a version of Jeffreys' prior based on the integrated (restricted) likelihood. We demonstrate the ease of application and flexibility of this approach in familiar settings involving both balanced and unbalanced data.  相似文献   

14.
This paper is concerned with the statistical inference of a truncated Dirichlet distribution (TDD) arising in the general context of misclassified multinomial models (such as medical screening or diagnostic tests) and experimental design with mixtures. By employing the conditional distribution method, we offer a generating procedure for the TDD. Alternatively, a sampling‐based approach using the Gibbs sampler was provided as a means for developing the posterior moments of interest. Finding the mode of a TDD is equivalent to extracting the constrained maximum likelihood estimate (MLE) of parameter vector in a multinomial model. Based upon a theoretic result, we propose an algorithm to calculate the constrained MLE. Applications in misclassification are presented.  相似文献   

15.
This paper provides an extensive review of the literature on the Gibbs ensemble Monte Carlo method for direct determination of phase coexistence in fluids. The Gibbs ensemble technique is based on performing a simulation in two distinct regions in a way that ensures that the conditions of phase coexistence are satisfied in a statistical sense. Contrary to most other available techniques for this purpose, such as thermodynamic integration, grand canonical Monte Carlo or Widom test particle insertions, the Gibbs ensemble technique involves only a single simulation per coexistence point. A significant body of literature now exists on the method, its theoretical foundations, and proposed modifications for efficient determination of equilibria involving dense fluids and complex intermolecular potentials. Some practical aspects of Gibbs ensemble simulation are also discussed in this review. Applications of the technique to date range from studies of simple model potentials (for example Lennard–Jones, square-well or Yukawa fluids) to calculations of equilibria in mixtures with components described by realistic potentials. We conclude by discussing the limitations of the technique and potential future applications.  相似文献   

16.
Henderson's mixed model equations system is generally required in a Gibbs sampling application. In two previous studies, we proposed two indirect solving approaches that give dominance values in an animal model context with no need to process all this system. The first one does not require D-1 and the second is based on processing the additive animal model residuals. In the present work, we show that these two methods can be handled iteratively. Since the Bayesian approach is now a widely used tool in estimation of genetic parameters, the main part of this work is devoted to a Gibbs sampling application that can be accelerated by means of the aforementioned indirect solving methods. Three replicates of a population data set are simulated in the paper to compare the applications and estimates. This shows effectively that the estimates given by implementing a Gibbs sampler with each of the two suggested solving methods are obtained with less computational time and are comparable to those given by considering the integral system, particularly when priors are more weighted.  相似文献   

17.
Gallic acid (GA) is important for pharmaceutical industries as an antioxidant. It also finds use in tanning, ink dyes and manufacturing of paper. Molecularly imprinted polymers (MIP), which are tailor made materials, can play an excellent role in separation of GA from complex matrices. Molecular recognition being the most important property of MIP, the present work proposes a methodology based on density functional theory (DFT) calculations for selection of suitable functional monomer for a rational design of MIP with a high binding capacity for GA. A virtual library of 18 functional monomers was created and screened for the template GA. The prepolymerization template-monomer complexes were optimized at B3LYP/6-31G(d) model chemistry and the changes in the Gibbs free energy (ΔG) due to complex formation were determined on the optimized structures. The monomer with the highest Gibbs free energy gain forms most stable complex with the template resulting in formation of more selective binding sites in the polymeric matrix in MIPs. This can lead to high binding capacity of MIP for GA. Amongst the 18 monomers, acrylic acid (AA) and acrylamide (AAm) gave the highest value of ΔG due to complex formation with GA. 4-vinyl pyridine (4-Vp) had intermediate value of ΔG while, methyl methacrylate (MMA) gave least value of ΔG due to complex formation with GA. Based on this study, the MIPs were synthesized and rebinding performance was evaluated using Langmuir-Freundlich model. The imprinting factor for AA and AAm based MIPs were 5.28 and 4.80 respectively, 4-Vp based MIP had imprinting factor of 2.59 while MMA based MIP exhibited an imprinting factor of 1.95. The experimental results were in good agreement with the computational predictions. The experimental data validated the DFT based computational approach.  相似文献   

18.
Multi-trait (co)variance estimation is an important topic in plant and animal breeding. In this study we compare estimates obtained with restricted maximum likelihood (REML) and Bayesian Gibbs sampling of simulated data and of three traits (diameter, height and branch angle) from a 26-year-old partial diallel progeny test of Scots pine (Pinus sylvestris L.). Based on the results from the simulated data we can conclude that the REML estimates are accurate but the mode of posterior distributions from the Gibbs sampling can be overestimated depending on the level of the heritability. The mean and median of the posteriors were considerably higher than the expected values of the heritabilities. The confidence intervals calculated with the delta method were biased downwardly. The highest probablity density (HPD) interval provides a better interval estimate, but could be slightly biased at the lower level. Similar differences between REML and Gibbs sampling estimates were found for the Scots pine data. We conclude that further simulation studies are needed in order to evaluate the effect of different priors on (co)variance components in the genetic individual model.  相似文献   

19.
Yi N 《Genetics》2004,167(2):967-975
In this article, a unified Markov chain Monte Carlo (MCMC) framework is proposed to identify multiple quantitative trait loci (QTL) for complex traits in experimental designs, based on a composite space representation of the problem that has fixed dimension. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to the composite model space for mapping multiple QTL. The unified framework not only results in some new algorithms, but also gives useful insight into some of the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. Finally, we develop strategies to improve the performance of MCMC algorithms.  相似文献   

20.
The identification of MHC restricted epitopes is an important goal in peptide based vaccine and diagnostic development. As wet lab experiments for identification of MHC binding peptide are expensive and time consuming, in silico tools have been developed as fast alternatives, however with low performance. In the present study, we used IEDB training and blind validation datasets for the prediction of peptide binding to fourteen human MHC class I and II molecules using Gibbs motif sampler, weight matrix and artificial neural network methods. As compare to MHC class I predictor based on sequence weighting (Aroc=0.95 and CC=0.56) and artificial neural network (Aroc=0.73 and CC=0.25), MHC class II predictor based on Gibbs sampler did not perform well (Aroc=0.62 and CC=0.19). The predictive accuracy of Gibbs motif sampler in identifying the 9-mer cores of a binding peptide to DRB1 alleles are also limited (40¢), however above the random prediction (14¢). Therefore, the size of dataset (training and validation) and the correct identification of the binding core are the two main factors limiting the performance of MHC class-II binding peptide prediction. Overall, these data suggest that there is substantial room to improve the quality of the core predictions using novel approaches that capture distinct features of MHC-peptide interactions than the current approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号