首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
We analyzed the phylogeny of the Neotropical pitvipers within the Porthidium group (including intra-specific through inter-generic relationships) using 1.4 kb of DNA sequences from two mitochondrial protein-coding genes (ND4 and cyt-b). We investigated how Bayesian Markov chain Monte-Carlo (MCMC) phylogenetic hypotheses based on this 'mesoscale' dataset were affected by analysis under various complex models of nucleotide evolution that partition models across the dataset. We develop an approach, employing three statistics (Akaike weights, Bayes factors, and relative Bayes factors), for examining the performance of complex models in order to identify the best-fit model for data analysis. Our results suggest that: (1) model choice may have important practical effects on phylogenetic conclusions even for mesoscale datasets, (2) the use of a complex partitioned model did not produce widespread increases or decreases in nodal posterior probability support, and (3) most differences in resolution resulting from model choice were concentrated at deeper nodes. Our phylogenetic estimates of relationships among members of the Porthidium group (genera: Atropoides, Cerrophidion, and Porthidium) resolve the monophyly of the three genera. Bayesian MCMC results suggest that Cerrophidion and Porthidium form a clade that is the sister taxon to Atropoides. In addition to resolving the intra-specific relationships among a majority of Porthidium group taxa, our results highlight phylogeographic patterns across Middle and South America and suggest that each of the three genera may harbor undescribed species diversity.  相似文献   

2.
MOTIVATION: Finding differentially expressed genes is a fundamental objective of a microarray experiment. Numerous methods have been proposed to perform this task. Existing methods are based on point estimates of gene expression level obtained from each microarray experiment. This approach discards potentially useful information about measurement error that can be obtained from an appropriate probe-level analysis. Probabilistic probe-level models can be used to measure gene expression and also provide a level of uncertainty in this measurement. This probe-level measurement error provides useful information which can help in the identification of differentially expressed genes. RESULTS: We propose a Bayesian method to include probe-level measurement error into the detection of differentially expressed genes from replicated experiments. A variational approximation is used for efficient parameter estimation. We compare this approximation with MAP and MCMC parameter estimation in terms of computational efficiency and accuracy. The method is used to calculate the probability of positive log-ratio (PPLR) of expression levels between conditions. Using the measurements from a recently developed Affymetrix probe-level model, multi-mgMOS, we test PPLR on a spike-in dataset and a mouse time-course dataset. Results show that the inclusion of probe-level measurement error improves accuracy in detecting differential gene expression. AVAILABILITY: The MAP approximation and variational inference described in this paper have been implemented in an R package pplr. The MCMC method is implemented in Matlab. Both software are available from http://umber.sbs.man.ac.uk/resources/puma.  相似文献   

3.
A Markov chain Monte Carlo (MCMC) implemented Bayesian method has been developed to detect quantitative trait loci (QTL) effects and Q × E interaction effects. However, the MCMC algorithm is time consuming due to repeated samplings of QTL parameters. We developed an expectation and maximization (EM) algorithm as an alternative method for detecting QTL and Q × E interaction. Simulation studies and real data analysis showed that the EM algorithm produced comparable result as the Bayesian method, but with a speed many magnitudes faster than the MCMC algorithm. We used the EM algorithm to analyze a well known barley dataset produced by the North American Barley Genome Mapping Project. The dataset contained eight quantitative traits collected from 150 doubled-haploid (DH) lines evaluated in multiple environments. Each line was genotyped for 495 polymorphic markers. The result showed that all eight traits exhibited QTL main effects and Q × E interaction effects. On average, the main effects and Q × E interaction effects contributed 34.56 and 16.23% of the total phenotypic variance, respectively. Furthermore, we found that whether or not a locus shows Q × E interaction does not depend on the presence of main effect.  相似文献   

4.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.  相似文献   

5.
Yi N 《Genetics》2004,167(2):967-975
In this article, a unified Markov chain Monte Carlo (MCMC) framework is proposed to identify multiple quantitative trait loci (QTL) for complex traits in experimental designs, based on a composite space representation of the problem that has fixed dimension. The proposed unified approach includes the existing Bayesian QTL mapping methods using reversible jump MCMC algorithm as special cases. We also show that a variety of Bayesian variable selection methods using Gibbs sampling can be applied to the composite model space for mapping multiple QTL. The unified framework not only results in some new algorithms, but also gives useful insight into some of the important factors governing the performance of Gibbs sampling and reversible jump for mapping multiple QTL. Finally, we develop strategies to improve the performance of MCMC algorithms.  相似文献   

6.
MOTIVATION: Affymetrix GeneChip arrays are currently the most widely used microarray technology. Many summarization methods have been developed to provide gene expression levels from Affymetrix probe-level data. Most of the currently popular methods do not provide a measure of uncertainty for the expression level of each gene. The use of probabilistic models can overcome this limitation. A full hierarchical Bayesian approach requires the use of computationally intensive MCMC methods that are impractical for large datasets. An alternative computationally efficient probabilistic model, mgMOS, uses Gamma distributions to model specific and non-specific binding with a latent variable to capture variations in probe affinity. Although promising, the main limitations of this model are that it does not use information from multiple chips and does not account for specific binding to the mismatch (MM) probes. RESULTS: We extend mgMOS to model the binding affinity of probe-pairs across multiple chips and to capture the effect of specific binding to MM probes. The new model, multi-mgMOS, provides improved accuracy, as demonstrated on some bench-mark datasets and a real time-course dataset, and is much more computationally efficient than a competing hierarchical Bayesian approach that requires MCMC sampling. We demonstrate how the probabilistic model can be used to estimate credibility intervals for expression levels and their log-ratios between conditions. AVAILABILITY: Both mgMOS and the new model multi-mgMOS have been implemented in an R package, which is available at http://www.bioinf.man.ac.uk/resources/puma.  相似文献   

7.
SUMMARY: The fundamental problem of gene selection via cDNA data is to identify which genes are differentially expressed across different kinds of tissue samples (e.g. normal and cancer). cDNA data contain large number of variables (genes) and usually the sample size is relatively small so the selection process can be unstable. Therefore, models which incorporate sparsity in terms of variables (genes) are desirable for this kind of problem. This paper proposes a two-level hierarchical Bayesian model for variable selection which assumes a prior that favors sparseness. We adopt a Markov chain Monte Carlo (MCMC) based computation technique to simulate the parameters from the posteriors. The method is applied to leukemia data from a previous study and a published dataset on breast cancer. SUPPLEMENTARY INFORMATION: http://stat.tamu.edu/people/faculty/bmallick.html.  相似文献   

8.
9.
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.  相似文献   

10.
Maximum likelihood haplotyping for general pedigrees   总被引:3,自引:0,他引:3  
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.  相似文献   

11.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

12.
Summary .  We compare two Monte Carlo (MC) procedures, sequential importance sampling (SIS) and Markov chain Monte Carlo (MCMC), for making Bayesian inferences about the unknown states and parameters of state–space models for animal populations. The procedures were applied to both simulated and real pup count data for the British grey seal metapopulation, as well as to simulated data for a Chinook salmon population. The MCMC implementation was based on tailor-made proposal distributions combined with analytical integration of some of the states and parameters. SIS was implemented in a more generic fashion. For the same computing time MCMC tended to yield posterior distributions with less MC variation across different runs of the algorithm than the SIS implementation with the exception in the seal model of some states and one of the parameters that mixed quite slowly. The efficiency of the SIS sampler greatly increased by analytically integrating out unknown parameters in the observation model. We consider that a careful implementation of MCMC for cases where data are informative relative to the priors sets the gold standard, but that SIS samplers are a viable alternative that can be programmed more quickly. Our SIS implementation is particularly competitive in situations where the data are relatively uninformative; in other cases, SIS may require substantially more computer power than an efficient implementation of MCMC to achieve the same level of MC error.  相似文献   

13.
QTL analysis in arbitrary pedigrees with incomplete marker information   总被引:3,自引:0,他引:3  
Vogl C  Xu S 《Heredity》2002,89(5):339-345
Mapping quantitative trait loci (QTL) in arbitrary outbred pedigrees is complicated by the combinatorial possibilities of allele flow relationships and of the founder allelic configurations. Exact methods are only available for rather short and simple pedigrees. Stochastic simulation using Markov chain Monte Carlo (MCMC) integration offers more flexibility. MCMC methods are less natural in a frequentist than in a Bayesian context, which we therefore adopt. Among the MCMC algorithms for updating marker locus genotypes, we implement the descent-graph algorithm. It can be used to update marker locus allele flow relationships and can handle arbitrarily complex pedigrees and missing marker information. Compared with updating marker genotypic information, updating QTL parameters, such as position, effects, and the allele flow relationships is relatively easy with MCMC. We treat the effect of each diploid combination of founder alleles as a random variable and only estimate the variance of these effects, ie, we model diploid genotypic effects instead of the usual partition in additive and dominance effects. This is a variant of the random model approach. The number of QTL alleles is generally unknown. In the Bayesian context, the number of QTL present on a linkage group can be treated as variable. Computer simulations suggest that the algorithm can indeed handle complex pedigrees and detect two QTL on a linkage group, but that the number of individuals in a single extended family is limited to about 50 to 100 individuals.  相似文献   

14.
A recent article published in Cladistics is critical of a number of heuristic methods for phylogenetic inference based on parsimony scores. One of my papers is among those criticized, and I would appreciate the opportunity to make a public response. The specific criticism is that I have re‐invented an algorithm for economizing parsimony calculations on trees that differ by a subtree pruning and regrafting (SPR) rearrangement. This criticism is justified, and I apologize for incorrectly claiming originality for my presentation of this algorithm. However, I would like to clarify the intent of my paper, if I can do so without detracting from the sincerity of my apology. My paper is not about that algorithm, nor even primarily about parsimony. Rather, it is about a novel strategy for Markov chain Monte Carlo (MCMC) sampling in a state space consisting of trees. The sampler involves drawing from conditional distributions over sets of trees: a Gibbs‐like strategy that had not previously been used to sample tree‐space. I would like to see this technique incorporated into MCMC samplers for phylogenetics, as it may have advantages over commonly used Metropolis‐like strategies. I have recently used it to sample phylogenies of a biological invasion, and I am finding many applications for it in agent‐based Bayesian ecological modelling. It is thus my contention that my 2005 paper retains substantial value.  相似文献   

15.
Mapping quantitative trait loci using the MCMC procedure in SAS   总被引:1,自引:0,他引:1  
S Xu  Z Hu 《Heredity》2011,106(2):357-369
The MCMC procedure in SAS (called PROC MCMC) is particularly designed for Bayesian analysis using the Markov chain Monte Carlo (MCMC) algorithm. The program is sufficiently general to handle very complicated statistical models and arbitrary prior distributions. This study introduces the SAS/MCMC procedure and demonstrates the application of the program to quantitative trait locus (QTL) mapping. A real life QTL mapping experiment in wheat female fertility trait was used as an example for the demonstration. The fertility trait phenotypes were described under three different models: (1) the Poisson model, (2) the Bernoulli model and (3) the zero-truncated Poisson model. One QTL was identified on the second chromosome. This QTL appears to control the switch of seed-producing ability of female plants but does not affect the number of seeds produced once the switch is turned on.  相似文献   

16.
iTRAQ (isobaric Tags for Relative and Absolute Quantitation) is a technique that allows simultaneous quantitation of proteins in multiple samples. In this paper, we describe a Bayesian hierarchical model-based method to infer the relative protein expression levels and hence to identify differentially expressed proteins from iTRAQ data. Our model assumes that the measured peptide intensities are affected by both protein expression levels and peptide specific effects. The values of these two effects across experiments are modeled as random effects. The nonrandom missingness of peptide data is modeled with a logistic regression which relates the missingness probability for a peptide with the expression level of the protein that produces this peptide. We propose a Markov chain Monte Carlo method for the inference of model parameters, including the relative expression levels across samples. Our simulation results suggest that the estimates of relative protein expression levels based on the MCMC samples have smaller bias than those estimated from ANOVA models or fold changes. We apply our method to an iTRAQ dataset studying the roles of Caveolae for postnatal cardiovascular function.  相似文献   

17.
Li Z  Sillanpää MJ 《Genetics》2012,190(1):231-249
Bayesian hierarchical shrinkage methods have been widely used for quantitative trait locus mapping. From the computational perspective, the application of the Markov chain Monte Carlo (MCMC) method is not optimal for high-dimensional problems such as the ones arising in epistatic analysis. Maximum a posteriori (MAP) estimation can be a faster alternative, but it usually produces only point estimates without providing any measures of uncertainty (i.e., interval estimates). The variational Bayes method, stemming from the mean field theory in theoretical physics, is regarded as a compromise between MAP and MCMC estimation, which can be efficiently computed and produces the uncertainty measures of the estimates. Furthermore, variational Bayes methods can be regarded as the extension of traditional expectation-maximization (EM) algorithms and can be applied to a broader class of Bayesian models. Thus, the use of variational Bayes algorithms based on three hierarchical shrinkage models including Bayesian adaptive shrinkage, Bayesian LASSO, and extended Bayesian LASSO is proposed here. These methods performed generally well and were found to be highly competitive with their MCMC counterparts in our example analyses. The use of posterior credible intervals and permutation tests are considered for decision making between quantitative trait loci (QTL) and non-QTL. The performance of the presented models is also compared with R/qtlbim and R/BhGLM packages, using a previously studied simulated public epistatic data set.  相似文献   

18.
Bayesian networks are knowledge representation tools that model the (in)dependency relationships among variables for probabilistic reasoning. Classification with Bayesian networks aims to compute the class with the highest probability given a case. This special kind is referred to as Bayesian network classifiers. Since learning the Bayesian network structure from a dataset can be viewed as an optimization problem, heuristic search algorithms may be applied to build high-quality networks in medium- or large-scale problems, as exhaustive search is often feasible only for small problems. In this paper, we present our new algorithm, ABC-Miner, and propose several extensions to it. ABC-Miner uses ant colony optimization for learning the structure of Bayesian network classifiers. We report extended computational results comparing the performance of our algorithm with eight other classification algorithms, namely six variations of well-known Bayesian network classifiers, cAnt-Miner for discovering classification rules and a support vector machine algorithm.  相似文献   

19.
The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those used in current Bayesian phylogenetics software. We sampled 12 empirical nucleotide data sets--ranging in size from 27 to 71 taxa and from 378 to 2,520 sites--under difficult conditions: short runs, no Metropolis-coupling, and an oversimplified substitution model producing difficult tree spaces (Jukes Cantor with equal site rates). Convergence was assessed by comparison to reference samples obtained from multiple Metropolis-coupled runs. We find that proposals producing topology changes as a side effect of branch length changes (LOCAL and Continuous Change) consistently perform worse than those involving stochastic branch rearrangements (nearest neighbor interchange, subtree pruning and regrafting, tree bisection and reconnection, or subtree swapping). Among the latter, moves that use an extension mechanism to mix local with more distant rearrangements show better overall performance than those involving only local or only random rearrangements. Moves with only local rearrangements tend to mix well but have long burn-in periods, whereas moves with random rearrangements often show the reverse pattern. Combinations of moves tend to perform better than single moves. The time to convergence can be shortened considerably by starting with a good tree, but this comes at the cost of compromising convergence diagnostics based on overdispersed starting points. Our results have important implications for developers of Bayesian MCMC implementations and for the large group of users of Bayesian phylogenetics software.  相似文献   

20.
The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号