首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Utilizing Gaussian Markov Random Field Properties of Bayesian Animal Models   总被引:1,自引:0,他引:1  
Summary In this article, we demonstrate how Gaussian Markov random field properties give large computational benefits and new opportunities for the Bayesian animal model. We make inference by computing the posteriors for important quantitative genetic variables. For the single‐trait animal model, a nonsampling‐based approximation is presented. For the multitrait model, we set up a robust and fast Markov chain Monte Carlo algorithm. The proposed methodology was used to analyze quantitative genetic properties of morphological traits of a wild house sparrow population. Results for single‐ and multitrait models were compared.  相似文献   

2.
Dunson DB  Watson M  Taylor JA 《Biometrics》2003,59(2):296-304
Often a response of interest cannot be measured directly and it is necessary to rely on multiple surrogates, which can be assumed to be conditionally independent given the latent response and observed covariates. Latent response models typically assume that residual densities are Gaussian. This article proposes a Bayesian median regression modeling approach, which avoids parametric assumptions about residual densities by relying on an approximation based on quantiles. To accommodate within-subject dependency, the quantile response categories of the surrogate outcomes are related to underlying normal variables, which depend on a latent normal response. This underlying Gaussian covariance structure simplifies interpretation and model fitting, without restricting the marginal densities of the surrogate outcomes. A Markov chain Monte Carlo algorithm is proposed for posterior computation, and the methods are applied to single-cell electrophoresis (comet assay) data from a genetic toxicology study.  相似文献   

3.
4.
The covarion (or site specific rate variation, SSRV) process of biological sequence evolution is a process by which the evolutionary rate of a nucleotide/amino acid/codon position can change in time. In this paper, we introduce time-continuous, space-discrete, Markov-modulated Markov chains as a model for representing SSRV processes, generalizing existing theory to any model of rate change. We propose a fast algorithm for diagonalizing the generator matrix of relevant Markov-modulated Markov processes. This algorithm makes phylogeny likelihood calculation tractable even for a large number of rate classes and a large number of states, so that SSRV models become applicable to amino acid or codon sequence datasets. Using this algorithm, we investigate the accuracy of the discrete approximation to the Gamma distribution of evolutionary rates, widely used in molecular phylogeny. We show that a relatively large number of classes is required to achieve accurate approximation of the exact likelihood when the number of analyzed sequences exceeds 20, both under the SSRV and among site rate variation (ASRV) models.  相似文献   

5.
6.
The adaptive alpha-spending algorithm incorporates additional contextual evidence (including correlations among genes) about differential expression to adjust the initial p-values to yield the alpha-spending adjusted p-values. The alpha-spending algorithm is named so because of its similarity with the alpha-spending algorithm in interim analysis of clinical trials in which stage-specific significance levels are assigned to each stage of the clinical trial. We show that the Bonferroni correction applied to the alpha-spending adjusted p-values approximately controls the Family Wise Error Rate under the complete null hypothesis. Using simulations we also show that the use of the alpha spending algorithm yields increased power over the unadjusted p-values while controlling FDR. We found the greater benefits of the alpha spending algorithm with increasing sample sizes and correlation among genes. The use of the alpha spending algorithm will result in microarray experiments that make more efficient use of their data and may help conserve resources.  相似文献   

7.
We present a new method to efficiently estimate very large numbers of p-values using empirically constructed null distributions of a test statistic. The need to evaluate a very large number of p-values is increasingly common with modern genomic data, and when interaction effects are of interest, the number of tests can easily run into billions. When the asymptotic distribution is not easily available, permutations are typically used to obtain p-values but these can be computationally infeasible in large problems. Our method constructs a prediction model to obtain a first approximation to the p-values and uses Bayesian methods to choose a fraction of these to be refined by permutations. We apply and evaluate our method on the study of association between 2-way interactions of genetic markers and colorectal cancer using the data from the first phase of a large, genome-wide case-control study. The results show enormous computational savings as compared to evaluating a full set of permutations, with little decrease in accuracy.  相似文献   

8.
Linkage disequilibrium (LD) testing has become a popular and effective method of fine-scale disease-gene localization. It has been proposed that LD testing could also be used for genome screening, particularly as dense maps of diallelic markers become available and automation allows inexpensive genotyping of diallelic markers. We compare diallelic markers and multiallelic markers in terms of sample sizes required for detection of LD, by use of a single marker locus in a case-control study, for rare monophyletic diseases with Mendelian inheritance. We extrapolate from our results to discuss the feasibility of single-marker LD screening in more-complex situations. We have used a deterministic population genetic model to calculate the expected power to detect LD as a function of marker density, age of mutation, number of marker alleles, mode of inheritance of a rare disease, and sample size. Our calculations show that multiallelic markers always have more power to detect LD than do diallelic markers (under otherwise equivalent conditions) and that the ratio of the number of diallelic to the number of multiallelic markers needed for equivalent power increases with mutation age and complexity of mode of inheritance. Power equivalent to that achieved by a multiallelic screen can theoretically be achieved by use of a more dense diallelic screen, but mapping panels of the necessary resolution are not currently available and may be difficult to achieve. Genome screening that uses single-marker LD testing may therefore be feasible only for young (<20 generations), rare, monophyletic Mendelian diseases, such as may be found in rapidly growing genetic isolates.  相似文献   

9.
Often in biomedical studies, the routine use of linear mixed‐effects models (based on Gaussian assumptions) can be questionable when the longitudinal responses are skewed in nature. Skew‐normal/elliptical models are widely used in those situations. Often, those skewed responses might also be subjected to some upper and lower quantification limits (QLs; viz., longitudinal viral‐load measures in HIV studies), beyond which they are not measurable. In this paper, we develop a Bayesian analysis of censored linear mixed models replacing the Gaussian assumptions with skew‐normal/independent (SNI) distributions. The SNI is an attractive class of asymmetric heavy‐tailed distributions that includes the skew‐normal, skew‐t, skew‐slash, and skew‐contaminated normal distributions as special cases. The proposed model provides flexibility in capturing the effects of skewness and heavy tail for responses that are either left‐ or right‐censored. For our analysis, we adopt a Bayesian framework and develop a Markov chain Monte Carlo algorithm to carry out the posterior analyses. The marginal likelihood is tractable, and utilized to compute not only some Bayesian model selection measures but also case‐deletion influence diagnostics based on the Kullback–Leibler divergence. The newly developed procedures are illustrated with a simulation study as well as an HIV case study involving analysis of longitudinal viral loads.  相似文献   

10.
A model is developed for the spread of a state in small social groups. Under suitable assumptions the model exhibits formal identity with Markov chain theory. The basic theorems and classifications of Markov chain theory are stated and interpreted in terms of the model. Finally, some procedures for testing the model are indicated.  相似文献   

11.
Kolassa JE  Tanner MA 《Biometrics》1999,55(1):246-251
This article presents an algorithm for approximate frequentist conditional inference on two or more parameters for any regression model in the Generalized Linear Model (GLIM) family. We thereby extend highly accurate inference beyond the cases of logistic regression and contingency tables implimented in commercially available software. The method makes use of the double saddlepoint approximations of Skovgaard (1987, Journal of Applied Probability 24, 875-887) and Jensen (1992, Biometrika 79, 693-703) to the conditional cumulative distribution function of a sufficient statistic given the remaining sufficient statistics. This approximation is then used in conjunction with noniterative Monte Carlo methods to generate a sample from a distribution that approximates the joint distribution of the sufficient statistics associated with the parameters of interest conditional on the observed values of the sufficient statistics associated with the nuisance parameters. This algorithm is an alternate approach to that presented by Kolassa and Tanner (1994, Journal of the American Statistical Association 89, 697-702), in which a Markov chain is generated whose equilibrium distribution under certain regularity conditions approximates the joint distribution of interest. In Kolassa and Tanner (1994), the Gibbs sampler was used in conjunction with these univariate conditional distribution function approximations. The method of this paper does not require the construction and simulation of a Markov chain, thus avoiding the need to develop regularity conditions under which the algorithm converges and the need for the data analyst to check convergence of the particular chain. Examples involving logistic and truncated Poisson regression are presented.  相似文献   

12.
In recent years, the number of studies focusing on the genetic basis of common disorders with a complex mode of inheritance, in which multiple genes of small effect are involved, has been steadily increasing. An improved methodology to identify the cumulative contribution of several polymorphous genes would accelerate our understanding of their importance in disease susceptibility and our ability to develop new treatments. A critical bottleneck is the inability of standard statistical approaches, developed for relatively modest predictor sets, to achieve power in the face of the enormous growth in our knowledge of genomics. The inability is due to the combinatorial complexity arising in searches for multiple interacting genes. Similar "curse of dimensionality" problems have arisen in other fields, and Bayesian statistical approaches coupled to Markov chain Monte Carlo (MCMC) techniques have led to significant improvements in understanding. We present here an algorithm, APSampler, for the exploration of potential combinations of allelic variations positively or negatively associated with a disease or with a phenotype. The algorithm relies on the rank comparison of phenotype for individuals with and without specific patterns (i.e., combinations of allelic variants) isolated in genetic backgrounds matched for the remaining significant patterns. It constructs a Markov chain to sample only potentially significant variants, minimizing the potential of large data sets to overwhelm the search. We tested APSampler on a simulated data set and on a case-control MS (multiple sclerosis) study for ethnic Russians. For the simulated data, the algorithm identified all the phenotype-associated allele combinations coded into the data and, for the MS data, it replicated the previously known findings.  相似文献   

13.
Markov chain Monte Carlo (MCMC) has recently gained use as a method of estimating required probability and likelihood functions in pedigree analysis, when exact computation is impractical. However, when a multiallelic locus is involved, irreducibility of the constructed Markov chain, an essential requirement of the MCMC method, may fail. Solutions proposed by several researchers, which do not identify all the noncommunicating sets of genotypic configurations, are inefficient with highly polymorphic loci. This is a particularly serious problem in linkage analysis, because highly polymorphic markers are much more informative and thus are preferred. In the present paper, we describe an algorithm that finds all the noncommunicating classes of genotypic configurations on any pedigree. This leads to a more efficient method of defining an irreducible Markov chain. Examples, including a pedigree from a genetic study of familial Alzheimer disease, are used to illustrate how the algorithm works and how penetrances are modified for specific individuals to ensure irreducibility.  相似文献   

14.
One of the most challenging areas in human genetics is the dissection of quantitative traits. In this context, the efficient use of available data is important, including, when possible, use of large pedigrees and many markers for gene mapping. In addition, methods that jointly perform linkage analysis and estimation of the trait model are appealing because they combine the advantages of a model-based analysis with the advantages of methods that do not require prespecification of model parameters for linkage analysis. Here we review a Markov chain Monte Carlo approach for such joint linkage and segregation analysis, which allows analysis of oligogenic traits in the context of multipoint linkage analysis of large pedigrees. We provide an outline for practitioners of the salient features of the method, interpretation of the results, effect of violation of assumptions, and an example analysis of a two-locus trait to illustrate the method.  相似文献   

15.
Methods for detecting Quantitative Trait Loci (QTL) without markers have generally used iterative peeling algorithms for determining genotype probabilities. These algorithms have considerable shortcomings in complex pedigrees. A Monte Carlo Markov chain (MCMC) method which samples the pedigree of the whole population jointly is described. Simultaneous sampling of the pedigree was achieved by sampling descent graphs using the Metropolis-Hastings algorithm. A descent graph describes the inheritance state of each allele and provides pedigrees guaranteed to be consistent with Mendelian sampling. Sampling descent graphs overcomes most, if not all, of the limitations incurred by iterative peeling algorithms. The algorithm was able to find the QTL in most of the simulated populations. However, when the QTL was not modeled or found then its effect was ascribed to the polygenic component. No QTL were detected when they were not simulated.  相似文献   

16.
Du P  Jiang Y  Wang Y 《Biometrics》2011,67(4):1330-1339
Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data.  相似文献   

17.
A new method is developed for calculating sequence substitution probabilities using Markov chain Monte Carlo (MCMC) methods. The basic strategy is to use uniformization to transform the original continuous time Markov process into a Poisson substitution process and a discrete Markov chain of state transitions. An efficient MCMC algorithm for evaluating substitution probabilities by this approach using a continuous gamma distribution to model site-specific rates is outlined. The method is applied to the problem of inferring branch lengths and site-specific rates from nucleotide sequences under a general time-reversible (GTR) model and a computer program BYPASSR is developed. Simulations are used to examine the performance of the new program relative to an existing program BASEML that uses a discrete approximation for the gamma distributed prior on site-specific rates. It is found that BASEML and BYPASSR are in close agreement when inferring branch lengths, regardless of the number of rate categories used, but that BASEML tends to underestimate high site-specific substitution rates, and to overestimate intermediate rates, when fewer than 50 rate categories are used. Rate estimates obtained using BASEML agree more closely with those of BYPASSR as the number of rate categories increases. Analyses of the posterior distributions of site-specific rates from BYPASSR suggest that a large number of taxa are needed to obtain precise estimates of site-specific rates, especially when rates are very high or very low. The method is applied to analyze 45 sequences of the alpha 2B adrenergic receptor gene (A2AB) from a sample of eutherian taxa. In general, the pattern expected for regions under negative selection is observed with third codon positions having the highest inferred rates, followed by first codon positions and with second codon positions having the lowest inferred rates. Several sites show exceptionally high substitution rates at second codon positions that may represent the effects of positive selection.  相似文献   

18.
MOTIVATION: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these p-values may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so-called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged. RESULTS: In this paper we introduce a novel empirical Bayes screening (EBS) technique to inspect a large number of p-values in an effort to detect additional positive cases. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated by a step-down method so that the familywise error rate at the complete null hypothesis is still controlled. It is shown that the EBS has substantially higher sensitivity than the standard step-down approach for multiple comparison at the cost of a modest increase in the false discovery rate (FDR). The EBS procedure also compares favorably when compared with existing FDR control procedures for multiple testing. The EBS procedure is particularly useful in situations where it is important to identify all possible potentially positive cases which can be subjected to further confirmatory testing in order to eliminate the false positives. We illustrated this screening procedure using a data set on human colorectal cancer where we show that the EBS method detected additional genes related to colon cancer that were missed by other methods.This novel empirical Bayes procedure is advantageous over our earlier proposed empirical Bayes adjustments due to the following reasons: (i) it offers an automatic screening of the p-values the user may obtain from a univariate (i.e., gene by gene) analysis package making it extremely easy to use for a non-statistician, (ii) since it applies to the p-values, the tests do not have to be t-tests; in particular they could be F-tests which might arise in certain ANOVA formulations with expression data or even nonparametric tests, (iii) the empirical Bayes adjustment uses nonparametric function estimation techniques to estimate the marginal density of the transformed p-values rather than using a parametric model for the prior distribution and is therefore robust against model mis-specification. AVAILABILITY: R code for EBS is available from the authors upon request. SUPPLEMENTARY INFORMATION: http://www.stat.uga.edu/~datta/EBS/supp.htm  相似文献   

19.
Compliance is the extent to which a patient follows the prescribed regimen. Here we investigate the statistical properties of two popular measures of compliance - percentage of compliant days and percentage of doses taken. We use a stationary Markov chain to model the dependence structure of successive data points for each subject. We illustrate our model using discrete compliance data collected from an AIDS Clinical Trial Group study (ACTG 398). We check the model assumptions and evaluate the small sample as well as large sample properties of our estimators. We show that ignoring the within-subject dependence will usually underestimate the standard errors of the estimates of these compliance measures. Our model allows the application of meta-analytic approaches to assess the variation across subjects in these compliance indices and changes in them due to intervention.  相似文献   

20.
A Bayesian approach is presented for mapping a quantitative trait locus (QTL) using the 'Fernando and Grossman' multivariate Normal approximation to QTL inheritance. For this model, a Bayesian implementation that includes QTL position is problematic because standard Markov chain Monte Carlo (MCMC) algorithms do not mix, i.e. the QTL position gets stuck in one marker interval. This is because of the dependence of the covariance structure for the QTL effects on the adjacent markers and may be typical of the 'Fernando and Grossman' model. A relatively new MCMC technique, simulated tempering, allows mixing and so makes possible inferences about QTL position based on marginal posterior probabilities. The model was implemented for estimating variance ratios and QTL position using a continuous grid of allowed positions and was applied to simulated data of a standard granddaughter design. The results showed a smooth mixing of QTL position after implementation of the simulated tempering sampler. In this implementation, map distance between QTL and its flanking markers was artificially stretched to reduce the dependence of markers and covariance. The method generalizes easily to more complicated applications and can ultimately contribute to QTL mapping in complex, heterogeneous, human, animal or plant populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号