首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 484 毫秒
1.
Multivariate concomitant information on a subject's condition usually accompanies survival time data. Using a model in which each subject's lifetime is exponentially distributed, this paper suggests a method which utilizes a step-up procedure for choosing the most important variables associated with survival. Maximum likelihood (ML) estimates are utilized, and the likelihood ratio is employed as the criterion for adding significant concomitant variables. An example using multiple myeloma survival data and sixteen concomitant variables is discussed in which three variables are chosen to predict survival.  相似文献   

2.
It is typical in QTL mapping experiments that the number of markers under investigation is large. This poses a challenge to commonly used regression models since the number of feature variables is usually much larger than the sample size, especially, when epistasis effects are to be considered. The greedy nature of the conventional stepwise procedures is well known and is even more conspicuous in such cases. In this article, we propose a two-phase procedure based on penalized likelihood techniques and extended Bayes information criterion (EBIC) for QTL mapping. The procedure consists of a screening phase and a selection phase. In the screening phase, the main and interaction features are alternatively screened by a penalized likelihood mechanism. In the selection phase, a low-dimensional approach using EBIC is applied to the features retained in the screening phase to identify QTL. The two-phase procedure has the asymptotic property that its positive detection rate (PDR) and false discovery rate (FDR) converge to 1 and 0, respectively, as sample size goes to infinity. The two-phase procedure is compared with both traditional and recently developed approaches by simulation studies. A real data analysis is presented to demonstrate the application of the two-phase procedure.  相似文献   

3.
Under the model of independent test statistics, we propose atwo-parameter family of Bayes multiple testing procedures. Thetwo parameters can be viewed as tuning parameters. Using theBenjamini–Hochberg step-up procedure for controlling falsediscovery rate as a baseline for conservativeness, we choosethe tuning parameters to compromise between the operating characteristicsof that procedure and a less conservative procedure that focuseson alternatives that a priori might be considered likely ormeaningful. The Bayes procedures do not have the theoreticaland practical shortcomings of the popular stepwise procedures.In terms of the number of mistakes, simulations for two examplesindicate that over a large segment of the parameter space, theBayes procedure is preferable to the step-up procedure. Anotherdesirable feature of the procedures is that they are computationallyfeasible for any number of hypotheses.  相似文献   

4.
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure.  相似文献   

5.
ABSTRACT: BACKGROUND: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. RESULTS: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. CONCLUSIONS: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.  相似文献   

6.
Codon-based substitution models have been widely used to identify amino acid sites under positive selection in comparative analysis of protein-coding DNA sequences. The nonsynonymous-synonymous substitution rate ratio (d(N)/d(S), denoted omega) is used as a measure of selective pressure at the protein level, with omega > 1 indicating positive selection. Statistical distributions are used to model the variation in omega among sites, allowing a subset of sites to have omega > 1 while the rest of the sequence may be under purifying selection with omega < 1. An empirical Bayes (EB) approach is then used to calculate posterior probabilities that a site comes from the site class with omega > 1. Current implementations, however, use the naive EB (NEB) approach and fail to account for sampling errors in maximum likelihood estimates of model parameters, such as the proportions and omega ratios for the site classes. In small data sets lacking information, this approach may lead to unreliable posterior probability calculations. In this paper, we develop a Bayes empirical Bayes (BEB) approach to the problem, which assigns a prior to the model parameters and integrates over their uncertainties. We compare the new and old methods on real and simulated data sets. The results suggest that in small data sets the new BEB method does not generate false positives as did the old NEB approach, while in large data sets it retains the good power of the NEB approach for inferring positively selected sites.  相似文献   

7.
Numerous statistical methods have been developed for analyzing high‐dimensional data. These methods often focus on variable selection approaches but are limited for the purpose of testing with high‐dimensional data. They are often required to have explicit‐likelihood functions. In this article, we propose a “hybrid omnibus test” for high‐dicmensional data testing purpose with much weaker requirements. Our hybrid omnibus test is developed under a semiparametric framework where a likelihood function is no longer necessary. Our test is a version of a frequentist‐Bayesian hybrid score‐type test for a generalized partially linear single‐index model, which has a link function being a function of a set of variables through a generalized partially linear single index. We propose an efficient score based on estimating equations, define local tests, and then construct our hybrid omnibus test using local tests. We compare our approach with an empirical‐likelihood ratio test and Bayesian inference based on Bayes factors, using simulation studies. Our simulation results suggest that our approach outperforms the others, in terms of type I error, power, and computational cost in both the low‐ and high‐dimensional cases. The advantage of our approach is demonstrated by applying it to genetic pathway data for type II diabetes mellitus.  相似文献   

8.
Automated variable selection procedures, such as backward elimination, are commonly employed to perform model selection in the context of multivariable regression. The stability of such procedures can be investigated using a bootstrap‐based approach. The idea is to apply the variable selection procedure on a large number of bootstrap samples successively and to examine the obtained models, for instance, in terms of the inclusion of specific predictor variables. In this paper, we aim to investigate a particular important problem affecting this method in the case of categorical predictor variables with different numbers of categories and to give recommendations on how to avoid it. For this purpose, we systematically assess the behavior of automated variable selection based on the likelihood ratio test using either bootstrap samples drawn with replacement or subsamples drawn without replacement from the original dataset. Our study consists of extensive simulations and a real data example from the NHANES study. Our main result is that if automated variable selection is conducted on bootstrap samples, variables with more categories are substantially favored over variables with fewer categories and over metric variables even if none of them have any effect. Importantly, variables with no effect and many categories may be (wrongly) preferred to variables with an effect but few categories. We suggest the use of subsamples instead of bootstrap samples to bypass these drawbacks.  相似文献   

9.
Summary Gene co‐expressions have been widely used in the analysis of microarray gene expression data. However, the co‐expression patterns between two genes can be mediated by cellular states, as reflected by expression of other genes, single nucleotide polymorphisms, and activity of protein kinases. In this article, we introduce a bivariate conditional normal model for identifying the variables that can mediate the co‐expression patterns between two genes. Based on this model, we introduce a likelihood ratio (LR) test and a penalized likelihood procedure for identifying the mediators that affect gene co‐expression patterns. We propose an efficient computational algorithm based on iterative reweighted least squares and cyclic coordinate descent and have shown that when the tuning parameter in the penalized likelihood is appropriately selected, such a procedure has the oracle property in selecting the variables. We present simulation results to compare with existing methods and show that the LR‐based approach can perform similarly or better than the existing method of liquid association and the penalized likelihood procedure can be quite effective in selecting the mediators. We apply the proposed method to yeast gene expression data in order to identify the kinases or single nucleotide polymorphisms that mediate the co‐expression patterns between genes.  相似文献   

10.
Green PE  Park T 《Biometrics》2003,59(4):886-896
Log-linear models have been shown to be useful for smoothing contingency tables when categorical outcomes are subject to nonignorable nonresponse. A log-linear model can be fit to an augmented data table that includes an indicator variable designating whether subjects are respondents or nonrespondents. Maximum likelihood estimates calculated from the augmented data table are known to suffer from instability due to boundary solutions. Park and Brown (1994, Journal of the American Statistical Association 89, 44-52) and Park (1998, Biometrics 54, 1579-1590) developed empirical Bayes models that tend to smooth estimates away from the boundary. In those approaches, estimates for nonrespondents were calculated using an EM algorithm by maximizing a posterior distribution. As an extension of their earlier work, we develop a Bayesian hierarchical model that incorporates a log-linear model in the prior specification. In addition, due to uncertainty in the variable selection process associated with just one log-linear model, we simultaneously consider a finite number of models using a stochastic search variable selection (SSVS) procedure due to George and McCulloch (1997, Statistica Sinica 7, 339-373). The integration of the SSVS procedure into a Markov chain Monte Carlo (MCMC) sampler is straightforward, and leads to estimates of cell frequencies for the nonrespondents that are averages resulting from several log-linear models. The methods are demonstrated with a data example involving serum creatinine levels of patients who survived renal transplants. A simulation study is conducted to investigate properties of the model.  相似文献   

11.
12.
Bayes prediction quantifies uncertainty by assigning posterior probabilities. It was used to identify amino acids in a protein under recurrent diversifying selection indicated by higher nonsynonymous (d(N)) than synonymous (d(S)) substitution rates or by omega = d(N)/d(S) > 1. Parameters were estimated by maximum likelihood under a codon substitution model that assumed several classes of sites with different omega ratios. The Bayes theorem was used to calculate the posterior probabilities of each site falling into these site classes. Here, we evaluate the performance of Bayes prediction of amino acids under positive selection by computer simulation. We measured the accuracy by the proportion of predicted sites that were truly under selection and the power by the proportion of true positively selected sites that were predicted by the method. The accuracy was slightly better for longer sequences, whereas the power was largely unaffected by the increase in sequence length. Both accuracy and power were higher for medium or highly diverged sequences than for similar sequences. We found that accuracy and power were unacceptably low when data contained only a few highly similar sequences. However, sampling a large number of lineages improved the performance substantially. Even for very similar sequences, accuracy and power can be high if over 100 taxa are used in the analysis. We make the following recommendations: (1) prediction of positive selection sites is not feasible for a few closely related sequences; (2) using a large number of lineages is the best way to improve the accuracy and power of the prediction; and (3) multiple models of heterogeneous selective pressures among sites should be applied in real data analysis.  相似文献   

13.
This paper gives an approximate Bayes procedure for the estimation of the reliability function of a two-parameter Cauchy distribution using Jeffreys' non-informative prior with a squared-error loss function, and with a log-odds ratio squared-error loss function. Based on a Monte Carlo simulation study, two such Bayes estimators of the reliability are compared with the maximum likelihood estimator.  相似文献   

14.
The question of whether selection experiments ought to include a control line, as opposed to investing all facilities in a single selected line, is addressed using a likelihood perspective. The consequences of using a control line are evaluated under two scenarios. In the first one, environmental trend is modeled and inferred from the data. In this case, a control line is shown to be highly beneficial in terms of the efficiency of inferences about eheritability and response to selection. In the second scenario, environmental trend is not modeled. One can imagine that a previous analysis of the experimental data had lent support to this decision. It is shown that in this situation where a control line may seem superfluous, inclusion of a control line can result in minor gains in efficiency if a high selection intensity is practiced in the selected line. Further, if there is a loss, it is moderately small. The results are verified to hold under more complicated data structures via Monte Carlo simulation. For completeness, divergent selection designs are also reviewed, and inferences based on a conditional and full likelihood approach are contrasted.  相似文献   

15.
Multistage Selection for Genetic Gain by Orthogonal Transformation   总被引:3,自引:1,他引:2       下载免费PDF全文
S. Xu  W. M. Muir 《Genetics》1991,129(3):963-974
An exact transformed culling method for any number of traits or stages of selection with explicit solution for multistage selection is described in this paper. This procedure does not need numerical integration and is suitable for obtaining either desired genetic gains for a variable proportion selected or optimum aggregate breeding value for a fixed total proportion selected. The procedure has similar properties to multistage selection index and, as such, genetic gains from use of the procedure may exceed ordinary independent culling level selection. The relative efficiencies of transformed to conventional independent culling ranged from 87% to over 300%. These results suggest that for most situations one can chose a multistage selection scheme, either conventional or transformed culling, which will have an efficiency close to that of selection index. After considering cost savings associated with multistage selection, there are many situations in which economic returns from use of independent culling, either conventional or transformed, will exceed that of selection index.  相似文献   

16.
Multi-category classification methods were used to detect SNP-mortality associations in broilers. The objective was to select a subset of whole genome SNPs associated with chick mortality. This was done by categorizing mortality rates and using a filter-wrapper feature selection procedure in each of the classification methods evaluated. Different numbers of categories (2, 3, 4, 5 and 10) and three classification algorithms (naïve Bayes classifiers, Bayesian networks and neural networks) were compared, using early and late chick mortality rates in low and high hygiene environments. Evaluation of SNPs selected by each classification method was done by predicted residual sum of squares and a significance test-related metric. A naïve Bayes classifier, coupled with discretization into two or three categories generated the SNP subset with greatest predictive ability. Further, an alternative categorization scheme, which used only two extreme portions of the empirical distribution of mortality rates, was considered. This scheme selected SNPs with greater predictive ability than those chosen by the methods described previously. Use of extreme samples seems to enhance the ability of feature selection procedures to select influential SNPs in genetic association studies.  相似文献   

17.
Omics experiments endowed with a time‐course design may enable us to uncover the dynamic interplay among genes of cellular processes. Multivariate techniques (like VAR(1) models describing the temporal and contemporaneous relations among variates) that may facilitate this goal are hampered by the high‐dimensionality of the resulting data. This is resolved by the presented ridge regularized maximum likelihood estimation procedure for the VAR(1) model. Information on the absence of temporal and contemporaneous relations may be incorporated in this procedure. Its computational efficient implemention is discussed. The estimation procedure is accompanied with an LOOCV scheme to determine the associated penalty parameters. Downstream exploitation of the estimated VAR(1) model is outlined: an empirical Bayes procedure to identify the interesting temporal and contemporaneous relationships, impulse response analysis, mutual information analysis, and covariance decomposition into the (graphical) relations among variates. In a simulation study the presented ridge estimation procedure outperformed a sparse competitor in terms of Frobenius loss of the estimates, while their selection properties are on par. The proposed machinery is illustrated in the reconstruction of the p53 signaling pathway during HPV‐induced cellular transformation. The methodology is implemented in the ragt2ridges R‐package available from CRAN.  相似文献   

18.
We propose a flexible and identifiable version of the 2-groups model, motivated by hierarchical Bayes considerations, that features an empirical null and a semiparametric mixture model for the nonnull cases. We use a computationally efficient predictive recursion (PR) marginal likelihood procedure to estimate the model parameters, even the nonparametric mixing distribution. This leads to a nonparametric empirical Bayes testing procedure, which we call PRtest, based on thresholding the estimated local false discovery rates. Simulations and real data examples demonstrate that, compared to existing approaches, PRtest's careful handling of the nonnull density can give a much better fit in the tails of the mixture distribution which, in turn, can lead to more realistic conclusions.  相似文献   

19.
Summary An approximate method with explicit solutions to apply independent culling levels for multiple traits in n-stages of selection was developed. An approximate solution was found for sequentially selected traits. Two assumptions were necessary. The first was to assume that subsequent selection would not appreciably change the mean of traits already selected, and the second was to approximate the variance of a correlated trait in a selected population with an upward biased projection. The procedure was shown to give near optimal results regardless of selection intensity or genetic correlations if phenotypic correlations among traits were low. The procedure gave poor results only for certain sequences of selection when phenotypic correlations were high. However, in those cases good results were obtained using a different sequence of selection. With high correlations, the procedure is recommended only after comparing solutions and expected genetic gain for all sequences of selection. If the expected aggregate gain for the sequence of selection desired is less than that of another order, culling points associated with the optimal ordering must be determined. Genetic gain from use of culling points is independent of order of selection. The procedure is recommended for use with computer programs that attempt to find optimal culling points to reduce computational time and to check results.Journal Paper No. 12448 of the Purdue University Agricultural Experiment Station  相似文献   

20.
Volinsky CT  Raftery AE 《Biometrics》2000,56(1):256-262
We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995, Journal of the American Statistical Association 90, 928-934) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is defined in terms of the number of uncensored events instead of the number of observations. For a simple censored data model, this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose defining BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号