首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY: The source codes and datasets used are available from our Supplementary website.  相似文献   

2.
3.

Background  

Light microscopy is of central importance in cell biology. The recent introduction of automated high content screening has expanded this technology towards automation of experiments and performing large scale perturbation assays. Nevertheless, evaluation of microscopy data continues to be a bottleneck in many projects. Currently, among open source software, CellProfiler and its extension Analyst are widely used in automated image processing. Even though revolutionizing image analysis in current biology, some routine and many advanced tasks are either not supported or require programming skills of the researcher. This represents a significant obstacle in many biology laboratories.  相似文献   

4.
Using the concept of an extended data set (Zellner, 1986), we derived the projection or hat matrix for Bayesian regression analysis. The hat matrix shows how much influence or leverage the observed responses and the prior means have on each of the posterior fitted values. The amount of leverage associated with the observed data is shown to be a monotonically decreasing function of the ratio of the process variance to the prior variance. Additional properties of the Bayesian hat matrix are discussed. Two illustrative examples are presented.  相似文献   

5.
6.
Plant-leaf disease detection is one of the key problems of smart agriculture which has a significant impact on the global economy. To mitigate this, intelligent agricultural solutions are evolving that aid farmer to take preventive measures for improving crop production. With the advancement of deep learning, many convolutional neural network models have blazed their way to the identification of plant-leaf diseases. However, these models are limited to the detection of specific crops only. Therefore, this paper presents a new deeper lightweight convolutional neural network architecture (DLMC-Net) to perform plant leaf disease detection across multiple crops for real-time agricultural applications. In the proposed model, a sequence of collective blocks is introduced along with the passage layer to extract deep features. These benefits in feature propagation and feature reuse, which results in handling the vanishing gradient problem. Moreover, point-wise and separable convolution blocks are employed to reduce the number of trainable parameters. The efficacy of the proposed DLMC-Net model is validated across four publicly available datasets, namely citrus, cucumber, grapes, and tomato. Experimental results of the proposed model are compared against seven state-of-the-art models on eight parameters, namely accuracy, error, precision, recall, sensitivity, specificity, F1-score, and Matthews correlation coefficient. Experiments demonstrate that the proposed model has surpassed all the considered models, even under complex background conditions, with an accuracy of 93.56%, 92.34%, 99.50%, and 96.56% on citrus, cucumber, grapes, and tomato, respectively. Moreover, the proposed DLMC-Net requires only 6.4 million trainable parameters, which is the second best among the compared models. Therefore, it can be asserted that the proposed model is a viable alternative to perform plant leaf disease detection across multiple crops.  相似文献   

7.
8.

Background  

Gene expression microarray is a powerful technology for genetic profiling diseases and their associated treatments. Such a process involves a key step of biomarker identification, which are expected to be closely related to the disease. A most important task of these identified genes is that they can be used to construct a classifier which can effectively diagnose disease and even recognize the disease subtypes. Binary classification, for example, diseased or healthy, in microarray data analysis has been successful, while multi-class classification, such as cancer subtyping, remains challenging.  相似文献   

9.
If a dependent variable in a regression analysis is exceptionally expensive or hard to obtain the overall sample size used to fit the model may be limited. To avoid this one may use a cheaper or more easily collected “surrogate” variable to supplement the expensive variable. The regression analysis will be enhanced to the degree the surrogate is associated with the costly dependent variable. We develop a Bayesian approach incorporating surrogate variables in regression based on a two‐stage experiment. Illustrative examples are given, along with comparisons to an existing frequentist method. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

10.
Daniel Gianola 《Genetics》2013,194(3):573-596
Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless np. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that np.  相似文献   

11.
Summary Physical activity has many well‐documented health benefits for cardiovascular fitness and weight control. For pregnant women, the American College of Obstetricians and Gynecologists currently recommends 30 minutes of moderate exercise on most, if not all, days; however, very few pregnant women achieve this level of activity. Traditionally, studies have focused on examining individual or interpersonal factors to identify predictors of physical activity. There is a renewed interest in whether characteristics of the physical environment in which we live and work may also influence physical activity levels. We consider one of the first studies of pregnant women that examines the impact of characteristics of the built environment on physical activity levels. Using a socioecologic framework, we study the associations between physical activity and several factors including personal characteristics, meteorological/air quality variables, and neighborhood characteristics for pregnant women in four counties of North Carolina. We simultaneously analyze six types of physical activity and investigate cross‐dependencies between these activity types. Exploratory analysis suggests that the associations are different in different regions. Therefore, we use a multivariate regression model with spatially varying regression coefficients. This model includes a regression parameter for each covariate at each spatial location. For our data with many predictors, some form of dimension reduction is clearly needed. We introduce a Bayesian variable selection procedure to identify subsets of important variables. Our stochastic search algorithm determines the probabilities that each covariate's effect is null, non‐null but constant across space, and spatially varying. We found that individual‐level covariates had a greater influence on women's activity levels than neighborhood environmental characteristics, and some individual‐level covariates had spatially varying associations with the activity levels of pregnant women.  相似文献   

12.
Summary In studies involving functional data, it is commonly of interest to model the impact of predictors on the distribution of the curves, allowing flexible effects on not only the mean curve but also the distribution about the mean. Characterizing the curve for each subject as a linear combination of a high‐dimensional set of potential basis functions, we place a sparse latent factor regression model on the basis coefficients. We induce basis selection by choosing a shrinkage prior that allows many of the loadings to be close to zero. The number of latent factors is treated as unknown through a highly‐efficient, adaptive‐blocked Gibbs sampler. Predictors are included on the latent variables level, while allowing different predictors to impact different latent factors. This model induces a framework for functional response regression in which the distribution of the curves is allowed to change flexibly with predictors. The performance is assessed through simulation studies and the methods are applied to data on blood pressure trajectories during pregnancy.  相似文献   

13.
We investigate the general problem of signal classification and, in particular, that of assigning stimulus labels to neural spike trains recorded from single cortical neurons. Finding efficient ways of classifying neural responses is especially important in experiments involving rapid presentation of stimuli. We introduce a fast, exact alternative to Bayesian classification. Instead of estimating the class-conditional densities p(x|y) (where x is a scalar function of the feature[s], y the class label) and converting them to P(y|x) via Bayes’ theorem, this probability is evaluated directly and without the need for approximations. This is achieved by integrating over all possible binnings of x with an upper limit on the number of bins. Computational time is quadratic in both the number of observed data points and the number of bins. The algorithm also allows for the computation of feedback signals, which can be used as input to subsequent stages of inference, e.g. neural network training. Responses of single neurons from high-level visual cortex (area STSa) to rapid sequences of complex visual stimuli are analysed. Information latency and response duration increase nonlinearly with presentation duration, suggesting that neural processing speeds adapt to presentation speeds. Action Editor: Alexander Borst  相似文献   

14.
15.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

16.
Ying Yuan  Guosheng Yin 《Biometrics》2010,66(1):105-114
Summary .  We study quantile regression (QR) for longitudinal measurements with nonignorable intermittent missing data and dropout. Compared to conventional mean regression, quantile regression can characterize the entire conditional distribution of the outcome variable, and is more robust to outliers and misspecification of the error distribution. We account for the within-subject correlation by introducing a   ℓ2   penalty in the usual QR check function to shrink the subject-specific intercepts and slopes toward the common population values. The informative missing data are assumed to be related to the longitudinal outcome process through the shared latent random effects. We assess the performance of the proposed method using simulation studies, and illustrate it with data from a pediatric AIDS clinical trial.  相似文献   

17.

Background  

Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data analysis, little attention has been paid to uncertainty in the results obtained.  相似文献   

18.
19.
MOTIVATION: Accurate subcategorization of tumour types through gene-expression profiling requires analytical techniques that estimate the number of categories or clusters rigorously and reliably. Parametric mixture modelling provides a natural setting to address this problem. RESULTS: We compare a criterion for model selection that is derived from a variational Bayesian framework with a popular alternative based on the Bayesian information criterion. Using simulated data, we show that the variational Bayesian method is more accurate in finding the true number of clusters in situations that are relevant to current and future microarray studies. We also compare the two criteria using freely available tumour microarray datasets and show that the variational Bayesian method is more sensitive to capturing biologically relevant structure.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号