首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
MOTIVATION: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY: The source codes and datasets used are available from our Supplementary website.  相似文献   

2.

Background  

Light microscopy is of central importance in cell biology. The recent introduction of automated high content screening has expanded this technology towards automation of experiments and performing large scale perturbation assays. Nevertheless, evaluation of microscopy data continues to be a bottleneck in many projects. Currently, among open source software, CellProfiler and its extension Analyst are widely used in automated image processing. Even though revolutionizing image analysis in current biology, some routine and many advanced tasks are either not supported or require programming skills of the researcher. This represents a significant obstacle in many biology laboratories.  相似文献   

3.
SUMMARY: Vbmp is an R package for Gaussian Process classification of data over multiple classes. It features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. This software also incorporates feature weighting by means of Automatic Relevance Determination. Being equipped with only one main function and reasonable default values for optional parameters, vbmp combines flexibility with ease of usage as is demonstrated on a breast cancer microarray study. AVAILABILITY: The R library vbmp implementing this method is part of Bioconductor and can be downloaded from http://www.dcs.gla.ac.uk/~girolami  相似文献   

4.
5.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.  相似文献   

6.
7.

Background  

Normalization of gene expression microarrays carrying thousands of genes is based on assumptions that do not hold for diagnostic microarrays carrying only few genes. Thus, applying standard microarray normalization strategies to diagnostic microarrays causes new normalization problems.  相似文献   

8.
Plant-leaf disease detection is one of the key problems of smart agriculture which has a significant impact on the global economy. To mitigate this, intelligent agricultural solutions are evolving that aid farmer to take preventive measures for improving crop production. With the advancement of deep learning, many convolutional neural network models have blazed their way to the identification of plant-leaf diseases. However, these models are limited to the detection of specific crops only. Therefore, this paper presents a new deeper lightweight convolutional neural network architecture (DLMC-Net) to perform plant leaf disease detection across multiple crops for real-time agricultural applications. In the proposed model, a sequence of collective blocks is introduced along with the passage layer to extract deep features. These benefits in feature propagation and feature reuse, which results in handling the vanishing gradient problem. Moreover, point-wise and separable convolution blocks are employed to reduce the number of trainable parameters. The efficacy of the proposed DLMC-Net model is validated across four publicly available datasets, namely citrus, cucumber, grapes, and tomato. Experimental results of the proposed model are compared against seven state-of-the-art models on eight parameters, namely accuracy, error, precision, recall, sensitivity, specificity, F1-score, and Matthews correlation coefficient. Experiments demonstrate that the proposed model has surpassed all the considered models, even under complex background conditions, with an accuracy of 93.56%, 92.34%, 99.50%, and 96.56% on citrus, cucumber, grapes, and tomato, respectively. Moreover, the proposed DLMC-Net requires only 6.4 million trainable parameters, which is the second best among the compared models. Therefore, it can be asserted that the proposed model is a viable alternative to perform plant leaf disease detection across multiple crops.  相似文献   

9.
10.
PurposeThis study aims to develop a deep-learning-based method to classify clinically significant (CS) and clinically insignificant (CiS) prostate cancer (PCa) on multiparametric magnetic resonance imaging (mpMRI) automatically, and to select suitable mpMRI sequences for PCa classification in different anatomic zones.MethodsA multi-input selection network (MISN) is proposed for both PCa classification and the selection of the optimal combination of sequences for PCa classification in a specific zone. MISN is a multi-input/-output classification network consisting of nine branches to process nine input images from the mpMRI data. To improve classification accuracy and reduce model parameters, a pruning strategy is proposed to select a subset of the nine branches of MIST to form two more effective networks for the peripheral zone (PZ) PCa and transition zone (TZ) PCa, which are named as PZN and TZN, respectively. Besides, a new penalized cross-entropy loss function is adopted to train the networks to balance the classification sensitivity and specificity.ResultsThe proposed methods were evaluated on the PROSTATEx challenge dataset and achieved an area under the receiver operator characteristics curve of 0.95, which was much higher than currently published results and ranked first out of more than 1500 entries submitted to the challenge at the time of submission of this paper. For PZ-PCa and TZ-PCa classification, PZN and TZN achieved better performance than MISN.ConclusionsHigher performance can be achieved by selecting a suitable subset of the mpMRI sequences in PCa classification.  相似文献   

11.
12.
We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) it greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers, 2) it simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly, 3) it calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis  相似文献   

13.
Introduction: Breast cancer subtypes are currently defined by a combination of morphologic, genomic, and proteomic characteristics. These subtypes provide a molecular portrait of the tumor that aids diagnosis, prognosis, and treatment escalation/de-escalation options. Gene expression signatures describing intrinsic breast cancer subtypes for predicting risk of recurrence have been rapidly adopted in the clinic. Despite the use of subtype classifications, many patients develop drug resistance, breast cancer recurrence, or therapy failure.

Areas covered: This review provides a summary of immunohistochemistry, reverse phase protein array, mass spectrometry, and integrative studies that are revealing differences in biological functions within and between breast cancer subtypes. We conclude with a discussion of rigor and reproducibility for proteomic-based biomarker discovery.

Expert commentary: Innovations in proteomics, including implementation of assay guidelines and standards, are facilitating refinement of breast cancer subtypes. Proteomic and phosphoproteomic information distinguish biologically functional subtypes, are predictive of recurrence, and indicate likelihood of drug resistance. Actionable, activated signal transduction pathways can now be quantified and characterized. Proteomic biomarker validation in large, well-designed studies should become a public health priority to capitalize on the wealth of information gleaned from the proteome.  相似文献   


14.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

15.
By using chromosome images as a framework, algorithms for finding most dissimilar images are presented and illustrated by examples. In terms of angles, a chromosome image consists of two exterior biangles and two interior biangles. Biangles are defined and classified into 180° biangles, >180° biangles and <180° biangles. The dissimilarity of biangles and its geometric interpretation together with various properties of biangles are also presented. The results may have useful applications in pattern recognition, scene analysis, information storage and retrieval, artificial intelligence and fuzzy set theory.  相似文献   

16.
Genes are often classified into biologically related groups so that inferences on their functions can be made. This paper demonstrates that the di-codon usage is a useful feature for gene classification and gives better classification accuracy than the codon usage. Our experiments with different classifiers show that support vector machines performs better than other classifiers in classifying genes by using di-codon usage as features. The method is illustrated on 1841 HLA sequences which are classified into two major classes, HLA-I and HLA-II, and further classified into the subclasses of major classes. By using both codon and di-codon features, we show near perfect accuracies in the classification of HLA molecules into major classes and their sub-classes.  相似文献   

17.
18.
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.  相似文献   

19.
20.
A classification tool suitable for establishing the ecological status of lakes based on fish population parameters has been developed for the Republic of Ireland and Northern Ireland (EU Water Framework Directive Ecoregion 17). A lake typology relevant to fish populations in lakes from Ecoregion 17 was produced as part of the ecological classification tool development. Four lake types were determined based on fish metrics and abiotic variables from 43 “reference” lakes. The specific lake fish typology categorised lakes into low (≤67 CaCO3 mg L−1) or high (>67 CaCO3 mg L−1) alkalinity, and shallow (≤17 m) or deep (>17 m) maximum depth. The fish in lakes classification tool (FIL2) follows a novel multimetric predictive approach, assigning ecological status to a lake using two independent methods. FIL2 qualitatively defines a lake's ecological status based on fish metrics using discriminant classification rules and, using a generalised linear model, quantitatively derives an Ecological Quality Ratio (EQR, 0 < EQR < 1), along with associated confidence intervals. It is recommended that both methods are used to validate output and cross-check and highlight potential misclassification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号