首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MOTIVATION: Selecting a small number of relevant genes for accurate classification of samples is essential for the development of diagnostic tests. We present the Bayesian model averaging (BMA) method for gene selection and classification of microarray data. Typical gene selection and classification procedures ignore model uncertainty and use a single set of relevant genes (model) to predict the class. BMA accounts for the uncertainty about the best set to choose by averaging over multiple models (sets of potentially overlapping relevant genes). RESULTS: We have shown that BMA selects smaller numbers of relevant genes (compared with other methods) and achieves a high prediction accuracy on three microarray datasets. Our BMA algorithm is applicable to microarray datasets with any number of classes, and outputs posterior probabilities for the selected genes and models. Our selected models typically consist of only a few genes. The combination of high accuracy, small numbers of genes and posterior probabilities for the predictions should make BMA a powerful tool for developing diagnostics from expression data. AVAILABILITY: The source codes and datasets used are available from our Supplementary website.  相似文献   

2.

Background  

Light microscopy is of central importance in cell biology. The recent introduction of automated high content screening has expanded this technology towards automation of experiments and performing large scale perturbation assays. Nevertheless, evaluation of microscopy data continues to be a bottleneck in many projects. Currently, among open source software, CellProfiler and its extension Analyst are widely used in automated image processing. Even though revolutionizing image analysis in current biology, some routine and many advanced tasks are either not supported or require programming skills of the researcher. This represents a significant obstacle in many biology laboratories.  相似文献   

3.
SUMMARY: Vbmp is an R package for Gaussian Process classification of data over multiple classes. It features multinomial probit regression with Gaussian Process priors and estimates class posterior probabilities employing fast variational approximations to the full posterior. This software also incorporates feature weighting by means of Automatic Relevance Determination. Being equipped with only one main function and reasonable default values for optional parameters, vbmp combines flexibility with ease of usage as is demonstrated on a breast cancer microarray study. AVAILABILITY: The R library vbmp implementing this method is part of Bioconductor and can be downloaded from http://www.dcs.gla.ac.uk/~girolami  相似文献   

4.
5.
Using a measure of how differentially expressed a gene is in two biochemically/phenotypically different conditions, we can rank all genes in a microarray dataset. We have shown that the falling-off of this measure (normalized maximum likelihood in a classification model such as logistic regression) as a function of the rank is typically a power-law function. This power-law function in other similar ranked plots are known as the Zipf's law, observed in many natural and social phenomena. The presence of this power-law function prevents an intrinsic cutoff point between the "important" genes and "irrelevant" genes. We have shown that similar power-law functions are also present in permuted dataset, and provide an explanation from the well-known chi(2) distribution of likelihood ratios. We discuss the implication of this Zipf's law on gene selection in a microarray data analysis, as well as other characterizations of the ranked likelihood plots such as the rate of fall-off of the likelihood.  相似文献   

6.
7.

Background  

Normalization of gene expression microarrays carrying thousands of genes is based on assumptions that do not hold for diagnostic microarrays carrying only few genes. Thus, applying standard microarray normalization strategies to diagnostic microarrays causes new normalization problems.  相似文献   

8.
9.
10.
We aim at finding the smallest set of genes that can ensure highly accurate classification of cancers from microarray data by using supervised machine learning algorithms. The significance of finding the minimum gene subsets is three-fold: 1) it greatly reduces the computational burden and "noise" arising from irrelevant genes. In the examples studied in this paper, finding the minimum gene subsets even allows for extraction of simple diagnostic rules which lead to accurate diagnosis without the need for any classifiers, 2) it simplifies gene expression tests to include only a very small number of genes rather than thousands of genes, which can bring down the cost for cancer testing significantly, 3) it calls for further investigation into the possible biological relationship between these small numbers of genes and cancer development and treatment. Our simple yet very effective method involves two steps. In the first step, we choose some important genes using a feature importance ranking scheme. In the second step, we test the classification capability of all simple combinations of those important genes by using a good classifier. For three "small" and "simple" data sets with two, three, and four cancer (sub)types, our approach obtained very high accuracy with only two or three genes. For a "large" and "complex" data set with 14 cancer types, we divided the whole problem into a group of binary classification problems and applied the 2-step approach to each of these binary classification problems. Through this "divide-and-conquer" approach, we obtained accuracy comparable to previously reported results but with only 28 genes rather than 16,063 genes. In general, our method can significantly reduce the number of genes required for highly reliable diagnosis  相似文献   

11.
By using chromosome images as a framework, algorithms for finding most dissimilar images are presented and illustrated by examples. In terms of angles, a chromosome image consists of two exterior biangles and two interior biangles. Biangles are defined and classified into 180° biangles, >180° biangles and <180° biangles. The dissimilarity of biangles and its geometric interpretation together with various properties of biangles are also presented. The results may have useful applications in pattern recognition, scene analysis, information storage and retrieval, artificial intelligence and fuzzy set theory.  相似文献   

12.
Introduction: Breast cancer subtypes are currently defined by a combination of morphologic, genomic, and proteomic characteristics. These subtypes provide a molecular portrait of the tumor that aids diagnosis, prognosis, and treatment escalation/de-escalation options. Gene expression signatures describing intrinsic breast cancer subtypes for predicting risk of recurrence have been rapidly adopted in the clinic. Despite the use of subtype classifications, many patients develop drug resistance, breast cancer recurrence, or therapy failure.

Areas covered: This review provides a summary of immunohistochemistry, reverse phase protein array, mass spectrometry, and integrative studies that are revealing differences in biological functions within and between breast cancer subtypes. We conclude with a discussion of rigor and reproducibility for proteomic-based biomarker discovery.

Expert commentary: Innovations in proteomics, including implementation of assay guidelines and standards, are facilitating refinement of breast cancer subtypes. Proteomic and phosphoproteomic information distinguish biologically functional subtypes, are predictive of recurrence, and indicate likelihood of drug resistance. Actionable, activated signal transduction pathways can now be quantified and characterized. Proteomic biomarker validation in large, well-designed studies should become a public health priority to capitalize on the wealth of information gleaned from the proteome.  相似文献   


13.
Conotoxins are disulfide rich small peptides that target a broad spectrum of ion-channels and neuronal receptors. They offer promising avenues in the treatment of chronic pain, epilepsy and cardiovascular diseases. Assignment of newly sequenced mature conotoxins into appropriate superfamilies using a computational approach could provide valuable preliminary information on the biological and pharmacological functions of the toxins. However, creation of protein sequence patterns for the reliable identification and classification of new conotoxin sequences may not be effective due to the hypervariability of mature toxins. With the aim of formulating an in silico approach for the classification of conotoxins into superfamilies, we have incorporated the concept of pseudo-amino acid composition to represent a peptide in a mathematical framework that includes the sequence-order effect along with conventional amino acid composition. The polarity index attribute, which encodes information such as residue surface buriability, polarity, and hydropathy, was used to store the sequence-order effect. Several methods like BLAST, ISort (Intimate Sorting) predictor, least Hamming distance algorithm, least Euclidean distance algorithm and multi-class support vector machines (SVMs), were explored for superfamily identification. The SVMs outperform other methods providing an overall accuracy of 88.1% for all correct predictions with generalized squared correlation of 0.75 using jackknife cross-validation test for A, M, O and T superfamilies and a negative set consisting of short cysteine rich sequences from different eukaryotes having diverse functions. The computed sensitivity and specificity for the superfamilies were found to be in the range of 84.0-94.1% and 80.0-95.5%, respectively, attesting to the efficacy of multi-class SVMs for the successful in silico classification of the conotoxins into their superfamilies.  相似文献   

14.
Genes are often classified into biologically related groups so that inferences on their functions can be made. This paper demonstrates that the di-codon usage is a useful feature for gene classification and gives better classification accuracy than the codon usage. Our experiments with different classifiers show that support vector machines performs better than other classifiers in classifying genes by using di-codon usage as features. The method is illustrated on 1841 HLA sequences which are classified into two major classes, HLA-I and HLA-II, and further classified into the subclasses of major classes. By using both codon and di-codon features, we show near perfect accuracies in the classification of HLA molecules into major classes and their sub-classes.  相似文献   

15.
16.
MOTIVATION: The increasing use of DNA microarray-based tumor gene expression profiles for cancer diagnosis requires mathematical methods with high accuracy for solving clustering, feature selection and classification problems of gene expression data. RESULTS: New algorithms are developed for solving clustering, feature selection and classification problems of gene expression data. The clustering algorithm is based on optimization techniques and allows the calculation of clusters step-by-step. This approach allows us to find as many clusters as a data set contains with respect to some tolerance. Feature selection is crucial for a gene expression database. Our feature selection algorithm is based on calculating overlaps of different genes. The database used, contains over 16 000 genes and this number is considerably reduced by feature selection. We propose a classification algorithm where each tissue sample is considered as the center of a cluster which is a ball. The results of numerical experiments confirm that the classification algorithm in combination with the feature selection algorithm perform slightly better than the published results for multi-class classifiers based on support vector machines for this data set. AVAILABILITY: Available on request from the authors.  相似文献   

17.
18.
Yan X  Zheng T 《BMC genomics》2008,9(Z2):S14

Background

Gene expression data extracted from microarray experiments have been used to study the difference between mRNA abundance of genes under different conditions. In one of such experiments, thousands of genes are measured simultaneously, which provides a high-dimensional feature space for discriminating between different sample classes. However, most of these dimensions are not informative about the between-class difference, and add noises to the discriminant analysis.

Results

In this paper we propose and study feature selection methods that evaluate the "informativeness" of a set of genes. Two measures of information based on multigene expression profiles are considered for a backward information-driven screening approach for selecting important gene features. By considering multigene expression profiles, we are able to utilize interaction information among these genes. Using a breast cancer data, we illustrate our methods and compare them to the performance of existing methods.

Conclusion

We illustrate in this paper that methods considering gene-gene interactions have better classification power in gene expression analysis. In our results, we identify important genes with relative large p-values from single gene tests. This indicates that these are genes with weak marginal information but strong interaction information, which will be overlooked by strategies that only examine individual genes.
  相似文献   

19.
Prostate cancers that clinically appear to be localized may nonetheless respond poorly to curative treatment. Pretreatment prostate-specific antigen (PSA) level, biopsy Gleason score, and percentage of positive biopsies are all at least as important as clinical stage in predicting treatment outcome. A patient with a nonpalpable tumor, stage T1c disease, serum PSA of 12 ng/mL, and a Gleason score of 8 to 10 in 2 of 12 biopsy cores has a relatively poor prognosis. In a high-risk patient such as this one, the recommended treatment strategy involves a combination of brachytherapy and conformal external beam radiotherapy. In studies comparing treatments in patients stratified according to a variety of risk measures, this combination has shown biochemical disease-free survival rates superior to those seen following radical prostatectomy. The role of androgen suppression remains unclear.  相似文献   

20.
The most common treatment options for men with clinically localized prostate cancer include radical prostatectomy and radiation therapy. The choice between these options is often controversial, and selecting the optimal treatment poses a great challenge for patients and physicians. Factors important to the decision include age and life expectancy of the patient, the natural history of the prostate cancer, how curable the disease is, and the morbidity of treatment. Use of these criteria to select treatment for a healthy, 70-year-old man presenting with a nonpalpable tumor, stage T1c disease, serum prostate-specific antigen of 12 ng/mL, and an adenocarcinoma with a Gleason score of 8 that is present in 2 of 12 biopsy cores would lead to the choice of radical prostatectomy over radiation therapy. Data show that such a patient has a life expectancy of more than 12.3 years if the prostate cancer can be cured and a high probability of dying from the disease if it is not cured. Data further show that radical prostatectomy in such a patient would confer a survival advantage over radiation therapy without resulting in greater complications or reduction in quality of life.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号