共查询到20条相似文献,搜索用时 15 毫秒
1.
Projective non-negative matrix factorization (PNMF) projects high-dimensional non-negative examples X onto a lower-dimensional subspace spanned by a non-negative basis W and considers WT X as their coefficients, i.e., X≈WWT X. Since PNMF learns the natural parts-based representation Wof X, it has been widely used in many fields such as pattern recognition and computer vision. However, PNMF does not perform well in classification tasks because it completely ignores the label information of the dataset. This paper proposes a Discriminant PNMF method (DPNMF) to overcome this deficiency. In particular, DPNMF exploits Fisher''s criterion to PNMF for utilizing the label information. Similar to PNMF, DPNMF learns a single non-negative basis matrix and needs less computational burden than NMF. In contrast to PNMF, DPNMF maximizes the distance between centers of any two classes of examples meanwhile minimizes the distance between any two examples of the same class in the lower-dimensional subspace and thus has more discriminant power. We develop a multiplicative update rule to solve DPNMF and prove its convergence. Experimental results on four popular face image datasets confirm its effectiveness comparing with the representative NMF and PNMF algorithms. 相似文献
2.
3.
4.
Non-negative matrix factorization (NMF) condenses high-dimensional data into lower-dimensional models subject to the requirement that data can only be added, never subtracted. However, the NMF problem does not have a unique solution, creating a need for additional constraints (regularization constraints) to promote informative solutions. Regularized NMF problems are more complicated than conventional NMF problems, creating a need for computational methods that incorporate the extra constraints in a reliable way. We developed novel methods for regularized NMF based on block-coordinate descent with proximal point modification and a fast optimization procedure over the alpha simplex. Our framework has important advantages in that it (a) accommodates for a wide range of regularization terms, including sparsity-inducing terms like the penalty, (b) guarantees that the solutions satisfy necessary conditions for optimality, ensuring that the results have well-defined numerical meaning, (c) allows the scale of the solution to be controlled exactly, and (d) is computationally efficient. We illustrate the use of our approach on in the context of gene expression microarray data analysis. The improvements described remedy key limitations of previous proposals, strengthen the theoretical basis of regularized NMF, and facilitate the use of regularized NMF in applications. 相似文献
5.
Advances in DNA microarray technologies have made gene expression profiles a significant candidate in identifying different types of cancers. Traditional learning-based cancer identification methods utilize labeled samples to train a classifier, but they are inconvenient for practical application because labels are quite expensive in the clinical cancer research community. This paper proposes a semi-supervised projective non-negative matrix factorization method (Semi-PNMF) to learn an effective classifier from both labeled and unlabeled samples, thus boosting subsequent cancer classification performance. In particular, Semi-PNMF jointly learns a non-negative subspace from concatenated labeled and unlabeled samples and indicates classes by the positions of the maximum entries of their coefficients. Because Semi-PNMF incorporates statistical information from the large volume of unlabeled samples in the learned subspace, it can learn more representative subspaces and boost classification performance. We developed a multiplicative update rule (MUR) to optimize Semi-PNMF and proved its convergence. The experimental results of cancer classification for two multiclass cancer gene expression profile datasets show that Semi-PNMF outperforms the representative methods. 相似文献
6.
7.
Sandra Ortega-Martorell Paulo J. G. Lisboa Alfredo Vellido Rui V. Sim?es Martí Pumarola Margarida Julià-Sapé Carles Arús 《PloS one》2012,7(10)
Background
Pattern Recognition techniques can provide invaluable insights in the field of neuro-oncology. This is because the clinical analysis of brain tumors requires the use of non-invasive methods that generate complex data in electronic format. Magnetic Resonance (MR), in the modalities of spectroscopy (MRS) and spectroscopic imaging (MRSI), has been widely applied to this purpose. The heterogeneity of the tissue in the brain volumes analyzed by MR remains a challenge in terms of pathological area delimitation.Methodology/Principal Findings
A pre-clinical study was carried out using seven brain tumor-bearing mice. Imaging and spectroscopy information was acquired from the brain tissue. A methodology is proposed to extract tissue type-specific sources from these signals by applying Convex Non-negative Matrix Factorization (Convex-NMF). Its suitability for the delimitation of pathological brain area from MRSI is experimentally confirmed by comparing the images obtained with its application to selected target regions, and to the gold standard of registered histopathology data. The former showed good accuracy for the solid tumor region (proliferation index (PI)>30%). The latter yielded (i) high sensitivity and specificity in most cases, (ii) acquisition conditions for safe thresholds in tumor and non-tumor regions (PI>30% for solid tumoral region; ≤5% for non-tumor), and (iii) fairly good results when borderline pixels were considered.Conclusions/Significance
The unsupervised nature of Convex-NMF, which does not use prior information regarding the tumor area for its delimitation, places this approach one step ahead of classical label-requiring supervised methods for discrimination between tissue types, minimizing the negative effect of using mislabeled voxels. Convex-NMF also relaxes the non-negativity constraints on the observed data, which allows for a natural representation of the MRSI signal. This should help radiologists to accurately tackle one of the main sources of uncertainty in the clinical management of brain tumors, which is the difficulty of appropriately delimiting the pathological area. 相似文献8.
9.
10.
In the past decades,advances in high-throughput technologies have led to the generation of huge amounts of biological data that require analysis and interpretation.Recently,nonnegative matrix factorization(NMF) has been introduced as an efficient way to reduce the complexity of data as well as to interpret them,and has been applied to various fields of biological research.In this paper,we present CloudNMF,a distributed open-source implementation of NMF on a MapReduce framework.Experimental evaluation demonstrated that CloudNMF is scalable and can be used to deal with huge amounts of data,which may enable various kinds of a high-throughput biological data analysis in the cloud.CloudNMF is freely accessible at http://admis.fudan.edu.cn/projects/CloudNMF.html. 相似文献
11.
The ample variety of labeling dyes and staining methods available in fluorescence microscopy has enabled biologists to advance in the understanding of living organisms at cellular and molecular level. When two or more fluorescent dyes are used in the same preparation, or one dye is used in the presence of autofluorescence, the separation of the fluorescent emissions can become problematic. Various approaches have been recently proposed to solve this problem. Among them, blind non-negative matrix factorization is gaining interest since it requires little assumptions about the spectra and concentration of the fluorochromes. In this paper, we propose a novel algorithm for blind spectral separation that addresses some of the shortcomings of existing solutions: namely, their dependency on the initialization and their slow convergence. We apply this new algorithm to two relevant problems in fluorescence microscopy: autofluorescence elimination and spectral unmixing of multi-labeled samples. Our results show that our new algorithm performs well when compared with the state-of-the-art approaches for a much faster implementation. 相似文献
12.
Predicting what items will be selected by a target user in the future is an important function for recommendation systems. Matrix factorization techniques have been shown to achieve good performance on temporal rating-type data, but little is known about temporal item selection data. In this paper, we developed a unified model that combines Multi-task Non-negative Matrix Factorization and Linear Dynamical Systems to capture the evolution of user preferences. Specifically, user and item features are projected into latent factor space by factoring co-occurrence matrices into a common basis item-factor matrix and multiple factor-user matrices. Moreover, we represented both within and between relationships of multiple factor-user matrices using a state transition matrix to capture the changes in user preferences over time. The experiments show that our proposed algorithm outperforms the other algorithms on two real datasets, which were extracted from Netflix movies and Last.fm music. Furthermore, our model provides a novel dynamic topic model for tracking the evolution of the behavior of a user over time. 相似文献
13.
14.
Jun Yao Qi Zhao Ying Yuan Li Zhang Xiaoming Liu W. K. Alfred Yung John N. Weinstein 《PloS one》2012,7(9)
Numerous prognostic gene expression signatures for breast cancer were generated previously with few overlap and limited insight into the biology of the disease. Here we introduce a novel algorithm named SCoR (Survival analysis using Cox proportional hazard regression and Random resampling) to apply random resampling and clustering methods in identifying gene features correlated with time to event data. This is shown to reduce overfitting noises involved in microarray data analysis and discover functional gene sets linked to patient survival. SCoR independently identified a common poor prognostic signature composed of cell proliferation genes from six out of eight breast cancer datasets. Furthermore, a sequential SCoR analysis on highly proliferative breast cancers repeatedly identified T/B cell markers as favorable prognosis factors. In glioblastoma, SCoR identified a common good prognostic signature of chromosome 10 genes from two gene expression datasets (TCGA and REMBRANDT), recapitulating the fact that loss of one copy of chromosome 10 (which harbors the tumor suppressor PTEN) is linked to poor survival in glioblastoma patients. SCoR also identified prognostic genes on sex chromosomes in lung adenocarcinomas, suggesting patient gender might be used to predict outcome in this disease. These results demonstrate the power of SCoR to identify common and biologically meaningful prognostic gene expression signatures. 相似文献
15.
In gene expression profiling studies, including single-cell RNA sequencing(sc RNA-seq)analyses, the identification and characterization of co-expressed genes provides critical information on cell identity and function. Gene co-expression clustering in sc RNA-seq data presents certain challenges. We show that commonly used methods for single-cell data are not capable of identifying co-expressed genes accurately, and produce results that substantially limit biological expectations of co-expressed genes. Herein, we present single-cell Latent-variable Model(sc LM), a gene coclustering algorithm tailored to single-cell data that performs well at detecting gene clusters with significant biologic context. Importantly, sc LM can simultaneously cluster multiple single-cell datasets, i.e., consensus clustering, enabling users to leverage single-cell data from multiple sources for novel comparative analysis. sc LM takes raw count data as input and preserves biological variation without being influenced by batch effects from multiple datasets. Results from both simulation data and experimental data demonstrate that sc LM outperforms the existing methods with considerably improved accuracy. To illustrate the biological insights of sc LM, we apply it to our in-house and public experimental sc RNA-seq datasets. sc LM identifies novel functional gene modules and refines cell states, which facilitates mechanism discovery and understanding of complex biosystems such as cancers. A user-friendly R package with all the key features of the sc LM method is available at https://github.com/QSong-github/sc LM. 相似文献
16.
RNAseq and microarray methods are frequently used to measure gene expression level. While similar in purpose, there are fundamental differences between the two technologies. Here, we present the largest comparative study between microarray and RNAseq methods to date using The Cancer Genome Atlas (TCGA) data. We found high correlations between expression data obtained from the Affymetrix one-channel microarray and RNAseq (Spearman correlations coefficients of ∼0.8). We also observed that the low abundance genes had poorer correlations between microarray and RNAseq data than high abundance genes. As expected, due to measurement and normalization differences, Agilent two-channel microarray and RNAseq data were poorly correlated (Spearman correlations coefficients of only ∼0.2). By examining the differentially expressed genes between tumor and normal samples we observed reasonable concordance in directionality between Agilent two-channel microarray and RNAseq data, although a small group of genes were found to have expression changes reported in opposite directions using these two technologies. Overall, RNAseq produces comparable results to microarray technologies in term of expression profiling. The RNAseq normalization methods RPKM and RSEM produce similar results on the gene level and reasonably concordant results on the exon level. Longer exons tended to have better concordance between the two normalization methods than shorter exons. 相似文献
17.
18.
19.
Kosuke Yoshihara Atsushi Tajima Tetsuro Yahata Shoji Kodama Hiroyuki Fujiwara Mitsuaki Suzuki Yoshitaka Onishi Masayuki Hatae Kazunobu Sueyoshi Hisaya Fujiwara Yoshiki Kudo Kohei Kotera Hideaki Masuzaki Hironori Tashiro Hidetaka Katabuchi Ituro Inoue Kenichi Tanaka 《PloS one》2010,5(3)