共查询到20条相似文献,搜索用时 15 毫秒
1.
Face recognition is challenging especially when the images from different persons are similar to each other due to variations in illumination, expression, and occlusion. If we have sufficient training images of each person which can span the facial variations of that person under testing conditions, sparse representation based classification (SRC) achieves very promising results. However, in many applications, face recognition often encounters the small sample size problem arising from the small number of available training images for each person. In this paper, we present a novel face recognition framework by utilizing low-rank and sparse error matrix decomposition, and sparse coding techniques (LRSE+SC). Firstly, the low-rank matrix recovery technique is applied to decompose the face images per class into a low-rank matrix and a sparse error matrix. The low-rank matrix of each individual is a class-specific dictionary and it captures the discriminative feature of this individual. The sparse error matrix represents the intra-class variations, such as illumination, expression changes. Secondly, we combine the low-rank part (representative basis) of each person into a supervised dictionary and integrate all the sparse error matrix of each individual into a within-individual variant dictionary which can be applied to represent the possible variations between the testing and training images. Then these two dictionaries are used to code the query image. The within-individual variant dictionary can be shared by all the subjects and only contribute to explain the lighting conditions, expressions, and occlusions of the query image rather than discrimination. At last, a reconstruction-based scheme is adopted for face recognition. Since the within-individual dictionary is introduced, LRSE+SC can handle the problem of the corrupted training data and the situation that not all subjects have enough samples for training. Experimental results show that our method achieves the state-of-the-art results on AR, FERET, FRGC and LFW databases. 相似文献
2.
3.
Identifying subspace gene clusters from the gene expression data is useful for discovering novel functional gene interactions. In this paper, we propose to use low-rank representation (LRR) to identify the subspace gene clusters from microarray data. LRR seeks the lowest-rank representation among all the candidates that can represent the genes as linear combinations of the bases in the dataset. The clusters can be extracted based on the block diagonal representation matrix obtained using LRR, and they can well capture the intrinsic patterns of genes with similar functions. Meanwhile, the parameter of LRR can balance the effect of noise so that the method is capable of extracting useful information from the data with high level of background noise. Compared with traditional methods, our approach can identify genes with similar functions yet without similar expression profiles. Also, it could assign one gene into different clusters. Moreover, our method is robust to the noise and can identify more biologically relevant gene clusters. When applied to three public datasets, the results show that the LRR based method is superior to existing methods for identifying subspace gene clusters. 相似文献
4.
5.
6.
Jun Yang Brian D. Bennett Shujun Luo Kaoru Inoue Sara A. Grimm Gary P. Schroth Pierre R. Bushel H. Karimi Kinyamu Trevor K. Archer 《Molecular and cellular biology》2015,35(18):3225-3243
LIN28 is an evolutionarily conserved RNA-binding protein with critical functions in developmental timing and cancer. However, the molecular mechanisms underlying LIN28''s oncogenic properties are yet to be described. RNA-protein immunoprecipitation coupled with genome-wide sequencing (RIP-Seq) analysis revealed significant LIN28 binding within 843 mRNAs in breast cancer cells. Many of the LIN28-bound mRNAs are implicated in the regulation of RNA and cell metabolism. We identify heterogeneous nuclear ribonucleoprotein A1 (hnRNP A1), a protein with multiple roles in mRNA metabolism, as a LIN28-interacting partner. Subsequently, we used a custom computational method to identify differentially spliced gene isoforms in LIN28 and hnRNP A1 small interfering RNA (siRNA)-treated cells. The results reveal that these proteins regulate alternative splicing and steady-state mRNA expression of genes implicated in aspects of breast cancer biology. Notably, cells lacking LIN28 undergo significant isoform switching of the ENAH gene, resulting in a decrease in the expression of the ENAH exon 11a isoform. The expression of ENAH isoform 11a has been shown to be elevated in breast cancers that express HER2. Intriguingly, analysis of publicly available array data from the Cancer Genome Atlas (TCGA) reveals that LIN28 expression in the HER2 subtype is significantly different from that in other breast cancer subtypes. Collectively, our data suggest that LIN28 may regulate splicing and gene expression programs that drive breast cancer subtype phenotypes. 相似文献
7.
基于流形学习的基因表达谱数据可视化 总被引:2,自引:0,他引:2
基因表达谱的可视化本质上是高维数据的降维问题。采用流形学习算法来解决基因表达谱的降维数据可视化,讨论了典型的流形学习算法(Isomap和LLE)在表达谱降维中的适用性。通过类内/类间距离定量评价数据降维的效果,对两个典型基因芯片数据集(结肠癌基因表达谱数据集和急性白血病基因表达谱数据集)进行降维分析,发现两个数据集的本征维数都低于3,因而可以用流形学习方法在低维投影空间中进行可视化。与传统的降维方法(如PCA和MDS)的投影结果作比较,显示Isomap流形学习方法有更好的可视化效果。 相似文献
8.
9.
Evaluation of Gene Structure Prediction Programs 总被引:2,自引:0,他引:2
We evaluate a number of computer programs designed to predict the structure of protein coding genes in genomic DNA sequences. Computational gene identification is set to play an increasingly important role in the development of the genome projects, as emphasis turns from mapping to large-scale sequencing. The evaluation presented here serves both to assess the current status of the problem and to identify the most promising approaches to ensure further progress. The programs analyzed were uniformly tested on a large set of vertebrate sequences with simple gene structure, and several measures of predictive accuracy were computed at the nucleotide, exon, and protein product levels. The results indicated that the predictive accuracy of the programs analyzed was lower than originally found. The accuracy was even lower when considering only those sequences that had recently been entered and that did not show any similarity to previously entered sequences. This indicates that the programs are overly dependent on the particularities of the examples they learn from. For most of the programs, accuracy in this test set ranged from 0.60 to 0.70 as measured by the Correlation Coefficient (where 1.0 corresponds to a perfect prediction and 0.0 is the value expected for a random prediction), and the average percentage of exons exactly identified was less than 50%. Only those programs including protein sequence database searches showed substantially greater accuracy. The accuracy of the programs was severely affected by relatively high rates of sequence errors. Since the set on which the programs were tested included only relatively short sequences with simple gene structure, the accuracy of the programs is likely to be even lower when used for large uncharacterized genomic sequences with complex structure. While in such cases, programs currently available may still be of great use in pinpointing the regions likely to contain exons, they are far from being powerful enough to elucidate its genomic structure completely. 相似文献
10.
Gene set analysis allows the inclusion of knowledge from established gene sets, such as gene pathways, and potentially improves the power of detecting differentially expressed genes. However, conventional methods of gene set analysis focus on gene marginal effects in a gene set, and ignore gene interactions which may contribute to complex human diseases. In this study, we propose a method of gene interaction enrichment analysis, which incorporates knowledge of predefined gene sets (e.g. gene pathways) to identify enriched gene interaction effects on a phenotype of interest. In our proposed method, we also discuss the reduction of irrelevant genes and the extraction of a core set of gene interactions for an identified gene set, which contribute to the statistical variation of a phenotype of interest. The utility of our method is demonstrated through analyses on two publicly available microarray datasets. The results show that our method can identify gene sets that show strong gene interaction enrichments. The enriched gene interactions identified by our method may provide clues to new gene regulation mechanisms related to the studied phenotypes. In summary, our method offers a powerful tool for researchers to exhaustively examine the large numbers of gene interactions associated with complex human diseases, and can be a useful complement to classical gene set analyses which only considers single genes in a gene set. 相似文献
11.
12.
13.
Many experiments in the past have demonstrated the requirement of de novo gene expression during memory formation. In contrast to the initial reductionistic view that genes relevant to learning and memory would be easily found and would provide a simple key to understand this brain function, it is becoming apparent that the genetic contribution to memory is complex. Previous approaches have been focused on individual genes or genetic pathways and failed to address the massively parallel nature of genome activities and collective behavior of the genes that ultimately control the molecular mechanisms underlying brain function. In view of the broad variety of genes and the cross talk of genetic pathways involved in this regulation, only gene expression profiles may reflect the complete behavior of regulatory pathways. In this review we illustrate how DNA microarray-based gene expression profiling may help to dissect and analyze the complex mechanisms involved in gene regulation during the acquisition and storage of memory in the mammalian brain. 相似文献
14.
Gene set analysis methods, which consider predefined groups of genes in the analysis of genomic data, have been successfully applied for analyzing gene expression data in cross-sectional studies. The time-course gene set analysis (TcGSA) introduced here is an extension of gene set analysis to longitudinal data. The proposed method relies on random effects modeling with maximum likelihood estimates. It allows to use all available repeated measurements while dealing with unbalanced data due to missing at random (MAR) measurements. TcGSA is a hypothesis driven method that identifies a priori defined gene sets with significant expression variations over time, taking into account the potential heterogeneity of expression within gene sets. When biological conditions are compared, the method indicates if the time patterns of gene sets significantly differ according to these conditions. The interest of the method is illustrated by its application to two real life datasets: an HIV therapeutic vaccine trial (DALIA-1 trial), and data from a recent study on influenza and pneumococcal vaccines. In the DALIA-1 trial TcGSA revealed a significant change in gene expression over time within 69 gene sets during vaccination, while a standard univariate individual gene analysis corrected for multiple testing as well as a standard a Gene Set Enrichment Analysis (GSEA) for time series both failed to detect any significant pattern change over time. When applied to the second illustrative data set, TcGSA allowed the identification of 4 gene sets finally found to be linked with the influenza vaccine too although they were found to be associated to the pneumococcal vaccine only in previous analyses. In our simulation study TcGSA exhibits good statistical properties, and an increased power compared to other approaches for analyzing time-course expression patterns of gene sets. The method is made available for the community through an R package. 相似文献
15.
We investigate the use of markers to hasten the recovery of the recipient genome during an introgression breeding program. The effects of time and intensity of selection, population size, number and position of selected markers are studied for chromosomes either carrying or not carrying the introgressed gene. We show that marker assisted selection may lead to a gain in time of about two generations, an efficiency below previous theoretical predictions. Markers are most useful when their map position is known. In the early generations, it is shown that increasing the number of markers over three per non-carrier chromosome is not efficient, that the segment surrounding the introgressed gene is better controlled by rather distant markers unless high selection intensity can be applied, and that selection on this segment first can reduce the selection intensity available for selection on non-carrier chromosomes. These results are used to propose an optimal strategy for selection on the whole genome, making the most of available material and conditions (e.g., population size and fertility, genetic map). 相似文献
16.
An noticeable number of biclustering approaches have been proposed proposed for the study of gene expression data, especially for discovering functionally related gene sets under different subsets of experimental conditions. In this context, recognizing groups of co-expressed or co-regulated genes, that is, genes which follow a similar expression pattern, is one of the main objectives. Due to the problem complexity, heuristic searches are usually used instead of exhaustive algorithms. Furthermore, most of biclustering approaches use a measure or cost function that determines the quality of biclusters. Having a suitable quality metric for bicluster is a critical aspect, not only for guiding the search, but also for establishing a comparison criteria among the results obtained by different biclustering techniques. In this paper, we analyse a large number of existing approaches to quality measures for gene expression biclusters, as well as we present a comparative study of them based on their capability to recognize different expression patterns in biclusters. 相似文献
17.
Suzana Ruscanu Luc Jouneau Céline Urien Mickael Bourge Jér?me Lecardonnel Marco Moroldo Benoit Loup Marc Dalod Jamila Elhmouzi-Younes Claudia Bevilacqua Jayne Hope Damien Vitour Stéphan Zientara Gilles Meyer Isabelle Schwartz-Cornil 《Journal of virology》2013,87(16):9333-9343
Human and animal hemorrhagic viruses initially target dendritic cells (DCs). It has been proposed, but not documented, that both plasmacytoid DCs (pDCs) and conventional DCs (cDCs) may participate in the cytokine storm encountered in these infections. In order to evaluate the contribution of DCs in hemorrhagic virus pathogenesis, we performed a genome-wide expression analysis during infection by Bluetongue virus (BTV), a double-stranded RNA virus that induces hemorrhagic fever in sheep and initially infects cDCs. Both pDCs and cDCs accumulated in regional lymph nodes and spleen during BTV infection. The gene response profiles were performed at the onset of the disease and markedly differed with the DC subtypes and their lymphoid organ location. An integrative knowledge-based analysis revealed that blood pDCs displayed a gene signature related to activation of systemic inflammation and permeability of vasculature. In contrast, the gene profile of pDCs and cDCs in lymph nodes was oriented to inhibition of inflammation, whereas spleen cDCs did not show a clear functional orientation. These analyses indicate that tissue location and DC subtype affect the functional gene expression program induced by BTV and suggest the involvement of blood pDCs in the inflammation and plasma leakage/hemorrhage during BTV infection in the real natural host of the virus. These findings open the avenue to target DCs for therapeutic interventions in viral hemorrhagic diseases. 相似文献
18.
In this paper, based on low-rank representation and eigenface extraction, we present an improvement to the well known Sparse Representation based Classification (SRC). Firstly, the low-rank images of the face images of each individual in training subset are extracted by the Robust Principal Component Analysis (Robust PCA) to alleviate the influence of noises (e.g., illumination difference and occlusions). Secondly, Singular Value Decomposition (SVD) is applied to extract the eigenfaces from these low-rank and approximate images. Finally, we utilize these eigenfaces to construct a compact and discriminative dictionary for sparse representation. We evaluate our method on five popular databases. Experimental results demonstrate the effectiveness and robustness of our method. 相似文献
19.
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions. 相似文献
20.
Gene Set Expression Comparison kit for BRB-ArrayTools 总被引:1,自引:0,他引:1