首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.  相似文献   

2.
Improving missing value estimation in microarray data with gene ontology   总被引:3,自引:0,他引:3  
MOTIVATION: Gene expression microarray experiments produce datasets with frequent missing expression values. Accurate estimation of missing values is an important prerequisite for efficient data analysis as many statistical and machine learning techniques either require a complete dataset or their results are significantly dependent on the quality of such estimates. A limitation of the existing estimation methods for microarray data is that they use no external information but the estimation is based solely on the expression data. We hypothesized that utilizing a priori information on functional similarities available from public databases facilitates the missing value estimation. RESULTS: We investigated whether semantic similarity originating from gene ontology (GO) annotations could improve the selection of relevant genes for missing value estimation. The relative contribution of each information source was automatically estimated from the data using an adaptive weight selection procedure. Our experimental results in yeast cDNA microarray datasets indicated that by considering GO information in the k-nearest neighbor algorithm we can enhance its performance considerably, especially when the number of experimental conditions is small and the percentage of missing values is high. The increase of performance was less evident with a more sophisticated estimation method. We conclude that even a small proportion of annotated genes can provide improvements in data quality significant for the eventual interpretation of the microarray experiments. AVAILABILITY: Java and Matlab codes are available on request from the authors. SUPPLEMENTARY MATERIAL: Available online at http://users.utu.fi/jotatu/GOImpute.html.  相似文献   

3.
MOTIVATION: Significance analysis of differential expression in DNA microarray data is an important task. Much of the current research is focused on developing improved tests and software tools. The task is difficult not only owing to the high dimensionality of the data (number of genes), but also because of the often non-negligible presence of missing values. There is thus a great need to reliably impute these missing values prior to the statistical analyses. Many imputation methods have been developed for DNA microarray data, but their impact on statistical analyses has not been well studied. In this work we examine how missing values and their imputation affect significance analysis of differential expression. RESULTS: We develop a new imputation method (LinCmb) that is superior to the widely used methods in terms of normalized root mean squared error. Its estimates are the convex combinations of the estimates of existing methods. We find that LinCmb adapts to the structure of the data: If the data are heterogeneous or if there are few missing values, LinCmb puts more weight on local imputation methods; if the data are homogeneous or if there are many missing values, LinCmb puts more weight on global imputation methods. Thus, LinCmb is a useful tool to understand the merits of different imputation methods. We also demonstrate that missing values affect significance analysis. Two datasets, different amounts of missing values, different imputation methods, the standard t-test and the regularized t-test and ANOVA are employed in the simulations. We conclude that good imputation alleviates the impact of missing values and should be an integral part of microarray data analysis. The most competitive methods are LinCmb, GMC and BPCA. Popular imputation schemes such as SVD, row mean, and KNN all exhibit high variance and poor performance. The regularized t-test is less affected by missing values than the standard t-test. AVAILABILITY: Matlab code is available on request from the authors.  相似文献   

4.
MOTIVATION: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. RESULTS: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. AVAILABILITY: The software is available at http://www.cs.umn.edu/~hskim/tools.html CONTACT: hpark@cs.umn.edu  相似文献   

5.
Wang D  Lv Y  Guo Z  Li X  Li Y  Zhu J  Yang D  Xu J  Wang C  Rao S  Yang B 《Bioinformatics (Oxford, England)》2006,22(23):2883-2889
MOTIVATION: Microarrays datasets frequently contain a large number of missing values (MVs), which need to be estimated and replaced for subsequent data mining. The focus of the paper is to study the effects of different MV treatments for cDNA microarray data on disease classification analysis. RESULTS: By analyzing five datasets, we demonstrate that among three kinds of classifiers evaluated in this study, support vector machine (SVM) classifiers are robust to varied MV imputation methods [e.g. replacing MVs by zero, K nearest-neighbor (KNN) imputation algorithm, local least square imputation and Bayesian principal component analysis], while the classification and regression tree classifiers are sensitive in terms of classification accuracy. The KNNclassifiers built on differentially expressed genes (DEGs) are robust to the varied MV treatments, but the performances of the KNN classifiers based on all measured genes can be significantly deteriorated when imputing MVs for genes with larger missing rate (MR) (e.g. MR > 5%). Generally, while replacing MVs by zero performs relatively poor, the other imputation algorithms have little difference in affecting classification performances of the SVM or KNN classifiers. We further demonstrate the power and feasibility of our recently proposed functional expression profile (FEP) approach as means to handle microarray data with MVs. The FEPs, which are derived from the functional modules that are enriched with sets of DEGs and thus can be consistently identified under varied MV treatments, achieve precise disease classification with better biological interpretation. We conclude that the choice of MV treatments should be determined in context of the later approaches used for disease classification. The suggested exclusion criterion of ignoring the genes with larger MR (e.g. >5%), while justifiable for some classifiers such as KNN classifiers, might not be considered as a general rule for all classifiers.  相似文献   

6.
7.
8.
Deciphering important genes and pathways from incomplete gene expression data could facilitate a better understanding of cancer. Different imputation methods can be applied to estimate the missing values. In our study, we evaluated various imputation methods for their performance in preserving signi?cant genes and pathways. In the ?rst step, 5% genes are considered in random for two types of ignorable and non-ignorable missingness mechanisms with various missing rates. Next,10 well-known imputation methods were applied to the complete datasets. The signi?cance analysis of microarrays(SAM) method was applied to detect the signi?cant genes in rectal and lung cancers to showcase the utility of imputation approaches in preserving signi?cant genes. To determine the impact of different imputation methods on the identi?cation of important genes, the chi-squared test was used to compare the proportions of overlaps between signi?cant genes detected from original data and those detected from the imputed datasets. Additionally, the signi?cant genes are tested for their enrichment in important pathways, using the Consensus Path DB. Our results showed that almost all the signi?cant genes and pathways of the original dataset can be detected in all imputed datasets, indicating that there is no signi?cant difference in the performance of various imputationmethods tested. The source code and selected datasets are available on http://pro?les.bs.ipm.ir/softwares/imputation_methods/.  相似文献   

9.
MOTIVATION: Missing values are problematic for the analysis of microarray data. Imputation methods have been compared in terms of the similarity between imputed and true values in simulation experiments and not of their influence on the final analysis. The focus has been on missing at random, while entries are missing also not at random. RESULTS: We investigate the influence of imputation on the detection of differentially expressed genes from cDNA microarray data. We apply ANOVA for microarrays and SAM and look to the differentially expressed genes that are lost because of imputation. We show that this new measure provides useful information that the traditional root mean squared error cannot capture. We also show that the type of missingness matters: imputing 5% missing not at random has the same effect as imputing 10-30% missing at random. We propose a new method for imputation (LinImp), fitting a simple linear model for each channel separately, and compare it with the widely used KNNimpute method. For 10% missing at random, KNNimpute leads to twice as many lost differentially expressed genes as LinImp. AVAILABILITY: The R package for LinImp is available at http://folk.uio.no/idasch/imp.  相似文献   

10.

Background

Regulation mechanisms between miRNAs and genes are complicated. To accomplish a biological function, a miRNA may regulate multiple target genes, and similarly a target gene may be regulated by multiple miRNAs. Wet-lab knowledge of co-regulating miRNAs is limited. This work introduces a computational method to group miRNAs of similar functions to identify co-regulating miRNAsfrom a similarity matrix of miRNAs.

Results

We define a novel information content of gene ontology (GO) to measure similarity between two sets of GO graphs corresponding to the two sets of target genes of two miRNAs. This between-graph similarity is then transferred as a functional similarity between the two miRNAs. Our definition of the information content is based on the size of a GO term’s descendants, but adjusted by a weight derived from its depth level and the GO relationships at its path to the root node or to the most informative common ancestor (MICA). Further, a self-tuning technique and the eigenvalues of the normalized Laplacian matrix are applied to determine the optimal parameters for the spectral clustering of the similarity matrix of the miRNAs.

Conclusions

Experimental results demonstrate that our method has better clustering performance than the existing edge-based, node-based or hybrid methods. Our method has also demonstrated a novel usefulness for the function annotation of new miRNAs, as reported in the detailed case studies.
  相似文献   

11.
Li BQ  Zhang J  Huang T  Zhang L  Cai YD 《Biochimie》2012,94(9):1910-1917
This paper presents a new method for identifying retinoblastoma related genes by integrating gene expression profile and shortest path in a functional linkage graph. With the existing protein-protein interaction data from STRING, a weighted functional linkage graph is constructed. 119 consistently differentially expressed genes between retinoblastoma and normal retina were obtained from the overlap of two gene expression studies of retinoblastoma. Then the shortest paths between each pair of these 119 genes were determined with Dijkstra's algorithm. Finally, all the genes present on the shortest paths were extracted and ranked according to their betweenness and the 119 shortest genes with a betweenness greater than 100 and with a p-value less than 0.05 were selected for further analysis. We also identified 53 retinoblastoma related miRNAs from published miRNA array data and most of the 238 (119 consistently differentially expressed genes and 119 shortest path genes) retinoblastoma genes were shown to be target genes of these 53 miRNAs. Interestingly, the genes we identified from both the gene expression profiles and the functional protein association network included more cancer genes than did the genes identified from the gene expression profiles alone. In addition, these genes also had greater functional similarity to the reported cancer genes than did the genes identified from the gene expression profiles alone. This study shows promising results and proves the efficiency of the proposed methods.  相似文献   

12.
BackgroundThere is a growing body of evidence associating microRNAs (miRNAs) with human diseases. MiRNAs are new key players in the disease paradigm demonstrating roles in several human diseases. The functional association between miRNAs and diseases remains largely unclear and far from complete. With the advent of high-throughput functional genomics techniques that infer genes and biological pathways dysregulted in diseases, it is now possible to infer functional association between diseases and biological molecules by integrating disparate biological information.ResultsHere, we first used Lasso regression model to identify miRNAs associated with disease signature as a proof of concept. Then we proposed an integrated approach that uses disease-gene associations from microarray experiments and text mining, and miRNA-gene association from computational predictions and protein networks to build functional associations network between miRNAs and diseases. The findings of the proposed model were validated against gold standard datasets using ROC analysis and results were promising (AUC=0.81). Our protein network-based approach discovered 19 new functional associations between prostate cancer and miRNAs. The new 19 associations were validated using miRNA expression data and clinical profiles and showed to act as diagnostic and prognostic prostate biomarkers. The proposed integrated approach allowed us to reconstruct functional associations between miRNAs and human diseases and uncovered functional roles of newly discovered miRNAs.ConclusionsLasso regression was used to find associations between diseases and miRNAs using their gene signature. Defining miRNA gene signature by integrating the downstream effect of miRNAs demonstrated better performance than the miRNA signature alone. Integrating biological networks and multiple data to define miRNA and disease gene signature demonstrated high performance to uncover new functional associations between miRNAs and diseases.  相似文献   

13.
14.
BACKGROUND: Orofacial development is a multifaceted process involving precise, spatio‐temporal expression of a panoply of genes. MicroRNAs (miRNAs), the largest family of noncoding RNAs involved in gene silencing, represent critical regulators of cell and tissue differentiation. MicroRNA gene expression profiling is an effective means of acquiring novel and valuable information regarding the expression and regulation of genes, under the control of miRNA, involved in mammalian orofacial development. METHODS: To identify differentially expressed miRNAs during mammalian orofacial ontogenesis, miRNA expression profiles from gestation day (GD) ‐12, ‐13 and ‐14 murine orofacial tissue were compared utilizing miRXplore microarrays from Miltenyi Biotech. Quantitative real‐time PCR was utilized for validation of gene expression changes. Cluster analysis of the microarray data was conducted with the clValid R package and the UPGMA clustering method. Functional relationships between selected miRNAs were investigated using Ingenuity Pathway Analysis. RESULTS: Expression of over 26% of the 588 murine miRNA genes examined was detected in murine orofacial tissues from GD‐12–GD‐14. Among these expressed genes, several clusters were seen to be developmentally regulated. Differential expression of miRNAs within such clusters wereshown to target genes encoding proteins involved in cell proliferation, cell adhesion, differentiation, apoptosis and epithelial‐mesenchymal transformation, all processes critical for normal orofacial development. CONCLUSIONS: Using miRNA microarray technology, unique gene expression signatures of hundreds of miRNAs in embryonic orofacial tissue were defined. Gene targeting and functional analysis revealed that the expression of numerous protein‐encoding genes, crucial to normal orofacial ontogeny, may be regulated by specific miRNAs. Birth Defects Research (Part A), 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

15.
Gaussian mixture clustering and imputation of microarray data   总被引:3,自引:0,他引:3  
MOTIVATION: In microarray experiments, missing entries arise from blemishes on the chips. In large-scale studies, virtually every chip contains some missing entries and more than 90% of the genes are affected. Many analysis methods require a full set of data. Either those genes with missing entries are excluded, or the missing entries are filled with estimates prior to the analyses. This study compares methods of missing value estimation. RESULTS: Two evaluation metrics of imputation accuracy are employed. First, the root mean squared error measures the difference between the true values and the imputed values. Second, the number of mis-clustered genes measures the difference between clustering with true values and that with imputed values; it examines the bias introduced by imputation to clustering. The Gaussian mixture clustering with model averaging imputation is superior to all other imputation methods, according to both evaluation metrics, on both time-series (correlated) and non-time series (uncorrelated) data sets.  相似文献   

16.
Antiquity of microRNAs and their targets in land plants   总被引:25,自引:0,他引:25       下载免费PDF全文
Axtell MJ  Bartel DP 《The Plant cell》2005,17(6):1658-1673
  相似文献   

17.
The mechanisms of latent tuberculosis (TB) infection remain elusive. Roles of microRNA (miRNA) have been highlighted in pathogen–host interactions recently. To identify miRNAs involved in the immune response to TB, expression profiles of miRNAs in CD4+ T cells from patients with latent TB, active TB and healthy controls were investigated by microarray assay and validated by RT‐qPCR. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were used to analyse the significant functions and involvement in signalling pathways of the differentially expressed miRNAs. To identify potential target genes for miR‐29, interferon‐γ (IFN‐γ) mRNA expression was measured by RT‐qPCR. Our results showed that 27 miRNAs were deregulated among the three groups. RT‐qPCR results were generally consistent with the microarray data. We observed an inverse correlation between miR‐29 level and IFN‐γ mRNA expression in CD4+ T cells. GO and KEGG pathway analysis showed that the possible target genes of deregulated miRNAs were significantly enriched in mitogen‐activated protein kinase signalling pathway, focal adhesion and extracellular matrix receptor interaction, which might be involved in the transition from latent to active TB. In all, for the first time, our study revealed that some miRNAs in CD4+ T cells were altered in latent and active TB. Function and pathway analysis highlighted the possible involvement of miRNA‐deregulated mRNAs in TB. The study might help to improve understanding of the relationship between miRNAs in CD4+ T cells and TB, and laid an important foundation for further identification of the underlying mechanisms of latent TB infection and its reactivation.  相似文献   

18.
19.
20.
To identify novel as well as conserved miRNAs in citrus, deep sequencing of small RNA library combined with microarray was performed in precocious trifoliate orange (an early flowering mutant of trifoliate orange, Poncirus trifoliata L. Raf.), resulting in the obtainment of a total of 114 conserved miRNAs belonging to 38 families and 155 novel miRNAs. The miRNA star sequences of 39 conserved miRNAs and 27 novel miRNAs were also discovered among newly identified miRNAs, providing additional evidence for the existence of miRNAs. Through degradome sequencing, 172 and 149 genes were identified as targets of conserved miRNAs and novel miRNAs, respectively. GO and KEGG annotation revealed that high ranked miRNA-target genes were those implicated in biological and metabolic processes. To characterize those miRNAs expressed at the juvenile and adult development stages of citrus, further analysis on the expression profiles of these miRNAs through hybridizing the commercial microarray and real-time PCR was performed. The results revealed that some miRNAs were down-regulated at adult stage compared with juvenile stage. Detailed comparison of the expression patterns of some miRNAs and corresponding target genes revealed the negative correlation between them, while few of them are positively correlated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号