首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D-4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions.  相似文献   

3.
4.
5.
6.
We describe Sebida, a database of genes with sex-biased expression. The database integrates results from multiple, independent microarray studies comparing male and female gene expression in Drosophila melanogaster, Drosophila simulans and Anopheles gambiae. Sebida uses standard nomenclature, which allows individual genes to be compared across different microarray platforms and to be queried by gene name, symbol, or annotation number. In addition to ratios of male/female expression for each gene, Sebida also contains information useful for evolutionary studies, such as local recombination rate, degree of codon bias and interspecific divergence at synonymous and non-synonymous sites. AVAILABILITY: Sebida can be accessed at http://www.sebida.de  相似文献   

7.
8.
9.
10.
Microarray reality checks in the context of a complex disease   总被引:9,自引:0,他引:9  
A problem in analyzing microarray-based gene expression data is the separation of genes causally involved in a disease from innocent bystander genes, whose expression levels have been secondarily altered by primary changes elsewhere. To investigate this issue systematically in the context of a class of complex human diseases, we have compared microarray-based gene expression data with non-microarray-based clinical and biological data about the schizophrenias to ask whether these two approaches prioritize the same genes. We find that genes whose expression changes are deemed to be of importance from microarrays are rarely those classified as of importance from clinical, in situ, molecular, single-nucleotide polymorphism (SNP) association, knockout and drug perturbation data. This disparity is not limited to the schizophrenias but characterizes other human disease data sets. It also extends to biological validation of microarray data in model organisms, in which genome-wide phenotypic data have been systematically compared with microarray data. In addition, different bioinformatic protocols applied to the same microarray data yield quite different gene sets and thus make clinical decisions less straightforward. We discuss how progress may be improved in the clinical area by the assignment of high-quality phenotypic values to each member of a microarray-assigned gene set.  相似文献   

11.
12.

Background

With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.

Results

We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.

Conclusions

A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
  相似文献   

13.
MOTIVATION: Gene expression patterns obtained by in situ mRNA hybridization provide important information about different genes during Drosophila embryogenesis. So far, annotations of these images are done by manually assigning a subset of anatomy ontology terms to an image. This time-consuming process depends heavily on the consistency of experts. RESULTS: We develop a system to automatically annotate a fruitfly's embryonic tissue in which a gene has expression. We formulate the task as an image pattern recognition problem. For a new fly embryo image, our system answers two questions: (1) Which stage range does an image belong to? (2) Which annotations should be assigned to an image? We propose to identify the wavelet embryo features by multi-resolution 2D wavelet discrete transform, followed by min-redundancy max-relevance feature selection, which yields optimal distinguishing features for an annotation. We then construct a series of parallel bi-class predictors to solve the multi-objective annotation problem since each image may correspond to multiple annotations. SUPPLEMENTARY INFORMATION: The complete annotation prediction results are available at: http://www.cs.niu.edu/~jzhou/papers/fruitfly and http://research.janelia.org/peng/proj/fly_embryo_annotation/. The datasets used in experiments will be available upon request to the correspondence author.  相似文献   

14.
15.
16.
A key challenge in genetics is identifying the functional roles of genes in pathways. Numerous functional genomics techniques (e.g. machine learning) that predict protein function have been developed to address this question. These methods generally build from existing annotations of genes to pathways and thus are often unable to identify additional genes participating in processes that are not already well studied. Many of these processes are well studied in some organism, but not necessarily in an investigator''s organism of interest. Sequence-based search methods (e.g. BLAST) have been used to transfer such annotation information between organisms. We demonstrate that functional genomics can complement traditional sequence similarity to improve the transfer of gene annotations between organisms. Our method transfers annotations only when functionally appropriate as determined by genomic data and can be used with any prediction algorithm to combine transferred gene function knowledge with organism-specific high-throughput data to enable accurate function prediction.We show that diverse state-of-art machine learning algorithms leveraging functional knowledge transfer (FKT) dramatically improve their accuracy in predicting gene-pathway membership, particularly for processes with little experimental knowledge in an organism. We also show that our method compares favorably to annotation transfer by sequence similarity. Next, we deploy FKT with state-of-the-art SVM classifier to predict novel genes to 11,000 biological processes across six diverse organisms and expand the coverage of accurate function predictions to processes that are often ignored because of a dearth of annotated genes in an organism. Finally, we perform in vivo experimental investigation in Danio rerio and confirm the regulatory role of our top predicted novel gene, wnt5b, in leftward cell migration during heart development. FKT is immediately applicable to many bioinformatics techniques and will help biologists systematically integrate prior knowledge from diverse systems to direct targeted experiments in their organism of study.  相似文献   

17.
18.
19.
Recent whole-genome studies and in-depth expressed sequence tag (EST) analyses have identified most of the developmentally relevant genes in the urochordate, Ciona intestinalis. In this study, we made use of a large-scale oligo-DNA microarray to further investigate and identify genes with specific or correlated expression profiles, and we report global gene expression profiles for about 66% of all the C. intestinalis genes that are expressed during its life cycle. We succeeded in categorizing the data set into 5 large clusters and 49 sub-clusters based on the expression profile of each gene. This revealed the higher order of gene expression profiles during the developmental and aging stages. Furthermore, a combined analysis of microarray data with the EST database revealed the gene groups that were expressed at a specific stage or in a specific organ of the adult. This study provides insights into the complex structure of ascidian gene expression, identifies co-expressed gene groups and marker genes and makes predictions for the biological roles of many uncharacterized genes. This large-scale oligo-DNA microarray for C. intestinalis should facilitate the understanding of global gene expression and gene networks during the development and aging of a basal chordate.  相似文献   

20.
As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号