首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Gene expression array technology has made possible the assay of expression levels of tens of thousands of genes at a time; large databases of such measurements are currently under construction. One important use of such databases is the ability to search for experiments that have similar gene expression levels as a query, potentially identifying previously unsuspected relationships among cellular states. Such searches depend crucially on the metric used to assess the similarity between pairs of experiments. The complex joint distribution of gene expression levels, particularly their correlational structure and non-normality, make simple similarity metrics such as Euclidean distance or correlational similarity scores suboptimal for use in this application. We present a similarity metric for gene expression array experiments that takes into account the complex joint distribution of expression values. We provide a computationally tractable approximation to this measure, and have implemented a database search tool based on it. We discuss implementation issues and efficiency, and we compare our new metric to other standard metrics.  相似文献   

2.
3.
4.
Wang Y  Robbins KR  Rekaya R 《PloS one》2010,5(10):e13239
Assessing conservation/divergence of gene expression across species is important for the understanding of gene regulation evolution. Although advances in microarray technology have provided massive high-dimensional gene expression data, the analysis of such data is still challenging. To date, assessing cross-species conservation of gene expression using microarray data has been mainly based on comparison of expression patterns across corresponding tissues, or comparison of co-expression of a gene with a reference set of genes. Because direct and reliable high-throughput experimental data on conservation of gene expression are often unavailable, the assessment of these two computational models is very challenging and has not been reported yet. In this study, we compared one corresponding tissue based method and three co-expression based methods for assessing conservation of gene expression, in terms of their pair-wise agreements, using a frequently used human-mouse tissue expression dataset. We find that 1) the co-expression based methods are only moderately correlated with the corresponding tissue based methods, 2) the reliability of co-expression based methods is affected by the size of the reference ortholog set, and 3) the corresponding tissue based methods may lose some information for assessing conservation of gene expression. We suggest that the use of either of these two computational models to study the evolution of a gene's expression may be subject to great uncertainty, and the investigation of changes in both gene expression patterns over corresponding tissues and co-expression of the gene with other genes is necessary.  相似文献   

5.
6.
7.
MOTIVATION: Many bioinformatic approaches exist for finding novel genes within genomic sequence data. Traditionally, homology search-based methods are often the first approach employed in determining whether a novel gene exists that is similar to a known gene. Unfortunately, distantly related genes or motifs often are difficult to find using single query-based homology search algorithms against large sequence datasets such as the human genome. Therefore, the motivation behind this work was to develop an approach to enhance the sensitivity of traditional single query-based homology algorithms against genomic data without losing search selectivity. RESULTS: We demonstrate that by searching against a genome fragmented into all possible reading frames, the sensitivity of homology-based searches is enhanced without degrading its selectivity. Using the ETS-domain, bromodomain and acetyl-CoA acetyltransferase gene as queries, we were able to demonstrate that direct protein-protein searches using BLAST2P or FASTA3 against a human genome segmented among all possible reading frames and translated was substantially more sensitive than traditional protein-DNA searches against a raw genomic sequence using an application such as TBLAST2N. Receiver operating characteristic analysis was employed to demonstrate that the algorithms remained selective, while comparisons of the algorithms showed that the protein-protein searches were more sensitive in identifying hits. Therefore, through the overprediction of reading frames by this method and the increased sensitivity of protein-protein based homology search algorithms, a genome can be deeply mined, potentially finding hits overlooked by protein-DNA searches against raw genomic data.  相似文献   

8.
9.
10.
11.
12.
Quantitative proteomics relies on accurate protein identification, which often is carried out by automated searching of a sequence database with tandem mass spectra of peptides. When these spectra contain limited information, automated searches may lead to incorrect peptide identifications. It is therefore necessary to validate the identifications by careful manual inspection of the mass spectra. Not only is this task time-consuming, but the reliability of the validation varies with the experience of the analyst. Here, we report a systematic approach to evaluating peptide identifications made by automated search algorithms. The method is based on the principle that the candidate peptide sequence should adequately explain the observed fragment ions. Also, the mass errors of neighboring fragments should be similar. To evaluate our method, we studied tandem mass spectra obtained from tryptic digests of E. coli and HeLa cells. Candidate peptides were identified with the automated search engine Mascot and subjected to the manual validation method. The method found correct peptide identifications that were given low Mascot scores (e.g., 20-25) and incorrect peptide identifications that were given high Mascot scores (e.g., 40-50). The method comprehensively detected false results from searches designed to produce incorrect identifications. Comparison of the tandem mass spectra of synthetic candidate peptides to the spectra obtained from the complex peptide mixtures confirmed the accuracy of the evaluation method. Thus, the evaluation approach described here could help boost the accuracy of protein identification, increase number of peptides identified, and provide a step toward developing a more accurate next-generation algorithm for protein identification.  相似文献   

13.
Comparing the gene-expression profiles of sick and healthy individuals can help in understanding disease. Such differential expression analysis is a well-established way to find gene sets whose expression is altered in the disease. Recent approaches to gene-expression analysis go a step further and seek differential co-expression patterns, wherein the level of co-expression of a set of genes differs markedly between disease and control samples. Such patterns can arise from a disease-related change in the regulatory mechanism governing that set of genes, and pinpoint dysfunctional regulatory networks.Here we present DICER, a new method for detecting differentially co-expressed gene sets using a novel probabilistic score for differential correlation. DICER goes beyond standard differential co-expression and detects pairs of modules showing differential co-expression. The expression profiles of genes within each module of the pair are correlated across all samples. The correlation between the two modules, however, differs markedly between the disease and normal samples.We show that DICER outperforms the state of the art in terms of significance and interpretability of the detected gene sets. Moreover, the gene sets discovered by DICER manifest regulation by disease-specific microRNA families. In a case study on Alzheimer''s disease, DICER dissected biological processes and protein complexes into functional subunits that are differentially co-expressed, thereby revealing inner structures in disease regulatory networks.  相似文献   

14.
The rapid development of microarray technologies has led to a similar progression in gene expression analysis methods, gene expression applications, and gene expression databases. Public gene expression databases enable any researcher to examine expression of their favorite genes across a wide variety of samples, download sample data for development of new analysis methods, or answer broad questions about gene expression regulation, among other applications. A wide variety of public gene expression databases exist, and they vary in their content, analysis capabilities, and ease of use. This review highlights the current features and describes examples of two broad categories of mammalian microarray databases: tissue gene expression databases and data warehouses.  相似文献   

15.
WRKY蛋白是一类在植物生长发育过程及生物与非生物胁迫过程中起重要调控作用的转录因子。该研究利用石榴全基因组数据,采用生物信息学的方法,对石榴WRKY转录因子家族成员蛋白理化性质、系统进化、基因结构、保守基序、顺式作用元件、蛋白互作及基因共表达和转录组表达模式进行系统分析。结果共鉴定出69个PgWRKY基因;分组鉴定和进化分析显示WRKY蛋白可分为Ⅰ、Ⅱ和Ⅲ共三大类型。顺式作用元件分析表明,PgWRKY基因广泛参与到非生物胁迫中;蛋白互作网络与共表达分析暗示PgWRKY基因在同一胁迫应答中可能作用一致并同时诱导表达;RNA-Seq数据分析表明,PgWRKY基因有一定的组织表达特异性,广泛参与植物营养、生殖生长以及根部逆境胁迫应答过程。  相似文献   

16.
MOTIVATION: Genome projects have produced large amounts of data on the sequences of new genes whose functions are as yet unknown. The functions of new genes are usually inferred by comparing their sequences with those of known genes, but evaluation of the sequence homology of individual genes does not make the most of the available sequence information. Therefore, new methods and tools for extracting more biological information from homology searches would be advantageous. RESULTS: We have developed a computational tool, ORI-GENE, to analyze the results of sequence homology searches from the perspective of the evolution of selected sets of new genes. ORI-GENE has a graphical interface and accomplishes two important tasks: first, based on the output of homology searches, it identifies species with similar genes and displays their pattern of distribution on the phylogenetic tree. This function enables one to infer the way in which a given gene may have propagated among species over time. Second, from the distribution patterns, it predicts the point at which a given gene may have been first acquired (i.e. its 'origin'), then classifies the gene on that basis. Because it makes use of available evolutionary information to show the way in which genes cluster among species, ORI-GENE should be an effective tool for the screening and classification of new genes revealed by genome analysis. AVAILABILITY: ORI-GENE is retrievable via the Internet at: http://www.rtc.riken.go.jp/jouhou/ORI-GENE.  相似文献   

17.
18.
19.
《Genomics》2020,112(5):3157-3165
Identifying genes involved in functional differences between similar tissues from expression profiles is challenging, because the expected differences in expression levels are small. To exemplify this challenge, we studied the expression profiles of two skeletal muscles, deltoid and biceps, in healthy individuals. We provide a series of guides and recommendations for the analysis of this type of studies. These include how to account for batch effects and inter-individual differences to optimize the detection of gene signatures associated with tissue function. We provide guidance on the selection of optimal settings for constructing gene co-expression networks through parameter sweeps of settings and calculation of the overlap with an established knowledge network. Our main recommendation is to use a combination of the data-driven approaches, such as differential gene expression analysis and gene co-expression network analysis, and hypothesis-driven approaches, such as gene set connectivity analysis. Accordingly, we detected differences in metabolic gene expression between deltoid and biceps that were supported by both data- and hypothesis-driven approaches. Finally, we provide a bioinformatic framework that support the biological interpretation of expression profiles from related tissues from this combination of approaches, which is available at github.com/tabbassidaloii/AnalysisFrameworkSimilarTissues.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号