首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 890 毫秒
1.
MOTIVATION: Genes are typically expressed in modular manners in biological processes. Recent studies reflect such features in analyzing gene expression patterns by directly scoring gene sets. Gene annotations have been used to define the gene sets, which have served to reveal specific biological themes from expression data. However, current annotations have limited analytical power, because they are classified by single categories providing only unary information for the gene sets. RESULTS: Here we propose a method for discovering composite biological themes from expression data. We intersected two annotated gene sets from different categories of Gene Ontology (GO). We then scored the expression changes of all the single and intersected sets. In this way, we were able to uncover, for example, a gene set with the molecular function F and the cellular component C that showed significant expression change, while the changes in individual gene sets were not significant. We provided an exemplary analysis for HIV-1 immune response. In addition, we tested the method on 20 public datasets where we found many 'filtered' composite terms the number of which reached approximately 34% (a strong criterion, 5% significance) of the number of significant unary terms on average. By using composite annotation, we can derive new and improved information about disease and biological processes from expression data. AVAILABILITY: We provide a web application (ADGO: http://array.kobic.re.kr/ADGO) for the analysis of differentially expressed gene sets with composite GO annotations. The user can analyze Affymetrix and dual channel array (spotted cDNA and spotted oligo microarray) data for four species: human, mouse, rat and yeast. CONTACT: chu@kribb.re.kr SUPPLEMENTARY INFORMATION: http://array.kobic.re.kr/ADGO.  相似文献   

2.
Elucidating the effects of genetic polymorphisms on genes and gene networks is an important step in disease association studies. We developed the SNP2NMD database for human SNPs (single nucleotide polymorphisms) that result in PTCs (premature termination codons) and trigger nonsense-mediated mRNA decay (NMD). The SNP2NMD Web interfaces provide extensive genetic information on and graphical views of the queried SNP, gene, and disease terms. Availability: SNP2NMD is available from http://variome.net, or directly from http://bioportal.kobic.re.kr/SNP2NMD. Supplementary information: http://bioportal.kobic.re.kr/SNP2NMD/Wiki.jsp?page=Statistics.  相似文献   

3.
SUMMARY: TO-GO is a Gene Ontology (GO) navigation tool, which is implemented as a Java application. After the initial data downloading, the GO term tree can be interactively navigated without further network transfer. Local annotation can be incorporated. It supports querying by GO terms or associated gene product information, displaying the result as a table or a sub-tree. The result from the search for a set of external database accessions includes the number of gene products associated with each node, inclusive of sub-nodes. Search results can be further processed by set operations and these set operations can be quite useful for expression profile data analysis. A copy/paste function is also implemented in order to facilitate data exchange between applications. AVAILABILITY: TO-GO is freely available at http://www.ngic.re.kr/togo/index.html CONTACT: ungsik@kribb.re.kr  相似文献   

4.
Zhang X  Huang S  Sun W  Wang W 《Genetics》2012,190(4):1511-1520
Genome-wide expression quantitative trait loci (eQTL) studies have emerged as a powerful tool to understand the genetic basis of gene expression and complex traits. In a typical eQTL study, the huge number of genetic markers and expression traits and their complicated correlations present a challenging multiple-testing correction problem. The resampling-based test using permutation or bootstrap procedures is a standard approach to address the multiple-testing problem in eQTL studies. A brute force application of the resampling-based test to large-scale eQTL data sets is often computationally infeasible. Several computationally efficient methods have been proposed to calculate approximate resampling-based P-values. However, these methods rely on certain assumptions about the correlation structure of the genetic markers, which may not be valid for certain studies. We propose a novel algorithm, rapid and exact multiple testing correction by resampling (REM), to address this challenge. REM calculates the exact resampling-based P-values in a computationally efficient manner. The computational advantage of REM lies in its strategy of pruning the search space by skipping genetic markers whose upper bounds on test statistics are small. REM does not rely on any assumption about the correlation structure of the genetic markers. It can be applied to a variety of resampling-based multiple-testing correction methods including permutation and bootstrap methods. We evaluate REM on three eQTL data sets (yeast, inbred mouse, and human rare variants) and show that it achieves accurate resampling-based P-value estimation with much less computational cost than existing methods. The software is available at http://csbio.unc.edu/eQTL.  相似文献   

5.
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature. Availability: GCAT is freely available at http://binf1.memphis.edu/gcat.  相似文献   

6.
MOTIVATION: Analysis of the functions of microorganisms and their dynamics in the environment is essential for understanding microbial ecology. For analysis of highly similar sequences of a functional gene family using microarrays, the previous long oligonucleotide probe design strategies have not been useful in generating probes. RESULTS: We developed a Hierarchical Probe Design (HPD) program that designs both sequence-specific probes and hierarchical cluster-specific probes from sequences of a conserved functional gene based on the clustering tree of the genes, specifically for analyses of functional gene diversity in environmental samples. HPD was tested on datasets for the nirS and pmoA genes. Our results showed that HPD generated more sequence-specific probes than several popular oligonucleotide design programs. With a combination of sequence-specific and cluster-specific probes, HPD generated a probe set covering all the sequences of each test set. AVAILABILITY: http://brcapp.kribb.re.kr/HPD/  相似文献   

7.
The ultimate goal of metagenome research projects is to understand the ecological roles and physiological functions of the microbial communities in a given natural environment. The 454 pyrosequencing platform produces the longest reads among the most widely used next generation sequencing platforms. Since the relatively longer reads of the 454 platform provide more information for identification of microbial sequences, this platform is dedicated to microbial community and population studies. In order to accurately perform the downstream analysis of the 454 multiplex datasets, it is necessary to remove artificially designed sequences located at either ends of individual reads and to correct low-quality sequences. We have developed a program called PyroTrimmer that removes the barcodes, linkers, and primers, trims sequence regions with low quality scores, and filters out low-quality sequence reads. Although these functions have previously been implemented in other programs as well, PyroTrimmer has novelty in terms of the following features: i) more sensitive primer detection using Levenstein distance and global pairwise alignment, ii) the first stand-alone software with a graphic user interface, and iii) various options for trimming and filtering out the low-quality sequence reads. PyroTrimmer, written in JAVA, is compatible with multiple operating systems and can be downloaded free at http://pyrotrimmer.kobic.re.kr.  相似文献   

8.
Analyzing gene expression data in terms of gene sets: methodological issues   总被引:3,自引:0,他引:3  
MOTIVATION: Many statistical tests have been proposed in recent years for analyzing gene expression data in terms of gene sets, usually from Gene Ontology. These methods are based on widely different methodological assumptions. Some approaches test differential expression of each gene set against differential expression of the rest of the genes, whereas others test each gene set on its own. Also, some methods are based on a model in which the genes are the sampling units, whereas others treat the subjects as the sampling units. This article aims to clarify the assumptions behind different approaches and to indicate a preferential methodology of gene set testing. RESULTS: We identify some crucial assumptions which are needed by the majority of methods. P-values derived from methods that use a model which takes the genes as the sampling unit are easily misinterpreted, as they are based on a statistical model that does not resemble the biological experiment actually performed. Furthermore, because these models are based on a crucial and unrealistic independence assumption between genes, the P-values derived from such methods can be wildly anti-conservative, as a simulation experiment shows. We also argue that methods that competitively test each gene set against the rest of the genes create an unnecessary rift between single gene testing and gene set testing.  相似文献   

9.
10.
Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.  相似文献   

11.
Gene set analysis methods are popular tools for identifying differentially expressed gene sets in microarray data. Most existing methods use a permutation test to assess significance for each gene set. The permutation test's assumption of exchangeable samples is often not satisfied for time‐series data and complex experimental designs, and in addition it requires a certain number of samples to compute p‐values accurately. The method presented here uses a rotation test rather than a permutation test to assess significance. The rotation test can compute accurate p‐values also for very small sample sizes. The method can handle complex designs and is particularly suited for longitudinal microarray data where the samples may have complex correlation structures. Dependencies between genes, modeled with the use of gene networks, are incorporated in the estimation of correlations between samples. In addition, the method can test for both gene sets that are differentially expressed and gene sets that show strong time trends. We show on simulated longitudinal data that the ability to identify important gene sets may be improved by taking the correlation structure between samples into account. Applied to real data, the method identifies both gene sets with constant expression and gene sets with strong time trends.  相似文献   

12.
13.
14.
Analyses of gene set differential coexpression may shed light on molecular mechanisms underlying phenotypes and diseases. However, differential coexpression analyses of conceptually similar individual studies are often inconsistent and underpowered to provide definitive results. Researchers can greatly benefit from an open-source application facilitating the aggregation of evidence of differential coexpression across studies and the estimation of more robust common effects. We developed Meta Gene Set Coexpression Analysis (MetaGSCA), an analytical tool to systematically assess differential coexpression of an a priori defined gene set by aggregating evidence across studies to provide a definitive result. In the kernel, a nonparametric approach that accounts for the gene-gene correlation structure is used to test whether the gene set is differentially coexpressed between two comparative conditions, from which a permutation test p-statistic is computed for each individual study. A meta-analysis is then performed to combine individual study results with one of two options: a random-intercept logistic regression model or the inverse variance method. We demonstrated MetaGSCA in case studies investigating two human diseases and identified pathways highly relevant to each disease across studies. We further applied MetaGSCA in a pan-cancer analysis with hundreds of major cellular pathways in 11 cancer types. The results indicated that a majority of the pathways identified were dysregulated in the pan-cancer scenario, many of which have been previously reported in the cancer literature. Our analysis with randomly generated gene sets showed excellent specificity, indicating that the significant pathways/gene sets identified by MetaGSCA are unlikely false positives. MetaGSCA is a user-friendly tool implemented in both forms of a Web-based application and an R package “MetaGSCA”. It enables comprehensive meta-analyses of gene set differential coexpression data, with an optional module of post hoc pathway crosstalk network analysis to identify and visualize pathways having similar coexpression profiles.  相似文献   

15.
16.
Identifying differential features between conditions is a popular approach to understanding molecular features and their mechanisms underlying a biological process of particular interest. Although many tests for identifying differential expression of gene or gene sets have been proposed, there was limited success in developing methods for differential interactions of genes between conditions because of its computational complexity. We present a method for Evaluation of Dependency DifferentialitY (EDDY), which is a statistical test for differential dependencies of a set of genes between two conditions. Unlike previous methods focused on differential expression of individual genes or correlation changes of individual gene–gene interactions, EDDY compares two conditions by evaluating the probability distributions of dependency networks from genes. The method has been evaluated and compared with other methods through simulation studies, and application to glioblastoma multiforme data resulted in informative cancer and glioblastoma multiforme subtype-related findings. The comparison with Gene Set Enrichment Analysis, a differential expression-based method, revealed that EDDY identifies the gene sets that are complementary to those identified by Gene Set Enrichment Analysis. EDDY also showed much lower false positives than Gene Set Co-expression Analysis, a method based on correlation changes of individual gene–gene interactions, thus providing more informative results. The Java implementation of the algorithm is freely available to noncommercial users. Download from: http://biocomputing.tgen.org/software/EDDY.  相似文献   

17.
Gene Set Expression Comparison kit for BRB-ArrayTools   总被引:1,自引:0,他引:1  
  相似文献   

18.
Sim J  Kim SY  Lee J 《Proteins》2005,59(3):627-632
Successful prediction of protein domain boundaries provides valuable information not only for the computational structure prediction of multidomain proteins but also for the experimental structure determination. Since protein sequences of multiple domains may contain much information regarding evolutionary processes such as gene-exon shuffling, this information can be detected by analyzing the position-specific scoring matrix (PSSM) generated by PSI-BLAST. We have presented a method, PPRODO (Prediction of PROtein DOmain boundaries) that predicts domain boundaries of proteins from sequence information by a neural network. The network is trained and tested using the values obtained from the PSSM generated by PSI-BLAST. A 10-fold cross-validation technique is performed to obtain the parameters of neural networks using a nonredundant set of 522 proteins containing 2 contiguous domains. PPRODO provides good and consistent results for the prediction of domain boundaries, with accuracy of about 66% using the +/-20 residue criterion. The PPRODO source code, as well as all data sets used in this work, are available from http://gene.kias.re.kr/ approximately jlee/pprodo/.  相似文献   

19.
WHAP: haplotype-based association analysis   总被引:7,自引:0,他引:7  
We describe a software tool to perform haplotype-based association analysis, for quantitative and qualitative traits, in population and family samples, using single nucleotide polymorphism or multiallelic marker data. A range of tests is offered: omnibus and haplotype-specific tests; prospective and retrospective likelihoods; covariates and moderators; sliding window analyses; permutation P-values. We focus on the ability to flexibly impose constraints on haplotype effects, which allows for a range of conditional haplotype-based likelihood ratio tests: for example, whether an allele has an effect independent of its haplotypic background, or whether a single variant can explain the overall association at a locus. We illustrate using these tests to dissect a multi-locus association. AVAILABILITY: WHAP is a C/C++ program, freely available from the author's website: http://pngu.mgh.harvard.edu/purcell/whap/  相似文献   

20.
While meta-analysis provides a powerful tool for analyzing microarray experiments by combining data from multiple studies, it presents unique computational challenges. The Bioconductor package RankProd provides a new and intuitive tool for this purpose in detecting differentially expressed genes under two experimental conditions. The package modifies and extends the rank product method proposed by Breitling et al., [(2004) FEBS Lett., 573, 83-92] to integrate multiple microarray studies from different laboratories and/or platforms. It offers several advantages over t-test based methods and accepts pre-processed expression datasets produced from a wide variety of platforms. The significance of the detection is assessed by a non-parametric permutation test, and the associated P-value and false discovery rate (FDR) are included in the output alongside the genes that are detected by user-defined criteria. A visualization plot is provided to view actual expression levels for each gene with estimated significance measurements. AVAILABILITY: RankProd is available at Bioconductor http://www.bioconductor.org. A web-based interface will soon be available at http://cactus.salk.edu/RankProd  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号