首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.

Background  

Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results.  相似文献   

2.
For the analysis of complex polygenic diseases, one does not expect all patients to share the same disease-associated alleles. Not even will disease-causing variations be assigned to the identical sets of genes between patients. However, one does expect overlaps in the sets of genes that are involved and even more so in their assigned molecular processes. Furthermore, the assignment of single nucleotide polymorphisms (SNPs) to genes is highly ambiguous for intergenic SNPs. The tool presented here hence adds external information, i.e. GeneOntology (GO) terms (Gene Ontology Consortium), to the analysis of SNP data. AVAILABILITY: A web interface and source code are offered at https://webtools.imbs.uni-luebeck.de/snptogo  相似文献   

3.

Background

Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool.

Methods

We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems.

Results

We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity.

Conclusions

Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex).
  相似文献   

4.
We consider the problem of finding the set of rankings that best represents a given group of orderings on the same collection of elements (preference lists). This problem arises from social choice and voting theory, in which each voter gives a preference on a set of alternatives, and a system outputs a single preference order based on the observed voters' preferences. In this paper, we observe that, if the given set of preference lists is not homogeneous, a unique true underling ranking might not exist. Moreover only the lists that share the highest amount of information should be aggregated, and thus multiple rankings might provide a more feasible solution to the problem. In this light, we propose Network Selection, an algorithm that, given a heterogeneous group of rankings, first discovers the different communities of homogeneous rankings and then combines only the rank orderings belonging to the same community into a single final ordering. Our novel approach is inspired by graph theory; indeed our set of lists can be loosely read as the nodes of a network. As a consequence, only the lists populating the same community in the network would then be aggregated. In order to highlight the strength of our proposal, we show an application both on simulated and on two real datasets, namely a financial and a biological dataset. Experimental results on simulated data show that Network Selection can significantly outperform existing related methods. The other way around, the empirical evidence achieved on real financial data reveals that Network Selection is also able to select the most relevant variables in data mining predictive models, providing a clear superiority in terms of predictive power of the models built. Furthermore, we show the potentiality of our proposal in the bioinformatics field, providing an application to a biological microarray dataset.  相似文献   

5.
MOTIVATION: Functional analyses based on the association of Gene Ontology (GO) terms to genes in a selected gene list are useful bioinformatic tools and the GOstats package has been widely used to perform such computations. In this paper we report significant improvements and extensions such as support for conditional testing. RESULTS: We discuss the capabilities of GOstats, a Bioconductor package written in R, that allows users to test GO terms for over or under-representation using either a classical hypergeometric test or a conditional hypergeometric that uses the relationships among GO terms to decorrelate the results. AVAILABILITY: GOstats is available as an R package from the Bioconductor project: http://bioconductor.org  相似文献   

6.

Background  

The ever-expanding population of gene expression profiles (EPs) from specified cells and tissues under a variety of experimental conditions is an important but difficult resource for investigators to utilize effectively. Software tools have been recently developed to use the distribution of gene ontology (GO) terms associated with the genes in an EP to identify specific biological functions or processes that are over- or under-represented in that EP relative to other EPs. Additionally, it is possible to use the distribution of GO terms inherent to each EP to relate that EP as a whole to other EPs. Because GO term annotation is organized in a tree-like cascade of variable granularity, this approach allows the user to relate (e.g., by hierarchical clustering) EPs of varying length and from different platforms (e.g., GeneChip, SAGE, EST library).  相似文献   

7.
SUMMARY: We present here Blast2GO (B2G), a research tool designed with the main purpose of enabling Gene Ontology (GO) based data mining on sequence data for which no GO annotation is yet available. B2G joints in one application GO annotation based on similarity searches with statistical analysis and highlighted visualization on directed acyclic graphs. This tool offers a suitable platform for functional genomics research in non-model species. B2G is an intuitive and interactive desktop application that allows monitoring and comprehension of the whole annotation and analysis process. AVAILABILITY: Blast2GO is freely available via Java Web Start at http://www.blast2go.de. SUPPLEMENTARY MATERIAL: http://www.blast2go.de -> Evaluation.  相似文献   

8.
Activation tagging in plants: a tool for gene discovery   总被引:8,自引:0,他引:8  
A significant limitation of classical loss-of-function screens designed to dissect genetic pathways is that they rarely uncover genes that function redundantly, are compensated by alternative metabolic or regulatory circuits, or which have an additional role in early embryo or gametophyte development. Activation T-DNA tagging is one approach that has emerged in plants to help circumvent these potential problems. This technique utilises a T-DNA sequence that contains four tandem copies of the cauliflower mosaic virus (CaMV) 35S enhancer sequence. This element enhances the expression of neighbouring genes either side of the randomly integrated T-DNA tag, resulting in gain-of-function phenotypes. Activation tagging has identified a number of genes fundamental to plant development, metabolism and disease resistance in Arabidopsis. This review provides selected examples of these discoveries to highlight the utility of this technology. The recent development of activation tagging strategies for other model plant systems and the construction of new more sophisticated vectors for the generation of conditional alleles are also discussed. These recent advances have significantly expanded the horizons for gain-of-function genetics in plants.  相似文献   

9.

Background  

Composition Profiler is a web-based tool for semi-automatic discovery of enrichment or depletion of amino acids, either individually or grouped by their physico-chemical or structural properties.  相似文献   

10.

Background

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task.

Results

We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach “CODAC” (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe “GODomainMiner” for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively.

Conclusions

These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.
  相似文献   

11.
We present GENECODIS, a web-based tool that integrates different sources of information to search for annotations that frequently co-occur in a set of genes and rank them by statistical significance. The analysis of concurrent annotations provides significant information for the biologic interpretation of high-throughput experiments and may outperform the results of standard methods for the functional analysis of gene lists. GENECODIS is publicly available at .  相似文献   

12.
13.
MOTIVATION: We propose a method for studying the stability of biomarker lists obtained from functional genomics studies. It is common to adopt resampling methods to tune and evaluate marker-based diagnostic and prognostic systems in order to prevent selection bias. Such caution promotes honest estimation of class prediction, but leads to alternative sets of solutions. In microarray studies, the difference in lists may be bewildering, also due to the presence of modules of functionally related genes. Methods for assessing stability understand the dependency of the markers on the data or on the predictor's type and help selecting solutions. RESULTS: A computational framework for comparing sets of ranked biomarker lists is presented. Notions and algorithms are based on concepts from permutation group theory. We introduce several algebraic indicators and metric methods for symmetric groups, including the Canberra distance, a weighted version of Spearman's footrule. We also consider distances between partial lists and an aggregation of sets of lists into an optimal list based on voting theory (Borda count). The stability indicators are applied in practical situations to several synthetic, cancer microarray and proteomics datasets. The addressed issues are predictive classification, presence of modules, comparison of alternative biomarker lists, outlier removal, control of selection bias by randomization techniques and enrichment analysis. AVAILABILITY: Supplementary Material and software are available at the address http://biodcv.fbk.eu/listspy.html  相似文献   

14.
Comparative genomics as a tool for gene discovery   总被引:1,自引:0,他引:1  
With the increasing availability of data from multiple eukaryotic genome sequencing projects, attention has focused on interspecific comparisons to discover novel genes and transcribed genomic sequences. Generally, these extrinsic strategies combine ab initio gene prediction with expression and/or homology data to identify conserved gene candidates between two or more genomes. Interspecific sequence analyses have proven invaluable for the improvement of existing annotations, automation of annotation, and identification of novel coding regions and splice variants. Further, comparative genomic approaches hold the promise of improved prediction of terminal or small exons, microRNA precursors, and small peptide-encoding open reading frames--sequence elements that are difficult to identify through purely intrinsic methodologies in the absence of experimental data.  相似文献   

15.
Among new insights coming from the completion of sequencing of the human genome, reported in Nature and Science, are clues of how evolution has increased the complexity of species, and in particular how the genetic code has enabled this process. It is clear that life has not only evolved by increasing the number of genes, but also by ingeniously evolving an efficient code for expressing diversity in the building blocks (i.e. the amino acids). The rules of nucleic acid base pairing and the classification of amino acids according to hydrophobicity/hydrophilicity relationships define a binary DNA code, which determines the general biophysical characteristics of proteins. Sense and antisense strands can encode protein segments having inverted and complementary hydropathy. The underlying binary code controls association and dissociation of proteins and presumably represents a primordial code that might have emerged in the early stages of self-organizing biochemical cycles. It is the purpose of this communication to provide a perspective of the code in the context of a binary language from its primordial origin to its present day format and to propose to use this code as a genomic mining tool.  相似文献   

16.
17.
REVIGO summarizes and visualizes long lists of gene ontology terms   总被引:1,自引:0,他引:1  
Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret.REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.  相似文献   

18.
19.
BisoGenet: a new tool for gene network building,visualization and analysis   总被引:1,自引:0,他引:1  

Background  

The increasing availability and diversity of omics data in the post-genomic era offers new perspectives in most areas of biomedical research. Graph-based biological networks models capture the topology of the functional relationships between molecular entities such as gene, protein and small compounds and provide a suitable framework for integrating and analyzing omics-data. The development of software tools capable of integrating data from different sources and to provide flexible methods to reconstruct, represent and analyze topological networks is an active field of research in bioinformatics.  相似文献   

20.
Admixture mapping is a rapidly developing method to map susceptibility alleles in complex genetic disease associated with continental ancestry. Theoretically, when admixture between continental populations has occurred relatively recently, the chromosomal segments derived from the parental populations can be deduced from the differences in genotype allele frequencies. Progress in computational algorithms, in identification of ancestry informative single nucleotide polymorphisms, and in recent studies applying these tools suggests that this approach will complement other strategies for identifying the variation that underlies many complex diseases.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号