共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Poultney CS Gutiérrez RA Katari MS Gifford ML Paley WB Coruzzi GM Shasha DE 《Bioinformatics (Oxford, England)》2007,23(2):259-261
Sungear is a software system that supports a rapid, visually interactive and biologist-driven comparison of large datasets. The datasets can come from microarray experiments (e.g. genes induced in each experiment), from comparative genomics (e.g. genes present in each genome) or even from non-biological applications (e.g. demographics or baseball statistics). Sungear represents multiple datasets as vertices in a polygon. Each possible intersection among the sets is represented as a circle inside the polygon. The position of the circle is determined by the position of the vertices represented in the intersection and the area of the circle is determined by the number of elements in the intersection. Sungear shows which Gene Ontology terms are over-represented in a subset of circles or anchors. The intuitive Sungear interface has enabled biologists to determine quickly which dataset or groups of datasets play a role in a biological function of interest. AVAILABILITY: A live online version of Sungear can be found at http://virtualplant-prod.bio.nyu.edu/cgi-bin/sungear/index.cgi 相似文献
3.
Background
Alignment and comparison of related genome sequences is a powerful method to identify regions likely to contain functional elements. Such analyses are data intensive, requiring the inclusion of genomic multiple sequence alignments, sequence annotations, and scores describing regional attributes of columns in the alignment. Visualization and browsing of results can be difficult, and there are currently limited software options for performing this task. 相似文献4.
Robertson N Oveisi-Fordorei M Zuyderduyn SD Varhol RJ Fjell C Marra M Jones S Siddiqui A 《Genome biology》2007,8(1):R6
DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. 相似文献
5.
Small genome sequencing and annotations are leading to the definition of metabolic genotypes in an increasing number of organisms. Proteomics is beginning to give insights into the use of the metabolic genotype under given growth conditions. These data sets give the basis for systemically studying the genotype-phenotype relationship. Methods of systems science need to be employed to analyze, interpret, and predict this complex relationship. These endeavors will lead to the development of a new field, tentatively named phenomics. This article illustrates how the metabolic characteristics of annotated small genomes can be analyzed using flux balance analysis (FBA). A general algorithm for the formulation of in silico metabolic genotypes is described. Illustrative analyses of the in silico Escherichia coli K-12 metabolic genotypes are used to show how FBA can be used to study the capabilities of this strain. 相似文献
6.
Background
The Distributed Annotation System (DAS) offers a standard protocol for sharing and integrating annotations on biological sequences. There are more than 1000 DAS sources available and the number is steadily increasing. Clients are an essential part of the DAS system and integrate data from several independent sources in order to create a useful representation to the user. While web-based DAS clients exist, most of them do not have direct interaction capabilities such as dragging and zooming with the mouse.Results
Here we present GenExp, a web based and fully interactive visual DAS client. GenExp is a genome oriented DAS client capable of creating informative representations of genomic data zooming out from base level to complete chromosomes. It proposes a novel approach to genomic data rendering and uses the latest HTML5 web technologies to create the data representation inside the client browser. Thanks to client-side rendering most position changes do not need a network request to the server and so responses to zooming and panning are almost immediate. In GenExp it is possible to explore the genome intuitively moving it with the mouse just like geographical map applications. Additionally, in GenExp it is possible to have more than one data viewer at the same time and to save the current state of the application to revisit it later on.Conclusions
GenExp is a new interactive web-based client for DAS and addresses some of the short-comings of the existing clients. It uses client-side data rendering techniques resulting in easier genome browsing and exploration. GenExp is open source under the GPL license and it is freely available at http://gralggen.lsi.upc.edu/recerca/genexp. 相似文献7.
Background
Improvements in technology have been accompanied by the generation of large amounts of complex data. This same technology must be harnessed effectively if the knowledge stored within the data is to be retrieved. Storing data in ontologies aids its management; ontologies serve as controlled vocabularies that promote data exchange and re-use, improving analysis. 相似文献8.
An interactive multivariate analysis of FCM data 总被引:1,自引:0,他引:1
The procedure and results of the interactive multivariate analysis of FCM data are described. Using principal-components analysis, cluster analysis, and interactive maneuvers, this procedure facilitates an effective data compression from a four-dimensional space into two-dimensional space, then allows cluster separation. The procedure is especially effective for separating clusters, which are degenerated in the usual scattergrams. Programs were mostly written in C language on MS-DOS and were tested on four-dimensional analysis of the blood cells, which resulted in a successful separation of the degenerated clusters. 相似文献
9.
10.
The interpretation of microarray and other high-throughput data is highly dependent on the biological context of experiments. However, standard analysis packages are poor at simultaneously presenting both the array and related bioinformatic data. We have addressed this challenge by developing a system springScape based on 'spring embedding' and an 'information landscape' allowing several related data sources to be dynamically combined while highlighting one particular feature. Each data source is represented as a network of nodes connected by weighted edges. The networks are combined and embedded in the 2-D plane by spring embedding such that nodes with a high similarity are drawn close together. Complex relationships can be discovered by varying the weight of each data source and observing the dynamic response of the spring network. By modifying Procrustes analysis, we find that the visualizations have an acceptable degree of reproducibility. The 'information landscape' highlights one particular data source, displaying it as a smooth surface whose height is proportional to both the information being viewed and the density of nodes. The algorithm is demonstrated using several microarray data sets in combination with protein-protein interaction data and GO annotations. Among the features revealed are the spatio-temporal profile of gene expression and the identification of GO terms correlated with gene expression and protein interactions. The power of this combined display lies in its interactive feedback and exploitation of human visual pattern recognition. Overall, springScape shows promise as a tool for the interpretation of microarray data in the context of relevant bioinformatic information. 相似文献
11.
Random forests for genomic data analysis 总被引:1,自引:0,他引:1
Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning. 相似文献
12.
Background
When publishing large-scale microarray datasets, it is of great value to create supplemental websites where either the full data, or selected subsets corresponding to figures within the paper, can be browsed. We set out to create a CGI application containing many of the features of some of the existing standalone software for the visualization of clustered microarray data.Results
We present GeneXplorer, a web application for interactive microarray data visualization and analysis in a web environment. GeneXplorer allows users to browse a microarray dataset in an intuitive fashion. It provides simple access to microarray data over the Internet and uses only HTML and JavaScript to display graphic and annotation information. It provides radar and zoom views of the data, allows display of the nearest neighbors to a gene expression vector based on their Pearson correlations and provides the ability to search gene annotation fields.Conclusions
The software is released under the permissive MIT Open Source license, and the complete documentation and the entire source code are freely available for download from CPAN http://search.cpan.org/dist/Microarray-GeneXplorer/.13.
Understanding the evolution of gene function is a primary challenge of modern evolutionary biology. Despite an expanding database from genomic and developmental studies, we are lacking quantitative methods for analyzing the evolution of some important measures of gene function, such as gene-expression patterns. Here, we introduce phylogenetic comparative methods to compare different models of gene-expression evolution in a maximum-likelihood framework. We find that expression of duplicated genes has evolved according to a nonphylogenetic model, where closely related genes are no more likely than more distantly related genes to share common expression patterns. These results are consistent with previous studies that found rapid evolution of gene expression during the history of yeast. The comparative methods presented here are general enough to test a wide range of evolutionary hypotheses using genomic-scale data from any organism. 相似文献
14.
We introduce and evaluate data analysis methods to interpret simultaneous measurement of multiple genomic features made on
the same biological samples. Our tools use gene sets to provide an interpretable common scale for diverse genomic information.
We show we can detect genetic effects, although they may act through different mechanisms in different samples, and show we
can discover and validate important disease-related gene sets that would not be discovered by analyzing each data type individually. 相似文献
15.
Increasing use of high throughput genomic scale assays requires effective visualization and analysis techniques to facilitate data interpretation. Moreover, existing tools often require programming skills, which discourages bench scientists from examining their own data. We have created iCanPlot, a compelling platform for visual data exploration based on the latest technologies. Using the recently adopted HTML5 Canvas element, we have developed a highly interactive tool to visualize tabular data and identify interesting patterns in an intuitive fashion without the need of any specialized computing skills. A module for geneset overlap analysis has been implemented on the Google App Engine platform: when the user selects a region of interest in the plot, the genes in the region are analyzed on the fly. The visualization and analysis are amalgamated for a seamless experience. Further, users can easily upload their data for analysis--which also makes it simple to share the analysis with collaborators. We illustrate the power of iCanPlot by showing an example of how it can be used to interpret histone modifications in the context of gene expression. 相似文献
16.
Waters KM Liu T Quesenberry RD Willse AR Bandyopadhyay S Kathmann LE Weber TJ Smith RD Wiley HS Thrall BD 《PloS one》2012,7(3):e34515
To understand how integration of multiple data types can help decipher cellular responses at the systems level, we analyzed the mitogenic response of human mammary epithelial cells to epidermal growth factor (EGF) using whole genome microarrays, mass spectrometry-based proteomics and large-scale western blots with over 1000 antibodies. A time course analysis revealed significant differences in the expression of 3172 genes and 596 proteins, including protein phosphorylation changes measured by western blot. Integration of these disparate data types showed that each contributed qualitatively different components to the observed cell response to EGF and that varying degrees of concordance in gene expression and protein abundance measurements could be linked to specific biological processes. Networks inferred from individual data types were relatively limited, whereas networks derived from the integrated data recapitulated the known major cellular responses to EGF and exhibited more highly connected signaling nodes than networks derived from any individual dataset. While cell cycle regulatory pathways were altered as anticipated, we found the most robust response to mitogenic concentrations of EGF was induction of matrix metalloprotease cascades, highlighting the importance of the EGFR system as a regulator of the extracellular environment. These results demonstrate the value of integrating multiple levels of biological information to more accurately reconstruct networks of cellular response. 相似文献
17.
MOTIVATION: Genome sequencing projects and high-through-put technologies like DNA and Protein arrays have resulted in a very large amount of information-rich data. Microarray experimental data are a valuable, but limited source for inferring gene regulation mechanisms on a genomic scale. Additional information such as promoter sequences of genes/DNA binding motifs, gene ontologies, and location data, when combined with gene expression analysis can increase the statistical significance of the finding. This paper introduces a machine learning approach to information fusion for combining heterogeneous genomic data. The algorithm uses an unsupervised joint learning mechanism that identifies clusters of genes using the combined data. RESULTS: The correlation between gene expression time-series patterns obtained from different experimental conditions and the presence of several distinct and repeated motifs in their upstream sequences is examined here using publicly available yeast cell-cycle data. The results show that the combined learning approach taken here identifies correlated genes effectively. The algorithm provides an automated clustering method, but allows the user to specify apriori the influence of each data type on the final clustering using probabilities. AVAILABILITY: Software code is available by request from the first author. CONTACT: jkasturi@cse.psu.edu. 相似文献
18.
Diego Jarquín José Crossa Xavier Lacaze Philippe Du Cheyron Joëlle Daucourt Josiane Lorgeou François Piraux Laurent Guerreiro Paulino Pérez Mario Calus Juan Burgueño Gustavo de los Campos 《TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik》2014,127(3):595-607
Key message
New methods that incorporate the main and interaction effects of high-dimensional markers and of high-dimensional environmental covariates gave increased prediction accuracy of grain yield in wheat across and within environments.Abstract
In most agricultural crops the effects of genes on traits are modulated by environmental conditions, leading to genetic by environmental interaction (G × E). Modern genotyping technologies allow characterizing genomes in great detail and modern information systems can generate large volumes of environmental data. In principle, G × E can be accounted for using interactions between markers and environmental covariates (ECs). However, when genotypic and environmental information is high dimensional, modeling all possible interactions explicitly becomes infeasible. In this article we show how to model interactions between high-dimensional sets of markers and ECs using covariance functions. The model presented here consists of (random) reaction norm where the genetic and environmental gradients are described as linear functions of markers and of ECs, respectively. We assessed the proposed method using data from Arvalis, consisting of 139 wheat lines genotyped with 2,395 SNPs and evaluated for grain yield over 8 years and various locations within northern France. A total of 68 ECs, defined based on five phases of the phenology of the crop, were used in the analysis. Interaction terms accounted for a sizable proportion (16 %) of the within-environment yield variance, and the prediction accuracy of models including interaction terms was substantially higher (17–34 %) than that of models based on main effects only. Breeding for target environmental conditions has become a central priority of most breeding programs. Methods, like the one presented here, that can capitalize upon the wealth of genomic and environmental information available, will become increasingly important. 相似文献19.
Kivioja T Arvas M Saloheimo M Penttilä M Ukkonen E 《Bioinformatics (Oxford, England)》2005,21(11):2573-2579
20.
《Expert review of proteomics》2013,10(1):67-75
The rapid expansion of methods for measuring biological data ranging from DNA sequence variations to mRNA expression and protein abundance presents the opportunity to utilize multiple types of information jointly in the study of human health and disease. Organisms are complex systems that integrate inputs at myriad levels to arrive at an observable phenotype. Therefore, it is essential that questions concerning the etiology of phenotypes as complex as common human diseases take the systemic nature of biology into account, and integrate the information provided by each data type in a manner analogous to the operation of the body itself. While limited in scope, the initial forays into the joint analysis of multiple data types have yielded interesting results that would not have been reached had only one type of data been considered. These early successes, along with the aforementioned theoretical appeal of data integration, provide impetus for the development of methods for the parallel, high-throughput analysis of multiple data types. The idea that the integrated analysis of multiple data types will improve the identification of biomarkers of clinical endpoints, such as disease susceptibility, is presented as a working hypothesis. 相似文献