首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Many important questions in biology are, fundamentally, comparative, and this extends to our analysis of a growing number of sequenced genomes. Existing genomic analysis tools are often organized around literal views of genomes as linear strings. Even when information is highly condensed, these views grow cumbersome as larger numbers of genomes are added. Data aggregation and summarization methods from the field of visual analytics can provide abstracted comparative views, suitable for sifting large multi-genome datasets to identify critical similarities and differences. We introduce a software system for visual analysis of comparative genomics data. The system automates the process of data integration, and provides the analysis platform to identify and explore features of interest within these large datasets. GenoSets borrows techniques from business intelligence and visual analytics to provide a rich interface of interactive visualizations supported by a multi-dimensional data warehouse. In GenoSets, visual analytic approaches are used to enable querying based on orthology, functional assignment, and taxonomic or user-defined groupings of genomes. GenoSets links this information together with coordinated, interactive visualizations for both detailed and high-level categorical analysis of summarized data. GenoSets has been designed to simplify the exploration of multiple genome datasets and to facilitate reasoning about genomic comparisons. Case examples are included showing the use of this system in the analysis of 12 Brucella genomes. GenoSets software and the case study dataset are freely available at http://genosets.uncc.edu. We demonstrate that the integration of genomic data using a coordinated multiple view approach can simplify the exploration of large comparative genomic data sets, and facilitate reasoning about comparisons and features of interest.  相似文献   

2.
3.
Modern technologies have rapidly transformed biology into a data-intensive discipline. In addition to the enormous amounts of existing experimental data in the literature, every new study can produce a large amount of new data, resulting in novel ideas and more publications. In order to understand a biological process as completely as possible, scientists should be able to combine and analyze all such information. Not only molecular biology and bioinformatics, but all the other domains of biology including plant biology, require tools and technologies that enable experts to capture knowledge within distributed and heterogeneous sources of information. Ontologies have proven to be one of the most-useful means of constructing and formalizing expert knowledge. The key feature of an ontology is that it represents a computer-interpretable model of a particular subject area. This article outlines the importance of ontologies for systems biology, data integration and information analyses, as illustrated through the example of reactive oxygen species (ROS) signaling networks in plants.  相似文献   

4.
A prerequisite to systems biology is the integration of heterogeneous experimental data, which are stored in numerous life-science databases. However, a wide range of obstacles that relate to access, handling and integration impede the efficient use of the contents of these databases. Addressing these issues will not only be essential for progress in systems biology, it will also be crucial for sustaining the more traditional uses of life-science databases.  相似文献   

5.
6.
Bornaviruses are the only animal RNA viruses that establish a persistent infection in their host cell nucleus. Studies of bornaviruses have provided unique information about viral replication strategies and virus–host interactions. Although bornaviruses do not integrate into the host genome during their replication cycle, we and others have recently reported that there are DNA sequences derived from the mRNAs of ancient bornaviruses in the genomes of vertebrates, including humans, and these have been designated endogenous borna-like (EBL) elements. Therefore, bornaviruses have been interacting with their hosts as driving forces in the evolution of host genomes in a previously unexpected way. Studies of EBL elements have provided new models for virology, evolutionary biology and general cell biology. In this review, we summarize the data on EBL elements including what we have newly identified in eukaryotes genomes, and discuss the biological significance of EBL elements, with a focus on EBL nucleoprotein elements in mammalian genomes. Surprisingly, EBL elements were detected in the genomes of invertebrates, suggesting that the host range of bornaviruses may be much wider than previously thought. We also review our new data on non-retroviral integration of Borna disease virus.  相似文献   

7.
The functioning of even a simple biological system is much more complicated than the sum of its genes, proteins and metabolites. A premise of systems biology is that molecular profiling will facilitate the discovery and characterization of important disease pathways. However, as multiple levels of effector pathway regulation appear to be the norm rather than the exception, a significant challenge presented by high-throughput genomics and proteomics technologies is the extraction of the biological implications of complex data. Thus, integration of heterogeneous types of data generated from diverse global technology platforms represents the first challenge in developing the necessary foundational databases needed for predictive modelling of cell and tissue responses. Given the apparent difficulty in defining the correspondence between gene expression and protein abundance measured in several systems to date, how do we make sense of these data and design the next experiment? In this review, we highlight current approaches and challenges associated with integration and analysis of heterogeneous data sets, focusing on global analysis obtained from high-throughput technologies.  相似文献   

8.
9.
Adeno-associated virus (AAV) replication and biology have been extensively studied using cell culture systems, but there is precious little known about AAV biology in natural hosts. As part of our ongoing interest in the in vivo biology of AAV, we previously described the existence of extrachromosomal proviral AAV genomes in human tissues. In the current work, we describe the molecular structure of infectious DNA clones derived directly from these tissues. Sequence-specific linear rolling-circle amplification was utilized to isolate clones of native circular AAV DNA. Several molecular clones containing unit-length viral genomes directed the production of infectious wild-type AAV upon DNA transfection in the presence of adenovirus help. DNA sequence analysis of the molecular clones revealed the ubiquitous presence of a double-D inverted terminal repeat (ITR) structure, which implied a mechanism by which the virus is able to maintain ITR sequence continuity and persist in the absence of host chromosome integration. These data suggest that the natural life cycle of AAV, unlike that of retroviruses, might not have genome integration as an obligatory component.  相似文献   

10.

Background  

The integration of genomic information with quantitative experimental data is a key component of systems biology. An increasing number of microbial genomes are being sequenced, leading to an increasing amount of data from post-genomics technologies. The genomes of prokaryotes contain many structures of interest, such as operons, pathogenicity islands and prophage sequences, whose behaviour is of interest during infection and disease. There is a need for simple and novel tools to display and analyse data from these integrated datasets, and we have developed ProGenExpress as a tool for visualising arbitrarily complex numerical data in the context of prokaryotic genomes.  相似文献   

11.

Background

The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes.

Results

We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments.

Conclusions

We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.  相似文献   

12.
The understanding of the biology and the biochemistry of malaria parasites has considerably increased over the past two decades with the discovery of many potential targets for new antimalarial drugs. The decrypted genomes of several Plasmodium species and the new post-genomic tools further enriched our "reservoir" of targets and increased our ability to validate potential drug targets or to study the entire parasite metabolism. This review discusses targets involved in calcium metabolism, protein prenylation and apicoplast functions that have emerged by different approaches.  相似文献   

13.
14.
MOTIVATION: Natural language processing (NLP) techniques are increasingly being used in biology to automate the capture of new biological discoveries in text, which are being reported at a rapid rate. Yet, information represented in NLP data structures is classically very different from information organized with ontologies as found in model organisms or genetic databases. To facilitate the computational reuse and integration of information buried in unstructured text with that of genetic databases, we propose and evaluate a translational schema that represents a comprehensive set of phenotypic and genetic entities, as well as their closely related biomedical entities and relations as expressed in natural language. In addition, the schema connects different scales of biological information, and provides mappings from the textual information to existing ontologies, which are essential in biology for integration, organization, dissemination and knowledge management of heterogeneous phenotypic information. A common comprehensive representation for otherwise heterogeneous phenotypic and genetic datasets, such as the one proposed, is critical for advancing systems biology because it enables acquisition and reuse of unprecedented volumes of diverse types of knowledge and information from text. RESULTS: A novel representational schema, PGschema, was developed that enables translation of phenotypic, genetic and their closely related information found in textual narratives to a well-defined data structure comprising phenotypic and genetic concepts from established ontologies along with modifiers and relationships. Evaluation for coverage of a selected set of entities showed that 90% of the information could be represented (95% confidence interval: 86-93%; n = 268). Moreover, PGschema can be expressed automatically in an XML format using natural language techniques to process the text. To our knowledge, we are providing the first evaluation of a translational schema for NLP that contains declarative knowledge about genes and their associated biomedical data (e.g. phenotypes). AVAILABILITY: http://zellig.cpmc.columbia.edu/PGschema  相似文献   

15.
The sequencing of various genomes has inaugurated a new stage in the understanding of normal and pathological cell function through the analysis of the role of proteins. Proteins, after all, that intervene in the different molecular mechanisms of life, during growth, reproduction, and in the interaction between cells, thus making it possible to describe the biology of integrated systems. In this article, we briefly describe the various stages in the progression of our knowledge, from the genome to the "functional" proteome. Emphasis is placed on a global approach to the protein-protein interactions used to describe the cellular "interactome".  相似文献   

16.

Background

Synthetic biology aims to engineer biological systems for desired behaviors. The construction of these systems can be complex, often requiring genetic reprogramming, extensive de novo DNA synthesis, and functional screening.

Results

Herein, we present a programmable, multipurpose microfluidic platform and associated software and apply the platform to major steps of the synthetic biology research cycle: design, construction, testing, and analysis. We show the platform’s capabilities for multiple automated DNA assembly methods, including a new method for Isothermal Hierarchical DNA Construction, and for Escherichia coli and Saccharomyces cerevisiae transformation. The platform enables the automated control of cellular growth, gene expression induction, and proteogenic and metabolic output analysis.

Conclusions

Taken together, we demonstrate the microfluidic platform’s potential to provide end-to-end solutions for synthetic biology research, from design to functional analysis.
  相似文献   

17.
Junqueira M  Carvalho PC 《Proteomics》2012,12(17):2601-2606
Our current knowledge in biology has been mostly derived from studying model organisms and cell lines in which only a small fraction of all described species have been extensively studied. Although these model organisms are amenable to genetic manipulations, this blinds researchers to the true variability of life. Groundbreaking discoveries are often achieved by analyzing "noncanonical" species; for example, the characterization of Taq polymerase from Thermus aquaticus ultimately led to a revolution in the field of molecular biology. Brazil possesses a rich biodiversity and a considerable fraction of Brazilian groups use current proteomic techniques to explore this natural treasure-trove. However, in our opinion, much more than the widely adopted peptide spectrum match approach is required to explore this rich "proteomosphere." Here, we provide a critical overview of the available strategies for the analysis of proteomic data from "noncanonical" biological samples (e.g. proteins from unsequenced genomes or genomes with high levels of polymorphisms), and demonstrate some limitations of existing approaches for large-scale protein identification and quantitation. An understanding of the premises behind these computational tools is necessary to properly deal with their limitations and draw accurate conclusions.  相似文献   

18.
合成生物学是一门21世纪生物学的新兴学科,它着眼生物科学与工程科学的结合,把生物系统当作工程系统"从下往上"进行处理,由"单元"(unit)到"部件"(device)再到"系统"(system)来设计,修改和组装细胞构件及生物系统.合成生物学是分子和细胞生物学、进化系统学、生物化学、信息学、数学、计算机和工程等多学科交叉的产物.目前研究应用包括两个主要方面:一是通过对现有的、天然存在的生物系统进行重新设计和改造,修改已存在的生物系统,使该系统增添新的功能.二是通过设计和构建新的生物零件、组件和系统,创造自然界中尚不存在的人工生命系统.合成生物学作为一门建立在基因组方法之上的学科,主要强调对创造人工生命形态的计算生物学与实验生物学的协同整合.必须强调的是,用来构建生命系统新结构、产生新功能所使用的组件单元既可以是基因、核酸等生物组件,也可以是化学的、机械的和物理的元件.本文跟踪合成生物学研究及应用,对其在DNA水平编程、分子修饰、代谢途径、调控网络和工业生物技术等方面的进展进行综述.  相似文献   

19.
Advanced proteomic research efforts involving areas such as systems biology or biomarker discovery are enabled by the use of high level informatics tools that allow the effective analysis of large quantities of differing types of data originating from various studies. Performing such analyses on a large scale is not feasible without a computational platform that performs data processing and management tasks. Such a platform must be able to provide high-throughput operation while having sufficient flexibility to accommodate evolving data analysis tools and methodologies. The Proteomics Research Information Storage and Management system (PRISM) provides a platform that serves the needs of the accurate mass and time tag approach developed at Pacific Northwest National Laboratory. PRISM incorporates a diverse set of analysis tools and allows a wide range of operations to be incorporated by using a state machine that is accessible to independent, distributed computational nodes. The system has scaled well as data volume has increased over several years, while allowing adaptability for incorporating new and improved data analysis tools for more effective proteomics research.  相似文献   

20.
MOTIVATION: In the post-genomic era, biologists interested in systems biology often need to import data from public databases and construct their own system-specific or subject-oriented databases to support their complex analysis and knowledge discovery. To facilitate the analysis and data processing, customized and centralized databases are often created by extracting and integrating heterogeneous data retrieved from public databases. A generalized methodology for accessing, extracting, transforming and integrating the heterogeneous data is needed. RESULTS: This paper presents a new data integration approach named JXP4BIGI (Java XML Page for Biological Information Gathering and Integration). The approach provides a system-independent framework, which generalizes and streamlines the steps of accessing, extracting, transforming and integrating the data retrieved from heterogeneous data sources to build a customized data warehouse. It allows the data integrator of a biological database to define the desired bio-entities in XML templates (or Java XML pages), and use embedded extended SQL statements to extract structured, semi-structured and unstructured data from public databases. By running the templates in the JXP4BIGI framework and using a number of generalized wrappers, the required data from public databases can be efficiently extracted and integrated to construct the bio-entities in the XML format without having to hard-code the extraction logics for different data sources. The constructed XML bio-entities can then be imported into either a relational database system or a native XML database system to build a biological data warehouse. AVAILABILITY: JXP4BIGI has been integrated and tested in conjunction with the IKBAR system (http://www.ikbar.org/) in two integration efforts to collect and integrate data for about 200 human genes related to cell death from HUGO, Ensembl, and SWISS-PROT (Bairoch and Apweiler, 2000), and about 700 Drosophila genes from FlyBase (FlyBase Consortium, 2002). The integrated data has been used in comparative genomic analysis of x-ray induced cell death. Also, as explained later, JXP4BIGI is a middleware and framework to be integrated with biological database applications, and cannot run as a stand-alone software for end users. For demonstration purposes, a demonstration version is accessible at (http://www.ikbar.org/jxp4bigi/demo.html).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号