期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Lifting Industrial Ecology Modeling to a New Level of Quality and Transparency: A Call for More Transparent Publications and a Collaborative Open Source Software Framework

下载免费PDF全文

Stefan Pauliuk Guillaume Majeau‐Bettez Christopher L. Mutel Bernhard Steubing Konstantin Stadler 《Journal of Industrial Ecology》2015,19(6):937-949

Industrial ecology (IE) is a maturing scientific discipline. The field is becoming more data and computation intensive, which requires IE researchers to develop scientific software to tackle novel research questions. We review the current state of software programming and use in our field and find challenges regarding transparency, reproducibility, reusability, and ease of collaboration. Our response to that problem is fourfold: First, we propose how existing general principles for the development of good scientific software could be implemented in IE and related fields. Second, we argue that collaborating on open source software could make IE research more productive and increase its quality, and we present guidelines for the development and distribution of such software. Third, we call for stricter requirements regarding general access to the source code used to produce research results and scientific claims published in the IE literature. Fourth, we describe a set of open source modules for standard IE modeling tasks that represent our first attempt at turning our recommendations into practice. We introduce a Python toolbox for IE that includes the life cycle assessment (LCA) framework Brightway2, the ecospold2matrix module that parses unallocated data in ecospold format, the pySUT and pymrio modules for building and analyzing multiregion input‐output models and supply and use tables, and the dynamic_stock_model class for dynamic stock modeling. Widespread use of open access software can, at the same time, increase quality, transparency, and reproducibility of IE research. 相似文献

2.

''The surface management system'' (SuMS) database: a surface-based database to aid cortical surface reconstruction, visualization and analysis.

J Dickson H Drury D C Van Essen 《Philosophical transactions of the Royal Society of London. Series B, Biological sciences》2001,356(1412):1277-1292

Surface reconstructions of the cerebral cortex are increasingly widely used in the analysis and visualization of cortical structure, function and connectivity. From a neuroinformatics perspective, dealing with surface-related data poses a number of challenges. These include the multiplicity of configurations in which surfaces are routinely viewed (e.g. inflated maps, spheres and flat maps), plus the diversity of experimental data that can be represented on any given surface. To address these challenges, we have developed a surface management system (SuMS) that allows automated storage and retrieval of complex surface-related datasets. SuMS provides a systematic framework for the classification, storage and retrieval of many types of surface-related data and associated volume data. Within this classification framework, it serves as a version-control system capable of handling large numbers of surface and volume datasets. With built-in database management system support, SuMS provides rapid search and retrieval capabilities across all the datasets, while also incorporating multiple security levels to regulate access. SuMS is implemented in Java and can be accessed via a Web interface (WebSuMS) or using downloaded client software. Thus, SuMS is well positioned to act as a multiplatform, multi-user 'surface request broker' for the neuroscience community. 相似文献

3.

Nullius in Verba1: Advancing Data Transparency in Industrial Ecology

下载免费PDF全文

Edgar Hertwich Niko Heeren Brandon Kuczenski Guillaume Majeau‐Bettez Rupert J. Myers Stefan Pauliuk Konstantin Stadler Reid Lifset 《Journal of Industrial Ecology》2018,22(1):6-17

With the growth of the field of industrial ecology (IE), research and results have increased significantly leading to a desire for better utilization of the accumulated data in more sophisticated analyses. This implies the need for greater transparency, accessibility, and reusability of IE data, paralleling the considerable momentum throughout the sciences. The Data Transparency Task Force (DTTF) was convened by the governing council of the International Society for Industrial Ecology in late 2016 to propose best‐practice guidelines and incentives for sharing data. In this article, the members of the DTTF present an overview of developments toward transparent and accessible data within the IE community and more broadly. We argue that increased transparency, accessibility, and reusability of IE data will enhance IE research by enabling more detailed and reproducible research, and also facilitate meta‐analyses. These benefits will make the results of IE work more timely. They will enable independent verification of results, thus increasing their credibility and quality. They will also make the uptake of IE research results easier within IE and in other fields as well as by decision makers and sustainability practitioners, thus increasing the overall relevance and impact of the field. Here, we present two initial actions intended to advance these goals: (1) a minimum publication requirement for IE research to be adopted by the Journal of Industrial Ecology; and (2) a system of optional data openness badges rewarding journal articles that contain transparent and accessible data. These actions will help the IE community to move toward data transparency and accessibility. We close with a discussion of potential future initiatives that could build on the minimum requirements and the data openness badge system. 相似文献

4.

SFMN GeoSearch: An interactive approach to the visualization and exchange of point-based ecological data

Rodolphe Gonzales Jeffrey A. Cardille Lael Parrott Caroline Gaudreau Gaël Deest 《Ecological Informatics》2009,4(4):196-205

Recent advances in computer networks and information technologies have created exciting new possibilities for sharing and analyzing scientific research data. Although individual datasets can be studied efficiently, many scientists are still largely limited to considering data collected by themselves, their students, or closely affiliated research groups. Increasingly widespread high-speed network connections and the existence of large, coordinated research programs suggest the potential for scientists to access and learn from data from outside their immediate research circle. We are developing a web-based application that facilitates the sharing of scientific data within a research network using the now-common “virtual globe” in combination with advanced visualization methods designed for geographically distributed scientific data. Two major components of the system enable the rapid assessment of geographically distributed scientific data: a database built from information submitted by network members, and a module featuring novel and sophisticated geographic data visualization techniques. By enabling scientists to share results with each other and view their shared data through a common virtual-globe interface, the system provides a new platform for important meta-analyses and the analysis of broad-scale patterns. Here we present the design and capabilities of the SFMN GeoSearch platform for the Sustainable Forest Management Network, a pan-Canadian network of forest researchers who have accumulated data for more than a decade. Through the development and dissemination of this new tool, we hope to help scientists, students, and the general public to understand the depth and breadth of scientific data across potentially large areas. 相似文献

5.

Effective and efficient data sampling using bitmap indices

Yu Su Gagan Agrawal Jonathan Woodring Kary Myers Joanne Wendelberger James Ahrens 《Cluster computing》2014,17(4):1081-1100

With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. The growing sizes are making it extremely hard to store, manage, disseminate, analyze, and visualize these datasets, especially as neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as the computing power. Sampling can be an effective technique to address the above challenges, but it is extremely important to ensure that dataset characteristics are preserved, and the loss of accuracy is within acceptable levels. In this paper, we address the data explosion problems by developing a novel sampling approach, and implementing it in a flexible system that supports server-side sampling and data subsetting. We observe that to allow subsetting over scientific datasets, data repositories are likely to use an indexing technique. Among these techniques, we see that bitmap indexing can not only effectively support subsetting over scientific datasets, but can also help create samples that preserve both value and spatial distributions over scientific datasets. We have developed algorithms for using bitmap indices to sample datasets. We have also shown how only a small amount of additional metadata stored with bitvectors can help assess loss of accuracy with a particular subsampling level. Some of the other properties of this novel approach include: (1) sampling can be flexibly applied to a subset of the original dataset, which may be specified using a value-based and/or a dimension-based subsetting predicate, and (2) no data reorganization is needed, once bitmap indices have been generated. We have extensively evaluated our method with different types of datasets and applications, and demonstrated the effectiveness of our approach. 相似文献

6.

Development of a Data Management Tool for Investigating Multivariate Space and Free Will Experiences in Virtual Reality

Morie JF Iyer K Luigi DP Williams J Dozois A Rizzo AS 《Applied psychophysiology and biofeedback》2005,30(3):319-331

Virtual reality (VR) has become mature enough to be successfully used in clinical applications such as exposure therapy, pain distraction, and neuropsychological assessment. However, we now need to go beyond the outcome data from this research and conduct the detailed scientific investigations required to better understand what factors influence why VR works (or doesn’t) in these types of clinical applications. This knowledge is required to guide the development of VR applications in the key areas of education, training, and rehabilitation and to further evolve existing VR approaches. One of the primary assets obtained with the use of VR is the ability to simulate the complexity of real world environments, within which human performance can be tested and trained. But this asset comes with a price in terms of the capture, quantification and analysis of large, multivariate and concurrent data sources that reflect the naturalistic behavioral interaction that is afforded in a virtual world. As well, while achieving realism has been a main goal in making convincing VR environments, just what constitutes realism and how much is needed is still an open question situated firmly in the research domain. Just as in real “reality,” such factors in virtual reality are complex and multivariate, and the understanding of this complexity presents exceptional challenges to the VR researcher. For certain research questions, good behavioral science often requires consistent delivery of stimuli within tightly controlled lab-based experimental conditions. However, for other important research questions we do not want to constrain naturalistic behavior and limit VR’s ability to replicate real world conditions, simply because it is easier to study human performance with traditional lab-based methodologies. By doing so we may compromise the very qualities that comprise VR’s unique capacity to mimic the experiences and challenges that exist in everyday life. What is really needed to address scientific questions that require natural exploration of a simulated environment are more usable and robust tools to instrument, organize, and visualize the complex data generated by measurements of participant behaviors within a virtual world. This paper briefly describes the rationale and methodology of an initial study in an ongoing research program that aims to investigate human performance within a virtual environment where unconstrained “free will” exploratory behavior is essential to research questions that involve the relationships between physiology, emotion, and memory. After a discussion of the research protocol and the types of data that were collected, we describe a novel tool that was borne from our need to more efficiently capture, manage, and explore the complex data that was generated in this research. An example of a research participant’s annotated display from this data management and visualization tool is then presented. It is our view that this tool provides the capacity to better visualize and understand the complex data relationships that may arise in VR research that investigates naturalistic free will behavior and its impact on other human performance variables. 相似文献

7.

Virtual Reality: Beyond Visualization

《Journal of molecular biology》2019,431(7):1315-1321

Virtual reality (VR) has recently become an affordable technology. A wide range of options are available to access this unique visualization medium, from simple cardboard inserts for smartphones to truly advanced headsets tracked by external sensors. While it is now possible for any research team to gain access to VR, we can still question what it brings to scientific research. Visualization and the ability to navigate complex three-dimensional data are undoubtedly a gateway to many scientific applications; however, we are convinced that data treatment and numerical simulations, especially those mixing interactions with data, human cognition, and automated algorithms will be the future of VR in scientific research. Moreover, VR might soon merit the same level of attention to imaging data as machine learning currently has. In this short perspective, we discuss approaches that employ VR in scientific research based on some concrete examples. 相似文献

8.

Statistical Application and Challenges in Global Gel-Free Proteomic Analysis by Mass Spectrometry

Lei Nie Gang Wu 《Critical reviews in biotechnology》2013,33(4):297-307

Global gel-free proteomic analysis by mass spectrometry has been widely used as an important tool for exploring complex biological systems at the whole genome level. Simultaneous analysis of a large number of protein species is a complicated and challenging task. The challenges exist throughout all stages of a global gel-free proteomic analysis: experimental design, peptide/protein identification, data preprocessing and normalization, and inferential analysis. In addition to various efforts to improve the analytical technologies, statistical methodologies have been applied in all stages of proteomic analyses to help extract relevant information efficiently from large proteomic datasets. In this review, we summarize current applications of statistics in several stages of global gel-free proteomic analysis by mass spectrometry. We discuss the challenges associated with the applications of various statistical tools. Whenever possible, we also propose potential solutions on how to improve the data collection and interpretation for mass-spectrometry-based global proteomic analysis using more sophisticated and/or novel statistical approaches. 相似文献

9.

UTOPIA-User-Friendly Tools for Operating Informatics Applications

Pettifer SR Sinnott JR Attwood TK 《Comparative and Functional Genomics》2004,5(1):56-60

相似文献

10.

Population-ethnic group specific genome variation allele frequency data: a querying and visualization journey

Viennas E Gkantouna V Ioannou M Georgitsi M Rigou M Poulas K Patrinos GP Tzimas G 《Genomics》2012,100(2):93-101

National/ethnic mutation databases aim to document the genetic heterogeneity in various populations and ethnic groups worldwide. We have previously reported the development and upgrade of FINDbase (www.findbase.org), a database recording causative mutations and pharmacogenomic marker allele frequencies in various populations around the globe. Although this database has recently been upgraded, we continuously try to enhance its functionality by providing more advanced visualization tools that would further assist effective data querying and comparisons. We are currently experimenting in various visualization techniques on the existing FINDbase causative mutation data collection aiming to provide a dynamic research tool for the worldwide scientific community. We have developed an interactive web-based application for population-based mutation data retrieval. It supports sophisticated data exploration allowing users to apply advanced filtering criteria upon a set of multiple views of the underlying data collection and enables browsing the relationships between individual datasets in a novel and meaningful way. 相似文献

11.

A practical guide to interpreting and generating bottom-up proteomics data visualizations

Julia Patricia Schessner Eugenia Voytik Isabell Bludau 《Proteomics》2022,22(8):2100103

Mass-spectrometry based bottom-up proteomics is the main method to analyze proteomes comprehensively and the rapid evolution of instrumentation and data analysis has made the technology widely available. Data visualization is an integral part of the analysis process and it is crucial for the communication of results. This is a major challenge due to the immense complexity of MS data. In this review, we provide an overview of commonly used visualizations, starting with raw data of traditional and novel MS technologies, then basic peptide and protein level analyses, and finally visualization of highly complex datasets and networks. We specifically provide guidance on how to critically interpret and discuss the multitude of different proteomics data visualizations. Furthermore, we highlight Python-based libraries and other open science tools that can be applied for independent and transparent generation of customized visualizations. To further encourage programmatic data visualization, we provide the Python code used to generate all data figures in this review on GitHub ( https://github.com/MannLabs/ProteomicsVisualization ). 相似文献

12.

The ImageJ ecosystem: Open‐source software for image visualization,processing, and analysis

Alexandra B. Schroeder Ellen T. A. Dobson Curtis T. Rueden Pavel Tomancak Florian Jug Kevin W. Eliceiri 《Protein science : a publication of the Protein Society》2021,30(1):234-249

For decades, biologists have relied on software to visualize and interpret imaging data. As techniques for acquiring images increase in complexity, resulting in larger multidimensional datasets, imaging software must adapt. ImageJ is an open‐source image analysis software platform that has aided researchers with a variety of image analysis applications, driven mainly by engaged and collaborative user and developer communities. The close collaboration between programmers and users has resulted in adaptations to accommodate new challenges in image analysis that address the needs of ImageJ's diverse user base. ImageJ consists of many components, some relevant primarily for developers and a vast collection of user‐centric plugins. It is available in many forms, including the widely used Fiji distribution. We refer to this entire ImageJ codebase and community as the ImageJ ecosystem. Here we review the core features of this ecosystem and highlight how ImageJ has responded to imaging technology advancements with new plugins and tools in recent years. These plugins and tools have been developed to address user needs in several areas such as visualization, segmentation, and tracking of biological entities in large, complex datasets. Moreover, new capabilities for deep learning are being added to ImageJ, reflecting a shift in the bioimage analysis community towards exploiting artificial intelligence. These new tools have been facilitated by profound architectural changes to the ImageJ core brought about by the ImageJ2 project. Therefore, we also discuss the contributions of ImageJ2 to enhancing multidimensional image processing and interoperability in the ImageJ ecosystem. 相似文献

13.

A human rights approach to an international code of conduct for genomic and clinical data sharing

Bartha M. Knoppers Jennifer R. Harris Isabelle Budin-Ljøsne Edward S. Dove 《Human genetics》2014,133(7):895-903

Fostering data sharing is a scientific and ethical imperative. Health gains can be achieved more comprehensively and quickly by combining large, information-rich datasets from across conventionally siloed disciplines and geographic areas. While collaboration for data sharing is increasingly embraced by policymakers and the international biomedical community, we lack a common ethical and legal framework to connect regulators, funders, consortia, and research projects so as to facilitate genomic and clinical data linkage, global science collaboration, and responsible research conduct. Governance tools can be used to responsibly steer the sharing of data for proper stewardship of research discovery, genomics research resources, and their clinical applications. In this article, we propose that an international code of conduct be designed to enable global genomic and clinical data sharing for biomedical research. To give this proposed code universal application and accountability, however, we propose to position it within a human rights framework. This proposition is not without precedent: international treaties have long recognized that everyone has a right to the benefits of scientific progress and its applications, and a right to the protection of the moral and material interests resulting from scientific productions. It is time to apply these twin rights to internationally collaborative genomic and clinical data sharing. 相似文献

14.

The interactive presentation of 3D information obtained from reconstructed datasets and 3D placement of single histological sections with the 3D portable document format

de Boer BA Soufan AT Hagoort J Mohun TJ van den Hoff MJ Hasman A Voorbraak FP Moorman AF Ruijter JM 《Development (Cambridge, England)》2011,138(1):159-167

Interpretation of the results of anatomical and embryological studies relies heavily on proper visualization of complex morphogenetic processes and patterns of gene expression in a three-dimensional (3D) context. However, reconstruction of complete 3D datasets is time consuming and often researchers study only a few sections. To help in understanding the resulting 2D data we developed a program (TRACTS) that places such arbitrary histological sections into a high-resolution 3D model of the developing heart. The program places sections correctly, robustly and as precisely as the best of the fits achieved by five morphology experts. Dissemination of 3D data is severely hampered by the 2D medium of print publication. Many insights gained from studying the 3D object are very hard to convey using 2D images and are consequently lost or cannot be verified independently. It is possible to embed 3D objects into a pdf document, which is a format widely used for the distribution of scientific papers. Using the freeware program Adobe Reader to interact with these 3D objects is reasonably straightforward; creating such objects is not. We have developed a protocol that describes, step by step, how 3D objects can be embedded into a pdf document. Both the use of TRACTS and the inclusion of 3D objects in pdf documents can help in the interpretation of 2D and 3D data, and will thus optimize communication on morphological issues in developmental biology. 相似文献

15.

From remotely-sensed solar-induced chlorophyll fluorescence to ecosystem structure,function, and service: Part II—Harnessing data

Ying Sun Jiaming Wen Lianhong Gu Joanna Joiner Christine Y. Chang Christiaan van der Tol Albert Porcar-Castell Troy Magney Lixin Wang Leiqiu Hu Uwe Rascher Pablo Zarco-Tejada Christopher B. Barrett Jiameng Lai Jimei Han Zhenqi Luo 《Global Change Biology》2023,29(11):2893-2925

Although our observing capabilities of solar-induced chlorophyll fluorescence (SIF) have been growing rapidly, the quality and consistency of SIF datasets are still in an active stage of research and development. As a result, there are considerable inconsistencies among diverse SIF datasets at all scales and the widespread applications of them have led to contradictory findings. The present review is the second of the two companion reviews, and data oriented. It aims to (1) synthesize the variety, scale, and uncertainty of existing SIF datasets, (2) synthesize the diverse applications in the sector of ecology, agriculture, hydrology, climate, and socioeconomics, and (3) clarify how such data inconsistency superimposed with the theoretical complexities laid out in (Sun et al., 2023) may impact process interpretation of various applications and contribute to inconsistent findings. We emphasize that accurate interpretation of the functional relationships between SIF and other ecological indicators is contingent upon complete understanding of SIF data quality and uncertainty. Biases and uncertainties in SIF observations can significantly confound interpretation of their relationships and how such relationships respond to environmental variations. Built upon our syntheses, we summarize existing gaps and uncertainties in current SIF observations. Further, we offer our perspectives on innovations needed to help improve informing ecosystem structure, function, and service under climate change, including enhancing in-situ SIF observing capability especially in “data desert” regions, improving cross-instrument data standardization and network coordination, and advancing applications by fully harnessing theory and data. 相似文献

16.

Chemotography for multi-target SAR analysis in the context of biological pathways

Lounkine E Kutchukian P Petrone P Davies JW Glick M 《Bioorganic & medicinal chemistry》2012,20(18):5416-5427

The increasing amount of chemogenomics data, that is, activity measurements of many compounds across a variety of biological targets, allows for better understanding of pharmacology in a broad biological context. Rather than assessing activity at individual biological targets, today understanding of compound interaction with complex biological systems and molecular pathways is often sought in phenotypic screens. This perspective poses novel challenges to structure-activity relationship (SAR) assessment. Today, the bottleneck of drug discovery lies in the understanding of SAR of rich datasets that go beyond single targets in the context of biological pathways, potential off-targets, and complex selectivity profiles. To aid in the understanding and interpretation of such complex SAR, we introduce Chemotography (chemotype chromatography), which encodes chemical space using a color spectrum by combining clustering and multidimensional scaling. Rich biological data in our approach were visualized using spatial dimensions traditionally reserved for chemical space. This allowed us to analyze SAR in the context of target hierarchies and phylogenetic trees, two-target activity scatter plots, and biological pathways. Chemotography, in combination with the Kyoto Encyclopedia of Genes and Genomes (KEGG), also allowed us to extract pathway-relevant SAR from the ChEMBL database. We identified chemotypes showing polypharmacology and selectivity-conferring scaffolds, even in cases where individual compounds have not been tested against all relevant targets. In addition, we analyzed SAR in ChEMBL across the entire Kinome, going beyond individual compounds. Our method combines the strengths of chemical space visualization for SAR analysis and graphical representation of complex biological data. Chemotography is a new paradigm for chemogenomic data visualization and its versatile applications presented here may allow for improved assessment of SAR in biological context, such as phenotypic assay hit lists. 相似文献

17.

A practical guide to amplicon and metagenomic analysis of microbiome data

Yong-Xin Liu Yuan Qin Tong Chen Meiping Lu Xubo Qian Xiaoxuan Guo Yang Bai 《蛋白质与细胞》2021,12(5):315-330

Advances in high-throughput sequencing(HTS)have fostered rapid developments in the field of microbiome research,and massive microbiome datasets are now being generated.However,the diversity of software tools and the complexity of analysis pipelines make it difficult to access this field.Here,we systematically summarize the advantages and limitations of micro-biome methods.Then,we recommend specific pipelines for amplicon and metagenomic analyses,and describe commonly-used software and databases,to help researchers select the appropriate tools.Furthermore,we introduce statistical and visualization methods suit-able for microbiome analysis,including alpha-and beta-diversity,taxonomic composition,difference compar-isons,correlation,networks,machine learning,evolu-tion,source tracing,and common visualization styles to help researchers make informed choices.Finally,a step-by-step reproducible analysis guide is introduced.We hope this review will allow researchers to carry out data analysis more effectively and to quickly select the appropriate tools in order to efficiently mine the bio-logical significance behind the data. 相似文献

18.

Recent advances in mass spectrometry-based computational metabolomics

《Current opinion in chemical biology》2023

The computational metabolomics field brings together computer scientists, bioinformaticians, chemists, clinicians, and biologists to maximize the impact of metabolomics across a wide array of scientific and medical disciplines. The field continues to expand as modern instrumentation produces datasets with increasing complexity, resolution, and sensitivity. These datasets must be processed, annotated, modeled, and interpreted to enable biological insight. Techniques for visualization, integration (within or between omics), and interpretation of metabolomics data have evolved along with innovation in the databases and knowledge resources required to aid understanding. In this review, we highlight recent advances in the field and reflect on opportunities and innovations in response to the most pressing challenges. This review was compiled from discussions from the 2022 Dagstuhl seminar entitled “Computational Metabolomics: From Spectra to Knowledge”. 相似文献

19.

EcoLens: Integration and interactive visualization of ecological datasets

Cynthia Sims Parr Bongshin Lee Benjamin B. Bederson 《Ecological Informatics》2007,2(1):61-69

Complex multi-dimensional datasets are now pervasive in science and elsewhere in society. Better interactive tools are needed for visual data exploration so that patterns in such data may be easily discovered, data can be proofread, and subsets of data can be chosen for algorithmic analysis. In particular, synthetic research such as ecological interaction research demands effective ways to examine multiple datasets. This paper describes our integration of hundreds of food-web datasets into a common platform, and the visualization software, EcoLens, we developed for exploring this information. This publicly-available application and integrated dataset have been useful for our research predicting large complex food webs, and EcoLens is favorably reviewed by other researchers. Many habitats are not well represented in our large database. We confirm earlier results about the small size and lack of taxonomic resolution in early food webs but find that they and a non-food-web source provide trophic information about a large number of taxa absent from more modern studies. Corroboration of Tuesday Lake trophic links across studies is usually possible, but lack of links among congeners may have several explanations. While EcoLens does not provide all kinds of analytical support, its label- and item-based approach is effective at addressing concerns about the comparability and taxonomic resolution of food-web data. 相似文献

20.

Techniques for optimization of queries on integrated biological resources

Lacroix Z Raschid L Eckman BA 《Journal of bioinformatics and computational biology》2004,2(2):375-411

Today, scientific data are inevitably digitized, stored in a wide variety of formats, and are accessible over the Internet. Scientific discovery increasingly involves accessing multiple heterogeneous data sources, integrating the results of complex queries, and applying further analysis and visualization applications in order to collect datasets of interest. Building a scientific integration platform to support these critical tasks requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that are locally materialized in warehouses or generated by software. The lack of efficiency of existing approaches can significantly affect the process with lengthy delays while accessing critical resources or with the failure of the system to report any results. Some queries take so much time to be answered that their results are returned via email, making their integration with other results a tedious task. This paper presents several issues that need to be addressed to provide seamless and efficient integration of biomolecular data. Identified challenges include: capturing and representing various domain specific computational capabilities supported by a source including sequence or text search engines and traditional query processing; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; developing cost and semantics based decision support tools to select sources and capabilities, and to generate efficient query evaluation plans. 相似文献