首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper describes a novel technique for establishing a virtual file system that allows data to be transferred user-transparently and on-demand across computing and storage servers of a computational grid. Its implementation is based on extensions to the Network File System (NFS) that are encapsulated in software proxies. A key differentiator between this approach and previous work is the way in which file servers are partitioned: while conventional file systems share a single (logical) server across multiple users, the virtual file system employs multiple proxy servers that are created, customized and terminated dynamically, for the duration of a computing session, on a per-user basis. Furthermore, the solution does not require modifications to standard NFS clients and servers. The described approach has been deployed in the context of the PUNCH network-computing infrastructure, and is unique in its ability to integrate unmodified, interactive applications (even commercial ones) and existing computing infrastructure into a network computing environment. Experimental results show that: (1) the virtual file system performs well in comparison to native NFS in a local-area setup, with mean overheads of 1 and 18%, for the single-client execution of the Andrew benchmark in two representative computing environments, (2) the average overhead for eight clients can be reduced to within 1% of native NFS with the use of concurrent proxies, (3) the wide-area performance is within 1% of the local-area performance for a typical compute-intensive PUNCH application (SimpleScalar), while for the I/O-intensive application Andrew the wide-area performance is 5.5 times worse than the local-area performance.  相似文献   

2.
Development of high-performance distributed applications, called metaapplications, is extremely challenging because of their complex runtime environment coupled with their requirements of high-performance and Quality of Service (QoS). Such applications typically run on a set of heterogeneous machines with dynamically varying loads, connected by heterogeneous networks possibly supporting a wide variety of communication protocols. In spite of the size and complexity of such applications, they must provide the high-performance and QoS mandated by their users. In order to achieve the goal of high-performance, they need to adaptively utilize their computational and communication resources. Apart from the requirements of adaptive resource utilization, such applications have a third kind of requirement related to remote access QoS. Different clients, although accessing a single server resource, may have differing QoS requirements from their remote connections. A single server resource may also need to provide different QoS for different clients, depending on various issues such as the amount of trust between the server and a given client. These QoS requirements can be encapsulated under the abstraction of remote access capabilities. Metaapplications need to address all the above three requirements in order to achieve the goal of high-performance and satisfy user expectations of QoS. This paper presents Open HPC++, a programming environment for high-performance applications running in a complex and heterogeneous run-time environment. Open HPC++ provides application level tools and mechanisms to satisfy application requirements of adaptive resource utilization and remote access capabilities. Open HPC++ is designed on the lines of CORBA and uses an Object Request Broker (ORB) to support seamless communication between distributed application components. In order to provide adaptive utilization of communication resources, it uses the principle of open implementation to open up the communication mechanisms of its ORB. By virtue of its open architecture, the ORB supports multiple, possibly custom, communication protocols, along with automatic and user controlled protocol selection at run-time. An extension of the same mechanism is used to support the concept of remote access capabilities. In order to support adaptive utilization of computational resources, Open HPC++ also provides a flexible yet powerful set of load-balancing mechanisms that can be used to implement custom load-balancing strategies. The paper also presents performance evaluations of Open HPC++ adaptivity and load-balancing mechanisms. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

3.
I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed.In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing.  相似文献   

4.
The Rosetta software suite for macromolecular modeling is a powerful computational toolbox for protein design, structure prediction, and protein structure analysis. The development of novel Rosetta‐based scientific tools requires two orthogonal skill sets: deep domain‐specific expertise in protein biochemistry and technical expertise in development, deployment, and analysis of molecular simulations. Furthermore, the computational demands of molecular simulation necessitate large scale cluster‐based or distributed solutions for nearly all scientifically relevant tasks. To reduce the technical barriers to entry for new development, we integrated Rosetta with modern, widely adopted computational infrastructure. This allows simplified deployment in large‐scale cluster and cloud computing environments, and effective reuse of common libraries for simulation execution and data analysis. To achieve this, we integrated Rosetta with the Conda package manager; this simplifies installation into existing computational environments and packaging as docker images for cloud deployment. Then, we developed programming interfaces to integrate Rosetta with the PyData stack for analysis and distributed computing, including the popular tools Jupyter, Pandas, and Dask. We demonstrate the utility of these components by generating a library of a thousand de novo disulfide‐rich miniproteins in a hybrid simulation that included cluster‐based design and interactive notebook‐based analyses. Our new tools enable users, who would otherwise not have access to the necessary computational infrastructure, to perform state‐of‐the‐art molecular simulation and design with Rosetta.  相似文献   

5.

Background

Mass spectrometry analyses of complex protein samples yield large amounts of data and specific expertise is needed for data analysis, in addition to a dedicated computer infrastructure. Furthermore, the identification of proteins and their specific properties require the use of multiple independent bioinformatics tools and several database search algorithms to process the same datasets. In order to facilitate and increase the speed of data analysis, there is a need for an integrated platform that would allow a comprehensive profiling of thousands of peptides and proteins in a single process through the simultaneous exploitation of multiple complementary algorithms.

Results

We have established a new proteomics pipeline designated as APP that fulfills these objectives using a complete series of tools freely available from open sources. APP automates the processing of proteomics tasks such as peptide identification, validation and quantitation from LC-MS/MS data and allows easy integration of many separate proteomics tools. Distributed processing is at the core of APP, allowing the processing of very large datasets using any combination of Windows/Linux physical or virtual computing resources.

Conclusions

APP provides distributed computing nodes that are simple to set up, greatly relieving the need for separate IT competence when handling large datasets. The modular nature of APP allows complex workflows to be managed and distributed, speeding up throughput and setup. Additionally, APP logs execution information on all executed tasks and generated results, simplifying information management and validation.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0441-8) contains supplementary material, which is available to authorized users.  相似文献   

6.
Access to public data sets is important to the scientific community as a resource to develop new experiments or validate new data. Projects such as the PeptideAtlas, Ensembl and The Cancer Genome Atlas (TCGA) offer both access to public data and a repository to share their own data. Access to these data sets is often provided through a web page form and a web service API. Access technologies based on web protocols (e.g. http) have been in use for over a decade and are widely adopted across the industry for a variety of functions (e.g. search, commercial transactions, and social media). Each architecture adapts these technologies to provide users with tools to access and share data. Both commonly used web service technologies (e.g. REST and SOAP), and custom-built solutions over HTTP are utilized in providing access to research data. Providing multiple access points ensures that the community can access the data in the simplest and most effective manner for their particular needs. This article examines three common access mechanisms for web accessible data: BioMart, caBIG, and Google Data Sources. These are illustrated by implementing each over the PeptideAtlas repository and reviewed for their suitability based on specific usages common to research. BioMart, Google Data Sources, and caBIG are each suitable for certain uses. The tradeoffs made in the development of the technology are dependent on the uses each was designed for (e.g. security versus speed). This means that an understanding of specific requirements and tradeoffs is necessary before selecting the access technology.  相似文献   

7.
Increased platform heterogeneity and varying resource availability in distributed systems motivate the design of resource-aware applications, which ensure a desired performance level by continuously adapting their behavior to changing resource characteristics. In this paper, we describe an application-independent adaptation framework that simplifies the design of resource-aware applications. This framework eliminates the need for adaptation decisions to be explicitly programmed into the application by relying on two novel components: (1) a tunability interface, which exposes adaptation choices in the form of alternate application configurations while encapsulating core application functionality; and (2) a virtual execution environment, which emulates application execution under diverse resource availability enabling off-line collection of information about resulting behavior. Together, these components permit automatic run-time decisions on when to adapt by continuously monitoring resource conditions and application progress, and how to adapt by dynamically choosing an application configuration most appropriate for the prescribed user preference. We evaluate the framework using an interactive distributed image visualization application and a parallel image processing application. The framework permits automatic adaptation to changes in execution environment characteristics such as available network bandwidth or data arrival pattern by choosing a different application configuration that satisfies user preferences of output quality and timeliness.  相似文献   

8.
The completion of the Arabidopsis genome and the large collections of other plant sequences generated in recent years have sparked extensive functional genomics efforts. However, the utilization of this data is inefficient, as data sources are distributed and heterogeneous and efforts at data integration are lagging behind. PlaNet aims to overcome the limitations of individual efforts as well as the limitations of heterogeneous, independent data collections. PlaNet is a distributed effort among European bioinformatics groups and plant molecular biologists to establish a comprehensive integrated database in a collaborative network. Objectives are the implementation of infrastructure and data sources to capture plant genomic information into a comprehensive, integrated platform. This will facilitate the systematic exploration of Arabidopsis and other plants. New methods for data exchange, database integration and access are being developed to create a highly integrated, federated data resource for research. The connection between the individual resources is realized with BioMOBY. BioMOBY provides an architecture for the discovery and distribution of biological data through web services. While knowledge is centralized, data is maintained at its primary source without a need for warehousing. To standardize nomenclature and data representation, ontologies and generic data models are defined in interaction with the relevant communities.Minimal data models should make it simple to allow broad integration, while inheritance allows detail and depth to be added to more complex data objects without losing integration. To allow expert annotation and keep databases curated, local and remote annotation interfaces are provided. Easy and direct access to all data is key to the project.  相似文献   

9.
The emergence of cloud computing has made it become an attractive solution for large-scale data processing and storage applications. Cloud infrastructures provide users a remote access to powerful computing capacity, large storage space and high network bandwidth to deploy various applications. With the support of cloud computing, many large-scale applications have been migrated to cloud infrastructures instead of running on in-house local servers. Among these applications, continuous write applications (CWAs) such as online surveillance systems, can significantly benefit due to the flexibility and advantages of cloud computing. However, with specific characteristics such as continuous data writing and processing, and high level demand of data availability, cloud service providers prefer to use sophisticated models for provisioning resources to meet CWAs’ demands while minimizing the operational cost of the infrastructure. In this paper, we present a novel architecture of multiple cloud service providers (CSPs) or commonly referred to as Cloud-of-Clouds. Based on this architecture, we propose two operational cost-aware algorithms for provisioning cloud resources for CWAs, namely neighboring optimal resource provisioning algorithm and global optimal resource provisioning algorithm, in order to minimize the operational cost and thereby maximizing the revenue of CSPs. We validate the proposed algorithms through comprehensive simulations. The two proposed algorithms are compared against each other to assess their effectiveness, and with a commonly used and practically viable round-robin approach. The results demonstrate that NORPA and GORPA outperform the conventional round-robin algorithm by reducing the operational cost by up to 28 and 57 %, respectively. The low complexity of the proposed cost-aware algorithms allows us to apply it to a realistic Cloud-of-Clouds environment in industry as well as academia.  相似文献   

10.
New ‘omics’ technologies are changing nutritional sciences research. They enable to tackle increasingly complex questions but also increase the need for collaboration between research groups. An important challenge for successful collaboration is the management and structured exchange of information that accompanies data-intense technologies. NuGO, the European Nutrigenomics Organization, the major collaborating network in molecular nutritional sciences, is supporting the application of modern information technologies in this area. We have developed and implemented a concept for data management and computing infrastructure that supports collaboration between nutrigenomics researchers. The system fills the gap between “private” storing with occasional file sharing by email and the use of centralized databases. It provides flexible tools to share data, also during experiments, while preserving ownership. The NuGO Information Network is a decentral, distributed system for data exchange based on standard web technology. Secure access to data, maintained by the individual researcher, is enabled by web services based on the the BioMoby framework. A central directory provides information about available web services. The flexibility of the infrastructure allows a wide variety of services for data processing and integration by combining several web services, including public services. Therefore, this integrated information system is suited for other research collaborations.  相似文献   

11.
Together with the rapid development of IT technology, cloud computing has been considered as the next generation’s computing infrastructure. One of the essential part of cloud computing is the virtual machine technology that enables to reduce the data center cost with better resource utilization. Especially, virtual desktop infrastructure (VDI) is receiving explosive attentions from IT markets because of its advantages of easier software management, greater data protection, and lower cost. However, sharing physical resources in VDI to consolidate multiple guest virtual machines (VMs) on a host has a tradeoff that can lead to significant I/O degradation. Optimizing I/O virtualization overhead is a challenging task because it needs to scrutinize multiple software layers between guest VMs and host where those VMs are executing. In this paper, we present a hypervisor-level cache, called hyperCache, which is possible to provide a shortcut in KVM/QEMU. It intercepts I/O requests in the hypervisor and analyses their I/O access patterns to select data retaining high access frequency. Also, it has a capability of maintaining the appropriate cache memory size by utilizing the cache block map. Our experimental results demonstrate that our method improves I/O bandwidth by up to 4.7x over the existing QEMU.  相似文献   

12.
SUMMARY: The SBW-MATLAB Interface allows MATLAB users to take advantage of the wide variety of tools available through SBW, the Systems Biology Workbench (Sauro et al. (2003) OMICS, 7, 355-372). It also enables MATLAB users to themselves create SBW-enabled tools which can be freely distributed.  相似文献   

13.
Increasing power consumption of IT infrastructures and growing electricity prices have led to the development of several energy-saving techniques in the last couple of years. Virtualization and consolidation of services is one of the key technologies in data centers to reduce overprovisioning and therefore increase energy savings. This paper shows that the energy-optimal allocation of virtualized services in a heterogeneous server infrastructure is NP-hard and can be modeled as a variant of the multidimensional vector packing problem. Furthermore, it proposes a model to predict the performance degradation of a service when it is consolidated with other services. The model allows considering the tradeoff between power consumption and service performance during service allocation. Finally, the paper presents two heuristics that approximate the energy-optimal and performance-aware resource allocation problem and shows that the allocations determined by the proposed heuristics are more energy-efficient than the widely applied maximum-density consolidation.  相似文献   

14.
Saidi  Ahmed  Nouali  Omar  Amira  Abdelouahab 《Cluster computing》2022,25(1):167-185

Attribute-based encryption (ABE) is an access control mechanism that ensures efficient data sharing among dynamic groups of users by setting up access structures indicating who can access what. However, ABE suffers from expensive computation and privacy issues in resource-constrained environments such as IoT devices. In this paper, we present SHARE-ABE, a novel collaborative approach for preserving privacy that is built on top of Ciphertext-Policy Attribute-Based Encryption (CP-ABE). Our approach uses Fog computing to outsource the most laborious decryption operations to Fog nodes. The latter collaborate to partially decrypt the data using an original and efficient chained architecture. Additionally, our approach preserves the privacy of the access policy by introducing false attributes. Furthermore, we introduce a new construction of a collaboration attribute that allows users within the same group to combine their attributes while satisfying the access policy. Experiments and analyses of the security properties demonstrate that the proposed scheme is secure and efficient especially for resource-constrained IoT devices.

  相似文献   

15.
Researchers in quantitative systems biology make use of a large number of different software packages for modelling, analysis, visualization, and general data manipulation. In this paper, we describe the Systems Biology Workbench (SBW), a software framework that allows heterogeneous application components--written in diverse programming languages and running on different platforms--to communicate and use each others' capabilities via a fast binary encoded-message system. Our goal was to create a simple, high performance, opensource software infrastructure which is easy to implement and understand. SBW enables applications (potentially running on separate, distributed computers) to communicate via a simple network protocol. The interfaces to the system are encapsulated in client-side libraries that we provide for different programming languages. We describe in this paper the SBW architecture, a selection of current modules, including Jarnac, JDesigner, and SBWMeta-tool, and the close integration of SBW into BioSPICE, which enables both frameworks to share tools and compliment and strengthen each others capabilities.  相似文献   

16.
The Olfactory Receptor Database (ORDB) is a WWW-accessible database that has been expanded from an olfactory receptor resource to a chemoreceptor resource. It stores data on six classes of G-protein-coupled sensory chemoreceptors: (i) olfactory receptor-like proteins, (ii) vomeronasal receptors, (iii) insect olfactory receptors, (iv) worm chemo-receptors, (v) taste papilla receptors and (vi) fungal pheromone receptors. A complementary database of the ligands of these receptors (OdorDB) has been constructed and is publicly available in a pilot mode. The database schema of ORDB has been changed from traditional relational to EAV/CR (Entity-Attribute-Value with Classes and Relationships), which allows the interoperability of ORDB with other related databases as well as the creation of intra-database associations among objects. This inter-operability facilitates users to follow information from odor molecule binding to its putative receptor, to the properties of the neuron expressing the receptor, to a computational model of activity of olfactory bulb neurons. In addition, tools and resources have been added allowing users to access interactive phylogenetic trees and alignments of sensory chemoreceptors. ORDB is available via the WWW at http://ycmi.med. yale.edu/senselab/ordb/  相似文献   

17.
Interpro is a widely used tool for protein annotation in genome sequencing projects, demanding a large amount of computation and representing a huge time-consuming step. We present a strategy to execute programs using databases Pfam, PROSITE and ProDom of Interpro in a distributed environment using a Java-based messaging system. We developed a two-layer scheduling architecture of the distributed infrastructure. Then, we made experiments and analyzed the results. Our distributed system gave much better results than Interpro Pfam, PROSITE and ProDom running in a centralized platform. This approach seems to be appropriate and promising for highly demanding computational tools used for biological applications.  相似文献   

18.
Plants have evolved both physical and chemical defenses to make the nutrients of attacked organs difficult to access or more toxic to resist animal consumption or/and pathogen attack. Although it is intuitive that a tradeoff could exist between physical and chemical defenses because of finite defense resources, many studies have failed to detect this tradeoff. We hypothesized that tradeoff between physical and chemical defenses in individual organs was mediated by the total resource allocation to those organs. In this study, we tested whether a tradeoff between physical (i.e. fiber content, which has proved to be a good indicator of investment into seed coat) and chemical defenses (i.e. total phenolics, which are abundant chemical defenses in plant seeds) existed in plant seeds by using 163 common species collected from Xishuangbanna tropical forest, southwest China. Then we tested whether this tradeoff was mediated by seed mass which could be a potential proxy of total resource investment per seed. Among the 163 species, there was large interspecific variation in both total phenolics (from 0.01 to 20.52%) and fiber content (from 4.47 to 81.49%). Our results supported our hypothesis: negative relationships between physical and chemical defenses were much stronger among small seeds than among large seeds. Our study suggests that total resource acquisition must be considered when evaluating defense tradeoffs. However, it is usually extremely difficult to measure this resource acquisition variation, thus we suggest utilizing easily measured proxies of acquisition variation to quantify tradeoffs.  相似文献   

19.
The Stanford Microarray Database (SMD; http://genome-www.stanford.edu/microarray/) serves as a microarray research database for Stanford investigators and their collaborators. In addition, SMD functions as a resource for the entire scientific community, by making freely available all of its source code and providing full public access to data published by SMD users, along with many tools to explore and analyze those data. SMD currently provides public access to data from 3500 microarrays, including data from 85 publications, and this total is increasing rapidly. In this article, we describe some of SMD's newer tools for accessing public data, assessing data quality and for data analysis.  相似文献   

20.

Background  

There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号