期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive hybrid storage systems leveraging SSDs and HDDs in HPC cloud environments

Donghun?Koo Jik-Soo?Kim Soonwook?Hwang Hyeonsang?Eom Jaehwan?Lee Email author View author&#;s OrcID profile 《Cluster computing》2017,20(3):2119-2131

Cloud computing should inherently support various types of data-intensive workloads with different storage access patterns. This makes a high-performance storage system in the Cloud an important component. Emerging flash device technologies such as solid state drives (SSDs) are a viable choice for building high performance computing (HPC) cloud storage systems to address more fine-grained data access patterns. However, the bit-per-dollar SSD price is still higher than the prices of HDDs. This study proposes an optimized progressive file layout (PFL) method to leverage the advantages of SSDs in a parallel file system such as Lustre so that small file I/O performance can be significantly improved. A PFL can dynamically adjust chunk sizes and stripe patterns according to various I/O traffics. Extensive experimental results show that this approach (i.e. building a hybrid storage system based on a combination of SSDs and HDDs) can actually achieve balanced throughput over mixed I/O workloads consisting of large and small file access patterns. 相似文献

2.

The MOSIX Direct File System Access Method for Supporting Scalable Cluster File Systems

Lior Amar Amnon Barak Amnon Shiloh 《Cluster computing》2004,7(2):141-150

MOSIX is a cluster management system that supports preemptive process migration. This paper presents the MOSIX Direct File System Access (DFSA), a provision that can improve the performance of cluster file systems by allowing a migrated process to directly access files in its current location. This capability, when combined with an appropriate file system, could substantially increase the I/O performance and reduce the network congestion by migrating an I/O intensive process to a file server rather than the traditional way of bringing the file's data to the process. DFSA is suitable for clusters that manage a pool of shared disks among multiple machines. With DFSA, it is possible to migrate parallel processes from a client node to file servers for parallel access to different files. Any consistent file system can be adjusted to work with DFSA. To test its performance, we developed the MOSIX File-System (MFS) which allows consistent parallel operations on different files. The paper describes DFSA and presents the performance of MFS with and without DFSA. 相似文献

3.

Data management for large‐scale scientific computations in high performance distributed systems

A. Choudhary M. Kandemir J. No G. Memik X. Shen W. Liao H. Nagesh S. More V. Taylor R. Thakur R. Stevens 《Cluster computing》2000,3(1):45-60

With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high‐level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file storage systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a novel application development environment which is built around an active meta-data management system (MDMS) to handle high-level data in an effective manner. The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified, performance-oriented directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques for the application at hand to the MDMS. We discuss the importance of an active MDMS and show how the three components of our environment, namely the application, the MDMS, and the HSS, fit together. We also report performance numbers from our ongoing implementation and illustrate that significant improvements are made possible without undue programming effort. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

4.

A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing 总被引：1，自引：0，他引：1

X. Shen A. Choudhary C. Matarazzo P. Sinha 《Cluster computing》2003,6(3):189-200

I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed.In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing. 相似文献

5.

An I/O device driver for bioinformatics tools: the case for BLAST

Mauro RC Lifschitz S 《Genetics and molecular research : GMR》2005,4(3):563-570

There are many bioinformatics tools that deal with input/output (I/O) issues by using filing systems from the most common operating systems, such as Linux or MS Windows. However, as data volumes increase, there is a need for more efficient disk access, ad hoc memory management and specific page-replacement policies. We propose a device driver that can be used by multiple applications. It keeps the application code unchanged, providing a non-intrusive and flexible strategy for I/O calls that may be adopted in a straightforward manner. With our approach, database developers can define their own I/O management strategies. We used our device driver to manage Basic Local Alignment Search Tool (BLAST) I/O calls. Based on preliminary experimental results with National Center for Biotechnology Information (NCBI) BLAST, this approach can provide database management systems-like data management features, which may be used for BLAST and many other computational biology applications. 相似文献

6.

Scalable Session Locking for a Distributed File System

Randal C. Burns Robert M. Rees Larry J. Stockmeyer Darrell D.E. Long 《Cluster computing》2001,4(4):295-306

File systems provide an interface for applications to obtain exclusive access to files, in which a process holds privileges to a file that cannot be preempted and restrict the capabilities of other processes. Local file systems do this by maintaining information about the privileges of current file sessions, and checking subsequent sessions for compatibility. Implementing exclusive access in this manner for distributed file systems degrades performance by requiring every new file session to be registered with a lock server that maintains global session state. We present two techniques for improving the performance of session management in the distributed environment. We introduce a distributed lock for managing file access, called a semi-preemptible lock, that allows clients to cache privileges. Under a semi-preemptible lock, a file system creates new sessions without messages to the lock manager. This improves performance by exploiting locality – the affinity of files to clients. We also present data structures and algorithms for the dynamic evaluation of locks that allow a distributed file system to efficiently manage arbitrarily complex locking. In this case, complex means that an object can be locked in a large number of unique modes. The combination of these techniques results in a distributed locking scheme that supports fine-grained concurrency control with low memory and message overhead and with the assurance that their locking system is correct and avoids unnecessary deadlocks. 相似文献

7.

Middleware support for many-task computing

Ioan Raicu Ian Foster Mike Wilde Zhao Zhang Kamil Iskra Peter Beckman Yong Zhao Alex Szalay Alok Choudhary Philip Little Christopher Moretti Amitabh Chaudhary Douglas Thain 《Cluster computing》2010,13(3):291-314

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Many-task computing denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. Traditional techniques found in production systems in the scientific community to support many-task computing do not scale to today’s largest systems, due to issues in local resource manager scalability and granularity, efficient utilization of the raw hardware, long wait queue times, and shared/parallel file system contention and scalability. To address these limitations, we adopted a “top-down” approach to building a middleware called Falkon, to support the most demanding many-task computing applications at the largest scales. Falkon (Fast and Light-weight tasK executiON framework) integrates (1) multi-level scheduling to enable dynamic resource provisioning and minimize wait queue times, (2) a streamlined task dispatcher able to achieve orders-of-magnitude higher task dispatch rates than conventional schedulers, and (3) data diffusion which performs data caching and uses a data-aware scheduler to co-locate computational and storage resources. Micro-benchmarks have shown Falkon to achieve over 15K+ tasks/s throughputs, scale to hundreds of thousands of processors and to millions of queued tasks, and execute billions of tasks per day. Data diffusion has also shown to improve applications scalability and performance, with its ability to achieve hundreds of Gb/s I/O rates on modest sized clusters, with Tb/s I/O rates on the horizon. Falkon has shown orders of magnitude improvements in performance and scalability than traditional approaches to resource management across many diverse workloads and applications at scales of billions of tasks on hundreds of thousands of processors across clusters, specialized systems, Grids, and supercomputers. Falkon’s performance and scalability have enabled a new class of applications called Many-Task Computing to operate at previously so-believed impossible scales with high efficiency. 相似文献

8.

MRPack: Multi-Algorithm Execution Using Compute-Intensive Approach in MapReduce

Muhammad Idris Shujaat Hussain Muhammad Hameed Siddiqi Waseem Hassan Hafiz Syed Muhammad Bilal Sungyoung Lee 《PloS one》2015,10(8)

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement. 相似文献

9.

neCODEC: nearline data compression for scientific applications

Yuan Tian Cong Xu Weikuan Yu Jeffrey S. Vetter Scott Klasky Honggao Liu Saad Biaz 《Cluster computing》2014,17(2):475-486

Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times. 相似文献

10.

A grid layout algorithm for automatic drawing of biochemical networks 总被引：4，自引：0，他引：4

Li W Kurata H 《Bioinformatics (Oxford, England)》2005,21(9):2036-2042

MOTIVATION: Visualization is indispensable in the research of complex biochemical networks. Available graph layout algorithms are not adequate for satisfactorily drawing such networks. New methods are required to visualize automatically the topological architectures and facilitate the understanding of the functions of the networks. RESULTS: We propose a novel layout algorithm to draw complex biochemical networks. A network is modeled as a system of interacting nodes on squared grids. A discrete cost function between each node pair is designed based on the topological relation and the geometric positions of the two nodes. The layouts are produced by minimizing the total cost. We design a fast algorithm to minimize the discrete cost function, by which candidate layouts can be produced efficiently. A simulated annealing procedure is used to choose better candidates. Our algorithm demonstrates its ability to exhibit cluster structures clearly in relatively compact layout areas without any prior knowledge. We developed Windows software to implement the algorithm for CADLIVE. AVAILABILITY: All materials can be freely downloaded from http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/ SUPPLEMENTARY INFORMATION: http://kurata21.bio.kyutech.ac.jp/grid/grid_layout.htm; http://www.cadlive.jp/ 相似文献

11.

An efficient grid layout algorithm for biological networks utilizing various biological attributes 总被引：2，自引：0，他引：2

Kaname Kojima Masao Nagasaki Euna Jeong Mitsuru Kato Satoru Miyano 《BMC bioinformatics》2007,8(1):76

Background

Clearly visualized biopathways provide a great help in understanding biological systems. However, manual drawing of large-scale biopathways is time consuming. We proposed a grid layout algorithm that can handle gene-regulatory networks and signal transduction pathways by considering edge-edge crossing, node-edge crossing, distance measure between nodes, and subcellular localization information from Gene Ontology. Consequently, the layout algorithm succeeded in drastically reducing these crossings in the apoptosis model. However, for larger-scale networks, we encountered three problems: (i) the initial layout is often very far from any local optimum because nodes are initially placed at random, (ii) from a biological viewpoint, human layouts still exceed automatic layouts in understanding because except subcellular localization, it does not fully utilize biological information of pathways, and (iii) it employs a local search strategy in which the neighborhood is obtained by moving one node at each step, and automatic layouts suggest that simultaneous movements of multiple nodes are necessary for better layouts, while such extension may face worsening the time complexity. 相似文献

12.

GMBlock: Optimizing data movement in a block-level storage sharing system over Myrinet

Evangelos Koukis Anastassios Nanos Nectarios Koziris 《Cluster computing》2010,13(4):349-372

We present gmblock, a block-level storage sharing system over Myrinet which uses an optimized I/O path to transfer data directly between the storage medium and the network, bypassing the host CPU and main memory bus of the storage server. It is device driver independent and retains the protection and isolation features of the OS. We evaluate the performance of a prototype gmblock server and find that: (a) the proposed techniques eliminate memory and peripheral bus contention, increasing remote I/O bandwidth significantly, in the order of 20–200% compared to an RDMA-based approach, (b) the impact of remote I/O to local computation becomes negligible, (c) the performance characteristics of RAID storage combined with limited NIC resources reduce performance. We introduce synchronized send operations to improve the degree of disk to network I/O overlapping. We deploy the OCFS2 shared-disk filesystem over gmblock and show gains for various application benchmarks, provided I/O scheduling can eliminate the disk bottleneck due to concurrent access. 相似文献

13.

Molecular complexes at a glance: automated generation of two-dimensional complex diagrams

Stierand K Maass PC Rarey M 《Bioinformatics (Oxford, England)》2006,22(14):1710-1716

MOTIVATION: In this paper a new algorithmic approach is presented, which automatically generates structure diagrams of molecular complexes. A complex diagram contains the ligand, the amino acids of the protein interacting with the ligand and the hydrophilic interactions schematized as dashed lines between the corresponding atoms. The algorithm is based on a combinatorial optimization strategy which solves parts of the layout problem non-heuristically. The depicted molecules are represented as structure diagrams according to the chemical nomenclature. Due to the frequent usage of complex diagrams in the scientific literature as well as in text books dealing with structural biology, biochemistry and medicinal chemistry, the new algorithm is a key element for computer applications in these areas. RESULTS: The method was implemented in the new software tool PoseView. It was tested on a representative dataset containing 305 protein-ligand complexes in total from the Brookhaven Protein Data Bank. PoseView was able to find collision-free layouts for more than three quarters of all complexes. In the following the layout generation algorithm is presented and, additional to the statistical results, representative test cases demonstrating the challenges of the layout generation will be discussed. AVAILABILITY: The method is available as a webservice at http://www.zbh.uni-hamburg.de/poseview. 相似文献

14.

I/O access frequency-aware cache method on KVM/QEMU

Taehoon?Kim Jaechun?No Email author View author&#;s OrcID profile Zhegao?Piao Seong?Joon?Yoo 《Cluster computing》2017,20(3):2143-2155

Together with the rapid development of IT technology, cloud computing has been considered as the next generation’s computing infrastructure. One of the essential part of cloud computing is the virtual machine technology that enables to reduce the data center cost with better resource utilization. Especially, virtual desktop infrastructure (VDI) is receiving explosive attentions from IT markets because of its advantages of easier software management, greater data protection, and lower cost. However, sharing physical resources in VDI to consolidate multiple guest virtual machines (VMs) on a host has a tradeoff that can lead to significant I/O degradation. Optimizing I/O virtualization overhead is a challenging task because it needs to scrutinize multiple software layers between guest VMs and host where those VMs are executing. In this paper, we present a hypervisor-level cache, called hyperCache, which is possible to provide a shortcut in KVM/QEMU. It intercepts I/O requests in the hypervisor and analyses their I/O access patterns to select data retaining high access frequency. Also, it has a capability of maintaining the appropriate cache memory size by utilizing the cache block map. Our experimental results demonstrate that our method improves I/O bandwidth by up to 4.7x over the existing QEMU. 相似文献

15.

DIRAQ: scalable in situ data- and resource-aware indexing for optimized query performance

Sriram Lakshminarasimhan Xiaocheng Zou David A. Boyuka II Saurabh V. Pendse John Jenkins Venkatram Vishwanath Michael E. Papka Scott Klasky Nagiza F. Samatova 《Cluster computing》2014,17(4):1101-1119

Scientific data analytics in high-performance computing environments has been evolving along with the advancement of computing capabilities. With the onset of exascale computing, the increasing gap between compute performance and I/O bandwidth has rendered the traditional post-simulation processing a tedious process. Despite the challenges due to increased data production, there exists an opportunity to benefit from “cheap” computing power to perform query-driven exploration and visualization during simulation time. To accelerate such analyses, applications traditionally augment, post-simulation, raw data with large indexes, which are then repeatedly utilized for data exploration. However, the generation of current state-of-the-art indexes involves a compute- and memory-intensive processing, thus rendering them inapplicable in an in situ context. In this paper we propose DIRAQ, a parallel in situ, in network data encoding and reorganization technique that enables the transformation of simulation output into a query-efficient form, with negligible runtime overhead to the simulation run. DIRAQ’s effective core-local, precision-based encoding approach incorporates an embedded compressed index that is 3–6 \(\times \) smaller than current state-of-the-art indexing schemes. Its data-aware index adjustmentation improves performance of group-level index layout creation by up to 35 % and reduces the size of the generated index by up to 27 %. Moreover, DIRAQ’s in network index merging strategy enables the creation of aggregated indexes that speed up spatial-context query responses by up to \(10\times \) versus alternative techniques. DIRAQ’s topology-, data-, and memory-aware aggregation strategy results in efficient I/O and yields overall end-to-end encoding and I/O time that is less than that required to write the raw data with MPI collective I/O. 相似文献

16.

Comparison of three flow line layouts with unreliable machines and profit maximization

Guan Wang Yang Woo Shin Dug Hee Moon 《Flexible Services and Manufacturing Journal》2016,28(4):669-693

Manufacturing system design is a complex challenge when a new factory is being built. Although some factories produce the same product, the layouts of the factories may be different. Manufacturing systems for automotive engines can be modelled with several types of queueing networks with finite buffers and unreliable machines. In this paper, three types of layout structures which are commonly used in automotive engine shops are compared with respect to maximizing profit that is determined by throughput and the investment cost of buffers. We assume that the service times are constant but inhomogeneous, and the time to failure and the time to repair are exponentially distributed. To solve this problem we used approximation methods which are based on aggregation and overlapping decomposition for computing performance measures, and a gradient search method for finding an optimal buffer allocation. 相似文献

17.

Performance Evaluation of the Quadrics Interconnection Network 总被引：1，自引：0，他引：1

Fabrizio Petrini Eitan Frachtenberg Adolfy Hoisie Salvador Coll 《Cluster computing》2003,6(2):125-142

相似文献

18.

An Adaptive Hybrid OLAP Architecture with optimized memory access patterns

Lubomir Riha Maria Malik Tarek El-Ghazawi 《Cluster computing》2013,16(4):663-677

OLAP (On-Line Analytical Processing) is an approach to efficiently evaluate multidimensional data for business intelligence applications. OLAP contributes to business decision-making by identifying, extracting, and analyzing multidimensional data. The fundamental structure of OLAP is a data cube that enables users to interactively explore the distinct data dimensions. Processing depends on the complexity of queries, dimensionality, and growing size of the data cube. As data volumes keep on increasing and the demands by business users also increase, higher processing speed than ever is needed, as faster processing means faster decisions and more profit to industry. In this paper, we are proposing an Adaptive Hybrid OLAP Architecture that takes advantage of heterogeneous systems with GPUs and CPUs and leverages their different memory subsystems characteristics to minimize response time. Thus, our approach (a) exploits both types of hardware rather than using the CPU only as a frontend for GPU; (b) uses two different data formats (multidimensional cube and relational cube) to match the GPU and CPU memory access patterns and diverts queries adaptively to the best resource for solving the problem at hand; (c) exploits data locality of multidimensional OLAP on NUMA multicore systems through intelligent thread placement; and (d) guides its adaptation and choices by an architectural model that captures the memory access patterns and the underlying data characteristics. Results show an increase in performance by roughly four folds over the best known related approach. There is also the important economical factor. The proposed hybrid system costs only 10 % more than same system without GPU. With this small extra cost, the added GPU increases query processing by almost 2 times. 相似文献

19.

DataStager: scalable data staging services for petascale applications

Hasan Abbasi Matthew Wolf Greg Eisenhauer Scott Klasky Karsten Schwan Fang Zheng 《Cluster computing》2010,13(3):277-290

Known challenges for petascale machines are that (1) the costs of I/O for high performance applications can be substantial, especially for output tasks like checkpointing, and (2) noise from I/O actions can inject undesirable delays into the runtimes of such codes on individual compute nodes. This paper introduces the flexible ‘DataStager’ framework for data staging and alternative services within that jointly address (1) and (2). Data staging services moving output data from compute nodes to staging or I/O nodes prior to storage are used to reduce I/O overheads on applications’ total processing times, and explicit management of data staging offers reduced perturbation when extracting output data from a petascale machine’s compute partition. Experimental evaluations of DataStager on the Cray XT machine at Oak Ridge National Laboratory establish both the necessity of intelligent data staging and the high performance of our approach, using the GTC fusion modeling code and benchmarks running on 1000+ processors. 相似文献

20.

Optimizing I/O forwarding techniques for extreme-scale event tracing

Thomas Ilsche Joseph Schuchart Jason Cope Dries Kimpe Terry Jones Andreas Knüpfer Kamil Iskra Robert Ross Wolfgang E. Nagel Stephen Poole 《Cluster computing》2014,17(1):1-18

Programming development tools are a vital component for understanding the behavior of parallel applications. Event tracing is a principal ingredient to these tools, but new and serious challenges place event tracing at risk on extreme-scale machines. As the quantity of captured events increases with concurrency, the additional data can overload the parallel file system and perturb the application being observed. In this work we present a solution for event tracing on extreme-scale machines. We enhance an I/O forwarding software layer to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system. Furthermore, we introduce a sophisticated write buffering capability to limit the impact. To validate the approach, we employ the Vampir tracing toolset using these new capabilities. Our results demonstrate that the approach increases the maximum traced application size by a factor of 5× to more than 200,000 processes. 相似文献