首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Discretionary Caching for I/O on Clusters
Authors:Murali Vilayannur  Anand Sivasubramaniam  Mahmut Kandemir  Rajeev Thakur  Robert Ross
Institution:(1) Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802;(2) Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, 60439
Abstract:I/O bottlenecks are already a problem in many large-scale applications that manipulate huge datasets. This problem is expected to get worse as applications get larger, and the I/O subsystem performance lags behind processor and memory speed improvements. At the same time, off-the-shelf clusters of workstations are becoming a popular platform for demanding applications due to their cost-effectiveness and widespread deployment. Caching I/O blocks is one effective way of alleviating disk latencies, and there can be multiple levels of caching on a cluster of workstations. Previous studies have shown the benefits of caching—whether it be local to a particular node, or a shared global cache across the cluster—for certain applications. However, we show that while caching is useful in some situations, it can hurt performance if we are not careful about what to cache and when to bypass the cache. This paper presents compilation techniques and runtime support to address this problem. These techniques are implemented and evaluated on an experimental Linux/Pentium cluster running a parallel file system. Our results using a diverse set of applications (scientific and commercial) demonstrate the benefits of a discretionary approach to caching for I/O subsystems on clusters, providing as much as 48% savings in overall execution time over indiscriminately caching everything in some applications. Parts of this paper have appeared in the Proceedings of the 3rd IEEE/ACM Symposium on Cluster Computing and the Grid (CCGrid'03). This paper is an extension of these prior results, and includes a more extensive performance evaluation. Murali Vilayannur is a Ph.D. student in the Department of Computer Science and Engineering at The Pennsylvania State University. His research interests are in High-Performance Parallel I/O, File Systems, Virtual Memory Algorithms and Operating Systems. Anand Sivasubramaniam received his B.Tech. in Computer Science from the Indian Institute of Technology, Madras, in 1989, and the M.S. and Ph.D. degrees in Computer Science from the Georgia Institute of Technology in 1991 and 1995 respectively. He has been on the faculty at The Pennsylvania State University since Fall 1995 where he is currently an Associate Professor. Anand's research interests are in computer architecture, operating systems, performance evaluation, and applications for both high performance computer systems and embedded systems. Anand's research has been funded by NSF through several grants, including the CAREER award, and from industries including IBM, Microsoft and Unisys Corp. He has several publications in leading journals and conferences, and is on the editorial board of IEEE Transactions on Computers and IEEE Transactions on Parallel and Distributed Systems. He is a recipient of the 2002 IBM Faculty Award. Anand is a member of the IEEE, IEEE Computer Society, and ACM. Mahmut Kandemir received the B.Sc. and M.Sc. degrees in control and computer engineering from Istanbul Technical University, Istanbul, Turkey, in 1988 and 1992, respectively. He received the Ph.D. from Syracuse University, Syracuse, New York in electrical engineering and computer science, in 1999. He has been an assistant professor in the Computer Science and Engineering Department at the Pennsylvania State University since August 1999. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He is a member of the IEEE and the ACM. Rajeev Thakur is a Computer Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. He received a B.E. from the University of Bombay, India, in 1990, M.S. from Syracuse University in 1992, and Ph.D. from Syracuse University in 1995, all in computer engineering. His research interests are in the area of high-performance computing in general and high-performance networking and I/O in particular. He was a member of the MPI Forum and participated actively in the definition of the I/O part of the MPI-2 standard. He is the author of a widely used, portable implementation of MPI-IO, called ROMIO. He is also a co-author of the book “Using MPI-2: Advanced Features of the Message Passing Interface” published by MIT Press. Robert Ross received his Ph.D. in Computer Engineering from Clemson University in 2000. He is now an Assistant Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. His research interests are in message passing and storage systems for high performance computing environments. He is the primary author and lead developer for the Parallel Virtual File System (PVFS), a parallel file system for Linux clusters. Current projects include the ROMIO MPI-IO implementation, PVFS, PVFS2, and the MPICH2 implementation of the MPI message passing interface.
Keywords:I/O  clusters  file systems  compiler optimizations  caching
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号