首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
Authors:Adnan Agbaria  Roy Friedman
Institution:(1) The Technion, Department of Computer Science, Haifa, 32000, Israel
Abstract:This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, fault-tolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance and dynamicity for its application programs as well. Starfish achieves these goals by combining group communication technology with checkpoint/restart, and uses a novel architecture that is both flexible and portable and keeps group communication outside the critical data path, for maximum performance.
Keywords:checkpoint/restart  fault-tolerance  distributed system  high performance  MPI
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号