首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 791 毫秒
1.
The exact resolution of large instances of combinatorial optimization problems, such as three dimensional quadratic assignment problem (Q3AP), is a real challenge for grid computing. Indeed, it is necessary to reconsider the resolution algorithms and take into account the characteristics of such environments, especially large scale and dynamic availability of resources, and their multi-domain administration. In this paper, we revisit the design and implementation of the branch and bound algorithm for solving large combinatorial optimization problems such as Q3AP on the computational grids. Such gridification is based on new ways to efficiently deal with some crucial issues, mainly dynamic adaptive load balancing and fault tolerance. Our new approach allowed the exact resolution on a nation-wide grid of a difficult Q3AP instance. To solve this instance, an average of 1,123 computing cores were used for less than 12 days with a peak of around 3,427 computing cores.  相似文献   

2.
High performance and distributed computing systems such as peta-scale, grid and cloud infrastructure are increasingly used for running scientific models and business services. These systems experience large availability variations through hardware and software failures. Resource providers need to account for these variations while providing the required QoS at appropriate costs in dynamic resource and application environments. Although the performance and reliability of these systems have been studied separately, there has been little analysis of the lost Quality of Service (QoS) experienced with varying availability levels. In this paper, we present a resource performability model to estimate lost performance and corresponding cost considerations with varying availability levels. We use the resulting model in a multi-phase planning approach for scheduling a set of deadline-sensitive meteorological workflows atop grid and cloud resources to trade-off performance, reliability and cost. We use simulation results driven by failure data collected over the lifetime of high performance systems to demonstrate how the proposed scheme better accounts for resource availability.  相似文献   

3.
4.
While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less load among each mirroring pair. The duplication of modified data to the mirroring nodes is performed asynchronously in the background. The read performance is improved by two techniques: doubling the degree of parallelism and hot-spot skipping. A synthetic benchmark is used to evaluate these algorithms in a real cluster environment and the proposed algorithms are shown to be very effective in performance enhancement. Yifeng Zhu received his B.Sc. degree in Electrical Engineering in 1998 from Huazhong University of Science and Technology, Wuhan, China; the M.S. and Ph.D. degree in Computer Science from University of Nebraska – Lincoln in 2002 and 2005 respectively. He is an assistant professor in the Electrical and Computer Engineering department at University of Maine. His main research interests are cluster computing, grid computing, computer architecture and systems, and parallel I/O storage systems. Dr. Zhu is a Member of ACM, IEEE, the IEEE Computer Society, and the Francis Crowe Society. Hong Jiang received the B.Sc. degree in Computer Engineering in 1982 from Huazhong University of Science and Technology, Wuhan, China; the M.A.Sc. degree in Computer Engineering in 1987 from the University of Toronto, Toronto, Canada; and the PhD degree in Computer Science in 1991 from the Texas A&M University, College Station, Texas, USA. Since August 1991 he has been at the University of Nebraska-Lincoln, Lincoln, Nebraska, USA, where he is Professor and Vice Chair in the Department of Computer Science and Engineering. His present research interests are computer architecture, parallel/distributed computing, cluster and Grid computing, computer storage systems and parallel I/O, performance evaluation, real-time systems, middleware, and distributed systems for distance education. He has over 100 publications in major journals and international Conferences in these areas and his research has been supported by NSF, DOD and the State of Nebraska. Dr. Jiang is a Member of ACM, the IEEE Computer Society, and the ACM SIGARCH. Xiao Qin received the BS and MS degrees in computer science from Huazhong University of Science and Technology in 1992 and 1999, respectively. He received the PhD degree in computer science from the University of Nebraska-Lincoln in 2004. Currently, he is an assistant professor in the department of computer science at the New Mexico Institute of Mining and Technology. He had served as a subject area editor of IEEE Distributed System Online (2000–2001). His research interests are in parallel and distributed systems, storage systems, real-time computing, performance evaluation, and fault-tolerance. He is a member of the IEEE. Dan Feng received the Ph.D degree from Huazhong University of Science and Technology, Wuhan, China, in 1997. She is currently a professor of School of Computer, Huazhong University of Science and Technology, Wuhan, China. She is the principal scientist of the the National Grand Fundamental Research 973 Program of China “Research on the organization and key technologies of the Storage System on the next generation Internet.” Her research interests include computer architecture, storage system, parallel I/O, massive storage and performance evaluation. David Swanson received a Ph.D. in physical (computational) chemistry at the University of Nebraska-Lincoln (UNL) in 1995, after which he worked as an NSF-NATO postdoctoral fellow at the Technical University of Wroclaw, Poland, in 1996, and subsequently as a National Research Council Research Associate at the Naval Research Laboratory in Washington, DC, from 1997–1998. In 1999 he returned to UNL where he directs the Research Computing Facility and currently serves as an Assistant Research Professor in the Department of Computer Science and Engineering. The Office of Naval Research, the National Science Foundation, and the State of Nebraska have supported his research in areas such as large-scale scientific simulation and distributed systems.  相似文献   

5.
In biological systems, the dynamic analysis method has gained increasing attention in the past decade. The Boolean network is the most common model of a genetic regulatory network. The interactions of activation and inhibition in the genetic regulatory network are modeled as a set of functions of the Boolean network, while the state transitions in the Boolean network reflect the dynamic property of a genetic regulatory network. A difficult problem for state transition analysis is the finding of attractors. In this paper, we modeled the genetic regulatory network as a Boolean network and proposed a solving algorithm to tackle the attractor finding problem. In the proposed algorithm, we partitioned the Boolean network into several blocks consisting of the strongly connected components according to their gradients, and defined the connection between blocks as decision node. Based on the solutions calculated on the decision nodes and using a satisfiability solving algorithm, we identified the attractors in the state transition graph of each block. The proposed algorithm is benchmarked on a variety of genetic regulatory networks. Compared with existing algorithms, it achieved similar performance on small test cases, and outperformed it on larger and more complex ones, which happens to be the trend of the modern genetic regulatory network. Furthermore, while the existing satisfiability-based algorithms cannot be parallelized due to their inherent algorithm design, the proposed algorithm exhibits a good scalability on parallel computing architectures.  相似文献   

6.
Recently, software distributed shared memory systems have successfully provided an easy user interface to parallel user applications on distributed systems. In order to prompt program performance, most of DSM systems usually were greedy to utilize all of available processors in a computer network to execute user programs. However, using more processors to execute programs cannot necessarily guarantee to obtain better program performance. The overhead of paralleling programs is increased by the addition in the number of processors used for program execution. If the performance gain from program parallel cannot compensate for the overhead, increasing the number of execution processors will result in performance degradation and resource waste. In this paper, we proposed a mechanism to dynamically find a suitable system scale to optimize performance for DSM applications according to run-time information. The experimental results show that the proposed mechanism can precisely predict the processor number that will result in the best performance and then effectively optimize the performance of the test applications by adapting system scale according to the predicted result. Yi-Chang Zhuang received his B.S., M.S. and Ph.D. degrees in electrical engineering from National Cheng Kung University in 1995, 1997, and 2004. He is currently working as an engineer at Industrial Technology Research Institute in Taiwan. His research interests include object-based storage, file systems, distributed systems, and grid computing. Jyh-Biau Chang is currently an assistant professor at the Information Management Department of Leader University in Taiwan. He received his B.S., M.S. and Ph.D. degrees from Electrical Engineering Department of National Cheng Kung University in 1994, 1996, and 2005. His research interest is focused on cluster and grid computing, parallel and distributed system, and operating system. Tyng-Yeu Liang is currently an assistant professor who teaches and studies at Department of Electrical Engineering, National Kaohsiung University of Applied Sciences in Taiwan. He received his B.S., M.S. and Ph.D. degrees from National Cheng Kung University in 1992, 1994, and 2000. His study is interested in cluster and grid computing, image processing and multimedia. Ce-Kuen Shieh currently is a professor at the Electrical Engineering Department of National Cheng Kung University in Taiwan. He is also the chief of computation center at National Cheng Kung University. He received his Ph.D. degree from the Department of Electrical Engineering of National Cheng Kung University in 1988. He was the chairman of the Electrical Engineering Department of National Cheng Kung University from 2002 to 2005. His research interest is focused on computer network, and parallel and distributed system. Laurence T. Yang is a professor at the Department of Computer Science, St. Francis Xavier University, Canada. His research includes high performance computing and networking, embedded systems, ubiquitous/pervasive computing and intelligence, and autonomic and trusted computing.  相似文献   

7.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers.  相似文献   

8.
In this paper, we present a fault tolerant and recovery system called FRASystem (Fault Tolerant & Recovery Agent System) using multi-agent in distributed computing systems. Previous rollback-recovery protocols were dependent on an inherent communication and an underlying operating system, which caused a decline of computing performance. We propose a rollback-recovery protocol that works independently on an operating system and leads to an increasing portability and extensibility. We define four types of agents: (1) a recovery agent performs a rollback-recovery protocol after a failure, (2) an information agent constructs domain knowledge as a rule of fault tolerance and information during a failure-free operation, (3) a facilitator agent controls the communication between agents, (4) a garbage collection agent performs garbage collection of the useless fault tolerance information. Since agent failures may lead to inconsistent states of a system and a domino effect, we propose an agent recovery algorithm. A garbage collection protocol addresses the performance degradation caused by the increment of saved fault tolerance information in a stable storage. We implemented a prototype of FRASystem using Java and CORBA and experimented the proposed rollback-recovery protocol. The simulations results indicate that the performance of our protocol is better than previous rollback-recovery protocols which use independent checkpointing and pessimistic message logging without using agents. Our contributions are as follows: (1) this is the first rollback-recovery protocol using agents, (2) FRASystem is not dependent on an operating system, and (3) FRASystem provides a portability and extensibility.  相似文献   

9.
As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.  相似文献   

10.

Fog-cloud computing is a promising distributed model for hosting ever-increasing Internet of Things (IoT) applications. IoT applications should meet different characteristics such as deadline, frequency rate, and input file size. Fog nodes are heterogeneous, resource-limited devices and cannot accommodate all the IoT applications. Due to these difficulties, designing an efficient algorithm to deploy a set of IoT applications in a fog-cloud environment is very important. In this paper, a fuzzy approach is developed to classify applications based on their characteristics then an efficient heuristic algorithm is proposed to place applications on the virtualized computing resources. The proposed policy aims to provide a high quality of service for IoT users while the profit of fog service providers is maximized by minimizing resource wastage. Extensive simulation experiments are conducted to evaluate the performance of the proposed policy. Results show that the proposed policy outperforms other approaches by improving the average response time up to 13%, the percentage of deadline satisfied requests up to 12%, and the resource wastage up to 26%.

  相似文献   

11.
FPGA based distributed self healing architecture for reusable systems   总被引:1,自引:0,他引:1  
Creating an environment of “no doubt” for computing systems is critical for supporting next generation science, engineering, and commercial applications. With reconfigurable devices such as Field Programmable Gate Arrays (FPGAs), designers are provided with a seductive tool to use as a basis for sophisticated but highly reliable platforms. Reconfigurable computing platforms potentially offer the enhancement of reliability and recovery from catastrophic failures through partial and dynamic reconfigurations; and eliminate the need for redundant hardware resources typically used by existing fault-tolerant systems. We propose a two-level self-healing methodology to offer 100% availability for mission critical systems with comparatively less hardware overhead and performance degradation. Our proposed system first undertakes healing at the node-level. Failing to rectify the system at the node-level, network-level healing is then undertaken. We have designed a system based on Xilinx Virtex-5 FPGAs and Cirronet wireless mesh nodes to demonstrate autonomous wireless healing capability among networked node devices. Our prototype is a proof-of-concept work which demonstrates the feasibility of using FPGAs to provide maximum computational availability in a critical self-healing distributed architecture.  相似文献   

12.
Virtual machines (VM) migration can improve availability, manageability, performance and fault tolerance of systems. Current migration researches mainly focus on the promotion of the efficiency by using shared storage, priority-based policy etc.. But the effect of migration is not well concerned. In fact, once physical servers are overloaded from denial-of-service attack (DDoS) attack, a hasty migration operation not only unable to alleviate the harm of the attack, but also increases the harmfulness. In this paper, a novel DDoS attack, Cloud-Droplet-Freezing (CDF) attack, is described according to the characteristics of cloud computing cluster. Our experiments show that such attack is able to congest internal network communication of cloud server cluster, whilst consume resources of physical server. Base on the analysis of CDF attack, we highlight the method of evaluating potential threats hidden behind the normal VM migration and analyze the flaws of existing intrusion detection systems/prevention system for defensing the CDF attack.  相似文献   

13.
Clusters of workstations are a practical approach to parallel computing that provide high performance at a low cost for many scientific and engineering applications. In order to handle problems with increasing data sets, methods supporting parallel out-of-core computations must be investigated. Since writing an out-of-core version of a program is a difficult task and virtual memory systems do not perform well in some cases, we have developed a parallel programming interface and the support library to provide efficient and convenient access to the out-of-core data. This paper focuses on how these components extend the range of problem sizes that can be solved on the cluster of workstations. Execution time of Jacobi iteration when using our interface, virtual memory and PVFS are compared to characterize the performance for various problem sizes, and it is concluded that our new interface significantly increases the sizes of problems that can be efficiently solved. Jianqi Tang received B.Sc. and M.Sc. from Harbin Institute of Technology in 1997 and 1999 respectively, both in computer application. Currently, she is a Ph.D. candidate at the Department of Computer Science and engineering, Harbin Institute of Technology. She has participated in several National research projects. Her research interests include parallel computing, parallel I/O and grid computing. Binxing Fang received M.Sc. in 1984 from Tsinghua University and Ph.D. from Harbin Institute of Technology in 1989, both in computer science. From 1990 to 1993 he was with National University of Defense Technology as a postdoctor. Since 1984, he is a faculty member at the Department of Computer Science and engineering of Harbin Institute of Technology, where he is presently a Professor. He is a Member of the National Information Expert Consultant Group and a Standing Member of the Council of Chinese Society of Communications. His research efforts focus on parallel computing, computer network and information security. Professor Fang has implemented over 30 projects from the state and ministry/province. Mingzeng Hu was born in 1935. He has been with the Department of Computer Science and engineering in Harbin Institute of Technology since 1958, where he is currently a Professor. He was a visiting scholar in the Siemens Company, Germany from 1978 to 1979, a visiting associate professor in Chiba University, Japan from 1984 to 1985, and a visiting professor in York University, Canada from 1989 to 1995. He is the Director of the National Key Laboratory of Computer Information Content Security. He is also a Member of 3rd Academic Degree Committee under the State Council of China. Professor Hu’s research interests include high performance computer architecture and parallel processing technology, fault tolerant computing, network system, VL design, and computer system security technology. He has implemented many projects from the state and ministry/province and has won several Ministry Science and Technology Progress Awards. He published over 100 papers in core journals home and abroad and one book. Professor Hu has supervised over 20 doctoral students. Hongli Zhang received M.Sc in computer system software in 1996 and Ph.D. in computer architecture in 1999 from Harbin Institute of Technology. Currently, she is an Associate Professor at the Department of Computer Science and engineering, Harbin Institute of Technology. Her research interests include computer network security and parallel computing.  相似文献   

14.
The ability to capture the state of a process and later recover that state in the form of an equivalent running process is the basis for a number of important features in parallel and distributed systems. Adaptive load sharing and fault tolerance are well-known examples. Traditional state capture mechanisms have employed an external agent (such as the operating system kernel) to examine and capture process state. However, the increasing prevalence of heterogeneous cluster and “metacomputing” systems as high-performance computing platforms has prompted investigation of process-internal state capture mechanisms. Perhaps the greatest advantage of the process-internal approach is the ability to support cross-platform state capture and recovery, an important feature in heterogeneous environments. Among the perceived disadvantages of existing process-internal mechanisms are poor performance in multiple respects, and difficulty of use in terms of programmer effort. In this paper we describe a new process-internal state capture and recovery mechanism: Process Introspection. Experiences with this system indicate that the perceived disadvantages associated with process-internal mechanisms can be largely overcome, making this approach to state capture an appropriate one for cluster and metacomputing environments. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

15.
Energy efficiency and high computing power are basic design considerations across modern-day computing solutions due to different concerns such as system performance, operational cost, and environmental issues. Desktop Grid and Volunteer Computing System (DGVCS) so called opportunistic infrastructures offer computational power at low cost focused on harvesting idle computing cycles of existing commodity computing resources. Other than allowing to customize the end user offer, virtualization is considered as one key techniques to reduce energy consumption in large-scale systems and contributes to the scalability of the system. This paper presents an energy efficient approach for opportunistic infrastructures based on task consolidation and customization of virtual machines. The experimental results with single desktops and complete computer rooms show that virtualization significantly improves the energy efficiency of opportunistic grids compared with dedicated computing systems without disturbing the end-user.  相似文献   

16.
An optimization of power and energy consumptions is the important concern for a design of modern-day and future computing and communication systems. Various techniques and high performance technologies have been investigated and developed for an efficient management of such systems. All these technologies should be able to provide good performance and to cope under an increased workload demand in the dynamic environments such as Computational Grids (CGs), clusters and clouds. In this paper we approach the independent batch scheduling in CG as a bi-objective minimization problem with makespan and energy consumption as the scheduling criteria. We use the Dynamic Voltage Scaling (DVS) methodology for scaling and possible reduction of cumulative power energy utilized by the system resources. We develop two implementations of Hierarchical Genetic Strategy-based grid scheduler (Green-HGS-Sched) with elitist and struggle replacement mechanisms. The proposed algorithms were empirically evaluated versus single-population Genetic Algorithms (GAs) and Island GA models for four CG size scenarios in static and dynamic modes. The simulation results show that proposed scheduling methodologies fairly reduce the energy usage and can be easily adapted to the dynamically changing grid states and various scheduling scenarios.  相似文献   

17.
In this paper we present the design and implementation of a Pluggable Fault-Tolerant CORBA Infrastructure that provides fault tolerance for CORBA applications by utilizing the pluggable protocols framework that most CORBA ORBs provide. Our approach does not require any modification to the CORBA ORB, and requires only minimal modification to the application. Moreover, it avoids the difficulty of retrieving and assigning the ORB state by embedding the fault tolerance mechanisms into the ORB. The Pluggable Fault-Tolerant CORBA Infrastructure exhibits similar or better performance than other Fault-Tolerant CORBA systems, while providing strong replica consistency.  相似文献   

18.
A fault detection service for wide area distributed computations   总被引:6,自引:0,他引:6  
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to trade off timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

19.
The sensitivity analysis of a Cellular Genetic Algorithm (CGA) with local search is used to design a new and faster heuristic for the problem of mapping independent tasks to a distributed system (such as a computer cluster or grid) in order to minimize makespan (the time when the last task finishes). The proposed heuristic improves the previously known Min-Min heuristic. Moreover, the heuristic finds mappings of similar quality to the original CGA but in a significantly reduced runtime (1,000 faster). The proposed heuristic is evaluated across twelve different classes of scheduling instances. In addition, a proof of the energy-efficiency of the algorithm is provided. This convergence study suggests how additional energy reduction can be achieved by inserting low power computing nodes to the distributed computer system. Simulation results show that this approach reduces both energy consumption and makespan.  相似文献   

20.
Placement of component service replicas for service-based application (SBA) in cloud environments has become increasingly important. A SBA is usually communication topology-aware, and component service replicas possess stronger data dependency than data replicas; therefore, there are huge amounts of communication between the computer nodes that are used to place component service replicas. Because the conventional methods do not consider the communication topology of component services and the relations between computer nodes, they are not appropriate for placing component service replicas. In this paper, we propose a topological matching-based component service replicas placement method that takes into account not only the topology of SBAs but also the communication performance between different computing nodes. This method first discovers the communication topology of a SBA via multi-scale graph clustering then acquires the topology of computer nodes through spectral clustering. It then places the component service replicas by matching the above two topological structures. Comprehensive experiments are conducted by comparing the performance of our method with those of other methods based on CloudSim simulation software. The results show the effectiveness of our method for improving the performance of SBAs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号