首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Large scale clusters based on virtualization technologies have been widely used in many areas, including the data center and cloud computing environment. But how to save energy is a big challenge for building a “green cluster” recently. However, previous researches, including local approaches, which focus on saving the energy of the components in a single workstation without a global vision on the whole cluster, and cluster-wide energy saving techniques, which can only be applied to homogeneous workstations and specific applications, cannot solve the challenges. This paper describes the design and implementation of a novel scheme, called Magnet, that uses live migration of virtual machines to transfer load among the nodes on a multi-layer ring-based overlay. This scheme can reduce the power consumption greatly by regarding all the cluster nodes as a whole based on virtualization technologies. And, it can be applied to both the homogeneous and heterogeneous servers. Experimental measurements show that the new method can reduce the power consumption by 74.8% over base at most with certain adjustably acceptable overhead. The effectiveness and performance insights are also analytically verified.  相似文献   

3.
The complexity and requirements of web applications are increasing in order to meet more sophisticated business models (web services and cloud computing, for instance). For this reason, characteristics such as performance, scalability and security are addressed in web server cluster design. Due to the rising energy costs and also to environmental concerns, energy consumption in this type of system has become a main issue. This paper shows energy consumption reduction techniques that use a load forecasting method, combined with DVFS (Dynamic Voltage and Frequency Scaling) and dynamic configuration techniques (turning servers on and off), in a soft real-time web server clustered environment. Our system promotes energy consumption reduction while maintaining user’s satisfaction with respect to request deadlines being met. The results obtained show that prediction capabilities increase the QoS (Quality of Service) of the system, while maintaining or improving the energy savings over state-of-the-art power management mechanisms. To validate this predictive policy, a web application running a real workload profile was deployed in an Apache server cluster testbed running Linux.  相似文献   

4.
With the ever increasing trend of dynamic and static content web, clusters have been widely used for large-scale web servers to improve the system scalability. Dynamically switching the cluster nodes between different power states is one effective approach to save the energy in such clusters. Many research efforts have been invested in designing power-aware clusters by using this method. However, booting a cluster node from a low-power state to an active state takes a certain amount of time that depends on different configurations. This process incurs significant performance degradation. The existing work normally trades a certain amount of performance degradation for energy saving. This paper proposes a hybrid method to predict the number of requests per booting time of the web workloads. A power-aware web cluster scheduler is designed to divide the cluster nodes into an active group and a low-power group. The scheduler attempts to minimize the active group and maximize the low-power group, and boot the cluster nodes in the low-power group in advance to minimize/eliminate performance degradation by leveraging the prediction scheme. Furthermore, this paper integrates the power awareness into the conventional load balancers including Least Connections, Deficit Round Robin, and Skew. Comprehensive experiments are performed to explore the potential opportunities to minimize/eliminate the performance degradation of the power-aware web cluster.  相似文献   

5.
We report the results of an evaluation project on three Beowulf type clusters. The purpose of this study was to assess both the performance of the clusters and the availability and quality of the software for cluster management and management of the available resources. This last goal could hardly be achieved because at the time this project was undertaken much of the management software was either very immature or not yet available. However, it was possible to assess the cluster performance both from the point of view of single program execution as well as with respect to throughput by loading the systems according to a predefined schedule via the available batch systems. To this end a set of application programs, ranging from astronomy to quantum chemistry, together with a synthetic benchmark were employed. From the results we wanted to derive answers about the viability of using cluster systems routinely in a multi-user environment with comparable maintenance cost and effort to that of an integrated parallel machine.  相似文献   

6.
One of the fundamental issues to ensure maximal performance improvement in a cluster computing environment is load distribution, which is commonly achieved by using polling-based load distribution algorithms. Such algorithms suffer from two weaknesses: (1) Load information exchanged during a polling session is confined to the two negotiating nodes only. (2) Such algorithms are not scalable in that growth of the distributed system is accompanied with increasing amount of polling sessions.In this paper, we proposed a LD algorithm which is based on anti-tasks and load state vectors. Anti-tasks travel around the distributed system for pairing up task senders and receivers. As an anti-task travels, timed load information is collected and disseminated over the entire system via the load state vector bundled with the anti-task. Guided by load state vectors, anti-tasks are spontaneously directed towards processing nodes having high transient workload, thus allowing their surplus workload to be relocated soonest possible. No peer-to-peer negotiations between senders and receivers are needed.To reduce the network bandwidth consumption caused by the anti-task algorithm, the number of hosts that an anti-task needs to travel to must be carefully limited. The algorithm achieves this by employing the mathematical notion of Finite Projective Plane (FPP). By employing FPP, the number of nodes that each anti-task has to travel is at most , where N is the number of nodes in the system, without sacrifying the spread of load information.  相似文献   

7.
During the past decade, cluster computing and mobile communication technologies have been extensively deployed and widely applied because of their giant commercial value. The rapid technological advancement makes it feasible to integrate these two technologies and a revolutionary application called mobile cluster computing is arising on the horizon. Mobile cluster computing technology can further enhance the power of our laptops and mobile devices by running parallel applications. However, scheduling parallel applications on mobile clusters is technically challenging due to the significant communication latency and limited battery life of mobile devices. Therefore, shortening schedule length and conserving energy consumption have become two major concerns in designing efficient and energy-aware scheduling algorithms for mobile clusters. In this paper, we propose two novel scheduling strategies aimed at leveraging performance and power consumption for parallel applications running on mobile clusters. Our research focuses on scheduling precedence constrained parallel tasks and thus duplication heuristics are applied to schedule parallel tasks to minimize communication overheads. However, existing duplication algorithms are developed with consideration of schedule lengths, completely ignoring energy consumption of clusters. In this regard, we design two energy-aware duplication scheduling algorithms, called EADUS and TEBUS, to schedule precedence constrained parallel tasks with a complexity of O(n 2), where n is the number of tasks in a parallel task set. Unlike the existing duplication-based scheduling algorithms that replicate all the possible predecessors of each task, the proposed algorithms judiciously replicate predecessors of a task if the duplication can help in conserving energy. Our energy-aware scheduling strategies are conducive to balancing scheduling lengths and energy savings of a set of precedence constrained parallel tasks. We conducted extensive experiments using both synthetic benchmarks and real-world applications to compare our algorithms with two existing approaches. Experimental results based on simulated mobile clusters demonstrate the effectiveness and practicality of the proposed duplication-based scheduling strategies. For example, EADUS and TABUS can reduce energy consumption for the Gaussian Elimination application by averages of 16.08% and 8.1% with merely 5.7% and 2.2% increase in schedule length respectively.
Xiao Qin (Corresponding author)Email:
  相似文献   

8.
Simulated annealing (SA) is a general-purpose optimization technique widely used in various combinatorial optimization problems. However, the main drawback of this technique is a long computation time required to obtain a good quality of solution. Clusters have emerged as a feasible and popular platform for parallel computing in many applications. Computing nodes on many of the clusters available today are temporally heterogeneous. In this study, multiple Markov chain (MMC) parallel simulated annealing (PSA) algorithms have been implemented on a temporally heterogeneous cluster of workstations to solve the graph partitioning problem and their performance has been analyzed in detail. Temporal heterogeneity of a cluster of workstations is harnessed by employing static and dynamic load balancing techniques to further improve efficiency and scalability of the MMC PSA algorithms.  相似文献   

9.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers.  相似文献   

10.
In this paper, an autonomic performance management approach is introduced that can be applied to a general class of web services deployed in large scale distributed environment. The proposed approach utilizes traditional large scale control-based algorithms by using interaction balance approach in web service environment for managing the response time and the system level power consumption. This approach is developed in a generic fashion that makes it suitable for web service deployments, where web service performance can be adjusted by using a finite set of control inputs. This approach maintains the service level agreements, maximizes the revenue, and minimizes the infrastructure operating cost. Additionally, the proposed approach is fault-tolerant with respect to the failures of the computing nodes inside the distributed deployment. Moreover, the computational overhead of the proposed approach can also be managed by using appropriate value of configuration parameters during its deployment.  相似文献   

11.
As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability.  相似文献   

12.
Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, have been proposed. However, the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems, a bcast algorithm by van de Geijn, et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient in a high bi-section bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion. We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency environment, when the message size is 32 MB, the proposed bcast and allreduce are 1.6 and 3.2 times faster, respectively, than the algorithms used in existing MPI systems for Grid environment.
Motohiko MatsudaEmail:
  相似文献   

13.
BeoBLAST is an integrated software package that handles user requests and distributes BLAST and PSI-BLAST searches to nodes of a Beowulf cluster, thus providing a simple way to implement a scalable BLAST system on top of relatively inexpensive computer clusters. Additionally, BeoBLAST offers a number of novel search features through its web interface, including the ability to perform simultaneous searches of multiple databases with multiple queries, and the ability to start a search using the PSSM generated from a previous PSI-BLAST search on a different database. The underlying system can also handle automated querying for high throughput work. AVAILABILITY: Source code is available under the GNU public license at http://bioinformatics.fccc.edu/  相似文献   

14.
The single factor limiting the harnessing of the enormous computing power of clusters for parallel computing is the lack of appropriate software. Present cluster operating systems are not built to support parallel computing – they do not provide services to manage parallelism. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both Message Passing (MP) or Distributed Shared Memory (DSM) paradigms. They are only offered as separate components implemented at the user level as library and independent servers. Due to poor operating systems users must deal with computers of a cluster rather than to see this cluster as a single powerful computer. A Single System Image of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate that it is possible to develop a cluster operating system that is able to efficiently manage parallelism, support Message Passing and DSM and offer the Single System Image. In order to substantiate the claim the first version of a cluster operating system, called GENESIS, that manages parallelism and offers the Single System Image has been developed.  相似文献   

15.
Mainstream computing equipment and the advent of affordable multi-Gigabit communication technology permit us to address data acquisition and processing problems with clusters of COTS machinery. Such networks typically contain heterogeneous platforms, real-time partitions and even custom devices. Vital overall system requirements are high efficiency and flexibility. In preceding projects we experienced the difficulties to meet both requirements at once. Intelligent I/O (I2O) is an industry specification that defines a uniform messaging format and execution environment for hardware and operating system independent device drivers in systems with processor based communication equipment. Mapping this concept to a distributed computing environment and encapsulating the details of the specification into an application-programming framework allow us to provide architectural support for (i) efficient and (ii) extensible cluster operation. This paper portrays our view of applying I2O to high-performance clusters. We demonstrate the feasibility of this approach and report on the efficiency of our XDAQ software framework for distributed data acquisition systems.  相似文献   

16.
We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific experiments, allowing the submission environment to be customized for each instance. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found in these job management environments.
Evan L. TurnerEmail:
  相似文献   

17.
Chang  Luyao  Li  Fan  Niu  Xinzheng  Zhu  Jiahui 《Cluster computing》2022,25(4):3005-3017

To better collect data in context to balance energy consumption, wireless sensor networks (WSN) need to be divided into clusters. The division of clusters makes the network become a hierarchical organizational structure, which plays the role of balancing the network load and prolonging the life cycle of the system. In clustering routing algorithm, the pros and cons of clustering algorithm directly affect the result of cluster division. In this paper, an algorithm for selecting cluster heads based on node distribution density and allocating remaining nodes is proposed for the defects of cluster head random election and uneven clustering in the traditional LEACH protocol clustering algorithm in WSN. Experiments show that the algorithm can realize the rapid selection of cluster heads and division of clusters, which is effective for node clustering and is conducive to equalizing energy consumption.

  相似文献   

18.
MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating processors. In this article, we show how the execution time of collective communication operations can be improved significantly by an internal restructuring based on orthogonal processor structures with two or more levels. The execution time of operations like MPI_Bcast() or MPI_Allgather() can be reduced by 40% and 70% on a dual Xeon cluster and a Beowulf cluster with single-processor nodes. But also on a Cray T3E a significant performance improvement can be obtained by a careful selection of the processor structure. The use of these optimized communication operations can reduce the execution time of data parallel implementations of complex application programs significantly without requiring any other change of the computation and communication structure. We present runtime functions for the modeling of two-phase realizations and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.  相似文献   

19.
In this paper, we report on our “Iridis-Pi” cluster, which consists of 64 Raspberry Pi Model B nodes each equipped with a 700 MHz ARM processor, 256 Mbit of RAM and a 16 GiB SD card for local storage. The cluster has a number of advantages which are not shared with conventional data-centre based cluster, including its low total power consumption, easy portability due to its small size and weight, affordability, and passive, ambient cooling. We propose that these attributes make Iridis-Pi ideally suited to educational applications, where it provides a low-cost starting point to inspire and enable students to understand and apply high-performance computing and data handling to tackle complex engineering and scientific challenges. We present the results of benchmarking both the computational power and network performance of the “Iridis-Pi.” We also argue that such systems should be considered in some additional specialist application areas where these unique attributes may prove advantageous. We believe that the choice of an ARM CPU foreshadows a trend towards the increasing adoption of low-power, non-PC-compatible architectures in high performance clusters.  相似文献   

20.
Cloud computing and cluster computing are user-centric computing services. The shared software and hardware resources and information can be provided to the computers and other equipments according to the demands of users. A majority of services are deployed through outsourcing. Outsourcing computation allows resource-constrained clients to outsource their complex computation workloads to a powerful server which is rich of computation resources. Modular exponentiation is one of the most complex computations in public key based cryptographic schemes. It is useful to reduce the computation cost of the clients by using outsourcing computation. In this paper, we propose a novel outsourcing algorithm for modular exponentiation based on the new mathematical division under the setting of two non-colluding cloud servers. The base and the power of the outsourced data can be kept private and the efficiency is improved compared with former works.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号