共查询到20条相似文献,搜索用时 15 毫秒
2.
Large scale clusters based on virtualization technologies have been widely used in many areas, including the data center and
cloud computing environment. But how to save energy is a big challenge for building a “green cluster” recently. However, previous
researches, including local approaches, which focus on saving the energy of the components in a single workstation without
a global vision on the whole cluster, and cluster-wide energy saving techniques, which can only be applied to homogeneous
workstations and specific applications, cannot solve the challenges. This paper describes the design and implementation of
a novel scheme, called Magnet, that uses live migration of virtual machines to transfer load among the nodes on a multi-layer
ring-based overlay. This scheme can reduce the power consumption greatly by regarding all the cluster nodes as a whole based
on virtualization technologies. And, it can be applied to both the homogeneous and heterogeneous servers. Experimental measurements
show that the new method can reduce the power consumption by 74.8% over base at most with certain adjustably acceptable overhead.
The effectiveness and performance insights are also analytically verified. 相似文献
3.
The complexity and requirements of web applications are increasing in order to meet more sophisticated business models (web
services and cloud computing, for instance). For this reason, characteristics such as performance, scalability and security
are addressed in web server cluster design. Due to the rising energy costs and also to environmental concerns, energy consumption
in this type of system has become a main issue. This paper shows energy consumption reduction techniques that use a load forecasting
method, combined with DVFS (Dynamic Voltage and Frequency Scaling) and dynamic configuration techniques (turning servers on
and off), in a soft real-time web server clustered environment. Our system promotes energy consumption reduction while maintaining
user’s satisfaction with respect to request deadlines being met. The results obtained show that prediction capabilities increase
the QoS (Quality of Service) of the system, while maintaining or improving the energy savings over state-of-the-art power
management mechanisms. To validate this predictive policy, a web application running a real workload profile was deployed
in an Apache server cluster testbed running Linux. 相似文献
4.
With the ever increasing trend of dynamic and static content web, clusters have been widely used for large-scale web servers to improve the system scalability. Dynamically switching the cluster nodes between different power states is one effective approach to save the energy in such clusters. Many research efforts have been invested in designing power-aware clusters by using this method. However, booting a cluster node from a low-power state to an active state takes a certain amount of time that depends on different configurations. This process incurs significant performance degradation. The existing work normally trades a certain amount of performance degradation for energy saving. This paper proposes a hybrid method to predict the number of requests per booting time of the web workloads. A power-aware web cluster scheduler is designed to divide the cluster nodes into an active group and a low-power group. The scheduler attempts to minimize the active group and maximize the low-power group, and boot the cluster nodes in the low-power group in advance to minimize/eliminate performance degradation by leveraging the prediction scheme. Furthermore, this paper integrates the power awareness into the conventional load balancers including Least Connections, Deficit Round Robin, and Skew. Comprehensive experiments are performed to explore the potential opportunities to minimize/eliminate the performance degradation of the power-aware web cluster. 相似文献
5.
We report the results of an evaluation project on three Beowulf type clusters. The purpose of this study was to assess both the performance of the clusters and the availability and quality of the software for cluster management and management of the available resources. This last goal could hardly be achieved because at the time this project was undertaken much of the management software was either very immature or not yet available. However, it was possible to assess the cluster performance both from the point of view of single program execution as well as with respect to throughput by loading the systems according to a predefined schedule via the available batch systems. To this end a set of application programs, ranging from astronomy to quantum chemistry, together with a synthetic benchmark were employed. From the results we wanted to derive answers about the viability of using cluster systems routinely in a multi-user environment with comparable maintenance cost and effort to that of an integrated parallel machine. 相似文献
6.
One of the fundamental issues to ensure maximal performance improvement in a cluster computing environment is load distribution, which is commonly achieved by using polling-based load distribution algorithms. Such algorithms suffer from two weaknesses: (1) Load information exchanged during a polling session is confined to the two negotiating nodes only. (2) Such algorithms are not scalable in that growth of the distributed system is accompanied with increasing amount of polling sessions.In this paper, we proposed a LD algorithm which is based on anti-tasks and load state vectors. Anti-tasks travel around the distributed system for pairing up task senders and receivers. As an anti-task travels, timed load information is collected and disseminated over the entire system via the load state vector bundled with the anti-task. Guided by load state vectors, anti-tasks are spontaneously directed towards processing nodes having high transient workload, thus allowing their surplus workload to be relocated soonest possible. No peer-to-peer negotiations between senders and receivers are needed.To reduce the network bandwidth consumption caused by the anti-task algorithm, the number of hosts that an anti-task needs to travel to must be carefully limited. The algorithm achieves this by employing the mathematical notion of Finite Projective Plane (FPP). By employing FPP, the number of nodes that each anti-task has to travel is at most
, where N is the number of nodes in the system, without sacrifying the spread of load information. 相似文献
7.
During the past decade, cluster computing and mobile communication technologies have been extensively deployed and widely
applied because of their giant commercial value. The rapid technological advancement makes it feasible to integrate these
two technologies and a revolutionary application called mobile cluster computing is arising on the horizon. Mobile cluster
computing technology can further enhance the power of our laptops and mobile devices by running parallel applications. However,
scheduling parallel applications on mobile clusters is technically challenging due to the significant communication latency
and limited battery life of mobile devices. Therefore, shortening schedule length and conserving energy consumption have become
two major concerns in designing efficient and energy-aware scheduling algorithms for mobile clusters. In this paper, we propose
two novel scheduling strategies aimed at leveraging performance and power consumption for parallel applications running on
mobile clusters. Our research focuses on scheduling precedence constrained parallel tasks and thus duplication heuristics
are applied to schedule parallel tasks to minimize communication overheads. However, existing duplication algorithms are developed
with consideration of schedule lengths, completely ignoring energy consumption of clusters. In this regard, we design two
energy-aware duplication scheduling algorithms, called EADUS and TEBUS, to schedule precedence constrained parallel tasks
with a complexity of O( n
2), where n is the number of tasks in a parallel task set. Unlike the existing duplication-based scheduling algorithms that replicate
all the possible predecessors of each task, the proposed algorithms judiciously replicate predecessors of a task if the duplication
can help in conserving energy. Our energy-aware scheduling strategies are conducive to balancing scheduling lengths and energy
savings of a set of precedence constrained parallel tasks. We conducted extensive experiments using both synthetic benchmarks
and real-world applications to compare our algorithms with two existing approaches. Experimental results based on simulated
mobile clusters demonstrate the effectiveness and practicality of the proposed duplication-based scheduling strategies. For
example, EADUS and TABUS can reduce energy consumption for the Gaussian Elimination application by averages of 16.08% and
8.1% with merely 5.7% and 2.2% increase in schedule length respectively.
相似文献
8.
Simulated annealing (SA) is a general-purpose optimization technique widely used in various combinatorial optimization problems.
However, the main drawback of this technique is a long computation time required to obtain a good quality of solution. Clusters
have emerged as a feasible and popular platform for parallel computing in many applications. Computing nodes on many of the
clusters available today are temporally heterogeneous. In this study, multiple Markov chain (MMC) parallel simulated annealing (PSA) algorithms have been implemented
on a temporally heterogeneous cluster of workstations to solve the graph partitioning problem and their performance has been
analyzed in detail. Temporal heterogeneity of a cluster of workstations is harnessed by employing static and dynamic load
balancing techniques to further improve efficiency and scalability of the MMC PSA algorithms. 相似文献
9.
Cluster, consisting of a group of computers, is to act as a whole system to provide users with computer resources. Each computer is a node of this cluster. Cluster computer refers to a system consisting of a complete set of computers connected to each other. With the rapid development of computer technology, cluster computing technique with high performance–cost ratio has been widely applied in distributed parallel computing. For the large-scale close data in group enterprise, a heterogeneous data integration model was built under cluster environment based on cluster computing, XML technology and ontology theory. Such model could provide users unified and transparent access interfaces. Based on cluster computing, the work has solved the heterogeneous data integration problems by means of Ontology and XML technology. Furthermore, good application effect has been achieved compared with traditional data integration model. Furthermore, it was proved that this model improved the computing capacity of system, with high performance–cost ratio. Thus, it is hoped to provide support for decision-making of enterprise managers. 相似文献
10.
In this paper, an autonomic performance management approach is introduced that can be applied to a general class of web services deployed in large scale distributed environment. The proposed approach utilizes traditional large scale control-based algorithms by using interaction balance approach in web service environment for managing the response time and the system level power consumption. This approach is developed in a generic fashion that makes it suitable for web service deployments, where web service performance can be adjusted by using a finite set of control inputs. This approach maintains the service level agreements, maximizes the revenue, and minimizes the infrastructure operating cost. Additionally, the proposed approach is fault-tolerant with respect to the failures of the computing nodes inside the distributed deployment. Moreover, the computational overhead of the proposed approach can also be managed by using appropriate value of configuration parameters during its deployment. 相似文献
11.
As GPUs, ARM CPUs and even FPGAs are widely used in modern computing, a data center gradually develops towards the heterogeneous clusters. However, many well-known programming models such as MapReduce are designed for homogeneous clusters and have poor performance in heterogeneous environments. In this paper, we reconsider the problem and make four contributions: (1) We analyse the causes of MapReduce poor performance in heterogeneous clusters, and the most important one is unreasonable task allocation between nodes with different computing ability. (2) Based on this, we propose MrHeter, which separates MapReduce process into map-shuffle stage and reduce stage, then constructs optimization model separately for them and gets different task allocation \(ml_{ij}, mr_{ij}, r_{ij}\) for heterogeneous nodes based on computing ability.(3) In order to make it suitable for dynamic execution, we propose D-MrHeter, which includes monitor and feedback mechanism. (4) Finally, we prove that MrHeter and D-MrHeter can greatly decrease total execution time of MapReduce from 30 to 70 % in heterogeneous cluster comparing with original Hadoop, having better performance especially in the condition of heavy-workload and large-difference between nodes computing ability. 相似文献
12.
Several MPI systems for Grid environment, in which clusters are connected by wide-area networks, have been proposed. However,
the algorithms of collective communication in such MPI systems assume relatively low bandwidth wide-area networks, and they
are not designed for the fast wide-area networks that are becoming available. On the other hand, for cluster MPI systems,
a bcast algorithm by van de Geijn, et al. and an allreduce algorithm by Rabenseifner have been proposed, which are efficient
in a high bi-section bandwidth environment. We modify those algorithms so as to effectively utilize fast wide-area inter-cluster
networks and to control the number of nodes which can transfer data simultaneously through wide-area networks to avoid congestion.
We confirmed the effectiveness of the modified algorithms by experiments using a 10 Gbps emulated WAN environment. The environment
consists of two clusters, where each cluster consists of nodes with 1 Gbps Ethernet links and a switch with a 10 Gbps upper
link. The two clusters are connected through a 10 Gbps WAN emulator which can insert latency. In a 10 millisecond latency
environment, when the message size is 32 MB, the proposed bcast and allreduce are 1.6 and 3.2 times faster, respectively,
than the algorithms used in existing MPI systems for Grid environment.
相似文献
13.
BeoBLAST is an integrated software package that handles user requests and distributes BLAST and PSI-BLAST searches to nodes of a Beowulf cluster, thus providing a simple way to implement a scalable BLAST system on top of relatively inexpensive computer clusters. Additionally, BeoBLAST offers a number of novel search features through its web interface, including the ability to perform simultaneous searches of multiple databases with multiple queries, and the ability to start a search using the PSSM generated from a previous PSI-BLAST search on a different database. The underlying system can also handle automated querying for high throughput work. AVAILABILITY: Source code is available under the GNU public license at http://bioinformatics.fccc.edu/ 相似文献
14.
The single factor limiting the harnessing of the enormous computing power of clusters for parallel computing is the lack of appropriate software. Present cluster operating systems are not built to support parallel computing – they do not provide services to manage parallelism. The cluster operating environments that are used to assist the execution of parallel applications do not provide support for both Message Passing (MP) or Distributed Shared Memory (DSM) paradigms. They are only offered as separate components implemented at the user level as library and independent servers. Due to poor operating systems users must deal with computers of a cluster rather than to see this cluster as a single powerful computer. A Single System Image of the cluster is not offered to users. There is a need for an operating system for clusters. We claim and demonstrate that it is possible to develop a cluster operating system that is able to efficiently manage parallelism, support Message Passing and DSM and offer the Single System Image. In order to substantiate the claim the first version of a cluster operating system, called GENESIS, that manages parallelism and offers the Single System Image has been developed. 相似文献
15.
Mainstream computing equipment and the advent of affordable multi-Gigabit communication technology permit us to address data acquisition and processing problems with clusters of COTS machinery. Such networks typically contain heterogeneous platforms, real-time partitions and even custom devices. Vital overall system requirements are high efficiency and flexibility. In preceding projects we experienced the difficulties to meet both requirements at once. Intelligent I/O (I 2O) is an industry specification that defines a uniform messaging format and execution environment for hardware and operating system independent device drivers in systems with processor based communication equipment. Mapping this concept to a distributed computing environment and encapsulating the details of the specification into an application-programming framework allow us to provide architectural support for (i) efficient and (ii) extensible cluster operation. This paper portrays our view of applying I 2O to high-performance clusters. We demonstrate the feasibility of this approach and report on the efficiency of our XDAQ software framework for distributed data acquisition systems. 相似文献
16.
We describe a system for creating personal clusters in user-space to support the submission and management of thousands of
compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust
infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute
resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing
job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly.
Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific
experiments, allowing the submission environment to be customized for each instance. The version of the system described in
this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their
scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found
in these job management environments.
相似文献
17.
To better collect data in context to balance energy consumption, wireless sensor networks (WSN) need to be divided into clusters. The division of clusters makes the network become a hierarchical organizational structure, which plays the role of balancing the network load and prolonging the life cycle of the system. In clustering routing algorithm, the pros and cons of clustering algorithm directly affect the result of cluster division. In this paper, an algorithm for selecting cluster heads based on node distribution density and allocating remaining nodes is proposed for the defects of cluster head random election and uneven clustering in the traditional LEACH protocol clustering algorithm in WSN. Experiments show that the algorithm can realize the rapid selection of cluster heads and division of clusters, which is effective for node clustering and is conducive to equalizing energy consumption. 相似文献
18.
MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating processors. In this article, we show how the execution time of collective communication operations can be improved significantly by an internal restructuring based on orthogonal processor structures with two or more levels. The execution time of operations like MPI_Bcast() or MPI_Allgather() can be reduced by 40% and 70% on a dual Xeon cluster and a Beowulf cluster with single-processor nodes. But also on a Cray T3E a significant performance improvement can be obtained by a careful selection of the processor structure. The use of these optimized communication operations can reduce the execution time of data parallel implementations of complex application programs significantly without requiring any other change of the computation and communication structure. We present runtime functions for the modeling of two-phase realizations and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs. 相似文献
19.
In this paper, we report on our “Iridis-Pi” cluster, which consists of 64 Raspberry Pi Model B nodes each equipped with a 700 MHz ARM processor, 256 Mbit of RAM and a 16 GiB SD card for local storage. The cluster has a number of advantages which are not shared with conventional data-centre based cluster, including its low total power consumption, easy portability due to its small size and weight, affordability, and passive, ambient cooling. We propose that these attributes make Iridis-Pi ideally suited to educational applications, where it provides a low-cost starting point to inspire and enable students to understand and apply high-performance computing and data handling to tackle complex engineering and scientific challenges. We present the results of benchmarking both the computational power and network performance of the “Iridis-Pi.” We also argue that such systems should be considered in some additional specialist application areas where these unique attributes may prove advantageous. We believe that the choice of an ARM CPU foreshadows a trend towards the increasing adoption of low-power, non-PC-compatible architectures in high performance clusters. 相似文献
20.
Cloud computing and cluster computing are user-centric computing services. The shared software and hardware resources and information can be provided to the computers and other equipments according to the demands of users. A majority of services are deployed through outsourcing. Outsourcing computation allows resource-constrained clients to outsource their complex computation workloads to a powerful server which is rich of computation resources. Modular exponentiation is one of the most complex computations in public key based cryptographic schemes. It is useful to reduce the computation cost of the clients by using outsourcing computation. In this paper, we propose a novel outsourcing algorithm for modular exponentiation based on the new mathematical division under the setting of two non-colluding cloud servers. The base and the power of the outsourced data can be kept private and the efficiency is improved compared with former works. 相似文献
|