首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Workstation clusters are emerging as a general-purpose computing platform for the execution of workloads comprising parallel and sequential applications. The scalability and flexibility typical of implicit coscheduling strategies makes them a very promising solution to the scheduling needs of workstation clusters. In this paper we present a simulation study that compares, for a variety of workloads (that include both parallel and sequential applications) and operating system schedulers, 12 implicit coscheduling strategies in terms of the performance they are able to deliver to applications. By using a detailed simulator, we evaluate the performance of different coscheduling alternatives for a variety of simulation scenarios, and we identify the set of strategies that deliver the best performance to all the applications composing typical cluster workloads. Moreover, we show that for schedulers providing immediate preemption, the best strategies are also the simplest ones to implement.  相似文献   

2.
A Load Balancing Tool for Distributed Parallel Loops   总被引:1,自引:0,他引:1  
Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications.  相似文献   

3.
Input buffered switch architecture has become attractive for implementing high performance switches for workstation clusters. It is challenging to provide a scheduling technique that is both highly efficient and fair in resource allocation. In this paper, we first introduce an iterative Fair Scheduling (iFS) scheme for input buffered switches that supports fair bandwidth distribution among the flows and achieves asymptotically 100% throughput. We then apply the idea of fair scheduling to switches with multicasting capability and propose an mFS scheme which allocates bandwidth to various flows according to their reservations. We show that mFS produces throughput comparable to the existing schemes while distributing the bandwidth as per the given reservations. Extensive simulation results are presented to validate the effectiveness of our proposed schemes.  相似文献   

4.
Divisible load scenarios occur in modern media server applications since most multimedia applications typically require access to continuous and discrete data. A high performance Continuous Media (CM) server greatly depends on the ability of its disk IO subsystem to serve both types of workloads efficiently. Disk scheduling algorithms for mixed media workloads, although they play a central role in this task, have been overlooked by related research efforts. These algorithms must satisfy several stringent performance goals, such as achieving low response time and ensuring fairness, for the discrete-data workload, while at the same time guaranteeing the uninterrupted delivery of continuous data, for the continuous-data workload. The focus of this paper is on disk scheduling algorithms for mixed media workloads in a multimedia information server. We propose novel algorithms, present a taxonomy of relevant algorithms, and study their performance through experimentation. Our results show that our algorithms offer drastic improvements in discrete request average response times, are fair, serve continuous requests without interruptions, and that the disk technology trends are such that the expected performance benefits can be even greater in the future.  相似文献   

5.
We develop a workload model based on the observed behavior of parallel computers at the San Diego Supercomputer Center and the Cornell Theory Center. This model gives us insight into the performance of strategies for scheduling moldable jobs on space-sharing parallel computers. We find that Adaptive Static Partitioning (ASP), which has been reported to work well for other workloads, does not perform as well as strategies that adapt better to system load. The best of the strategies we consider is one that explicitly reduces allocations when load is high (a variation of Sevcik's (1989) A+ strategy). This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
During the past decade, cluster computing and mobile communication technologies have been extensively deployed and widely applied because of their giant commercial value. The rapid technological advancement makes it feasible to integrate these two technologies and a revolutionary application called mobile cluster computing is arising on the horizon. Mobile cluster computing technology can further enhance the power of our laptops and mobile devices by running parallel applications. However, scheduling parallel applications on mobile clusters is technically challenging due to the significant communication latency and limited battery life of mobile devices. Therefore, shortening schedule length and conserving energy consumption have become two major concerns in designing efficient and energy-aware scheduling algorithms for mobile clusters. In this paper, we propose two novel scheduling strategies aimed at leveraging performance and power consumption for parallel applications running on mobile clusters. Our research focuses on scheduling precedence constrained parallel tasks and thus duplication heuristics are applied to schedule parallel tasks to minimize communication overheads. However, existing duplication algorithms are developed with consideration of schedule lengths, completely ignoring energy consumption of clusters. In this regard, we design two energy-aware duplication scheduling algorithms, called EADUS and TEBUS, to schedule precedence constrained parallel tasks with a complexity of O(n 2), where n is the number of tasks in a parallel task set. Unlike the existing duplication-based scheduling algorithms that replicate all the possible predecessors of each task, the proposed algorithms judiciously replicate predecessors of a task if the duplication can help in conserving energy. Our energy-aware scheduling strategies are conducive to balancing scheduling lengths and energy savings of a set of precedence constrained parallel tasks. We conducted extensive experiments using both synthetic benchmarks and real-world applications to compare our algorithms with two existing approaches. Experimental results based on simulated mobile clusters demonstrate the effectiveness and practicality of the proposed duplication-based scheduling strategies. For example, EADUS and TABUS can reduce energy consumption for the Gaussian Elimination application by averages of 16.08% and 8.1% with merely 5.7% and 2.2% increase in schedule length respectively.
Xiao Qin (Corresponding author)Email:
  相似文献   

7.
Smart grid (SG) application is being used nowadays to meet the demand of increasing power consumption. SG application is considered as a perfect solution for combining renewable energy resources and electrical grid by means of creating a bidirectional communication channel between the two systems. In this paper, three SG applications applicable to renewable energy system, namely, distribution automation (DA), distributed energy system-storage (DER) and electrical vehicle (EV), are investigated in order to study their suitability in Long Term Evolution (LTE) network. To compensate the weakness in the existing scheduling algorithms, a novel bandwidth estimation and allocation technique and a new scheduling algorithm are proposed. The technique allocates available network resources based on application’s priority, whereas the algorithm makes scheduling decision based on dynamic weighting factors of multi-criteria to satisfy the demands (delay, past average throughput and instantaneous transmission rate) of quality of service. Finally, the simulation results demonstrate that the proposed mechanism achieves higher throughput, lower delay and lower packet loss rate for DA and DER as well as provide a degree of service for EV. In terms of fairness, the proposed algorithm shows 3%, 7 % and 9% better performance compared to exponential rule (EXP-Rule), modified-largest weighted delay first (M-LWDF) and exponential/PF (EXP/PF), respectively.  相似文献   

8.
We study the problem of scheduling a divisible load in a three-dimensional mesh of processors. The objective is to find partition of a load into shares and distribution of load shares among processors which minimize load processing time subject to communication delays involved in sending load from one processor to another. We propose a new scheduling algorithm which distributes load in a sequence of stages across the network, each stage brings load to a set of processors located at the same distance from the load source. A key feature of our solution is that sets of processors receive load in the order of decreasing processing capacities. We call this scheduling strategy Largest Layer First. A theorem about the processing time attained by the algorithm is stated. Performance of the algorithm is compared to earlier results.  相似文献   

9.
While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. This alternative setup may be described simply as a collection of multiprocessors or a distributed server system. This collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing.The biggest question which arises in such distributed server systems is what is a good rule for assigning jobs to host machines: i.e. what is a good task assignment policy. Many task assignment policies have been proposed, but not systematically evaluated under supercomputing workloads.In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We validate our experiments by providing analytical proofs of the performance of each of these policies. These proofs also help provide much intuition. We find that while the performance of supercomputing servers varies widely with the task assignment policy, none of the above task assignment policies perform as well as we would like.We observe that all policies proposed thus far aim to balance load among the hosts. We propose a policy which purposely unbalances load among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs – thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load.  相似文献   

10.
An efficient function of a complicated or dynamic high performance computing environment requires the scheduler to dispatch the submitted tasks according to the identification of the idling resources. A derivative problem is to provide accurate forecasts of the tasks runtimes. This is usually needed to assist scheduling policies and fine tune scheduling decisions, and also used for future planning of resource allocation when conducting advance reservation. However, the characteristics of the existing prediction strategies determine that the sole strategy is not appropriate for all kinds of heterogeneous tasks. Aiming at this problem, a multi-strategy collaborative prediction model (MSCPM) for the runtime of online tasks is proposed, and a novel concept named Prediction Accuracy Assurance (PAA) as a criterion is introduced to quantitatively evaluate the precision of the prediction runtime provided by a specific prediction strategy.  相似文献   

11.
Simulated annealing (SA) is a general-purpose optimization technique widely used in various combinatorial optimization problems. However, the main drawback of this technique is a long computation time required to obtain a good quality of solution. Clusters have emerged as a feasible and popular platform for parallel computing in many applications. Computing nodes on many of the clusters available today are temporally heterogeneous. In this study, multiple Markov chain (MMC) parallel simulated annealing (PSA) algorithms have been implemented on a temporally heterogeneous cluster of workstations to solve the graph partitioning problem and their performance has been analyzed in detail. Temporal heterogeneity of a cluster of workstations is harnessed by employing static and dynamic load balancing techniques to further improve efficiency and scalability of the MMC PSA algorithms.  相似文献   

12.
We study sensor scheduling problems of p-percent coverage in this paper and propose two scheduling algorithms to prolong network lifetime due to the fact that for some applications full coverage is not necessary and different subareas of the monitored area may have different coverage requirements. Centralized p-Percent Coverage Algorithm (CPCA) we proposed is a centralized algorithm which selects the least number of nodes to monitor p-percent of the monitored area. Distributed p-Percent Coverage Protocol (DPCP) we represented is a distributed algorithm which can determine a set of nodes in a distributed manner to cover p-percent of the monitored area. Both of the algorithms can guarantee network connectivity. The simulation results show that our algorithms can remarkably prolong network lifetime, have less than 5% un-required coverage for large networks, and employ nodes fairly for most cases.  相似文献   

13.
Divisible load theory is a methodology involving the linear and continuous modeling of partitionable computation and communication loads for parallel processing. It adequately represents an important class of problems with applications in parallel and distributed system scheduling, various types of data processing, scientific and engineering computation, and sensor networks. Solutions are surprisingly tractable. Research in this area over the past decade is described.  相似文献   

14.
In this paper, we present a new task scheduling algorithm, called Contention-Aware Scheduling (CAS) algorithm, with the objective of delivering good quality of schedules in low running-time by considering contention on links of arbitrarily-connected, heterogeneous processors. The CAS algorithm schedules tasks on processors and messages on links by considering the earliest finish time attribute with the virtual cut-through (VCT) or the store-and-forward (SAF) switching. There are three types of CAS algorithm presented in this paper, which differ in ordering the messages from immediate predecessor tasks. As part of the experimental study, the performance of the CAS algorithm is compared with two well-known APN (arbitrary processor network) scheduling algorithms. Experiments on the results of the synthetic benchmarks and the task graphs of the well-known problems clearly show that our CAS algorithm outperforms the related work with respect to performance (given in normalized schedule length) and cost (given in running time) to generate output schedules. Ali Fuat Alkaya received the B.Sc. degree in mathematics from Koc University, Istanbul, Turkey in 1998, and the M.Sc. degree in computer engineering from Marmara University, Istanbul, Turkey in 2002. He is currently a Ph.D. student in engineering management department at the same university. His research interests include task scheduling and analysis of algorithms. Haluk Rahmi Topcuoglu received the B.Sc. and M.Sc. degrees in computer engineering from Bogazici University, Istanbul, Turkey, in 1991 and 1993, respectively. He received the Ph.D. degree in computer science from Syracuse University in 1999. He has been on the faculty at Marmara University, Istanbul, Turkey since Fall 1999, where he is currently an Associate Professor in computer engineering department. His main research interests are task scheduling and mapping in parallel and distributed systems; parallel processing; evolutionary algorithms and their applicability for stationary and dynamic environments. He is a member of the ACM, the IEEE, and the IEEE Computer Society. e-mail: haluk@eng.marmara.edu.tr e-mail: falkaya@eng.marmara.edu.tr  相似文献   

15.
Scheduling parallel tasks in multi-cluster grid can be seen as two interdependent problems: cluster allocation and scheduling parallel task on the allocated cluster. In this paper both rigid and moldable parallel tasks are considered. We propose a theoretical model of utility-oriented parallel task scheduling in multi-cluster grid with advance reservations. On the basis of the model we present an approximation algorithm, a repair strategy based genetic algorithm and greedy heuristics MaxMax, T-Sufferage and R-Sufferage to solve the two interdependent problems. We compare the performance of these algorithms in aspect of utility optimality and timing results. Simulation results show on average the (1+α)-approximation algorithm achieves the best trade-off between utility optimality and timing. Genetic algorithm could achieve better utility than greedy heuristics and approximate algorithm at expensive time cost. Greedy heuristics do not perform equally well when adapted to different utility functions while the approximation algorithm shows its intrinsic stable performance.  相似文献   

16.
Video-on-Demand (VoD) systems are expected to support a variety of multimedia services to the users, such as tele-education, teleconference, remote working, videotelephony, high-definition TV, etc. These applications necessitate abundant bandwidth and buffer space as well as appropriate software and hardware for the efficient manipulation of the networks resources. In this work we investigate a promising scheduling algorithm referred to as the Deadline Credit (DC) algorithm, which exploits the available bandwidth and buffer space to serve a diverse class of prerecorded video applications. We provide simulation results when the DC algorithm is applied to a hierarchical architecture distributed VoD network, which fits the existing tree topology used in todays cable TV systems. The issues investigated via the simulations are: the system utilization, the influence of the buffer space on the delivered Quality of Service, and the fairness of the scheduling mechanism. We examine cases with homogenous as well as diverse video streams, and extend our system to support interactive VCR-like functions. We also contribute a modification to the DC algorithm so that in cases when the video applications have different displaying periods, the video streams obtain a fair share of the networks resources. Finally, we validate our results by simulating actual videos encoded in MPEG-4 and H.263 formats.  相似文献   

17.
Virtualization is widely used in cloud computing environments to efficiently manage resources, but it also raises several challenges. One of them is the fairness issue of resource allocation among virtual machines. Traditional virtualized resource allocation approaches distribute physical resources equally without taking into account the actual workload of each virtual machine and thus often leads to wasting. In this paper, we propose a virtualized resource auction and allocation model (VRAA) based on incentive and penalty to correct this wasting problem. In our approach, we use Nash equilibrium of cooperative games to fairly allocate resources among multiple virtual machines to maximize revenue of the system. To illustrate the effectiveness of the proposed approach, we then apply the basic laws of auction gaming to investigate how CPU allocation and contention can affect applications’ performance (i.e., response time), and its effect on CPU utilization. We find that in our VRAA model, the fairness index is high, and the resource allocation is closely proportional to the actual workloads of the virtual machines, so the wasting of resources is reduced. Experiment results show that our model is general, and can be applied to other virtualized non-CPU resources.  相似文献   

18.
Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Many-task computing denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. Traditional techniques found in production systems in the scientific community to support many-task computing do not scale to today’s largest systems, due to issues in local resource manager scalability and granularity, efficient utilization of the raw hardware, long wait queue times, and shared/parallel file system contention and scalability. To address these limitations, we adopted a “top-down” approach to building a middleware called Falkon, to support the most demanding many-task computing applications at the largest scales. Falkon (Fast and Light-weight tasK executiON framework) integrates (1) multi-level scheduling to enable dynamic resource provisioning and minimize wait queue times, (2) a streamlined task dispatcher able to achieve orders-of-magnitude higher task dispatch rates than conventional schedulers, and (3) data diffusion which performs data caching and uses a data-aware scheduler to co-locate computational and storage resources. Micro-benchmarks have shown Falkon to achieve over 15K+ tasks/s throughputs, scale to hundreds of thousands of processors and to millions of queued tasks, and execute billions of tasks per day. Data diffusion has also shown to improve applications scalability and performance, with its ability to achieve hundreds of Gb/s I/O rates on modest sized clusters, with Tb/s I/O rates on the horizon. Falkon has shown orders of magnitude improvements in performance and scalability than traditional approaches to resource management across many diverse workloads and applications at scales of billions of tasks on hundreds of thousands of processors across clusters, specialized systems, Grids, and supercomputers. Falkon’s performance and scalability have enabled a new class of applications called Many-Task Computing to operate at previously so-believed impossible scales with high efficiency.  相似文献   

19.
Application scheduling plays an important role in high-performance cluster computing. Application scheduling can be classified as job scheduling and task scheduling. This paper presents a survey on the software tools for the graph-based scheduling on cluster systems with the focus on task scheduling. The tasks of a parallel or distributed application can be properly scheduled onto multi-processors in order to optimize the performance of the program (e.g., execution time or resource utilization). In general, scheduling algorithms are designed based on the notion of task graph that represents the relationship of parallel tasks. The scheduling algorithms map the nodes of a graph to the processors in order to minimize overall execution time. Although many scheduling algorithms have been proposed in the literature, surprisingly not many practical tools can be found in practical use. After discussing the fundamental scheduling techniques, we propose a framework and taxonomy for the scheduling tools on clusters. Using this framework, the features of existing scheduling tools are analyzed and compared. We also discuss the important issues in improving the usability of the scheduling tools. This work is supported by the Hong Kong Polytechnic University under grant H-ZJ80 and by NASA Ames Research Center by a cooperative grant agreement with the University of Texas at Arlington. Jiannong Cao received the BSc degree in computer science from Nanjing University, Nanjing, China in 1982, and the MSc and the Ph.D degrees in computer science from Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently an associate professor in Department of Computing at the Hong Kong Polytechnic University, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. He was on the faculty of computer science at James Cook University and University of Adelaide in Australia, and City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile computing, fault tolerance, and distributed software architecture and tools. He has published over 120 technical papers in the above areas. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/programme committee member for many international conferences. Dr. Cao is a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing, and Computer Architecture Professional Committee of the China Computer Federation. Alvin Chan is currently an assistant professor at the Hong Kong Polytechnic University. He graduated from the University of New South Wales with a Ph.D. degree in 1995 and was subsequently employed as a Research Scientist by the CSIRO, Australia. From 1997 to 1998, he was employed by the Centre for Wireless Communications, National University of Singapore as a Program Manager. Dr. Chan is one of the founding members and director of a university spin-off company, Information Access Technology Limited. He is an active consultant and has been providing consultancy services to both local and overseas companies. His research interests include mobile computing, context-aware computing and smart card applications. Yudong Sun received the B.S. and M.S. degrees from Shanghai Jiao Tong University, China. He received Ph.D. degree from the University of Hong Kong in 2002, all in computer science. From 1988 to 1996, he was among the teaching staff in Department of Computer Science and Engineering at Shanghai Jiao Tong University. From 2002 to 2003, he held a research position at the Hong Kong Polytechnic University. At present, he is a Research Associate in School of Computing Science at University of Newcastle upon Tyne, UK. His research interests include parallel and distributed computing, Web services, Grid computing, and bioinformatics. Sajal K. Das is currently a Professor of Computer Science and Engineering and the Founding Director of the Center for Research in Wireless Mobility and Networking (CReWMaN) at the University of Texas at Arlington. His current research interests include resource and mobility management in wireless networks, mobile and pervasive computing, sensor networks, mobile internet, parallel processing, and grid computing. He has published over 250 research papers, and holds four US patents in wireless mobile networks. He received the Best Paper Awards in ACM MobiCom’99, ICOIN-16, ACM, MSWiM’00 and ACM/IEEE PADS’97. Dr. Das serves on the Editorial Boards of IEEE Transactions on Mobile Computing, ACM/Kluwer Wireless Networks, Parallel Processing Letters, Journal of Parallel Algorithms and Applications. He served as General Chair of IEEE PerCom’04, IWDC’04, MASCOTS’02 ACM WoWMoM’00-02; General Vice Chair of IEEE PerCom’03, ACM MobiCom’00 and IEEE HiPC’00-01; Program Chair of IWDC’02, WoWMoM’98-99; TPC Vice Chair of ICPADS’02; and as TPC member of numerous IEEE and ACM conferences. Minyi Guo received his Ph.D. degree in information science from University of Tsukuba, Japan in 1998. From 1998 to 2000, Dr. Guo had been a research scientist of NEC Soft, Ltd. Japan. He is currently a professor at the Department of Computer Software, The University of Aizu, Japan. From 2001 to 2003, he was a visiting professor of Georgia State University, USA, Hong Kong Polytechnic University, Hong Kong. Dr. Guo has served as general chair, program committee or organizing committee chair for many international conferences, and delivered more than 20 invited talks in USA, Australia, China, and Japan. He is the editor-in-chief of the Journal of Embedded Systems. He is also in editorial board of International Journal of High Performance Computing and Networking, Journal of Embedded Computing, Journal of Parallel and Distributed Scientific and Engineering Computing, and International Journal of Computer and Applications. Dr. Guo’s research interests include parallel and distributed processing, parallelizing compilers, data parallel languages, data mining, molecular computing and software engineering. He is a member of the ACM, IEEE, IEEE Computer Society, and IEICE. He is listed in Marquis Who’s Who in Science and Engineering.  相似文献   

20.
Heterogeneous parallel clusters of workstations are being used to solve many important computational problems. Scheduling parallel applications on the best collection of machines in a heterogeneous computing environment is a complex problem. Performance prediction is vital to good application performance in this environment since utilization of an ill-suited machine can slow the computation down significantly. This paper addresses the problem of network performance prediction. A new methodology for characterizing network links and application's need for network resources is developed which makes use of Performance Surfaces [3]. This Performance Surface abstraction is used to schedule a parallel application on resources where it will run most efficiently.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号