首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
In this paper, we consider the problem of scheduling divisible loads on arbitrary graphs with the objective to minimize the total processing time of the entire load submitted for processing. We consider an arbitrary graph network comprising heterogeneous processors interconnected via heterogeneous links in an arbitrary fashion. The divisible load is assumed to originate at any processor in the network. We transform the problem into a multi-level unbalanced tree network and schedule the divisible load. We design systematic procedures to identify and eliminate any redundant processor–link pairs (those pairs whose consideration in scheduling will penalize the performance) and derive an optimal tree structure to obtain an optimal processing time, for a fixed sequence of load distribution. Since the algorithm thrives to determine an equivalent number of processors (resources) that can be used for processing the entire load, we refer to this approach as resource-aware optimal load distribution (RAOLD) algorithm. We extend our study by applying the optimal sequencing theorem proposed for single-level tree networks in the literature for multi-level tree for obtaining an optimal solution. We evaluate the performance for a wide range of arbitrary graphs with varying connectivity probabilities and processor densities. We also study the effect of network scalability and connectivity. We demonstrate the time performance when the point of load origination differs in the network and highlight certain key features that may be useful for algorithm and/or network system designers. We evaluate the time performance with rigorous simulation experiments under different system parameters for the ease of a complete understanding.  相似文献   

2.
This paper investigates scheduling strategies for divisible jobs/loads originating from multiple sites in hierarchical networks with heterogeneous processors and communication channels. In contrast, most previous work in the divisible load scheduling theory (DLT) literature mainly addressed scheduling problems with loads originating from a single processor. This is one of the first works that address scheduling multiple loads from multiple sites in the DLT paradigm. In addition, scheduling multi-site jobs is common in Grids and other general distributed systems for resource sharing and coordination. An efficient static scheduling algorithm PPDD (Processor-set Partitioning and Data Distribution Algorithm) is proposed to near-optimally distribute multiple loads among all processors so that the overall processing time of all jobs is minimized. The PPDD algorithm is applied to two cases: when processors are equipped with front-ends and when they are not equipped with front-ends. The application of the algorithm to homogeneous systems is also studied. Further, several important properties exhibited by the PPDD algorithm are proven through lemmas. To implement the PPDD algorithm, we propose a communication strategy. In addition, we compare the performance of the PPDD algorithm with a Round-robin Scheduling Algorithm (RSA), which is most commonly used. Extensive case studies through numerical analysis have been conducted to verify the theoretical findings.  相似文献   

3.
Clusters of workstations and networked parallel computing systems are emerging as promising computational platforms for HPC applications. The processors in such systems are typically interconnected by a collection of heterogeneous networks such as Ethernet, ATM, and FDDI, among others. In this paper, we develop techniques to perform block-cyclic redistribution over P processors interconnected by such a collection of heterogeneous networks. We represent the communication scheduling problem using a timing diagram formalism. Here, each interprocessor communication event is represented by a rectangle whose height denotes the time to perform this event over the heterogeneous network. The communication scheduling problem is then one of appropriately positioning the rectangles so as to minimize the completion time of all the communication events. For the important case where the block size changes by a factor of K, we develop a heuristic algorithm whose completion time is at most twice the optimal. The running time of the heuristic is O(PK 2). Our heuristic algorithm is adaptive to variations in network performance, and derives schedules at run-time, based on current information about the available network bandwidth. Our experimental results show that our schedules always have communication times that are very close to optimal. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

4.
In this paper, we present a new task scheduling algorithm, called Contention-Aware Scheduling (CAS) algorithm, with the objective of delivering good quality of schedules in low running-time by considering contention on links of arbitrarily-connected, heterogeneous processors. The CAS algorithm schedules tasks on processors and messages on links by considering the earliest finish time attribute with the virtual cut-through (VCT) or the store-and-forward (SAF) switching. There are three types of CAS algorithm presented in this paper, which differ in ordering the messages from immediate predecessor tasks. As part of the experimental study, the performance of the CAS algorithm is compared with two well-known APN (arbitrary processor network) scheduling algorithms. Experiments on the results of the synthetic benchmarks and the task graphs of the well-known problems clearly show that our CAS algorithm outperforms the related work with respect to performance (given in normalized schedule length) and cost (given in running time) to generate output schedules. Ali Fuat Alkaya received the B.Sc. degree in mathematics from Koc University, Istanbul, Turkey in 1998, and the M.Sc. degree in computer engineering from Marmara University, Istanbul, Turkey in 2002. He is currently a Ph.D. student in engineering management department at the same university. His research interests include task scheduling and analysis of algorithms. Haluk Rahmi Topcuoglu received the B.Sc. and M.Sc. degrees in computer engineering from Bogazici University, Istanbul, Turkey, in 1991 and 1993, respectively. He received the Ph.D. degree in computer science from Syracuse University in 1999. He has been on the faculty at Marmara University, Istanbul, Turkey since Fall 1999, where he is currently an Associate Professor in computer engineering department. His main research interests are task scheduling and mapping in parallel and distributed systems; parallel processing; evolutionary algorithms and their applicability for stationary and dynamic environments. He is a member of the ACM, the IEEE, and the IEEE Computer Society. e-mail: haluk@eng.marmara.edu.tr e-mail: falkaya@eng.marmara.edu.tr  相似文献   

5.
In this paper, we consider the problem of scheduling and mapping precedence-constrained tasks to a network of heterogeneous processors. In such systems, processors are usually physically distributed, implying that the communication cost is considerably higher than in tightly coupled multiprocessors. Therefore, scheduling and mapping algorithms for such systems must schedule the tasks as well as the communication traffic by treating both the processors and communication links as equally important resources. We propose an algorithm that achieves these objectives and adapts its task scheduling and mapping decisions according to the given network topology. Just like tasks, messages are also scheduled and mapped to suitable links during the minimization of the finish times of tasks. Heterogeneity of processors is exploited by scheduling critical tasks to the fastest processors. Our experimental study has demonstrated that the proposed algorithm is efficient and robust, and yields consistent performance over a wide range of scheduling parameters. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
Energy aware DAG scheduling on heterogeneous systems   总被引:1,自引:0,他引:1  
We address the problem of scheduling directed a-cyclic task graph (DAG) on a heterogeneous distributed processor system with the twin objectives of minimizing finish time and energy consumption. Previous scheduling heuristics have assigned DAGs to processors to minimize overall run-time of the application. But applications on embedded systems, such as high performance DSP in image processing, multimedia, and wireless security, need schedules which use low energy too.  相似文献   

7.
High availability plays an important role in heterogeneous clusters, where processors operate at different speeds and are not continuously available for processing. Existing scheduling algorithms designed for heterogeneous clusters do not factor in availability. We address in this paper the stochastic scheduling problem for heterogeneous clusters with availability constraints. Each node in a heterogeneous cluster is modeled by its speed and availability, and different classes of tasks submitted to the cluster are characterized by their execution times and availability requirements. To incorporate availability and heterogeneity into stochastic scheduling, we introduce metrics to quantify availability and heterogeneity in the context of multiclass tasks. A stochastic scheduling algorithm SSAC (stochastic scheduling with availability constraints) is then proposed to improve availability of heterogeneous clusters while reducing average response time of tasks. Experimental results show that our algorithm achieves a good trade-off between availability and responsiveness.
Tao XieEmail:
  相似文献   

8.
In this paper, divisible load scheduling in a linear network of processors is presented. The cases of processing load originating at the boundary and also at the interior of the network are considered. An equivalent tree network for the given linear network is derived. Using this equivalent tree network, we prove all the results obtained in the earlier studies. The equivalent tree network methodology presented in this paper, is more general than the earlier results, because in this approach, we can solve the scheduling problem even in an hetrogeneous linear network. The earlier studies considered only homogeneous linear network.  相似文献   

9.
A hysteresis binary McCulloch-Pitts neuron model is proposed in order to suppress the complicated oscillatory behaviors of neural dynamics. The artificial hysteresis binary neural network is used for scheduling time-multiplex crossbar switches in order to demonstrate the effects of hysteresis. Time-multiplex crossbar switching systems must control traffic on demand such that packet blocking probability and packet waiting time are minimized. The system using n×n processing elements solves an n×n crossbar-control problem with O(1) time, while the best existing parallel algorithm requires O(n) time. The hysteresis binary neural network maximizes the throughput of packets through a crossbar switch. The solution quality of our system does not degrade with the problem size.  相似文献   

10.
Online Prediction of the Running Time of Tasks   总被引:7,自引:0,他引:7  
We describe and evaluate the Running Time Advisor (RTA), a system that can predict the running time of a compute-bound task on a typical shared, unreserved commodity host. The prediction is computed from linear time series predictions of host load and takes the form of a confidence interval that neatly expresses the error associated with the measurement and prediction processes – error that must be captured to make statistically valid decisions based on the predictions. Adaptive applications make such decisions in pursuit of consistent high performance, choosing, for example, the host where a task is most likely to meet its deadline. We begin by describing the system and summarizing the results of our previously published work on host load prediction. We then describe our algorithm for computing predictions of running time from host load predictions. We next evaluate the system using over 100,000 randomized testcases run on 39 different hosts, finding that is indeed capable of computing correct and useful confidence intervals. Finally, we report on our experience with using the RTA in application-oriented real-time scheduling in distributed systems.  相似文献   

11.
A Load Balancing Tool for Distributed Parallel Loops   总被引:1,自引:0,他引:1  
Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications.  相似文献   

12.
The competitiveness of online algorithms is measured based on the correctness of the results produced and processing time efficiency. Traditionally evolutionary algorithms are not favored in online paradigms because of the large number of iterations involved in the algorithm which translates directly into processing time overhead. In this paper we describe MARS (Management Architecture for Resource Services) online scheduling algorithm which uses Simulated Annealing and concepts from Tabu Search to drastically decrease the processing time of the algorithm. The paper outlines the concepts behind MARS, the components involved and scheduling methodology used. In addition we also identify the time consuming bottlenecks in the performance of the system and how evolutionary algorithms help us soar past them.
Hesham El-RewiniEmail:
  相似文献   

13.
Abstract

We present a parallel algorithm for molecular dynamics involving short-range two- and three-body potentials and the pair-correlation function, g(r). The method is based on a spatial decomposition of the simulation box that takes advantage of a linked-cell list, and allows a load balanced partition of the computations of both the forces and g(r) over the processors. The tests of the program is conducted by evaluating the efficiency for both the thermalization phase and the production phase of the simulation. This method is successfully applied to the calculation of the direct correlation function of fluid krypton at small scattering angle along the T = 297 K supercritical isotherm.  相似文献   

14.
Failure-aware workflow scheduling in cluster environments   总被引:1,自引:0,他引:1  
The goal of workflow application scheduling is to achieve minimal makespan for each workflow. Scheduling workflow applications in high performance cluster environments is an NP-Complete problem, and becomes more complicated when potential resource failures are considered. While more research on failure prediction has been witnessed in recent years to improve system availability and reliability, very few of them attack the problem in the context of workflow application scheduling. In this paper, we study how a workflow scheduler benefits from failure prediction and propose FLAW, a failure-aware workflow scheduling algorithm. We propose two important definitions on accuracy, Application Oblivious Accuracy (AOA) and Application Aware Accuracy (AAA), from the perspectives of system and scheduling respectively, as we observe that the prediction accuracy defined conventionally imposes different performance implications on different applications and fails to measure how that improves scheduling effectiveness. The comprehensive evaluation results using real failure traces show that FLAW performs well with practically achievable prediction accuracy by reducing the average makespan, the loss time and the number of job rescheduling.  相似文献   

15.
In this study, we address the meta-task scheduling problem in heterogeneous computing (HC) systems, which is to find a task assignment that minimizes the schedule length of a meta-task composed of several independent tasks with no data dependencies. The fact that the meta-task scheduling problem in HC systems is NP-hard has motivated the development of many heuristic scheduling algorithms. These heuristic algorithms, however, neglect the stochastic nature of task execution times in an attempt to minimize a deterministic objective function, which is the maximum of the expected values of machine loads. Contrary to existing heuristics, we account for this stochastic nature by modeling task execution times as random variables. We, then, formulate a stochastic scheduling problem where the objective is to minimize the expected value of the maximum of machine loads. We prove that this new objective is underestimated by the deterministic objective function and that an optimal task assignment obtained with respect to the deterministic objective function could be inefficient in a real computing platform. In order to solve the stochastic scheduling problem posed, we develop a genetic algorithm based scheduling heuristic. Our extensive simulation studies show that the proposed genetic algorithm can produce better task assignments as compared to existing heuristics. Specifically, we observe a performance improvement on the relative cost heuristic (M.-Y. Wu and W. Shu, A high-performance mapping algorithm for heterogeneous computing systems, in: Int. Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001) by up to 61%.  相似文献   

16.
Signal processing in the cerebral cortex is thought to involve a common multi-purpose algorithm embodied in a canonical cortical micro-circuit that is replicated many times over both within and across cortical regions. Operation of this algorithm produces widely distributed but coherent and relevant patterns of activity. The theory of Coherent Infomax provides a formal specification of the objectives of such an algorithm. It also formally derives specifications for both the short-term processing dynamics and for the learning rules whereby the connection strengths between units in the network can be adapted to the environment in which the system finds itself. A central assumption of the theory is that the local processors can combine reliable signal coding with flexible use of those codes because they have two classes of synaptic connection: driving connections which specify the information content of the neural signals, and contextual connections which modulate that signal processing. Here, we make the biological relevance of this theory more explicit by putting more emphasis upon the contextual guidance of ongoing processing, by showing that Coherent Infomax is consistent with a particular Bayesian interpretation for the contextual guidance of learning and processing, by explicitly specifying rules for on-line learning, and by suggesting approximations by which the learning rules can be made computationally feasible within systems composed of very many local processors.  相似文献   

17.
We present a novel algorithm for the efficient generation of high-quality space-filling molecular graphics that is particularly appropriate for the creation of the large number of images needed in the animation of molecular dynamics. Each atom of the molecule is represented by a sphere of an appropriate radius, and the image of the sphere is constructed pixel-by-pixel using a generalization of the lighting model proposed by Porter (Comp. Graphics 1978, 12, 282). The edges of the spheres are antialiased, and intersections between spheres are handled through a simple blending algorithm that provides very smooth edges. We have implemented this algorithm on a multiprocessor computer using a procedure that dynamically repartitions the effort among the processors based on the CPU time used by each processor to create the previous image. This dynamic reallocation among processors automatically maximizes efficiency in the face of both the changing nature of the image from frame to frame and the shifting demands of the other programs running simultaneously on the same processors. We present data showing the efficiency of this multiprocessing algorithm as the number of processors is increased. The combination of the graphics and multiprocessor algorithms allows the fast generation of many high-quality images.  相似文献   

18.
Divisible Load Scheduling in Systems with Limited Memory   总被引:3,自引:0,他引:3  
In this work we consider scheduling divisible loads on a distributed computing system with limited available memory. The communication delays and heterogeneity of the system are taken into account. The problem studied consists in finding such a distribution of the load that the communication and computation time is the shortest possible. A new robust method is proposed to solve the problem of finding optimal distribution of computations on star network, and networks in which binomial trees can be embedded (meshes, hypercubes, multistage interconnections). We demonstrate that in many cases memory limitations do not restrict efficiency of parallel processing as much as computation and communication speeds.  相似文献   

19.
User satisfaction and scheduling on grids makes predictability of response times and quality-of-service highly desirable. However, existing approaches for response-time prediction still show significant prediction errors, mostly due to problems in dynamic arrival of jobs with potentially higher priority and hard-to-anticipate packing and backfilling effects. The same problems imply that quality-of-service cannot be solved with standard approaches from communication systems. Thus, this paper presents a scheduling approach which provides a more suitable framework for service guarantees and predictability. The approach is based on coarse-grain preemption, combined with an innovative separation of job classes. Resource shares can be determined as necessary to meet target service levels. A further extension permits limited dynamic resource allocation to adapt to variations in machine load and job mixes. The feasibility of service control is demonstrated with various workloads.  相似文献   

20.
In heterogeneous environments, dynamic scheduling algorithms are a powerful tool towards performance improvement of scientific applications via load balancing. However, these scheduling techniques employ heuristics that require prior knowledge about workload via profiling resulting in higher overhead as problem sizes and number of processors increase. In addition, load imbalance may appear only at run-time, making profiling work tedious and sometimes even obsolete. Recently, the integration of dynamic loop scheduling algorithms into a number of scientific applications has been proven effective. This paper reports on performance improvements obtained by integrating the Adaptive Weighted Factoring, a recently proposed dynamic loop scheduling technique that addresses these concerns, into two scientific applications: computational field simulation on unstructured grids, and N-Body simulations. Reported experimental results confirm the benefits of using this methodology, and emphasize its high potential for future integration into other scientific applications that exhibit substantial performance degradation due to load imbalance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号