期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A Multistage Load Distribution Strategy for Three-Dimensional Meshes

Włodzimierz Głazek 《Cluster computing》2003,6(1):31-39

We study the problem of scheduling a divisible load in a three-dimensional mesh of processors. The objective is to find partition of a load into shares and distribution of load shares among processors which minimize load processing time subject to communication delays involved in sending load from one processor to another. We propose a new scheduling algorithm which distributes load in a sequence of stages across the network, each stage brings load to a set of processors located at the same distance from the load source. A key feature of our solution is that sets of processors receive load in the order of decreasing processing capacities. We call this scheduling strategy Largest Layer First. A theorem about the processing time attained by the algorithm is stated. Performance of the algorithm is compared to earlier results. 相似文献

2.

PPDD: scheduling multi-site divisible loads in single-level tree networks

Xiaolin Li Bharadwaj Veeravalli 《Cluster computing》2010,13(1):31-46

This paper investigates scheduling strategies for divisible jobs/loads originating from multiple sites in hierarchical networks with heterogeneous processors and communication channels. In contrast, most previous work in the divisible load scheduling theory (DLT) literature mainly addressed scheduling problems with loads originating from a single processor. This is one of the first works that address scheduling multiple loads from multiple sites in the DLT paradigm. In addition, scheduling multi-site jobs is common in Grids and other general distributed systems for resource sharing and coordination. An efficient static scheduling algorithm PPDD (Processor-set Partitioning and Data Distribution Algorithm) is proposed to near-optimally distribute multiple loads among all processors so that the overall processing time of all jobs is minimized. The PPDD algorithm is applied to two cases: when processors are equipped with front-ends and when they are not equipped with front-ends. The application of the algorithm to homogeneous systems is also studied. Further, several important properties exhibited by the PPDD algorithm are proven through lemmas. To implement the PPDD algorithm, we propose a communication strategy. In addition, we compare the performance of the PPDD algorithm with a Round-robin Scheduling Algorithm (RSA), which is most commonly used. Extensive case studies through numerical analysis have been conducted to verify the theoretical findings. 相似文献

3.

An Equivalent Tree Network Methodology for Efficient Utilization of Front-Ends in Linear Network

V. Mani 《Cluster computing》2003,6(1):57-62

In this paper, divisible load scheduling in a linear network of processors is presented. The cases of processing load originating at the boundary and also at the interior of the network are considered. An equivalent tree network for the given linear network is derived. Using this equivalent tree network, we prove all the results obtained in the earlier studies. The equivalent tree network methodology presented in this paper, is more general than the earlier results, because in this approach, we can solve the scheduling problem even in an hetrogeneous linear network. The earlier studies considered only homogeneous linear network. 相似文献

4.

An effective reliability-driven technique of allocating tasks on heterogeneous cluster systems

Xiaoyong Tang Kenli Li Guiping Liao 《Cluster computing》2014,17(4):1413-1425

In large-scale heterogeneous cluster computing systems, processor and network failures are inevitable and can have an adverse effect on applications executing on such systems. One way of taking failures into account is to employ a reliable scheduling algorithm. However, most existing scheduling algorithms for precedence constrained tasks in heterogeneous systems only consider scheduling length, and not efficiently satisfy the reliability requirements of task. In recognition of this problem, we build an application reliability analysis model based on Weibull distribution, which can dynamically measure the reliability of task executing on heterogeneous cluster with arbitrary networks architectures. Then, we propose a reliability-driven earliest finish time with duplication scheduling algorithm (REFTD) which incorporates task reliability overhead into scheduling. Furthermore, to improve system reliability, it duplicates task as if task hazard rate is more than threshold \(\theta \) . The comparison study, based on both randomly generated graphs and the graphs of some real applications, shows that our scheduling algorithm can shorten schedule length and improve system reliability significantly. 相似文献

5.

A task scheduling algorithm for arbitrarily-connected processors with awareness of link contention

Ali Fuat Alkaya Haluk Rahmi Topcuoglu 《Cluster computing》2006,9(4):417-431

In this paper, we present a new task scheduling algorithm, called Contention-Aware Scheduling (CAS) algorithm, with the objective of delivering good quality of schedules in low running-time by considering contention on links of arbitrarily-connected, heterogeneous processors. The CAS algorithm schedules tasks on processors and messages on links by considering the earliest finish time attribute with the virtual cut-through (VCT) or the store-and-forward (SAF) switching. There are three types of CAS algorithm presented in this paper, which differ in ordering the messages from immediate predecessor tasks. As part of the experimental study, the performance of the CAS algorithm is compared with two well-known APN (arbitrary processor network) scheduling algorithms. Experiments on the results of the synthetic benchmarks and the task graphs of the well-known problems clearly show that our CAS algorithm outperforms the related work with respect to performance (given in normalized schedule length) and cost (given in running time) to generate output schedules. Ali Fuat Alkaya received the B.Sc. degree in mathematics from Koc University, Istanbul, Turkey in 1998, and the M.Sc. degree in computer engineering from Marmara University, Istanbul, Turkey in 2002. He is currently a Ph.D. student in engineering management department at the same university. His research interests include task scheduling and analysis of algorithms. Haluk Rahmi Topcuoglu received the B.Sc. and M.Sc. degrees in computer engineering from Bogazici University, Istanbul, Turkey, in 1991 and 1993, respectively. He received the Ph.D. degree in computer science from Syracuse University in 1999. He has been on the faculty at Marmara University, Istanbul, Turkey since Fall 1999, where he is currently an Associate Professor in computer engineering department. His main research interests are task scheduling and mapping in parallel and distributed systems; parallel processing; evolutionary algorithms and their applicability for stationary and dynamic environments. He is a member of the ACM, the IEEE, and the IEEE Computer Society. e-mail: haluk@eng.marmara.edu.tr e-mail: falkaya@eng.marmara.edu.tr 相似文献

6.

Block‐cyclic redistribution over heterogeneous networks

Prashanth B. Bhat Viktor K. Prasanna C.S. Raghavendra 《Cluster computing》2000,3(1):25-34

Clusters of workstations and networked parallel computing systems are emerging as promising computational platforms for HPC applications. The processors in such systems are typically interconnected by a collection of heterogeneous networks such as Ethernet, ATM, and FDDI, among others. In this paper, we develop techniques to perform block-cyclic redistribution over P processors interconnected by such a collection of heterogeneous networks. We represent the communication scheduling problem using a timing diagram formalism. Here, each interprocessor communication event is represented by a rectangle whose height denotes the time to perform this event over the heterogeneous network. The communication scheduling problem is then one of appropriately positioning the rectangles so as to minimize the completion time of all the communication events. For the important case where the block size changes by a factor of K, we develop a heuristic algorithm whose completion time is at most twice the optimal. The running time of the heuristic is O(PK ²). Our heuristic algorithm is adaptive to variations in network performance, and derives schedules at run-time, based on current information about the available network bandwidth. Our experimental results show that our schedules always have communication times that are very close to optimal. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

7.

Energy aware DAG scheduling on heterogeneous systems 总被引：1，自引：0，他引：1

Sanjeev Baskiyar Rabab Abdel-Kader 《Cluster computing》2010,13(4):373-383

We address the problem of scheduling directed a-cyclic task graph (DAG) on a heterogeneous distributed processor system with the twin objectives of minimizing finish time and energy consumption. Previous scheduling heuristics have assigned DAGs to processors to minimize overall run-time of the application. But applications on embedded systems, such as high performance DSP in image processing, multimedia, and wireless security, need schedules which use low energy too. 相似文献

8.

Parallel Processing of Divisible Loads on Partitionable Static Interconnection Networks

Keqin Li 《Cluster computing》2003,6(1):47-55

We analyze the parallel time and speedup for processing a divisible load on (1) a linear array with a corner initial processor; (2) a linear array with an interior initial processor; (3) a mesh with a corner initial processor; (4) a mesh with an interior initial processor; (5) a b-ary complete tree with the root as the initial processor; (6) a pyramid with the apex as the initial processor. Due to communication overhead and limited network connectivity, the speedup of parallel processing for a divisible load on static interconnection networks with constant node degrees is bounded from above by a quantity independent of network size. It is shown that for the above six cases, as the network size becomes large, the asymptotic speedup is approximately , 2 , ^3/4, 4^3/4, (b–1), and 3, respectively, where is the ratio of the time for computing a unit load to the time for communicating a unit load. We also investigate divisible load distribution on hypercubes. Our strategy takes advantage of the recursive structure of a hypercube. It is proven that linear speedup can be achieved as the communication cost becomes smaller and smaller. 相似文献

9.

A Novel Optimal Load Distribution Algorithm for Divisible Loads

Hyoung-Joong Kim 《Cluster computing》2003,6(1):41-46

A new model for divisible load problem is introduced. Its characteristics are analyzed. Optimal load distribution algorithms on the new model are presented for the tree-network and linear network. Applications that fit our model are briefly described. We show that our model outperforms the existing model such as Cheng–Robertazzi model. We show that the linear model is equivalent to a single-level tree network if the intermediate processors do not follow the store-and-forward communication model, but they follow the store-and-bypass model. This paper introduces the concept of store-and-bypass for divisible load theory. 相似文献

10.

Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors 总被引：1，自引：0，他引：1

Yu-Kwong Kwok Ishfaq Ahmad 《Cluster computing》2000,3(2):113-124

In this paper, we consider the problem of scheduling and mapping precedence-constrained tasks to a network of heterogeneous processors. In such systems, processors are usually physically distributed, implying that the communication cost is considerably higher than in tightly coupled multiprocessors. Therefore, scheduling and mapping algorithms for such systems must schedule the tasks as well as the communication traffic by treating both the processors and communication links as equally important resources. We propose an algorithm that achieves these objectives and adapts its task scheduling and mapping decisions according to the given network topology. Just like tasks, messages are also scheduled and mapped to suitable links during the minimization of the finish times of tasks. Heterogeneity of processors is exploited by scheduling critical tasks to the fastest processors. Our experimental study has demonstrated that the proposed algorithm is efficient and robust, and yields consistent performance over a wide range of scheduling parameters. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

11.

Divisible Load Scheduling in Systems with Limited Memory 总被引：3，自引：0，他引：3

M. Drozdowski P. Wolniewicz 《Cluster computing》2003,6(1):19-29

In this work we consider scheduling divisible loads on a distributed computing system with limited available memory. The communication delays and heterogeneity of the system are taken into account. The problem studied consists in finding such a distribution of the load that the communication and computation time is the shortest possible. A new robust method is proposed to solve the problem of finding optimal distribution of computations on star network, and networks in which binomial trees can be embedded (meshes, hypercubes, multistage interconnections). We demonstrate that in many cases memory limitations do not restrict efficiency of parallel processing as much as computation and communication speeds. 相似文献

12.

Stochastic scheduling for multiclass applications with availability requirements in heterogeneous clusters

Tao Xie Xiao Qin 《Cluster computing》2008,11(1):33-43

High availability plays an important role in heterogeneous clusters, where processors operate at different speeds and are not continuously available for processing. Existing scheduling algorithms designed for heterogeneous clusters do not factor in availability. We address in this paper the stochastic scheduling problem for heterogeneous clusters with availability constraints. Each node in a heterogeneous cluster is modeled by its speed and availability, and different classes of tasks submitted to the cluster are characterized by their execution times and availability requirements. To incorporate availability and heterogeneity into stochastic scheduling, we introduce metrics to quantify availability and heterogeneity in the context of multiclass tasks. A stochastic scheduling algorithm SSAC (stochastic scheduling with availability constraints) is then proposed to improve availability of heterogeneous clusters while reducing average response time of tasks. Experimental results show that our algorithm achieves a good trade-off between availability and responsiveness.

Tao XieEmail:

相似文献

13.

Scheduling Divisible Loads with Processor Release Times and Finite Size Buffer Capacity Constraints in Bus Networks

Veeravalli Bharadwaj Gerassimos Barlas 《Cluster computing》2003,6(1):63-74

相似文献

14.

Optimizing dataflow applications on heterogeneous environments

George Teodoro Timothy D. R. Hartley Umit V. Catalyurek Renato Ferreira 《Cluster computing》2012,15(2):125-144

The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster. 相似文献

15.

Fully implicit parallel simulation of single neurons 总被引：1，自引：1，他引：0

Hines ML Markram H Schürmann F 《Journal of computational neuroscience》2008,25(3):439-448

When a multi-compartment neuron is divided into subtrees such that no subtree has more than two connection points to other subtrees, the subtrees can be on different processors and the entire system remains amenable to direct Gaussian elimination with only a modest increase in complexity. Accuracy is the same as with standard Gaussian elimination on a single processor. It is often feasible to divide a 3-D reconstructed neuron model onto a dozen or so processors and experience almost linear speedup. We have also used the method for purposes of load balance in network simulations when some cells are so large that their individual computation time is much longer than the average processor computation time or when there are many more processors than cells. The method is available in the standard distribution of the NEURON simulation program. 相似文献

16.

A Load Balancing Tool for Distributed Parallel Loops 总被引：1，自引：0，他引：1

Ricolindo?L.?Cari?o Email author Ioana?Banicescu 《Cluster computing》2005,8(4):313-321

Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications. 相似文献

17.

On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring 总被引：1，自引：0，他引：1

Ioana Banicescu Vijay Velusamy Johnny Devaprasad 《Cluster computing》2003,6(3):215-226

In heterogeneous environments, dynamic scheduling algorithms are a powerful tool towards performance improvement of scientific applications via load balancing. However, these scheduling techniques employ heuristics that require prior knowledge about workload via profiling resulting in higher overhead as problem sizes and number of processors increase. In addition, load imbalance may appear only at run-time, making profiling work tedious and sometimes even obsolete. Recently, the integration of dynamic loop scheduling algorithms into a number of scientific applications has been proven effective. This paper reports on performance improvements obtained by integrating the Adaptive Weighted Factoring, a recently proposed dynamic loop scheduling technique that addresses these concerns, into two scientific applications: computational field simulation on unstructured grids, and N-Body simulations. Reported experimental results confirm the benefits of using this methodology, and emphasize its high potential for future integration into other scientific applications that exhibit substantial performance degradation due to load imbalance. 相似文献

18.

The research of constructing dynamic cognition model based on brain network

《Saudi Journal of Biological Sciences》2017,24(3):548-555

Estimating the functional interactions and connections between brain regions to corresponding process in cognitive, behavioral and psychiatric domains is a central pursuit for understanding the human connectome. Few studies have examined the effects of dynamic evolution on cognitive processing and brain activation using brain network model in scalp electroencephalography (EEG) data. Aim of this study was to investigate the brain functional connectivity and construct dynamic programing model from EEG data and to evaluate a possible correlation between topological characteristics of the brain connectivity and cognitive evolution processing. Here, functional connectivity between brain regions is defined as the statistical dependence between EEG signals in different brain areas and is typically determined by calculating the relationship between regional time series using wavelet coherence. We present an accelerated dynamic programing algorithm to construct dynamic cognitive model that we found that spatially distributed regions coherence connection difference, the topologic characteristics with which they can transfer information, producing temporary network states. Our findings suggest that brain dynamics give rise to variations in complex network properties over time after variation audio stimulation, dynamic programing model gives the dynamic evolution processing at different time and frequency. In this paper, by applying a new construct approach to understand whole brain network dynamics, firstly, brain network is constructed by wavelet coherence, secondly, different time active brain regions are selected by network topological characteristics and minimum spanning tree. Finally, dynamic evolution model is constructed to understand cognitive process by dynamic programing algorithm, this model is applied to the auditory experiment, results showed that, quantitatively, more correlation was observed after variation audio stimulation, the EEG function connection dynamic evolution model on cognitive processing is feasible with wavelet coherence EEG recording. 相似文献

19.

HSGA: a hybrid heuristic algorithm for workflow scheduling in cloud systems

Arash Ghorbannia Delavar Yalda Aryan 《Cluster computing》2014,17(1):129-137

In heterogeneous distributed computing systems like cloud computing, the problem of mapping tasks to resources is a major issue which can have much impact on system performance. For some reasons such as heterogeneous and dynamic features and the dependencies among requests, task scheduling is known to be a NP-complete problem. In this paper, we proposed a hybrid heuristic method (HSGA) to find a suitable scheduling for workflow graph, based on genetic algorithm in order to obtain the response quickly moreover optimizes makespan, load balancing on resources and speedup ratio. At first, the HSGA algorithm makes tasks prioritization in complex graph considering their impact on others, based on graph topology. This technique is efficient to reduction of completion time of application. Then, it merges Best-Fit and Round Robin methods to make an optimal initial population to obtain a good solution quickly, and apply some suitable operations such as mutation to control and lead the algorithm to optimized solution. This algorithm evaluates the solutions by considering efficient parameters in cloud environment. Finally, the proposed algorithm presents the better results with increasing number of tasks in application graph in contrast with other studied algorithms. 相似文献

20.

Managing performance and power consumption tradeoff for multiple heterogeneous servers in cloud computing

Yuan Tian Chuang Lin Keqin Li 《Cluster computing》2014,17(3):943-955

There are typically multiple heterogeneous servers providing various services in cloud computing. High power consumption of these servers increases the cost of running a data center. Thus, there is a problem of reducing the power cost with tolerable performance degradation. In this paper, we optimize the performance and power consumption tradeoff for multiple heterogeneous servers. We consider the following problems: (1) optimal job scheduling with fixed service rates; (2) joint optimal service speed scaling and job scheduling. For problem (1), we present the Karush-Kuhn-Tucker (KKT) conditions and provide a closed-form solution. For problem (2), both continuous speed scaling and discrete speed scaling are considered. In discrete speed scaling, the feasible service rates are discrete and bounded. We formulate the problem as an MINLP problem and propose a distributed algorithm by online value iteration, which has lower complexity than a centralized algorithm. Our approach provides an analytical way to manage the tradeoff between performance and power consumption. The simulation results show the gain of using speed scaling, and also prove the effectiveness and efficiency of the proposed algorithms. 相似文献