共查询到20条相似文献,搜索用时 0 毫秒
1.
Divisible Load Scheduling in Systems with Limited Memory 总被引:3,自引:0,他引:3
In this work we consider scheduling divisible loads on a distributed computing system with limited available memory. The communication delays and heterogeneity of the system are taken into account. The problem studied consists in finding such a distribution of the load that the communication and computation time is the shortest possible. A new robust method is proposed to solve the problem of finding optimal distribution of computations on star network, and networks in which binomial trees can be embedded (meshes, hypercubes, multistage interconnections). We demonstrate that in many cases memory limitations do not restrict efficiency of parallel processing as much as computation and communication speeds. 相似文献
2.
Design and Performance Analysis of Divisible Load Scheduling Strategies on Arbitrary Graphs 总被引:1,自引:0,他引:1
In this paper, we consider the problem of scheduling divisible loads on arbitrary graphs with the objective to minimize the total processing time of the entire load submitted for processing. We consider an arbitrary graph network comprising heterogeneous processors interconnected via heterogeneous links in an arbitrary fashion. The divisible load is assumed to originate at any processor in the network. We transform the problem into a multi-level unbalanced tree network and schedule the divisible load. We design systematic procedures to identify and eliminate any redundant processor–link pairs (those pairs whose consideration in scheduling will penalize the performance) and derive an optimal tree structure to obtain an optimal processing time, for a fixed sequence of load distribution. Since the algorithm thrives to determine an equivalent number of processors (resources) that can be used for processing the entire load, we refer to this approach as resource-aware optimal load distribution (RAOLD) algorithm. We extend our study by applying the optimal sequencing theorem proposed for single-level tree networks in the literature for multi-level tree for obtaining an optimal solution. We evaluate the performance for a wide range of arbitrary graphs with varying connectivity probabilities and processor densities. We also study the effect of network scalability and connectivity. We demonstrate the time performance when the point of load origination differs in the network and highlight certain key features that may be useful for algorithm and/or network system designers. We evaluate the time performance with rigorous simulation experiments under different system parameters for the ease of a complete understanding. 相似文献
3.
Hyoung-Joong Kim 《Cluster computing》2003,6(1):41-46
A new model for divisible load problem is introduced. Its characteristics are analyzed. Optimal load distribution algorithms on the new model are presented for the tree-network and linear network. Applications that fit our model are briefly described. We show that our model outperforms the existing model such as Cheng–Robertazzi model. We show that the linear model is equivalent to a single-level tree network if the intermediate processors do not follow the store-and-forward communication model, but they follow the store-and-bypass model. This paper introduces the concept of store-and-bypass for divisible load theory. 相似文献
4.
Włodzimierz Głazek 《Cluster computing》2003,6(1):31-39
We study the problem of scheduling a divisible load in a three-dimensional mesh of processors. The objective is to find partition of a load into shares and distribution of load shares among processors which minimize load processing time subject to communication delays involved in sending load from one processor to another. We propose a new scheduling algorithm which distributes load in a sequence of stages across the network, each stage brings load to a set of processors located at the same distance from the load source. A key feature of our solution is that sets of processors receive load in the order of decreasing processing capacities. We call this scheduling strategy Largest Layer First. A theorem about the processing time attained by the algorithm is stated. Performance of the algorithm is compared to earlier results. 相似文献
5.
A Load Balancing Tool for Distributed Parallel Loops 总被引:1,自引:0,他引:1
Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications. 相似文献
6.
7.
The Rutgers Computational Grid (RCG) project is aimed at providing high throughput performance to Rutgers university faculty and students. The RCG employs dual processor PCs, with Pentium II and III processors, as computational nodes, running the Linux RedHat operating system. The Load Sharing Facility (LSF) scheduling system from Platform Computing is used for job control and monitoring. The nodes are grouped into subclusters physically located in several departments and controlled by a single master node through LSF. The hardware and software used in RCG are described. Utilization and performance issues, including parallel performance, are discussed based on the experience of the first two years of RCG operation. 相似文献
8.
Video-on-Demand (VoD) systems are expected to support a variety of multimedia services to the users, such as tele-education, teleconference, remote working, videotelephony, high-definition TV, etc. These applications necessitate abundant bandwidth and buffer space as well as appropriate software and hardware for the efficient manipulation of the networks resources.
In this work we investigate a promising scheduling algorithm referred to as the Deadline Credit (DC) algorithm, which exploits the available bandwidth and buffer space to serve a diverse class of prerecorded video applications. We provide simulation results when the DC algorithm is applied to a hierarchical architecture distributed VoD network, which fits the existing tree topology used in todays cable TV systems. The issues investigated via the simulations are: the system utilization, the influence of the buffer space on the delivered Quality of Service, and the fairness of the scheduling mechanism. We examine cases with homogenous as well as diverse video streams, and extend our system to support interactive VCR-like functions. We also contribute a modification to the DC algorithm so that in cases when the video applications have different displaying periods, the video streams obtain a fair share of the networks resources. Finally, we validate our results by simulating actual videos encoded in MPEG-4 and H.263 formats. 相似文献
9.
Ouajdi Korbaa Hervé Camus Jean-Claude Gentina 《Flexible Services and Manufacturing Journal》2002,14(2):173-187
Flexible manufacturing system control is an NP-hard problem. A cyclic approach has been demonstrated to be adequate for an infinite scheduling problem because of maximal throughput reachability. However, it is not the only optimization criterion in general. In this article we consider the minimization of the work in process (WIP) as an economical and productivity factor. We propose a new cyclic scheduling algorithm giving the maximal throughput (a hard constraint) while minimizing WIP. This algorithm is based on progressive operations placing. A controlled beam search approach has been developed to determine at each step the schedule of the next operations. After presenting the main principles of the algorithm, we compare our approach to several most known cyclic scheduling algorithms using a significant existing example from the literature. 相似文献
10.
While the MPP is still the most common architecture in supercomputer centers today, a simpler and cheaper machine configuration is appearing at many supercomputing sites. This alternative setup may be described simply as a collection of multiprocessors or a distributed server system. This collection of multiprocessors is fed by a single common stream of jobs, where each job is dispatched to exactly one of the multiprocessor machines for processing.The biggest question which arises in such distributed server systems is what is a good rule for assigning jobs to host machines: i.e. what is a good task assignment policy. Many task assignment policies have been proposed, but not systematically evaluated under supercomputing workloads.In this paper we start by comparing existing task assignment policies using a trace-driven simulation under supercomputing workloads. We validate our experiments by providing analytical proofs of the performance of each of these policies. These proofs also help provide much intuition. We find that while the performance of supercomputing servers varies widely with the task assignment policy, none of the above task assignment policies perform as well as we would like.We observe that all policies proposed thus far aim to balance load among the hosts. We propose a policy which purposely unbalances load among the hosts, yet, counter-to-intuition, is also fair in that it achieves the same expected slowdown for all jobs – thus no jobs are biased against. We evaluate this policy again using both trace-driven simulation and analysis. We find that the performance of the load unbalancing policy is significantly better than the best of those policies which balance load. 相似文献
11.
Akira Nomoto Yasuo Watanabe Wataru Kaneko Shugo Nakamura Kentaro Shimizu 《Cluster computing》2004,7(1):65-72
This paper describes the design and implementation of a parallel programming environment called Distributed Shared Array (DSA), which provides a shared global array abstract across different machines connected by a network. In DSA, users can define and use global arrays that can be accessed uniformly from any machines in the network. Explicit management of array area allocation, replication, and migration is achieved by explicit calls for array manipulation: defining array regions, reading and writing array regions, synchronization, and control of replication and migration. The DSA is integrated with Grid (Globus) services. This paper also describes the use of our model for gene cluster analysis, multiple alignment and molecular dynamics simulation. In these applications, global arrays are used for storing the distance matrix, alignment matrix and atom coordinates, respectively. Large array areas, which cannot be stored in the memory of individual machines, are made available by the DSA. Scalable performance of DSA was obtained compared to that of conventional parallel programs written in MPI. 相似文献
12.
13.
14.
15.
16.
V. Mani 《Cluster computing》2003,6(1):57-62
In this paper, divisible load scheduling in a linear network of processors is presented. The cases of processing load originating at the boundary and also at the interior of the network are considered. An equivalent tree network for the given linear network is derived. Using this equivalent tree network, we prove all the results obtained in the earlier studies. The equivalent tree network methodology presented in this paper, is more general than the earlier results, because in this approach, we can solve the scheduling problem even in an hetrogeneous linear network. The earlier studies considered only homogeneous linear network. 相似文献
17.
Fournier's gangrene is a necrotizing infection of the scrotum or perineum that requires aggressive surgical debridement. Radical debridement of perineal necrotizing fasciitis can leave extensive tissue defects that are difficult to close and often require multiple surgical interventions. Vacuum-assisted closure (VAC) devices have been shown to assist in a more rapid closure of these wounds, but placement of such devices in the perineum can pose significant challenges. We have had success with use of VAC devices and report our techniques for their placement. 相似文献
18.
Focal therapy has been proposed in recent years as a means of bridging the gap between radical prostatectomy and active surveillance for treatment of prostate cancer. The rationale for focal therapy comes from its success in treating other malignancies. One of the challenges in applying such an approach to the treatment of prostate cancer has been the multifocal nature of the disease. This review addresses the selection of potentially ideal candidates for focal therapy and discusses which modalities are currently being used and proposed for focal therapy. Setting and meeting guidelines for oncologic efficacy is a challenge we must embrace to safely deliver this potentially revolutionary approach to treating men with prostate cancer.Key words: Focal therapy, Photodynamic therapy, Prostatic neoplasms, Prostate-specific antigen, Prostatectomy, Ultrasound, high-intensity focused, transrectal, CryosurgeryWith the advent of prostate-specific antigen (PSA) screening there has been a stage migration, with radical prostatectomy (RP) being performed with increasing frequency in men with low-risk disease.1 Whole gland treatment of prostate cancer carries a significant risk of incontinence and sexual dysfunction. Even in the most experienced centers, the rate of potency following RP is approximately 60%.2–4 Stage migration has led many to recommend active surveillance (AS) as a means to decrease the number of men who may be overtreated; however, AS has been slow to gain acceptance in the United States.An analysis of over 5300 men from the Cancer of the Prostate Strategic Urologic Research Endeavor (CaPSURE) National Prostate Cancer Registry5 showed that only 7% of men with clinically localized prostate cancer chose AS as an initial option. Aside from the anxiety that stems from not treating a diagnosed cancer, the greater difficulty with AS lies in selection of candidates and appropriate parameters for surveillance, allowing prompt intervention without compromising cure rates.Focal therapy has been proposed in recent years as a means of bridging the gap between whole gland treatment and AS. Many believe that for patients with low-risk disease, focal therapy is the ideal option for maximizing quality of life by avoiding the effects of whole gland radiation or surgery while alleviating the anxiety and uncertainty of AS. The definition of focal therapy itself is not well established and includes lesion-targeted therapy (LAT), hemiablative therapy (HAT), or subtotal gland therapy (STAT), sparing at least 1 neurovascular bundle.6The rationale for focal therapy comes from its success in treating other malignancies. In breast cancer treatment, for example, radical mastectomy has been replaced in many instances by local excision and Mohs surgery has led to less radical surgery for the treatment of melanoma.7 In our own field, the push for nephron-sparing surgery has led to the favoring of partial nephrectomy in tumors less than 7 cm, with oncologic outcomes similar to those of radical nephrectomy.8The challenge in applying such an approach to the treatment of prostate cancer has been the multifocal nature of prostate cancer and the fact that most cancers are detected without identifying a lesion on palpation or imaging studies.9,10In this review, we revisit the current status of focal therapy in the treatment of prostate cancer. We discuss whether there are ideal candidates for focal therapy; we then discuss how these candidates should be selected. We review which modalities are currently being used and proposed for focal therapy. Finally, we discuss potential definitions of successful treatment. As this article shows, there are still many aspects of focal therapy that are yet to be defined, that warrant a great need for further research. 相似文献
19.
Rajagopal Subramaniyan Pirabhu Raman Alan D. George Matthew Radlinski 《Cluster computing》2006,9(1):101-120
Gossip protocols have proven to be effective means by which failures can be detected in large, distributed systems in an asynchronous
manner without the limitations associated with reliable multicasting for group communications. In this paper, we discuss the
development and features of a Gossip-Enabled Monitoring Service (GEMS), a highly responsive and scalable resource monitoring
service, to monitor health and performance information in heterogeneous distributed systems. GEMS has many novel and essential
features such as detection of network partitions and dynamic insertion of new nodes into the service. Easily extensible, GEMS
also incorporates facilities for distributing arbitrary system and application-specific data. We present experiments and analytical
projections demonstrating scalability, fast response times and low resource utilization requirements, making GEMS a potent
solution for resource monitoring in distributed computing. 相似文献
20.
Distributed redundant array of inexpensive disks (RAID) is often embedded in a cluster architecture. In a centralized RAID subsystem, the false sharing problem does not exist, because the disk array allows only mutually exclusive access by one user at a time. However, the problem does exist in a distributed RAID architecture, because multiple accesses may occur simultaneously in a distributed environment. This problem will seriously limit the effectiveness of collective I/O operations in network-based, cluster computing. Traditional accesses to disks in a RAID are done at block level. The block granularity is large, say 32 KB, often resulting in false sharing among fragments in the block. The false sharing problem becomes worse when the block size or the stripe unit becomes too large. To solve this problem, we propose an adaptive sector grouping approach to accessing a distributed RAID. Each sector has a fine grain of 512 B. Multiple sectors are grouped together to match with the data block size. The grouped sector has a variable size that can be adaptively adjusted by software. Benchmark experiments reveal the positive effects of this adaptive access scheme on the performance of a RAID. Our scheme can reduce the collective I/O access time without increasing the buffer size. Both theoretical analysis and experimental results demonstrate the performance gain in using grouped sectors for fast access of distributed RAID. 相似文献