期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Dynamic data replacement and adaptive scheduling policies in spark

Li Chunlin Cai Qianqian Luo Youlong 《Cluster computing》2022,25(2):1421-1439

Improper data replacement and inappropriate selection of job scheduling policy are important reasons for the degradation of Spark system operation speed, which directly causes the performance degradation of Spark parallel computing. In this paper, we analyze the existing caching mechanism of Spark and find that there is still more room for optimization of the existing caching policy. For the task structure analysis, the key information of Spark tasks is taken out to obtain the data and memory usage during the task runtime, and based on this, an RDD weight calculation method is proposed, which integrates various factors affecting the RDD usage and establishes an RDD weight model. Based on this model, a minimum weight replacement algorithm based on RDD structure analyzing is proposed. The algorithm ensure that the relatively more valuable data in the data replacement process can be cached into memory. In addition, the default job scheduling algorithm of the Spark framework considers a single factor, which cannot form effective scheduling for jobs and causes a waste of cluster resources. In this paper, an adaptive job scheduling policy based on job classification is proposed to solve the above problem. The policy can classify job types and schedule resources more effectively for different types of jobs. The experimental results show that the proposed dynamic data replacement algorithm effectively improves Spark's memory utilization. The proposed job classification-based adaptive job scheduling algorithm effectively improves the system resource utilization and shortens the job completion time.

相似文献

2.

Towards effective science cloud provisioning for a large-scale high-throughput computing

Seoyoung Kim Jik-Soo Kim Soonwook Hwang Yoonhee Kim 《Cluster computing》2014,17(4):1157-1169

The science cloud paradigm has been actively developed and investigated, but still requires a suitable model for science cloud system in order to support increasing scientific computation needs with high performance. This paper presents an effective provisioning model of science cloud, particularly for large-scale high throughput computing applications. In this model, we utilize job traces where a statistical method is applied to pick the most influential features to improve application performance. With these features, a system determines where VM is deployed (allocation) and which instance type is proper (provisioning). An adaptive evaluation step which is subsequent to the job execution enables our model to adapt to dynamical computing environments. We show performance achievements by comparing the proposed model with other policies through experiments and expect noticeable improvements on performance as well as reduction of cost from resource consumption through our model. 相似文献

3.

Reliable budget aware workflow scheduling strategy on multi-cloud environment

Chakravarthi K. Kalyana Neelakantan P. Shyamala L. Vaidehi V. 《Cluster computing》2022,25(2):1189-1205

Cluster Computing - The resource provisioning and workflow execution in a multi-cloud environment using a pay-as-you-use framework have recently gained the attention of the cloud computing research... 相似文献

4.

Energy and resource efficient workflow scheduling in a virtualized cloud environment

Garg Neha Singh Damanpreet Goraya Major Singh 《Cluster computing》2021,24(2):767-797

High energy consumption (EC) is one of the leading and interesting issue in the cloud environment. The optimization of EC is generally related to scheduling problem. Optimum scheduling strategy is used to select the resources or tasks in such a way that system performance is not violated while minimizing EC and maximizing resource utilization (RU). This paper presents a task scheduling model for scheduling the tasks on virtual machines (VMs). The objective of the proposed model is to minimize EC, maximize RU, and minimize workflow makespan while preserving the task’s deadline and dependency constraints. An energy and resource efficient workflow scheduling algorithm (ERES) is proposed to schedule the workflow tasks to the VMs and dynamically deploy/un-deploy the VMs based on the workflow task’s requirements. An energy model is presented to compute the EC of the servers. Double threshold policy is used to perceive the server’ status i.e. overloaded/underloaded or normal. To balance the workload on the overloaded/underloaded servers, live VM migration strategy is used. To check the effectiveness of the proposed algorithm, exhaustive simulation experiments are conducted. The proposed algorithm is compared with power efficient scheduling and VM consolidation (PESVMC) algorithm on the accounts of RU, energy efficiency and task makespan. Further, the results are also verified in the real cloud environment. The results demonstrate the effectiveness of the proposed ERES algorithm.

相似文献

5.

Multi-objective workflow scheduling in cloud computing: trade-off between makespan and cost

Belgacem Ali Beghdad-Bey Kadda 《Cluster computing》2022,25(1):579-595

Cluster Computing - Recently, modern businesses have started to transform into cloud computing platforms to deploy their workflow applications. However, scheduling workflow under resource... 相似文献

6.

Towards operational cost minimization for cloud bursting with deadline constraints in hybrid clouds

Chunlin Li Jianhang Tang Youlong Luo 《Cluster computing》2018,21(4):2013-2029

In hybrid clouds, there is a technique named cloud bursting which can allow companies to expand their capacity to meet the demands of peak workloads in a low-priced manner. In this work, a cost-aware job scheduling approach based on queueing theory in hybrid clouds is proposed. The job scheduling problem in the private cloud is modeled as a queueing model. A genetic algorithm is applied to achieve optimal queues for jobs to improve the utilization rate of the private cloud. Then, the task execution time is predicted by back propagation neural network. The max–min strategy is applied to schedule tasks according to the prediction results in hybrid clouds. Experiments show that our cost-aware job scheduling algorithm can reduce the average job waiting time and average job response time in the private cloud. In additional, our proposed job scheduling algorithm can improve the system throughput of the private cloud. It also can reduce the average task waiting time, average task response time and total costs in hybrid clouds. 相似文献

7.

Hybrid Symbiotic Organisms Search Optimization Algorithm for Scheduling of Tasks on Cloud Computing Environment

Mohammed Abdullahi Md Asri Ngadi 《PloS one》2016,11(6)

Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan. 相似文献

8.

Cost-aware job scheduling for cloud instances using deep reinforcement learning

Cheng Feng Huang Yifeng Tanpure Bhavana Sawalani Pawan Cheng Long Liu Cong 《Cluster computing》2022,25(1):619-631

As the services provided by cloud vendors are providing better performance, achieving auto-scaling, load-balancing, and optimized performance along with low infrastructure maintenance, more and more companies migrate their services to the cloud. Since the cloud workload is dynamic and complex, scheduling the jobs submitted by users in an effective way is proving to be a challenging task. Although a lot of advanced job scheduling approaches have been proposed in the past years, almost all of them are designed to handle batch jobs rather than real-time workloads, such as that user requests are submitted at any time with any amount of numbers. In this work, we have proposed a Deep Reinforcement Learning (DRL) based job scheduler that dispatches the jobs in real time to tackle this problem. Specifically, we focus on scheduling user requests in such a way as to provide the quality of service (QoS) to the end-user along with a significant reduction of the cost spent on the execution of jobs on the virtual instances. We have implemented our method by Deep Q-learning Network (DQN) model, and our experimental results demonstrate that our approach can significantly outperform the commonly used real-time scheduling algorithms.

相似文献

9.

Failure-aware workflow scheduling in cluster environments 总被引：1，自引：0，他引：1

Zhifeng Yu Chenjia Wang Weisong Shi 《Cluster computing》2010,13(4):421-434

The goal of workflow application scheduling is to achieve minimal makespan for each workflow. Scheduling workflow applications in high performance cluster environments is an NP-Complete problem, and becomes more complicated when potential resource failures are considered. While more research on failure prediction has been witnessed in recent years to improve system availability and reliability, very few of them attack the problem in the context of workflow application scheduling. In this paper, we study how a workflow scheduler benefits from failure prediction and propose FLAW, a failure-aware workflow scheduling algorithm. We propose two important definitions on accuracy, Application Oblivious Accuracy (AOA) and Application Aware Accuracy (AAA), from the perspectives of system and scheduling respectively, as we observe that the prediction accuracy defined conventionally imposes different performance implications on different applications and fails to measure how that improves scheduling effectiveness. The comprehensive evaluation results using real failure traces show that FLAW performs well with practically achievable prediction accuracy by reducing the average makespan, the loss time and the number of job rescheduling. 相似文献

10.

A MapReduce task scheduling algorithm for deadline constraints

Zhuo Tang Junqing Zhou Kenli Li Ruixuan Li 《Cluster computing》2013,16(4):651-662

The current works about MapReduce task scheduling with deadline constraints neither take the differences of Map and Reduce task, nor the cluster’s heterogeneity into account. This paper proposes an extensional MapReduce Task Scheduling algorithm for Deadline constraints in Hadoop platform: MTSD. It allows user specify a job’s deadline and tries to make the job be finished before the deadline. Through measuring the node’s computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node’s capacity level respectively. The experiments show that the node classification algorithm can improved data locality observably to compare with default scheduler and it also can improve other scheduler’s locality. Secondly, we calculate the task’s average completion time which is based on the node level. It improves the precision of task’s remaining time evaluation. Finally, MTSD provides a mechanism to decide which job’s task should be scheduled by calculating the Map and Reduce task slot requirements. 相似文献

11.

A Service Brokering and Recommendation Mechanism for Better Selecting Cloud Services

Zhipeng Gui Chaowei Yang Jizhe Xia Qunying Huang Kai Liu Zhenlong Li Manzhu Yu Min Sun Nanyin Zhou Baoxuan Jin 《PloS one》2014,9(8)

Cloud computing is becoming the new generation computing infrastructure, and many cloud vendors provide different types of cloud services. How to choose the best cloud services for specific applications is very challenging. Addressing this challenge requires balancing multiple factors, such as business demands, technologies, policies and preferences in addition to the computing requirements. This paper recommends a mechanism for selecting the best public cloud service at the levels of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). A systematic framework and associated workflow include cloud service filtration, solution generation, evaluation, and selection of public cloud services. Specifically, we propose the following: a hierarchical information model for integrating heterogeneous cloud information from different providers and a corresponding cloud information collecting mechanism; a cloud service classification model for categorizing and filtering cloud services and an application requirement schema for providing rules for creating application-specific configuration solutions; and a preference-aware solution evaluation mode for evaluating and recommending solutions according to the preferences of application providers. To test the proposed framework and methodologies, a cloud service advisory tool prototype was developed after which relevant experiments were conducted. The results show that the proposed system collects/updates/records the cloud information from multiple mainstream public cloud services in real-time, generates feasible cloud configuration solutions according to user specifications and acceptable cost predication, assesses solutions from multiple aspects (e.g., computing capability, potential cost and Service Level Agreement, SLA) and offers rational recommendations based on user preferences and practical cloud provisioning; and visually presents and compares solutions through an interactive web Graphical User Interface (GUI). 相似文献

12.

A comparative study on resource allocation and energy efficient job scheduling strategies in large-scale parallel computing systems

Aftab Ahmed Chandio Kashif Bilal Nikos Tziritas Zhibin Yu Qingshan Jiang Samee U. Khan Cheng-Zhong Xu 《Cluster computing》2014,17(4):1349-1367

In the large-scale parallel computing environment, resource allocation and energy efficient techniques are required to deliver the quality of services (QoS) and to reduce the operational cost of the system. Because the cost of the energy consumption in the environment is a dominant part of the owner’s and user’s budget. However, when considering energy efficiency, resource allocation strategies become more difficult, and QoS (i.e., queue time and response time) may violate. This paper therefore is a comparative study on job scheduling in large-scale parallel systems to: (a) minimize the queue time, response time, and energy consumption and (b) maximize the overall system utilization. We compare thirteen job scheduling policies to analyze their behavior. A set of job scheduling policies includes (a) priority-based, (b) first fit, (c) backfilling, and (d) window-based policies. All of the policies are extensively simulated and compared. For the simulation, a real data center workload comprised of 22385 jobs is used. Based on results of their performance, we incorporate energy efficiency in three policies i.e., (1) best result producer, (2) average result producer, and (3) worst result producer. We analyze the (a) queue time, (b) response time, (c) slowdown ratio, and (d) energy consumption to evaluate the policies. Moreover, we present a comprehensive workload characterization for optimizing system’s performance and for scheduler design. Major workload characteristics including (a) Narrow, (b) Wide, (c) Short, and (d) Long jobs are characterized for detailed analysis of the schedulers’ performance. This study highlights the strengths and weakness of various job scheduling polices and helps to choose an appropriate job scheduling policy in a given scenario. 相似文献

13.

Response Surface Modelling for Performance Analysis of Scientific Workflow Application in Cloud

Soma Prathibha Latha B. 《Cluster computing》2021,24(2):1123-1134

Scientific workflow applications are used by scientists to carry out research in various domains such as Physics, Chemistry, Astronomy etc. These applications require huge computational resources and currently cloud platform is used for efficiently running these applications. To improve the makespan and cost in workflow execution in cloud platform it requires to identify proper number of Virtual Machines (VM) and choose proper VM type. As cloud platform is dynamic, the available resources and the type of the resources are the two important factors on the cost and makespan of workflow execution. The primary objective of this work is to analyze the relationship among the cloud configuration parameters (Number of VM, Type of VM, VM configurations) for executing scientific workflow applications in cloud platform. In this work, to accurately analyze the influence of cloud platform resource configuration and scheduling polices a new predictive modelling using Box–Behnken design which is one of the modelling technique of Response Surface Methodology (RSM). It is used to build quadratic mathematical models that can be used to analyze relationships among input and output variables. Workflow cost and makespan models were built for real world scientific workflows using ANOVA and it was observed that the models fit well and can be useful in analyzing the performance of scientific workflow applications in cloud

相似文献

14.

A resource-sharing model based on a repeated game in fog computing

《Saudi Journal of Biological Sciences》2017,24(3):687-694

With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks. 相似文献

15.

Predictable quality of service atop degradable distributed systems

Lavanya Ramakrishnan Daniel A. Reed 《Cluster computing》2013,16(2):321-334

High performance and distributed computing systems such as peta-scale, grid and cloud infrastructure are increasingly used for running scientific models and business services. These systems experience large availability variations through hardware and software failures. Resource providers need to account for these variations while providing the required QoS at appropriate costs in dynamic resource and application environments. Although the performance and reliability of these systems have been studied separately, there has been little analysis of the lost Quality of Service (QoS) experienced with varying availability levels. In this paper, we present a resource performability model to estimate lost performance and corresponding cost considerations with varying availability levels. We use the resulting model in a multi-phase planning approach for scheduling a set of deadline-sensitive meteorological workflows atop grid and cloud resources to trade-off performance, reliability and cost. We use simulation results driven by failure data collected over the lifetime of high performance systems to demonstrate how the proposed scheme better accounts for resource availability. 相似文献

16.

CloudDMSS: robust Hadoop-based multimedia streaming service architecture for a cloud computing environment

Myoungjin Kim Seungho Han Yun Cui Hanku Lee Hogyeon Cho Sungdae Hwang 《Cluster computing》2014,17(3):605-628

The delivery of scalable, rich multimedia applications and services on the Internet requires sophisticated technologies for transcoding, distributing, and streaming content. Cloud computing provides an infrastructure for such technologies, but specific challenges still remain in the areas of task management, load balancing, and fault tolerance. To address these issues, we propose a cloud-based distributed multimedia streaming service (CloudDMSS), which is designed to run on all major cloud computing services. CloudDMSS is highly adapted to the structure and policies of Hadoop, thus it has additional capacities for transcoding, task distribution, load balancing, and content replication and distribution. To satisfy the design requirements of our service architecture, we propose four important algorithms: content replication, system recovery for Hadoop distributed multimedia streaming, management for cloud multimedia management, and streaming resource-based connection (SRC) for streaming job distribution. To evaluate the proposed system, we conducted several different performance tests on a local testbed: transcoding, streaming job distribution using SRC, streaming service deployment and robustness to data node and task failures. In addition, we performed three different tests in an actual cloud computing environment, Cloudit 2.0: transcoding, streaming job distribution using SRC, and streaming service deployment. 相似文献

17.

Resource scheduling methods in cloud and fog computing environments: a systematic literature review

Rahimikhanghah Aryan Tajkey Melika Rezazadeh Bahareh Rahmani Amir Masoud 《Cluster computing》2022,25(2):911-945

In recent years, cloud computing can be considered an emerging technology that can share resources with users. Because cloud computing is on-demand, efficient use of resources such as memory, processors, bandwidth, etc., is a big challenge. Despite the advantages of cloud computing, sometimes it is not a proper choice due to its delay in responding appropriately to existing requests, which led to the need for another technology called fog computing. Fog computing reduces traffic and time lags by expanding cloud services to the network and closer to users. It can schedule resources with higher efficiency and utilize them to impact the user's experience dramatically. This paper aims to survey some studies that have been done in the field of scheduling in fog/cloud computing environments. The focus of this survey is on published studies between 2015 and 2021 in journals or conferences. We selected 71 studies in a systematic literature review (SLR) from four major scientific databases based on their relation to our paper. We classified these studies into five categories based on their traced parameters and their focus area. This classification comprises 1—performance 2—energy efficiency, 3—resource utilization, 4—performance and energy efficiency, and 5—performance and resource utilization simultaneously. 42.3% of the studies focused on performance, 9.9% on energy efficiency, 7.0% on resource utilization, 21.1% on both performance and energy efficiency, and 19.7% on both performance and resource utilization. Finally, we present challenges and open issues in the resource scheduling methods in fog/cloud computing environments.

相似文献

18.

Task scheduling for MapReduce in heterogeneous networks

Jia Wang Xiaoping Li 《Cluster computing》2016,19(1):197-210

In this paper, the task scheduling in MapReduce is considered for geo-distributed data centers on heterogeneous networks. Adaptive heartbeats, job deadlines and data locality are concerned. Job deadlines are divided according to the maximum data volume of tasks. With the considered constraints, the task scheduling is formulated as an assignment problem in each heartbeat, in which adaptive heartbeats are calculated by the processing times of tasks, jobs are sequencing in terms of the divided deadlines and tasks are scheduled by the Hungarian algorithm. Taking into account both the data transfer and processing times, the most suitable data center for all mapped jobs are determined in the reduce phase. Experimental results show that the proposed algorithms outperform the current existing ones. The proposals with sorted task-sequences have better performance than those with random task-sequences. 相似文献

19.

Fault tolerance and QoS scheduling using CAN in mobile social cloud computing

SookKyong Choi KwangSik Chung Heonchang Yu 《Cluster computing》2014,17(3):911-926

The performance of mobile devices including smart phones and laptops is steadily rising as prices plummet sharply. So, mobile devices are changing from being a mere interface for requesting services to becoming computing resources for providing and sharing services due to immeasurably improved performance. With the increasing number of mobile device users, the utilization rate of SNS (Social Networking Service) is also soaring. Applying SNS to the existing computing environment enables members of social network to share computing services without further authentication. To use mobile device as a computing resource, temporary network disconnection caused by user mobility and various HW/SW faults causing service disruption should be considered. Also these issues must be resolved to support mobile users and to provide user requirements for services. Accordingly, we propose fault tolerance and QoS (Quality of Services) scheduling using CAN (Content Addressable Network) in Mobile Social Cloud Computing (MSCC). MSCC is a computing environment that integrates social network-based cloud computing and mobile devices. In the computing environment, a mobile user can, through mobile devices, become a member of a social network through real world relationships. Essentially, members of a social network share cloud service or data with other members without further authentication by using their mobile device. We use CAN as the underlying MSCC to logically manage the locations of mobile devices. Fault tolerance and QoS scheduling consists of four sub-scheduling algorithms: malicious-user filtering, cloud service delivery, QoS provisioning, and replication and load-balancing. Under the proposed scheduling, a mobile device is used as a resource for providing cloud services, faults caused from user mobility or other reasons are tolerated and user requirements for QoS are considered. We simulate scheduling both with and without CAN. The simulation results show that our proposed scheduling algorithm enhances cloud service execution time, finish time and reliability and reduces the cloud service error rate. 相似文献

20.

Operational cost-aware resource provisioning for continuous write applications in cloud-of-clouds

Zeng Zeng Tram Truong-Huu Bharadwaj Veeravalli Chen-Khong Tham 《Cluster computing》2016,19(2):601-614

The emergence of cloud computing has made it become an attractive solution for large-scale data processing and storage applications. Cloud infrastructures provide users a remote access to powerful computing capacity, large storage space and high network bandwidth to deploy various applications. With the support of cloud computing, many large-scale applications have been migrated to cloud infrastructures instead of running on in-house local servers. Among these applications, continuous write applications (CWAs) such as online surveillance systems, can significantly benefit due to the flexibility and advantages of cloud computing. However, with specific characteristics such as continuous data writing and processing, and high level demand of data availability, cloud service providers prefer to use sophisticated models for provisioning resources to meet CWAs’ demands while minimizing the operational cost of the infrastructure. In this paper, we present a novel architecture of multiple cloud service providers (CSPs) or commonly referred to as Cloud-of-Clouds. Based on this architecture, we propose two operational cost-aware algorithms for provisioning cloud resources for CWAs, namely neighboring optimal resource provisioning algorithm and global optimal resource provisioning algorithm, in order to minimize the operational cost and thereby maximizing the revenue of CSPs. We validate the proposed algorithms through comprehensive simulations. The two proposed algorithms are compared against each other to assess their effectiveness, and with a commonly used and practically viable round-robin approach. The results demonstrate that NORPA and GORPA outperform the conventional round-robin algorithm by reducing the operational cost by up to 28 and 57 %, respectively. The low complexity of the proposed cost-aware algorithms allows us to apply it to a realistic Cloud-of-Clouds environment in industry as well as academia. 相似文献