首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.  相似文献   

2.
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for opportunistic compute resources. However, unlike dedicated resources, where MapReduce has mostly been deployed, opportunistic resources have significantly higher rates of node volatility. As a consequence, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate on such volatile resources. In this paper, we propose MOON, short for MapReduce On Opportunistic eNvironments, which is designed to offer reliable MapReduce service for opportunistic computing. MOON adopts a hybrid resource architecture by supplementing opportunistic compute resources with a small set of dedicated resources, and it extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms to take advantage of the hybrid resource architecture. Our results on an emulated opportunistic computing system running atop a 60-node cluster demonstrate that MOON can deliver significant performance improvements to Hadoop on volatile compute resources and even finish jobs that are not able to complete in Hadoop.  相似文献   

3.
Software architecture definition for on-demand cloud provisioning   总被引:1,自引:0,他引:1  
Cloud computing is a promising paradigm for the provisioning of IT services. Cloud computing infrastructures, such as those offered by the RESERVOIR project, aim to facilitate the deployment, management and execution of services across multiple physical locations in a seamless manner. In order for service providers to meet their quality of service objectives, it is important to examine how software architectures can be described to take full advantage of the capabilities introduced by such platforms. When dealing with software systems involving numerous loosely coupled components, architectural constraints need to be made explicit to ensure continuous operation when allocating and migrating services from one host in the Cloud to another. In addition, the need for optimising resources and minimising over-provisioning requires service providers to control the dynamic adjustment of capacity throughout the entire service lifecycle. We discuss the implications for software architecture definitions of distributed applications that are to be deployed on Clouds. In particular, we identify novel primitives to support service elasticity, co-location and other requirements, propose language abstractions for these primitives and define their behavioural semantics precisely by establishing constraints on the relationship between architecture definitions and Cloud management infrastructures using a model denotational approach in order to derive appropriate service management cycles. Using these primitives and semantic definition as a basis, we define a service management framework implementation that supports on demand cloud provisioning and present a novel monitoring framework that meets the demands of Cloud based applications.  相似文献   

4.
The ability to process large numbers of continuous data streams in a near-real-time fashion has become a crucial prerequisite for many scientific and industrial use cases in recent years. While the individual data streams are usually trivial to process, their aggregated data volumes easily exceed the scalability of traditional stream processing systems. At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of nodes. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today’s parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a highly distributed scheme which allows these frameworks to detect violations of user-defined QoS constraints and optimize the job execution without manual interaction. As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For an example streaming application from the multimedia domain running on a cluster of 200 nodes, our approach improves the processing latency by a factor of at least 13 while preserving high data throughput when needed.  相似文献   

5.
Taking advantage of distributed storage technology and virtualization technology, cloud storage systems provide virtual machine clients customizable storage service. They can be divided into two types: distributed file system and block level storage system. There are two disadvantages in existing block level storage system: Firstly, Some of them are tightly coupled with their cloud computing environments. As a result, it’s hard to extend them to support other cloud computing platforms; Secondly, The bottleneck of volume server seriously affects the performance and reliability of the whole system. In this paper we present a lightweighted block-level storage system for clouds—ORTHRUS, based on virtualization technology. We first design the architecture with multiple volume servers and its workflows, which can improve system performance and avoid the problem. Secondly, we propose a Listen-Detect-Switch mechanism for ORTHRUS to deal with contingent volume servers’ failure. At last we design a strategy that dynamically balances load between multiple volume servers. We characterize machine capability and load quantity with black box model, and implement the dynamic load balance strategy which is based on genetic algorithm. Extensive experimental results show that the aggregated I/O throughputs of ORTHRUS are significantly improved (approximately two times of that in Orthrus), and both I/O throughputs and IOPS are also remarkably improved (about 1.8 and 1.2 times, respectively) by our dynamic load balance strategy.  相似文献   

6.
Cloud computing is becoming the new generation computing infrastructure, and many cloud vendors provide different types of cloud services. How to choose the best cloud services for specific applications is very challenging. Addressing this challenge requires balancing multiple factors, such as business demands, technologies, policies and preferences in addition to the computing requirements. This paper recommends a mechanism for selecting the best public cloud service at the levels of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS). A systematic framework and associated workflow include cloud service filtration, solution generation, evaluation, and selection of public cloud services. Specifically, we propose the following: a hierarchical information model for integrating heterogeneous cloud information from different providers and a corresponding cloud information collecting mechanism; a cloud service classification model for categorizing and filtering cloud services and an application requirement schema for providing rules for creating application-specific configuration solutions; and a preference-aware solution evaluation mode for evaluating and recommending solutions according to the preferences of application providers. To test the proposed framework and methodologies, a cloud service advisory tool prototype was developed after which relevant experiments were conducted. The results show that the proposed system collects/updates/records the cloud information from multiple mainstream public cloud services in real-time, generates feasible cloud configuration solutions according to user specifications and acceptable cost predication, assesses solutions from multiple aspects (e.g., computing capability, potential cost and Service Level Agreement, SLA) and offers rational recommendations based on user preferences and practical cloud provisioning; and visually presents and compares solutions through an interactive web Graphical User Interface (GUI).  相似文献   

7.
Cloud computing is a relatively recent computing paradigm that is often the answer for dealing with large amounts of data. Tenants expect the cloud providers to keep supplying an agreed upon quality of service, while cloud providers aim to increase profits as it is a key ingredient of any economic enterprise. In this paper, we propose a data replication strategy for cloud systems that satisfies the response time objective for executing queries while simultaneously enables the provider to return a profit from each execution. The proposed strategy estimates the response time of the queries and performs data replication in a way that the execution of any particular query is still estimated to be profitable for the provider. We show with simulations that how the proposed strategy fulfills these two criteria.  相似文献   

8.
The current works about MapReduce task scheduling with deadline constraints neither take the differences of Map and Reduce task, nor the cluster’s heterogeneity into account. This paper proposes an extensional MapReduce Task Scheduling algorithm for Deadline constraints in Hadoop platform: MTSD. It allows user specify a job’s deadline and tries to make the job be finished before the deadline. Through measuring the node’s computing capacity, a node classification algorithm is proposed in MTSD. This algorithm classifies the nodes into several levels in heterogeneous clusters. Under this algorithm, we firstly illuminate a novel data distribution model which distributes data according to the node’s capacity level respectively. The experiments show that the node classification algorithm can improved data locality observably to compare with default scheduler and it also can improve other scheduler’s locality. Secondly, we calculate the task’s average completion time which is based on the node level. It improves the precision of task’s remaining time evaluation. Finally, MTSD provides a mechanism to decide which job’s task should be scheduled by calculating the Map and Reduce task slot requirements.  相似文献   

9.
In heterogeneous distributed computing systems like cloud computing, the problem of mapping tasks to resources is a major issue which can have much impact on system performance. For some reasons such as heterogeneous and dynamic features and the dependencies among requests, task scheduling is known to be a NP-complete problem. In this paper, we proposed a hybrid heuristic method (HSGA) to find a suitable scheduling for workflow graph, based on genetic algorithm in order to obtain the response quickly moreover optimizes makespan, load balancing on resources and speedup ratio. At first, the HSGA algorithm makes tasks prioritization in complex graph considering their impact on others, based on graph topology. This technique is efficient to reduction of completion time of application. Then, it merges Best-Fit and Round Robin methods to make an optimal initial population to obtain a good solution quickly, and apply some suitable operations such as mutation to control and lead the algorithm to optimized solution. This algorithm evaluates the solutions by considering efficient parameters in cloud environment. Finally, the proposed algorithm presents the better results with increasing number of tasks in application graph in contrast with other studied algorithms.  相似文献   

10.
There are typically multiple heterogeneous servers providing various services in cloud computing. High power consumption of these servers increases the cost of running a data center. Thus, there is a problem of reducing the power cost with tolerable performance degradation. In this paper, we optimize the performance and power consumption tradeoff for multiple heterogeneous servers. We consider the following problems: (1) optimal job scheduling with fixed service rates; (2) joint optimal service speed scaling and job scheduling. For problem (1), we present the Karush-Kuhn-Tucker (KKT) conditions and provide a closed-form solution. For problem (2), both continuous speed scaling and discrete speed scaling are considered. In discrete speed scaling, the feasible service rates are discrete and bounded. We formulate the problem as an MINLP problem and propose a distributed algorithm by online value iteration, which has lower complexity than a centralized algorithm. Our approach provides an analytical way to manage the tradeoff between performance and power consumption. The simulation results show the gain of using speed scaling, and also prove the effectiveness and efficiency of the proposed algorithms.  相似文献   

11.
Nowadays, complex smartphone applications are developed that support gaming, navigation, video editing, augmented reality, and speech recognition which require considerable computational power and battery lifetime. The cloud computing provides a brand new opportunity for the development of mobile applications. Mobile Hosts (MHs) are provided with data storage and processing services on a cloud computing platform rather than on the MHs. To provide seamless connection and reliable cloud service, we are focused on communication. When the connection to cloud server is increased explosively, each MH connection quality has to be declined. It causes several problems: network delay, retransmission, and so on. In this paper, we propose proxy based architecture to improve link performance for each MH in mobile cloud computing. By proposed proxy, the MH need not keep connection of the cloud server because it just connected one of proxy in the same subnet. And we propose the optimal access network discovery algorithm to optimize bandwidth usage. When the MH changes its point of attachment, proposed discovery algorithm helps to connect the optimal access network for cloud service. By experiment result and analysis, the proposed connection management method has better performance than the 802.11 access method.  相似文献   

12.
Concentrating on a single resource cannot efficiently cope with the overall high utilization of resources in cloud data centers. Nowadays multiple resource scheduling problem is more attractive to researchers. Some studies achieve progresses in multi-resource scenarios. However, these previous heuristics have obvious limitations in complex software defined cloud environment. Focusing on energy conservation and load balancing, we propose a preciousness model for multiple resource scheduling in this paper. We give the formulation of the problem and propose an innovative strategy (P-Aware). In P-Aware, a special algorithm PMDBP (Proportional Multi-dimensional Bin Packing) is applied in the multi-dimensional bin packing approach. In this algorithm, multiple resources are consumed in a proportional way. Structure and details of PMDBP are discussed in this paper. Extensive experiments demonstrate that our strategy outperforms others both in efficiency and load balancing. Now P-Aware has been implemented in the resource management system in our cooperative company to cut energy consumption and reduce resource contention.  相似文献   

13.
With the popularization and development of cloud computing, lots of scientific computing applications are conducted in cloud environments. However, current application scenario of scientific computing is also becoming increasingly dynamic and complicated, such as unpredictable submission times of jobs, different priorities of jobs, deadlines and budget constraints of executing jobs. Thus, how to perform scientific computing efficiently in cloud has become an urgent problem. To address this problem, we design an elastic resource provisioning and task scheduling mechanism to perform scientific workflow jobs in cloud. The goal of this mechanism is to complete as many high-priority workflow jobs as possible under budget and deadline constraints. This mechanism consists of four steps: job preprocessing, job admission control, elastic resource provisioning and task scheduling. We perform the evaluation with four kinds of real scientific workflow jobs under different budget constraints. We also consider the uncertainties of task runtime estimations, provisioning delays, and failures in evaluation. The results show that in most cases our mechanism achieves a better performance than other mechanisms. In addition, the uncertainties of task runtime estimations, VM provisioning delays, and task failures do not have major impact on the mechanism’s performance.  相似文献   

14.
Cloud computing is a computational model in which resource providers can offer on-demand services to clients in a transparent way. However, to be able to guarantee quality of service without limiting the number of accepted requests, providers must be able to dynamically manage the available resources so that they can be optimized. This dynamic resource management is not a trivial task, since it involves meeting several challenges related to workload modeling, virtualization, performance modeling, deployment and monitoring of applications on virtualized resources. This paper carries out a performance evaluation of a module for resource management in a cloud environment that includes handling available resources during execution time and ensuring the quality of service defined in the service level agreement. An analysis was conducted of different resource configurations to define which dimension of resource scaling has a real influence on client requests. The results were used to model and implement a simulated cloud system, in which the allocated resource can be changed on-the-fly, with a corresponding change in price. In this way, the proposed module seeks to satisfy both the client by ensuring quality of service, and the provider by ensuring the best use of resources at a fair price.  相似文献   

15.
This paper proposes solutions to monitor the load and to balance the load of cloud data center. The proposed solutions work in two phases and graph theoretical concepts are applied in both phases. In the first phase, cloud data center is modeled as a network graph. This network graph is augmented with minimum dominating set concept of graph theory for monitoring its load. For constructing minimum dominating set, this paper proposes a new variant of minimum dominating set (V-MDS) algorithm and is compared with existing construction algorithms proposed by Rooji and Fomin. The V-MDS approach of querying cloud data center load information is compared with Central monitor approach. The second phase focuses on system and network-aware live virtual machine migration for load balancing cloud data center. For this, a new system and traffic-aware live VM migration for load balancing (ST-LVM-LB) algorithm is proposed and is compared with existing benchmarked algorithms dynamic management algorithm (DMA) and Sandpiper. To study the performance of the proposed algorithms, CloudSim3.0.3 simulator is used. The experimental results show that, V-MDS algorithm takes quadratic time complexity, whereas Rooji and Fomin algorithms take exponential time complexity. Then the V-MDS approach for querying Cloud Data Center load information is compared with the Central monitor approach and the experimental result shows that the proposed approach reduces the number of message updates by half than the Central monitor approach. The experimental results show on load balancing that the developed ST-LVM-LB algorithm triggers lesser Virtual Machine migrations, takes lesser time and migration cost to migrate with minimum network overhead. Thus the proposed algorithms improve the service delivery performance of cloud data center by incorporating graph theoretical solutions in monitoring and balancing the load.  相似文献   

16.
Power management is becoming very important in data centers. To apply power management in cloud computing, Green Computing has been proposed and considered. Cloud computing is one of the new promising techniques, that are appealing to many big companies. In fact, due to its dynamic structure and property in online services, cloud computing differs from current data centers in terms of power management. To better manage the power consumption of web services in cloud computing with dynamic user locations and behaviors, we propose a power budgeting design based on the logical level, using distribution trees. By setting multiple trees or forest, we can differentiate and analyze the effect of workload types and Service Level Agreements (SLAs, e.g. response time) in terms of power characteristics. Based on these, we introduce classified power capping for different services as the control reference to maximize power saving when there are mixed workloads.  相似文献   

17.
Majed  Ali  Raji  Fatemeh  Miri  Ali 《Cluster computing》2022,25(1):401-416

Data availability represents one of the primary functionalities of any cloud storage system since it ensures uninterrupted access to data. A common solution used by service providers that increase data availability and improve cloud performance is data replication. In this paper, we present a dynamic data replication strategy that is based on a hybrid peer-to-peer cloud architecture. Our proposed strategy selects the most popular data for replication. To determine the proper nodes for storing popular data, we employ not only the feature specifications of storage nodes, but also the relevant structural positions in the cloud network. Our simulation results show the impact of using features such as data popularity, and structural characteristics in improving network performance and balancing the storage nodes, and reducing user response time.

  相似文献   

18.
The state-of-the-art indexing mechanisms for distributed cloud data management systems can not support complex queries, such as multi-dimensional query and range query. To solve this problem, we propose a multi-dimensional indexing mechanism named PR-Chord to support complex queries. PR-Chord is composed of the global index named PR-Index and the Chord network. The multi-dimensional space formed by the range of the multi-dimensional data is divided into hyper-rectangle spaces equally. The PR-Index is a hierarchical index structure based on the improved PR quadtree to index these spaces. The complex query is transformed into the query of leaf nodes of PR-Index. We design the algorithms of query, insertion and deletion to support complex queries. Since PR-Index does not store the multi-dimensional data, its maintenance cost is zero. PR-Chord has the advantages of load balancing and simple algorithm. The experiment results demonstrate that PR-Chord has good query efficiency.  相似文献   

19.
As DNA sequencing outpaces improvements in computer speed, there is a critical need to accelerate tasks like alignment and SNP calling. Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from .  相似文献   

20.
With the rapid development of cloud computing techniques, the number of users is undergoing exponential growth. It is difficult for traditional data centers to perform many tasks in real time because of the limited bandwidth of resources. The concept of fog computing is proposed to support traditional cloud computing and to provide cloud services. In fog computing, the resource pool is composed of sporadic distributed resources that are more flexible and movable than a traditional data center. In this paper, we propose a fog computing structure and present a crowd-funding algorithm to integrate spare resources in the network. Furthermore, to encourage more resource owners to share their resources with the resource pool and to supervise the resource supporters as they actively perform their tasks, we propose an incentive mechanism in our algorithm. Simulation results show that our proposed incentive mechanism can effectively reduce the SLA violation rate and accelerate the completion of tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号