首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for opportunistic compute resources. However, unlike dedicated resources, where MapReduce has mostly been deployed, opportunistic resources have significantly higher rates of node volatility. As a consequence, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate on such volatile resources. In this paper, we propose MOON, short for MapReduce On Opportunistic eNvironments, which is designed to offer reliable MapReduce service for opportunistic computing. MOON adopts a hybrid resource architecture by supplementing opportunistic compute resources with a small set of dedicated resources, and it extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms to take advantage of the hybrid resource architecture. Our results on an emulated opportunistic computing system running atop a 60-node cluster demonstrate that MOON can deliver significant performance improvements to Hadoop on volatile compute resources and even finish jobs that are not able to complete in Hadoop.  相似文献   

2.
In this study, we address the meta-task scheduling problem in heterogeneous computing (HC) systems, which is to find a task assignment that minimizes the schedule length of a meta-task composed of several independent tasks with no data dependencies. The fact that the meta-task scheduling problem in HC systems is NP-hard has motivated the development of many heuristic scheduling algorithms. These heuristic algorithms, however, neglect the stochastic nature of task execution times in an attempt to minimize a deterministic objective function, which is the maximum of the expected values of machine loads. Contrary to existing heuristics, we account for this stochastic nature by modeling task execution times as random variables. We, then, formulate a stochastic scheduling problem where the objective is to minimize the expected value of the maximum of machine loads. We prove that this new objective is underestimated by the deterministic objective function and that an optimal task assignment obtained with respect to the deterministic objective function could be inefficient in a real computing platform. In order to solve the stochastic scheduling problem posed, we develop a genetic algorithm based scheduling heuristic. Our extensive simulation studies show that the proposed genetic algorithm can produce better task assignments as compared to existing heuristics. Specifically, we observe a performance improvement on the relative cost heuristic (M.-Y. Wu and W. Shu, A high-performance mapping algorithm for heterogeneous computing systems, in: Int. Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001) by up to 61%.  相似文献   

3.
Recently, the video data has very huge volume, taking one city for example, thousands of cameras are built of which each collects high-definition video over 24–48 GB every day with the rapidly growth; secondly, data collected includes variety of formats involving multimedia, images and other unstructured data; furthermore the valuable information contains in only a few frames called key frames of massive video data; and the last problem caused is how to improve the processing velocity of a large amount of original video with computers, so as to enhance the crime prediction and detection effectiveness of police and users. In this paper, we conclude a novel architecture for next generation public security system, and the “front + back” pattern is adopted to address the problems brought by the redundant construction of current public security information systems which realizes the resource consolidation of multiple IT resources, and provides unified computing and storage environment for more complex data analysis and applications such as data mining and semantic reasoning. Under the architecture, we introduce cloud computing technologies such as distributed storage and computing, data retrieval of huge and heterogeneous data, provide multiple optimized strategies to enhance the utilization of resources and efficiency of tasks. This paper also presents a novel strategy to generate a super-resolution image via multi-stage dictionaries which are trained by a cascade training process. Extensive experiments on image super-resolution validate that the proposed solution can get much better results than some state-of-the-arts ones.  相似文献   

4.
Grid computing uses distributed interconnected computers and resources collectively to achieve higher performance computing and resource sharing. Task scheduling is one of the core steps to efficiently exploit the capabilities of Grid environment. Recently, heuristic algorithms have been successfully applied to solve task scheduling on computational Grids. In this paper, Gravitational Search Algorithm (GSA), as one of the latest population-based metaheuristic algorithms, is used for task scheduling on computational Grids. The proposed method employs GSA to find the best solution with the minimum makespan and flowtime. We evaluate this approach with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) method. The results demonstrate that the benefit of the GSA is its speed of convergence and the capability to obtain feasible schedules.  相似文献   

5.
In heterogeneous distributed computing systems like cloud computing, the problem of mapping tasks to resources is a major issue which can have much impact on system performance. For some reasons such as heterogeneous and dynamic features and the dependencies among requests, task scheduling is known to be a NP-complete problem. In this paper, we proposed a hybrid heuristic method (HSGA) to find a suitable scheduling for workflow graph, based on genetic algorithm in order to obtain the response quickly moreover optimizes makespan, load balancing on resources and speedup ratio. At first, the HSGA algorithm makes tasks prioritization in complex graph considering their impact on others, based on graph topology. This technique is efficient to reduction of completion time of application. Then, it merges Best-Fit and Round Robin methods to make an optimal initial population to obtain a good solution quickly, and apply some suitable operations such as mutation to control and lead the algorithm to optimized solution. This algorithm evaluates the solutions by considering efficient parameters in cloud environment. Finally, the proposed algorithm presents the better results with increasing number of tasks in application graph in contrast with other studied algorithms.  相似文献   

6.
Cloud computing has attracted significant attention from research community because of rapid migration rate of Information Technology services to its domain. Advances in virtualization technology has made cloud computing very popular as a result of easier deployment of application services. Tasks are submitted to cloud datacenters to be processed on pay as you go fashion. Task scheduling is one the significant research challenges in cloud computing environment. The current formulation of task scheduling problems has been shown to be NP-complete, hence finding the exact solution especially for large problem sizes is intractable. The heterogeneous and dynamic feature of cloud resources makes optimum task scheduling non-trivial. Therefore, efficient task scheduling algorithms are required for optimum resource utilization. Symbiotic Organisms Search (SOS) has been shown to perform competitively with Particle Swarm Optimization (PSO). The aim of this study is to optimize task scheduling in cloud computing environment based on a proposed Simulated Annealing (SA) based SOS (SASOS) in order to improve the convergence rate and quality of solution of SOS. The SOS algorithm has a strong global exploration capability and uses fewer parameters. The systematic reasoning ability of SA is employed to find better solutions on local solution regions, hence, adding exploration ability to SOS. Also, a fitness function is proposed which takes into account the utilization level of virtual machines (VMs) which reduced makespan and degree of imbalance among VMs. CloudSim toolkit was used to evaluate the efficiency of the proposed method using both synthetic and standard workload. Results of simulation showed that hybrid SOS performs better than SOS in terms of convergence speed, response time, degree of imbalance, and makespan.  相似文献   

7.
刘国波  戎恺  唐力  王伟民  周伟奇  韩宝龙  刘凯  黄洪 《生态学报》2022,42(24):10051-10059
生态环境是人类赖以生存和发展的基础,城市生态大数据智慧管理和服务平台建设是生态城市和美丽城市建设的需要。以深圳市为例,借助物联网、移动互联网、计算机、数据库和网络地理信息系统技术,以及时空地理大数据整合和共享、大数据挖掘分析和云端一体化业务协同等关键技术,结合城市生态系统评估分析决策模型/方法/对策库,在实现深圳市“空-地-网-统计-众源”等多源异构生态大数据有效集成的基础上,搭建了 “数据采集-信息提取-知识发现-决策生成-快速服务”全流程、一体化的深圳市生态大数据智慧管理和服务平台,构建了面向业务部门和科研人员等专业用户的生态野外数据调查采集系统和城市生态监测与评估管理决策分析系统,以及面向社会大众的深圳生态大调查APP。平台首次揭示了深圳市1979年以来不同生态系统的格局、构成、过程、服务和健康状况的变化,提升了深圳市生态环境综合决策科学化、生态环境监管精准化、生态环境公共服务便民化水平。平台有效降低了用户数据收集与处理和操作专业模型的难度,突破了原始数据应用的瓶颈和难点,提高了专业模型在业务部门中的使用率。未来,依靠“生态大数据+生态专业模型”的技术方案实现从数据到知识的挖掘,是实现城市生态大数据智慧化和专业化管理的关键,也是全面提高城市生态环境保护信息化服务水平的重要途径。  相似文献   

8.
The emergence of cloud computing has made it become an attractive solution for large-scale data processing and storage applications. Cloud infrastructures provide users a remote access to powerful computing capacity, large storage space and high network bandwidth to deploy various applications. With the support of cloud computing, many large-scale applications have been migrated to cloud infrastructures instead of running on in-house local servers. Among these applications, continuous write applications (CWAs) such as online surveillance systems, can significantly benefit due to the flexibility and advantages of cloud computing. However, with specific characteristics such as continuous data writing and processing, and high level demand of data availability, cloud service providers prefer to use sophisticated models for provisioning resources to meet CWAs’ demands while minimizing the operational cost of the infrastructure. In this paper, we present a novel architecture of multiple cloud service providers (CSPs) or commonly referred to as Cloud-of-Clouds. Based on this architecture, we propose two operational cost-aware algorithms for provisioning cloud resources for CWAs, namely neighboring optimal resource provisioning algorithm and global optimal resource provisioning algorithm, in order to minimize the operational cost and thereby maximizing the revenue of CSPs. We validate the proposed algorithms through comprehensive simulations. The two proposed algorithms are compared against each other to assess their effectiveness, and with a commonly used and practically viable round-robin approach. The results demonstrate that NORPA and GORPA outperform the conventional round-robin algorithm by reducing the operational cost by up to 28 and 57 %, respectively. The low complexity of the proposed cost-aware algorithms allows us to apply it to a realistic Cloud-of-Clouds environment in industry as well as academia.  相似文献   

9.
Task scheduling is one of the most challenging aspects to improve the overall performance of cloud computing and optimize cloud utilization and Quality of Service (QoS). This paper focuses on Task Scheduling optimization using a novel approach based on Dynamic dispatch Queues (TSDQ) and hybrid meta-heuristic algorithms. We propose two hybrid meta-heuristic algorithms, the first one using Fuzzy Logic with Particle Swarm Optimization algorithm (TSDQ-FLPSO), the second one using Simulated Annealing with Particle Swarm Optimization algorithm (TSDQ-SAPSO). Several experiments have been carried out based on an open source simulator (CloudSim) using synthetic and real data sets from real systems. The experimental results demonstrate the effectiveness of the proposed approach and the optimal results is provided using TSDQ-FLPSO compared to TSDQ-SAPSO and other existing scheduling algorithms especially in a high dimensional problem. The TSDQ-FLPSO algorithm shows a great advantage in terms of waiting time, queue length, makespan, cost, resource utilization, degree of imbalance, and load balancing.  相似文献   

10.
Sensor networks deployed in lakes and reservoirs, when combined with simulation models and expert knowledge from the global community, are creating deeper understanding of the ecological dynamics of lakes. However, the amount of data and the complex patterns in the data demand substantial compute resources and efficient data mining algorithms, both of which are beyond the realm of traditional limnological research. This paper uniquely adapts methods from computer science for application to data intensive ecological questions, in order to provide ecologists with approachable methodology to facilitate knowledge discovery in lake ecology. We apply a state-of-the-art time series mining technique based on symbolic representation (SAX) to high-frequency time series of phycocyanin (PHYCO) and chlorophyll (CHLORO) fluorescence, both of which are indicators of algal biomass in lakes, as well as model predictions of algal biomass (MODEL). We use data mining techniques to demonstrate that MODEL predicts PHYCO better than it predicts CHLORO. All time series have high redundancy, resulting in a relatively small subset of unique patterns. However, MODEL is much less complex than either PHYCO or CHLORO and fails to reproduce high biomass periods indicative of algal blooms. We develop a set of tools in R to enable motif discovery and anomaly detection within a single lake time series, and relationship study among multiple lake time series through distance metrics, clustering and classification. Furthermore, to improve computation times, we provision web services to launch R tools remotely on high performance computing (HPC) resources. Comprehensive experimental results on observational and simulated lake data demonstrate the effectiveness of our approach.  相似文献   

11.
声景包含重要的生态信息,具有实时性强、信息密度高的特点,有重要研究价值。现有的声景研究中,音频及相关环境参数采集和分析仍需要大量的人工作业,耗时耗力。基于多传感集成、边缘计算和深度学习技术,建立了一套声景大数据在线采集与分析系统,包括边缘计算节点和中心计算服务器。并通过3个实验站点,进行了近1年的技术验证,实现了声景大数据的自动化在线采集、传输和分析。该系统能适应户外恶劣的自然环境,能根据任务需求持续不断地进行声景大数据在线采集和分析,稳定性好。声学指数可以反映声景变化,但因指数侧重点不同,不同的声学指数之间变化特征差异较大,需要组合使用。通过声纹特征图能直观地识别出不同发声源,对物种的快速识别、声源的分类等具有较强的借鉴意义。系统借助VGGish网络提取的高维声景特征图能很好地识别不同站点和不同时间的声景变化,在不同站点和昼夜上具有较高的区分精度,有快速和直观地反映不同生态系统的类型特征、生态系统动态变化的潜力。丰富声纹特征库、优化声景特征分析神经网络、建设声景长期监测共享网络,有助于扩展系统在物种识别、生物多样性快速分析、生物与环境相互作用机制方面的应用。研究为声景大数据的在线采集...  相似文献   

12.

In recent years, cloud computing can be considered an emerging technology that can share resources with users. Because cloud computing is on-demand, efficient use of resources such as memory, processors, bandwidth, etc., is a big challenge. Despite the advantages of cloud computing, sometimes it is not a proper choice due to its delay in responding appropriately to existing requests, which led to the need for another technology called fog computing. Fog computing reduces traffic and time lags by expanding cloud services to the network and closer to users. It can schedule resources with higher efficiency and utilize them to impact the user's experience dramatically. This paper aims to survey some studies that have been done in the field of scheduling in fog/cloud computing environments. The focus of this survey is on published studies between 2015 and 2021 in journals or conferences. We selected 71 studies in a systematic literature review (SLR) from four major scientific databases based on their relation to our paper. We classified these studies into five categories based on their traced parameters and their focus area. This classification comprises 1—performance 2—energy efficiency, 3—resource utilization, 4—performance and energy efficiency, and 5—performance and resource utilization simultaneously. 42.3% of the studies focused on performance, 9.9% on energy efficiency, 7.0% on resource utilization, 21.1% on both performance and energy efficiency, and 19.7% on both performance and resource utilization. Finally, we present challenges and open issues in the resource scheduling methods in fog/cloud computing environments.

  相似文献   

13.
In vivo calcium imaging through microendoscopic lenses enables imaging of neuronal populations deep within the brains of freely moving animals. Previously, a constrained matrix factorization approach (CNMF-E) has been suggested to extract single-neuronal activity from microendoscopic data. However, this approach relies on offline batch processing of the entire video data and is demanding both in terms of computing and memory requirements. These drawbacks prevent its applicability to the analysis of large datasets and closed-loop experimental settings. Here we address both issues by introducing two different online algorithms for extracting neuronal activity from streaming microendoscopic data. Our first algorithm, OnACID-E, presents an online adaptation of the CNMF-E algorithm, which dramatically reduces its memory and computation requirements. Our second algorithm proposes a convolution-based background model for microendoscopic data that enables even faster (real time) processing. Our approach is modular and can be combined with existing online motion artifact correction and activity deconvolution methods to provide a highly scalable pipeline for microendoscopic data analysis. We apply our algorithms on four previously published typical experimental datasets and show that they yield similar high-quality results as the popular offline approach, but outperform it with regard to computing time and memory requirements. They can be used instead of CNMF-E to process pre-recorded data with boosted speeds and dramatically reduced memory requirements. Further, they newly enable online analysis of live-streaming data even on a laptop.  相似文献   

14.
In this paper, the task scheduling in MapReduce is considered for geo-distributed data centers on heterogeneous networks. Adaptive heartbeats, job deadlines and data locality are concerned. Job deadlines are divided according to the maximum data volume of tasks. With the considered constraints, the task scheduling is formulated as an assignment problem in each heartbeat, in which adaptive heartbeats are calculated by the processing times of tasks, jobs are sequencing in terms of the divided deadlines and tasks are scheduled by the Hungarian algorithm. Taking into account both the data transfer and processing times, the most suitable data center for all mapped jobs are determined in the reduce phase. Experimental results show that the proposed algorithms outperform the current existing ones. The proposals with sorted task-sequences have better performance than those with random task-sequences.  相似文献   

15.

Background

The clinical decision support system can effectively break the limitations of doctors’ knowledge and reduce the possibility of misdiagnosis to enhance health care. The traditional genetic data storage and analysis methods based on stand-alone environment are hard to meet the computational requirements with the rapid genetic data growth for the limited scalability.

Methods

In this paper, we propose a distributed gene clinical decision support system, which is named GCDSS. And a prototype is implemented based on cloud computing technology. At the same time, we present CloudBWA which is a novel distributed read mapping algorithm leveraging batch processing strategy to map reads on Apache Spark.

Results

Experiments show that the distributed gene clinical decision support system GCDSS and the distributed read mapping algorithm CloudBWA have outstanding performance and excellent scalability. Compared with state-of-the-art distributed algorithms, CloudBWA achieves up to 2.63 times speedup over SparkBWA. Compared with stand-alone algorithms, CloudBWA with 16 cores achieves up to 11.59 times speedup over BWA-MEM with 1 core.

Conclusions

GCDSS is a distributed gene clinical decision support system based on cloud computing techniques. In particular, we incorporated a distributed genetic data analysis pipeline framework in the proposed GCDSS system. To boost the data processing of GCDSS, we propose CloudBWA, which is a novel distributed read mapping algorithm to leverage batch processing technique in mapping stage using Apache Spark platform.
  相似文献   

16.
The performance of mobile devices including smart phones and laptops is steadily rising as prices plummet sharply. So, mobile devices are changing from being a mere interface for requesting services to becoming computing resources for providing and sharing services due to immeasurably improved performance. With the increasing number of mobile device users, the utilization rate of SNS (Social Networking Service) is also soaring. Applying SNS to the existing computing environment enables members of social network to share computing services without further authentication. To use mobile device as a computing resource, temporary network disconnection caused by user mobility and various HW/SW faults causing service disruption should be considered. Also these issues must be resolved to support mobile users and to provide user requirements for services. Accordingly, we propose fault tolerance and QoS (Quality of Services) scheduling using CAN (Content Addressable Network) in Mobile Social Cloud Computing (MSCC). MSCC is a computing environment that integrates social network-based cloud computing and mobile devices. In the computing environment, a mobile user can, through mobile devices, become a member of a social network through real world relationships. Essentially, members of a social network share cloud service or data with other members without further authentication by using their mobile device. We use CAN as the underlying MSCC to logically manage the locations of mobile devices. Fault tolerance and QoS scheduling consists of four sub-scheduling algorithms: malicious-user filtering, cloud service delivery, QoS provisioning, and replication and load-balancing. Under the proposed scheduling, a mobile device is used as a resource for providing cloud services, faults caused from user mobility or other reasons are tolerated and user requirements for QoS are considered. We simulate scheduling both with and without CAN. The simulation results show that our proposed scheduling algorithm enhances cloud service execution time, finish time and reliability and reduces the cloud service error rate.  相似文献   

17.
MOTIVATION: High-resolution mass spectrometers generate large data files that are complex, noisy and require extensive processing to extract the optimal data from raw spectra. This processing is readily achieved in software and is often embedded in manufacturers' instrument control and data processing environments. However, the speed of this data processing is such that it is usually performed off-line, post data acquisition. We have been exploring strategies that would allow real-time advanced processing of mass spectrometric data, making use of the reconfigurable computing paradigm, which exploits the flexibility and versatility of Field Programmable Gate Arrays (FPGAs). This approach has emerged as a powerful solution for speeding up time-critical algorithms. We describe here a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-ToF instruments. The hardware-implemented algorithms for de-noising, baseline correction, peak identification and deisotoping, running on a Xilinx Virtex 2 FPGA at 180 MHz, generate a mass fingerprint over 100 times faster than an equivalent algorithm written in C, running on a Dual 3 GHz Xeon workstation.  相似文献   

18.
Task scheduling for large-scale computing systems is a challenging problem. From the users perspective, the main concern is the performance of the submitted tasks, whereas, for the cloud service providers, reducing operation cost while providing the required service is critical. Therefore, it is important for task scheduling mechanisms to balance users’ performance requirements and energy efficiency because energy consumption is one of the major operational costs. We present a time dependent value of service (VoS) metric that will be maximized by the scheduling algorithm that take into consideration the arrival time of a task while evaluating the value functions for completing a task at a given time and the tasks energy consumption. We consider the variation in value for completing a task at different times such that the value of energy reduction can change significantly between peak and non-peak periods. To determine the value of a task completion, we use completion time and energy consumption with soft and hard thresholds. We define the VoS for a given workload to be the sum of the values for all tasks that are executed during a given period of time. Our system model is based on virtual machines, where each task will be assigned a resource configuration characterized by the number of the homogeneous cores and amount of memory. For the scheduling of each task submitted to our system, we use the estimated time to compute matrix and the estimated energy consumption matrix which are created using historical data. We design, evaluate, and compare our task scheduling methods to show that a significant improvement in energy consumption can be achieved when considering time-of-use dependent scheduling algorithms. The simulation results show that we improve the performance and the energy values up to 49% when compared to schedulers that do not consider the value functions. Similar to the simulation results, our experimental results from running our value based scheduling on an IBM blade server show up to 82% improvement in performance value, 110% improvement in energy value, and up to 77% improvement in VoS compared to schedulers that do not consider the value functions.  相似文献   

19.
Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications on clusters of commodity systems-on-chip. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric-static and dynamic scheduling strategies that carefully tune and distribute the operation’s micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.  相似文献   

20.
Geoscience observations and model simulations are generating vast amounts of multi-dimensional data. Effectively analyzing these data are essential for geoscience studies. However, the tasks are challenging for geoscientists because processing the massive amount of data is both computing and data intensive in that data analytics requires complex procedures and multiple tools. To tackle these challenges, a scientific workflow framework is proposed for big geoscience data analytics. In this framework techniques are proposed by leveraging cloud computing, MapReduce, and Service Oriented Architecture (SOA). Specifically, HBase is adopted for storing and managing big geoscience data across distributed computers. MapReduce-based algorithm framework is developed to support parallel processing of geoscience data. And service-oriented workflow architecture is built for supporting on-demand complex data analytics in the cloud environment. A proof-of-concept prototype tests the performance of the framework. Results show that this innovative framework significantly improves the efficiency of big geoscience data analytics by reducing the data processing time as well as simplifying data analytical procedures for geoscientists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号