首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing systems to characterize the degree of overlap of communication with computation in the execution of parallel applications. The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing approaches that operate outside the library. The utility of the framework is demonstrated by analyzing communication-computation overlap for micro-benchmarks and the NAS benchmarks, and the insights obtained are used to modify the NAS SP benchmark, resulting in improved overlap.
Vinod TipparajuEmail:
  相似文献   

2.
Rapid prototyping of distributed systems can be achieved by integrating commercial off-the-shelf (COTS) components. With components as the building blocks, it is important to predict the performance of the system based on the performance of individual components. In this paper, performance prediction of a system consisting of a small number of components is investigated under different inter-component communication patterns, and the number of threads provided by components. Based on the experimental results, it can be inferred that the proposed composition rules provide a reasonably accurate prediction of the performance of a system made out of these components.
Barrett R. BryantEmail:
  相似文献   

3.
One of the distinct characteristics of computing platforms shared by multiple users such as a cluster and a computational grid is heterogeneity on each computer and/or among computers. Temporal heterogeneity refers to variation, along the time dimension, of computing power available for a task on a computer, and spatial heterogeneity represents the variation among computers. In minimizing the average parallel execution time of a target task on a spatially heterogeneous computing system, it is not optimal to distribute the target task linearly proportional to the average computing powers available on computers. In this paper, effects of the temporal and spatial heterogeneity on performance of a target task have been analyzed in terms of the mean and standard deviation of parallel execution time. Based on the analysis results, an approach to load balancing for minimizing the average parallel execution time of a target task is described. The proposed approach whose validity has been verified through simulation considers temporal and spatial heterogeneities in addition to the average computing power on each computer.
Soo-Young Lee (Corresponding author)Email:
  相似文献   

4.
Service providers and their customers agree on certain quality of service guarantees through Service Level Agreements (SLA). An SLA contains one or more Service Level Objectives (SLO)s that describe the agreed-upon quality requirements at the service level. Translating these SLOs into lower-level policies that can then be used for design and monitoring purposes is a difficult problem. Usually domain experts are involved in this translation that often necessitates application of domain knowledge to this problem. In this article, we propose an approach that combines performance modeling with regression analysis to solve this problem. We demonstrate that our approach is practical and that it can be applied to different n-tier services. Our experiments show that for a typical 3-tier e-commerce application in a virtualized environment, the SLA can be met while improving CPU utilization by up to 3 times.
Yuan ChenEmail:
  相似文献   

5.
Previous studies have revealed that paravirtualization imposes minimal performance overhead on High Performance Computing (HPC) workloads, while exposing numerous benefits for this field. In this study, we are investigating the impact of paravirtualization on the performance of automatically-tuned software systems. We compare peak performance, performance degradation in constrained memory situations, performance degradation in multi-threaded applications, and inter-VM shared memory performance. For comparison purposes, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized systems. Our results show that the combination of ATLAS and Xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles in both single and multi-threaded scenarios. Furthermore, we show that it is possible to achieve memory sharing among OS instances at native speeds. These results expose new benefits to memory-intensive applications arising from the ability to slim down the guest OS without influencing the system performance. In addition, our findings support a novel and very attractive deployment scenario for computational science and engineering codes on virtual clusters and computational clouds.
Rich WolskiEmail:
  相似文献   

6.
A novel approach, called Aeneas, which is based on the execution state of distributed programs, is proposed in this paper. It is for the real-time performance analysis of distributed programs with reliability-constrains. In Aeneas, there are two important factors, the available data files and the transmission paths of each available data file. Some algorithms are designed to find all the transmission paths of each data file needed while the program executes, count the transmission time for each transmission path, then get the aggregate expression of transmission time, calculate the fastest response time and the slowest response time of distributed programs with reliability-constrains. In order to justify the feasibility and the availability of this approach, a series of experiments have been done. The results show that it is feasible and efficient to evaluate the real-time performance for distributed software with reliability-constrains.
Hai JinEmail:
  相似文献   

7.
We present a technique that controls the peak power consumption of a high-density server by implementing a feedback controller that uses precise, system-level power measurement to periodically select the highest performance state while keeping the system within a fixed power constraint. A control theoretic methodology is applied to systematically design this control loop with analytic assurances of system stability and controller performance, despite unpredictable workloads and running environments. In a real server we are able to control power over a 1 second period to within 1 W and over an 8 second period to within 0.1 W. Conventional servers respond to power supply constraint situations by using simple open-loop policies to set a safe performance level in order to limit peak power consumption. We show that closed-loop control can provide higher performance under these conditions and implement this technique on an IBM BladeCenter HS20 server. Experimental results demonstrate that closed-loop control provides up to 82% higher application performance compared to open-loop control and up to 17% higher performance compared to a widely used ad-hoc technique.
Malcolm WareEmail:
  相似文献   

8.
Jablonka and Lamb's claim that evolutionary biology is undergoing a ‘revolution’ is queried. But the very concept of revolutionary change has uncertain application to a field organized in the manner of contemporary biology. The explanatory primacy of sequence properties is also discussed.
Peter Godfrey-SmithEmail:
  相似文献   

9.
Most parallel machines, such as clusters, are space-shared in order to isolate batch parallel applications from each other and optimize their performance. However, this leads to low utilization or potentially long waiting times. We propose a self-adaptive approach to time-sharing such machines that provides isolation and allows the execution rate of an application to be tightly controlled by the administrator. Our approach combines a periodic real-time scheduler on each node with a global feedback-based control system that governs the local schedulers. We have developed an online system that implements our approach. The system takes as input a target execution rate for each application, and automatically and continuously adjusts the applications’ real-time schedules to achieve those rates with proportional CPU utilization. Target rates can be dynamically adjusted. Applications are performance-isolated from each other and from other work that is not using our system. We present an extensive evaluation that shows that the system remains stable with low response times, and that our focus on CPU isolation and control does not come at the significant expense of network I/O, disk I/O, or memory isolation.
Peter A. DindaEmail:
  相似文献   

10.
Efficient and robust data streaming services are a critical requirement of emerging Grid applications, which are based on seamless interactions and coupling between geographically distributed application components. Furthermore the dynamism of Grid environments and applications requires that these services be able to continually manage and optimize their operation based on system state and application requirements. This paper presents a design and implementation of such a self-managing data-streaming service based on online control strategies. A Grid-based fusion workflow scenario is used to evaluate the service and demonstrate its feasibility and performance.
Sherif AbdelwahedEmail:
  相似文献   

11.
Predictive performance modelling of parallel component compositions   总被引:1,自引:0,他引:1  
Large-scale scientific computing applications frequently make use of closely-coupled distributed parallel components. The performance of such applications is therefore dependent on the component parts and their interaction at run-time. This paper describes a methodology for predictive performance modelling and evaluation of parallel applications composed of multiple interacting components. In this paper, the fundamental steps and required operations involved in the modelling and evaluation process are identified—including component decomposition, component model combination, M×N communication modelling, dataflow analysis and overall performance evaluation. A case study is presented to illustrate the modelling process and the methodology is verified through experimental analysis.
Stephen A. JarvisEmail:
  相似文献   

12.
The integration of multiple predictors promises higher prediction accuracy than the accuracy that can be obtained with a single predictor. The challenge is how to select the best predictor at any given moment. Traditionally, multiple predictors are run in parallel and the one that generates the best result is selected for prediction. In this paper, we propose a novel approach for predictor integration based on the learning of historical predictions. Compared with the traditional approach, it does not require running all the predictors simultaneously. Instead, it uses classification algorithms such as k-Nearest Neighbor (k-NN) and Bayesian classification and dimension reduction technique such as Principal Component Analysis (PCA) to forecast the best predictor for the workload under study based on the learning of historical predictions. Then only the forecasted best predictor is run for prediction. Our experimental results show that it achieved 20.18% higher best predictor forecasting accuracy than the cumulative MSE based predictor selection approach used in the popular Network Weather Service system. In addition, it outperformed the observed most accurate single predictor in the pool for 44.23% of the performance traces.
Renato J. FigueiredoEmail:
  相似文献   

13.
14.
Over the years, we have seen a significant number of integration techniques for data warehouses to support web integrated data. However, the existing works focus extensively on the design concept. In this paper, we focus on the performance of a web database application such as an integrated web data warehousing using a well-defined and uniform structure to deal with web information sources including semi-structured data such as XML data, and documents such as HTML in a web data warehouse system. By using a case study, our implementation of the prototype is a web manipulation concept for both incoming sources and result outputs. Thus, the system not only can be operated through the web, it can also handle the integration of web data sources and structured data sources. Our main contribution is the performance evaluation of an integrated web data warehouse application which includes two tasks. Task one is to perform a verification of the correctness of integrated data based on the result set that is retrieved from the web integrated data warehouse system using complex and OLAP queries. The result set is checked against the result set that is retrieved from the existing independent data source systems. Task two is to measure the performance of OLAP or complex query by investigating source operation functions used by these queries to retrieve the data. The information of source operation functions used by each query is obtained using the TKPROF utility.
David TaniarEmail:
  相似文献   

15.
Studies on the effects of a variety of exogenous and anthropogenic environmental factors, including endocrine disruptors, heavy metals, UV light, high temperature, and others, on marine organisms have been presented at the 2nd Bilateral Seminar Italy–Japan held in November 2006. Reports were discussed in order to reveal the current situation of marine ecosystems, aiming at evaluation and prediction of environmental risks.
V. MatrangaEmail:
  相似文献   

16.
The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.
Alan MorrisEmail:
  相似文献   

17.
Sub-Antarctic Marion Island has had a permanent research station for 50 years and the islands Wandering Albatrosses have been intensively studied for 20 years. The reactions of breeding birds to approaches by a human on foot were recorded. Three response variables were calculated: intensity of vocal reaction (IVR), intensity of non-vocal reaction (INR) and overall response index (ORI). At 5 m from the nest, twice as many birds stood and/or vocalised as at 15 m. Nearest neighbour distance, age and gender did not explain individual variability of responses. Study colony birds had higher IVR scores than non-study colony birds; birds at colonies closest to the station had the highest ORI scores. A better breeding record was associated with lower IVR and ORI scores, but a causative relationship remains to be demonstrated. A minimum viewing distance of 25 m is recommended for breeding Wandering Albatrosses.
Marienne S. de VilliersEmail: Fax: +27-21-6503434
John CooperEmail:
Peter G. RyanEmail:
  相似文献   

18.
Scheduling mixed-parallel applications with advance reservations   总被引:1,自引:0,他引:1  
This paper investigates the scheduling of mixed-parallel applications, which exhibit both task and data parallelism, in advance reservations settings. Both the problem of minimizing application turn-around time and that of meeting a deadline are studied. For each several scheduling algorithms are proposed, some of which borrow ideas from previously published work in non-reservation settings. Algorithms are compared in simulation over a wide range of application and reservation scenarios. The main finding is that schedules computed using the previously published CPA algorithm can be adapted to advance reservation settings, notably resulting in low resource consumption and thus high efficiency.
Henri Casanova (Corresponding author)Email:
  相似文献   

19.
We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the probability of such an interruption on a single process is extremely small. We demonstrate that synchronizing the noise can significantly reduce its negative influence.
Aroon NatarajEmail:
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号