共查询到20条相似文献,搜索用时 31 毫秒
1.
Aniruddha G. Shet P. Sadayappan David E. Bernholdt Jarek Nieplocha Vinod Tipparaju 《Cluster computing》2008,11(1):75-90
Effective overlap of computation and communication is a well understood technique for latency hiding and can yield significant
performance gains for applications on high-end computers. In this paper, we propose an instrumentation framework for message-passing
systems to characterize the degree of overlap of communication with computation in the execution of parallel applications.
The inability to obtain precise time-stamps for pertinent communication events is a significant problem, and is addressed
by generation of minimum and maximum bounds on achieved overlap. The overlap measures can aid application developers and system
designers in investigating scalability issues. The approach has been used to instrument two MPI implementations as well as
the ARMCI system. The implementation resides entirely within the communication library and thus integrates well with existing
approaches that operate outside the library. The utility of the framework is demonstrated by analyzing communication-computation
overlap for micro-benchmarks and the NAS benchmarks, and the insights obtained are used to modify the NAS SP benchmark, resulting
in improved overlap.
相似文献
Vinod TipparajuEmail: |
2.
Rapid prototyping of distributed systems can be achieved by integrating commercial off-the-shelf (COTS) components. With components
as the building blocks, it is important to predict the performance of the system based on the performance of individual components.
In this paper, performance prediction of a system consisting of a small number of components is investigated under different
inter-component communication patterns, and the number of threads provided by components. Based on the experimental results,
it can be inferred that the proposed composition rules provide a reasonably accurate prediction of the performance of a system
made out of these components.
相似文献
Barrett R. BryantEmail: |
3.
One of the distinct characteristics of computing platforms shared by multiple users such as a cluster and a computational
grid is heterogeneity on each computer and/or among computers. Temporal heterogeneity refers to variation, along the time
dimension, of computing power available for a task on a computer, and spatial heterogeneity represents the variation among
computers. In minimizing the average parallel execution time of a target task on a spatially heterogeneous computing system, it is not optimal to distribute the target task linearly proportional
to the average computing powers available on computers. In this paper, effects of the temporal and spatial heterogeneity on
performance of a target task have been analyzed in terms of the mean and standard deviation of parallel execution time. Based
on the analysis results, an approach to load balancing for minimizing the average parallel execution time of a target task
is described. The proposed approach whose validity has been verified through simulation considers temporal and spatial heterogeneities
in addition to the average computing power on each computer.
相似文献
Soo-Young Lee (Corresponding author)Email: |
4.
Service providers and their customers agree on certain quality of service guarantees through Service Level Agreements (SLA).
An SLA contains one or more Service Level Objectives (SLO)s that describe the agreed-upon quality requirements at the service
level. Translating these SLOs into lower-level policies that can then be used for design and monitoring purposes is a difficult
problem. Usually domain experts are involved in this translation that often necessitates application of domain knowledge to
this problem. In this article, we propose an approach that combines performance modeling with regression analysis to solve
this problem. We demonstrate that our approach is practical and that it can be applied to different n-tier services. Our experiments show that for a typical 3-tier e-commerce application in a virtualized environment, the SLA
can be met while improving CPU utilization by up to 3 times.
相似文献
Yuan ChenEmail: |
5.
Lamia Youseff Keith Seymour Haihang You Dmitrii Zagorodnov Jack Dongarra Rich Wolski 《Cluster computing》2009,12(2):101-122
Previous studies have revealed that paravirtualization imposes minimal performance overhead on High Performance Computing
(HPC) workloads, while exposing numerous benefits for this field. In this study, we are investigating the impact of paravirtualization
on the performance of automatically-tuned software systems. We compare peak performance, performance degradation in constrained
memory situations, performance degradation in multi-threaded applications, and inter-VM shared memory performance. For comparison
purposes, we examine the proficiency of ATLAS, a quintessential example of an autotuning software system, in tuning the BLAS library routines for paravirtualized systems. Our results show that the combination of ATLAS and Xen paravirtualization delivers native execution performance and nearly identical memory hierarchy performance profiles
in both single and multi-threaded scenarios. Furthermore, we show that it is possible to achieve memory sharing among OS instances
at native speeds. These results expose new benefits to memory-intensive applications arising from the ability to slim down
the guest OS without influencing the system performance. In addition, our findings support a novel and very attractive deployment
scenario for computational science and engineering codes on virtual clusters and computational clouds.
相似文献
Rich WolskiEmail: |
6.
A novel approach, called Aeneas, which is based on the execution state of distributed programs, is proposed in this paper.
It is for the real-time performance analysis of distributed programs with reliability-constrains. In Aeneas, there are two
important factors, the available data files and the transmission paths of each available data file. Some algorithms are designed
to find all the transmission paths of each data file needed while the program executes, count the transmission time for each
transmission path, then get the aggregate expression of transmission time, calculate the fastest response time and the slowest
response time of distributed programs with reliability-constrains. In order to justify the feasibility and the availability
of this approach, a series of experiments have been done. The results show that it is feasible and efficient to evaluate the
real-time performance for distributed software with reliability-constrains.
相似文献
Hai JinEmail: |
7.
We present a technique that controls the peak power consumption of a high-density server by implementing a feedback controller
that uses precise, system-level power measurement to periodically select the highest performance state while keeping the system
within a fixed power constraint. A control theoretic methodology is applied to systematically design this control loop with
analytic assurances of system stability and controller performance, despite unpredictable workloads and running environments.
In a real server we are able to control power over a 1 second period to within 1 W and over an 8 second period to within 0.1 W.
Conventional servers respond to power supply constraint situations by using simple open-loop policies to set a safe performance
level in order to limit peak power consumption. We show that closed-loop control can provide higher performance under these
conditions and implement this technique on an IBM BladeCenter HS20 server. Experimental results demonstrate that closed-loop
control provides up to 82% higher application performance compared to open-loop control and up to 17% higher performance compared
to a widely used ad-hoc technique.
相似文献
Malcolm WareEmail: |
8.
Peter Godfrey-Smith 《Biology & philosophy》2007,22(3):429-437
Jablonka and Lamb's claim that evolutionary biology is undergoing a ‘revolution’ is queried. But the very concept of revolutionary
change has uncertain application to a field organized in the manner of contemporary biology. The explanatory primacy of sequence
properties is also discussed.
相似文献
Peter Godfrey-SmithEmail: |
9.
Most parallel machines, such as clusters, are space-shared in order to isolate batch parallel applications from each other
and optimize their performance. However, this leads to low utilization or potentially long waiting times. We propose a self-adaptive
approach to time-sharing such machines that provides isolation and allows the execution rate of an application to be tightly controlled by the administrator.
Our approach combines a periodic real-time scheduler on each node with a global feedback-based control system that governs
the local schedulers. We have developed an online system that implements our approach. The system takes as input a target
execution rate for each application, and automatically and continuously adjusts the applications’ real-time schedules to achieve
those rates with proportional CPU utilization. Target rates can be dynamically adjusted. Applications are performance-isolated
from each other and from other work that is not using our system. We present an extensive evaluation that shows that the system
remains stable with low response times, and that our focus on CPU isolation and control does not come at the significant expense
of network I/O, disk I/O, or memory isolation.
相似文献
Peter A. DindaEmail: |
10.
Viraj Bhat Manish Parashar Hua Liu Nagarajan Kandasamy Mohit Khandekar Scott Klasky Sherif Abdelwahed 《Cluster computing》2007,10(4):365-383
Efficient and robust data streaming services are a critical requirement of emerging Grid applications, which are based on
seamless interactions and coupling between geographically distributed application components. Furthermore the dynamism of
Grid environments and applications requires that these services be able to continually manage and optimize their operation
based on system state and application requirements. This paper presents a design and implementation of such a self-managing
data-streaming service based on online control strategies. A Grid-based fusion workflow scenario is used to evaluate the service
and demonstrate its feasibility and performance.
相似文献
Sherif AbdelwahedEmail: |
11.
Large-scale scientific computing applications frequently make use of closely-coupled distributed parallel components. The
performance of such applications is therefore dependent on the component parts and their interaction at run-time. This paper
describes a methodology for predictive performance modelling and evaluation of parallel applications composed of multiple
interacting components. In this paper, the fundamental steps and required operations involved in the modelling and evaluation
process are identified—including component decomposition, component model combination, M×N communication modelling, dataflow analysis and overall performance evaluation. A case study is presented to illustrate the
modelling process and the methodology is verified through experimental analysis.
相似文献
Stephen A. JarvisEmail: |
12.
The integration of multiple predictors promises higher prediction accuracy than the accuracy that can be obtained with a single
predictor. The challenge is how to select the best predictor at any given moment. Traditionally, multiple predictors are run
in parallel and the one that generates the best result is selected for prediction. In this paper, we propose a novel approach
for predictor integration based on the learning of historical predictions. Compared with the traditional approach, it does
not require running all the predictors simultaneously. Instead, it uses classification algorithms such as k-Nearest Neighbor
(k-NN) and Bayesian classification and dimension reduction technique such as Principal Component Analysis (PCA) to forecast the best predictor for the workload under study based on the learning of historical predictions. Then only the
forecasted best predictor is run for prediction. Our experimental results show that it achieved 20.18% higher best predictor
forecasting accuracy than the cumulative MSE based predictor selection approach used in the popular Network Weather Service
system. In addition, it outperformed the observed most accurate single predictor in the pool for 44.23% of the performance
traces.
相似文献
Renato J. FigueiredoEmail: |
13.
14.
Over the years, we have seen a significant number of integration techniques for data warehouses to support web integrated
data. However, the existing works focus extensively on the design concept. In this paper, we focus on the performance of a
web database application such as an integrated web data warehousing using a well-defined and uniform structure to deal with
web information sources including semi-structured data such as XML data, and documents such as HTML in a web data warehouse
system. By using a case study, our implementation of the prototype is a web manipulation concept for both incoming sources
and result outputs. Thus, the system not only can be operated through the web, it can also handle the integration of web data
sources and structured data sources. Our main contribution is the performance evaluation of an integrated web data warehouse
application which includes two tasks. Task one is to perform a verification of the correctness of integrated data based on
the result set that is retrieved from the web integrated data warehouse system using complex and OLAP queries. The result
set is checked against the result set that is retrieved from the existing independent data source systems. Task two is to
measure the performance of OLAP or complex query by investigating source operation functions used by these queries to retrieve
the data. The information of source operation functions used by each query is obtained using the TKPROF utility.
相似文献
David TaniarEmail: |
15.
Studies on the effects of a variety of exogenous and anthropogenic environmental factors, including endocrine disruptors,
heavy metals, UV light, high temperature, and others, on marine organisms have been presented at the 2nd Bilateral Seminar
Italy–Japan held in November 2006. Reports were discussed in order to reveal the current situation of marine ecosystems, aiming
at evaluation and prediction of environmental risks.
相似文献
V. MatrangaEmail: |
16.
The influences of the operating system and system-specific effects on application performance are increasingly important considerations
in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship
of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel
kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes
kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level
monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our
approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility
of KTAU in integrated system/application monitoring.
相似文献
Alan MorrisEmail: |
17.
Sub-Antarctic Marion Island has had a permanent research station for 50 years and the islands Wandering Albatrosses have been intensively studied for 20 years. The reactions of breeding birds to approaches by a human on foot were recorded. Three response variables were calculated: intensity of vocal reaction (IVR), intensity of non-vocal reaction (INR) and overall response index (ORI). At 5 m from the nest, twice as many birds stood and/or vocalised as at 15 m. Nearest neighbour distance, age and gender did not explain individual variability of responses. Study colony birds had higher IVR scores than non-study colony birds; birds at colonies closest to the station had the highest ORI scores. A better breeding record was associated with lower IVR and ORI scores, but a causative relationship remains to be demonstrated. A minimum viewing distance of 25 m is recommended for breeding Wandering Albatrosses.
相似文献
Marienne S. de VilliersEmail: Fax: +27-21-6503434 |
John CooperEmail: |
Peter G. RyanEmail: |
18.
Scheduling mixed-parallel applications with advance reservations 总被引:1,自引:0,他引:1
This paper investigates the scheduling of mixed-parallel applications, which exhibit both task and data parallelism, in advance
reservations settings. Both the problem of minimizing application turn-around time and that of meeting a deadline are studied.
For each several scheduling algorithms are proposed, some of which borrow ideas from previously published work in non-reservation
settings. Algorithms are compared in simulation over a wide range of application and reservation scenarios. The main finding
is that schedules computed using the previously published CPA algorithm can be adapted to advance reservation settings, notably
resulting in low resource consumption and thus high efficiency.
相似文献
Henri Casanova (Corresponding author)Email: |
19.
Pete Beckman Kamil Iskra Kazutomo Yoshii Susan Coghlan Aroon Nataraj 《Cluster computing》2008,11(1):3-16
We investigate operating system noise, which we identify as one of the main reasons for a lack of synchronicity in parallel
applications. Using a microbenchmark, we measure the noise on several contemporary platforms and find that, even with a general-purpose
operating system, noise can be limited if certain precautions are taken. We then inject artificially generated noise into
a massively parallel system and measure its influence on the performance of collective operations. Our experiments indicate
that on extreme-scale platforms, the performance is correlated with the largest interruption to the application, even if the
probability of such an interruption on a single process is extremely small. We demonstrate that synchronizing the noise can
significantly reduce its negative influence.
相似文献
Aroon NatarajEmail: |
20.