首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Storing enormous amount of data on hybrid storage systems has become a widely accepted solution for today’s production level applications in order to trade off the performance and cost. However, how to improve the performance of large scale storage systems with hybrid components (e.g. solid state disks, hard drives and tapes) and complicated user behaviors is not fully explored. In this paper, we conduct an in-depth case study (we call it FastStor) on designing a high performance hybrid storage system to support one of the world’s largest satellite images distribution systems operated by the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center. We demonstrate how to combine conventional caching policies with innovative current popularity oriented and user-specific prefetching algorithms to improve the performance of the EROS system. We evaluate the effectiveness of our proposed solution using over 5 million real world user download requests provided by EROS. Our experimental results show that using the Least Recently Used (LRU) caching policy alone, we are able to achieve an overall 64 % or 70 % hit ratio on a 100 TB or 200 TB FTP server farm composed of Solid State Disks (SSDs) respectively. The hit ratio can be further improved to 70 % (for 100 TB SSDs) and 76 % (for 200 TB SSDs) if intelligent prefetching algorithms are used together with LRU.  相似文献   

2.
File systems provide an interface for applications to obtain exclusive access to files, in which a process holds privileges to a file that cannot be preempted and restrict the capabilities of other processes. Local file systems do this by maintaining information about the privileges of current file sessions, and checking subsequent sessions for compatibility. Implementing exclusive access in this manner for distributed file systems degrades performance by requiring every new file session to be registered with a lock server that maintains global session state. We present two techniques for improving the performance of session management in the distributed environment. We introduce a distributed lock for managing file access, called a semi-preemptible lock, that allows clients to cache privileges. Under a semi-preemptible lock, a file system creates new sessions without messages to the lock manager. This improves performance by exploiting locality – the affinity of files to clients. We also present data structures and algorithms for the dynamic evaluation of locks that allow a distributed file system to efficiently manage arbitrarily complex locking. In this case, complex means that an object can be locked in a large number of unique modes. The combination of these techniques results in a distributed locking scheme that supports fine-grained concurrency control with low memory and message overhead and with the assurance that their locking system is correct and avoids unnecessary deadlocks.  相似文献   

3.
As the number of Internet users increase explosively, the delay in network response time is also increasing. An economic and efficient solution for this problem is web caching. But the use of a cache server can cause another bottleneck because of the concentration of requests at the cache server. Many studies on improving cache server performance have been suggested, but existing studies have focused on load balancing and/or caching capacity, not directly on improving the throughput of a single cache server. In this paper, we analyze the causes of cache server bottleneck, and propose an arbitral thread and the delayed caching mechanism as a solution. We use an arbitral thread in order to provide a quick service to users’ service requests, and we use delayed caching in order to improve system reliability. The proposed cache server is implemented through a modification of the SQUID cache server, and we compare its performance with the original SQUID cache server.  相似文献   

4.
Given the existence of powerful multiprocessor client workstations in many client-server object database applications, the performance bottleneck is the delay in transferring pages from the server to the client. We present a prefetching technique that can avoid this delay, especially where the client application requests pages from several database servers. This technique has been added to the EXODUS storage manager. Part of the novelty of this approach lies in the way that multithreading on the client workstation is exploited, in particular for activities such as prefetching and flushing dirty pages to the server. Using our own complex object benchmark, we analyze the performance of the prefetching technique with multiple clients and multiple servers. The technique is also tested under a variety of client host workload levels. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

5.
We present a new method which allows a swarm of robots to sort arbitrarily arranged objects into homogeneous clusters. In the ideal case, a distributed robotic sorting method should establish a single homogeneous cluster for each object type. This can be achieved with existing methods, but the rate of convergence is considered too slow for real-world application. Previous research on distributed robotic sorting is typified by randomised movement with a pick-up/deposit behaviour that is a probabilistic function of local object density. We investigate whether the ability of each robot to localise and return to remembered places can improve distributed sorting performance. In our method, each robot maintains a cache point for each object type. Upon collecting an object, it returns to add this object to the cluster surrounding the cache point. Similar to previous biologically inspired work on distributed sorting, no explicit communication between robots is implemented. However, the robots can still come to a consensus on the best cache for each object type by observing clusters and comparing their sizes with remembered cache sizes. We refer to this method as cache consensus. Our results indicate that incorporating this localisation capability enables a significant improvement in the rate of convergence. We present experimental results using a realistic simulation of our targeted robotic platform. A subset of these experiments is also validated on physical robots.  相似文献   

6.
In this paper, we propose a worst-case weighted approach to the multi-objective n-person non-zero sum game model where each player has more than one competing objective. Our “worst-case weighted multi-objective game” model supposes that each player has a set of weights to its objectives and wishes to minimize its maximum weighted sum objectives where the maximization is with respect to the set of weights. This new model gives rise to a new Pareto Nash equilibrium concept, which we call “robust-weighted Nash equilibrium”. We prove that the robust-weighted Nash equilibria are guaranteed to exist even when the weight sets are unbounded. For the worst-case weighted multi-objective game with the weight sets of players all given as polytope, we show that a robust-weighted Nash equilibrium can be obtained by solving a mathematical program with equilibrium constraints (MPEC). For an application, we illustrate the usefulness of the worst-case weighted multi-objective game to a supply chain risk management problem under demand uncertainty. By the comparison with the existed weighted approach, we show that our method is more robust and can be more efficiently used for the real-world applications.  相似文献   

7.
Numerous studies show that miss ratios at forward proxies are typically at least 40–50%. This paper proposes and evaluates a new approach for improving the throughput of Web proxy systems by reducing the overhead of handling cache misses. Namely, we propose to front-end a Web proxy with a high performance node that filters the requests, processing the misses and forwarding the hits and the new cacheable content to the proxy. Requests are filtered based on hints of the proxy cache content. This system, called Proxy Accelerator, achieves significantly better communications performance than a traditional proxy system. For instance, an accelerator can be built as an embedded system optimized for communication and HTTP processing, or as a kernel-mode HTTP server. Scalability with the Web proxy cluster size is achieved by using several accelerators. We use analytical models, trace-based simulations, and a real implementation to study the benefits and the implementation tradeoffs of this new approach. Our results show that a single proxy accelerator node in front of a 4-node Web proxy can improve the cost-performance ratio by about 40%. Hint-based request filter implementation choices that do not affect the overall hit ratio are available. An implementation of the hint management module integrated in Web proxy software is presented. Experimental evaluation of the implementation demonstrates that the associated overheads are very small.  相似文献   

8.
9.
The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster.  相似文献   

10.
The development of CPU has stepped into the era of multicore. Current parallel simulation kernel utilizes multicore resource by multi-process, which leads to inefficiency on communication and synchronization. To fulfill this gap, we proposed a HPSK (hierarchical parallel simulation kernel) model, which schedules logical processes and executes events in parallel with multithread paradigm. Based on this model, three key algorithms were proposed to support high performance: (1) An event management algorithm was proposed to improve the efficiency of creation and release of events. It uses a lock-free creation and asynchronous commitment mechanism to decouple the relationship between threads, hence reduce the overhead of locks. (2) A pointer-based communication algorithm was proposed to improve efficiency of communication between threads. It uses a buffer mechanism to avoid interrupting the execution of target thread. The target thread will read events from the buffers when it needs. By using ring-structure buffers, synchronization between sending and receiving of threads can be annihilated. (3) An approximate method was proposed to compute LBTS (Lower Bound on Time Stamp). It uses an asynchronous mechanism to eliminate disturbing of thread execution and a two-level filter mechanism to reduce redundant LBTS computation. A series of experiments with a modified phold model show that HPSK can achieve good performance for applications on different conditions. It can run 8× faster than μsik when event locality and lookahead is low.  相似文献   

11.
Recently, software distributed shared memory systems have successfully provided an easy user interface to parallel user applications on distributed systems. In order to prompt program performance, most of DSM systems usually were greedy to utilize all of available processors in a computer network to execute user programs. However, using more processors to execute programs cannot necessarily guarantee to obtain better program performance. The overhead of paralleling programs is increased by the addition in the number of processors used for program execution. If the performance gain from program parallel cannot compensate for the overhead, increasing the number of execution processors will result in performance degradation and resource waste. In this paper, we proposed a mechanism to dynamically find a suitable system scale to optimize performance for DSM applications according to run-time information. The experimental results show that the proposed mechanism can precisely predict the processor number that will result in the best performance and then effectively optimize the performance of the test applications by adapting system scale according to the predicted result. Yi-Chang Zhuang received his B.S., M.S. and Ph.D. degrees in electrical engineering from National Cheng Kung University in 1995, 1997, and 2004. He is currently working as an engineer at Industrial Technology Research Institute in Taiwan. His research interests include object-based storage, file systems, distributed systems, and grid computing. Jyh-Biau Chang is currently an assistant professor at the Information Management Department of Leader University in Taiwan. He received his B.S., M.S. and Ph.D. degrees from Electrical Engineering Department of National Cheng Kung University in 1994, 1996, and 2005. His research interest is focused on cluster and grid computing, parallel and distributed system, and operating system. Tyng-Yeu Liang is currently an assistant professor who teaches and studies at Department of Electrical Engineering, National Kaohsiung University of Applied Sciences in Taiwan. He received his B.S., M.S. and Ph.D. degrees from National Cheng Kung University in 1992, 1994, and 2000. His study is interested in cluster and grid computing, image processing and multimedia. Ce-Kuen Shieh currently is a professor at the Electrical Engineering Department of National Cheng Kung University in Taiwan. He is also the chief of computation center at National Cheng Kung University. He received his Ph.D. degree from the Department of Electrical Engineering of National Cheng Kung University in 1988. He was the chairman of the Electrical Engineering Department of National Cheng Kung University from 2002 to 2005. His research interest is focused on computer network, and parallel and distributed system. Laurence T. Yang is a professor at the Department of Computer Science, St. Francis Xavier University, Canada. His research includes high performance computing and networking, embedded systems, ubiquitous/pervasive computing and intelligence, and autonomic and trusted computing.  相似文献   

12.

Real-time accurate traffic congestion prediction can enable Intelligent traffic management systems (ITMSs) that replace traditional systems to improve the efficiency of traffic and reduce traffic congestion. The ITMS consists of three main layers, which are: Internet of Things (IoT), edge, and cloud layers. Edge can collect real-time data from different routes through IoT devices such as wireless sensors, and then it can compute and store this collected data before transmitting them to the cloud for further processing. Thus, an edge is an intermediate layer between IoT and cloud layers that can receive the transmitted data through IoT to overcome cloud challenges such as high latency. In this paper, a novel real-time traffic congestion prediction strategy (TCPS) is proposed based on the collected data in the edge’s cache server at the edge layer. The proposed TCPS contains three stages, which are: (i) real-time congestion prediction (RCP) stage, (ii) congestion direction detection (CD2) stage, and (iii) width change decision (WCD) stage. The RCP aims to predict traffic congestion based on the causes of congestion in the hotspot using a fuzzy inference system. If there is congestion, the CD2 stage is used to detect the congestion direction based on the predictions from the RCP by using the Optimal Weighted Naïve Bayes (OWNB) method. The WCD stage aims to prevent the congestion occurrence in which it is used to change the width of changeable routes (CR) after detecting the direction of congestion in CD2. The experimental results have shown that the proposed TCPS outperforms other recent methodologies. TCPS provides the highest accuracy, precision, and recall. Besides, it provides the lowest error, with values equal to 95%, 74%, 75%, and 5% respectively.

  相似文献   

13.
A first order analytical approximation of steady-state heat conduction in a hollow cylinder exchanging heat at its external surface by convection with a cold and windy environment is presented. The model depicts the thermal behavior of certain body elements, e.g., head/face, when exposed to such environments. The results are presented by dimensionless parameters and facilitate the estimation of wind chill equivalent temperatures (WCETs). The effects of several variables on determining WCETs were studied using specific examples, leading to the following generalizations: (1) the conditions assumed for "calm" wind speed appear to be a dominant factor in determining WCET; (2) the effects, on both (skin) surface temperature and on WCET, of a 1°C change in environmental temperature appear to be more pronounced than those of a 1 m/s change in wind speed; (3) similarly, predicted WCETs are more sensitive to the geometrical dimensions assumed for the modeled entity than they are to wind speeds; and (4) tissue thermal conductivity, the angle at which the convective heat transfer coefficient is measured relative to wind direction, and the factor used to establish "effective" wind speeds in the domain occupied by humans relative to reported values, all seem to have relatively small effects on the determination of WCET. These conclusions strongly suggest, among other things, that for any given combination of environmental conditions, wind chill indices may best be presented as ranges rather than as single values. This seems to apply even when worst-case scenarios are considered. Also emphasized is the need for careful and realistic selection of all the parameter values used in the determination of WCETs.  相似文献   

14.
Gas bubbles induced during the radiofrequency ablation (RFA) of tissues can affect the detection of ablation zones (necrosis zone or thermal lesion) during ultrasound elastography. To resolve this problem, our previous study proposed ultrasound Nakagami imaging for detecting thermal-induced bubble formation to evaluate ablation zones. To prepare for future applications, this study (i) created a novel algorithmic scheme based on the frequency and temporal compounding of Nakagami imaging for enhanced ablation zone visualization, (ii) integrated the proposed algorithm into a clinical scanner to develop a real-time Nakagami imaging system for monitoring RFA, and (iii) investigated the applicability of Nakagami imaging to various types of tissues. The performance of the real-time Nakagami imaging system in visualizing RFA-induced ablation zones was validated by measuring porcine liver (n = 18) and muscle tissues (n = 6). The experimental results showed that the proposed algorithm can operate on a standard clinical ultrasound scanner to monitor RFA in real time. The Nakagami imaging system effectively monitors RFA-induced ablation zones in liver tissues. However, because tissue properties differ, the system cannot visualize ablation zones in muscle fibers. In the future, real-time Nakagami imaging should be focused on the RFA of the liver and is suggested as an alternative monitoring tool when advanced elastography is unavailable or substantial bubbles exist in the ablation zone.  相似文献   

15.
Parallel file systems have been developed in recent years to ease the I/O bottleneck of high-end computing system. These advanced file systems offer several data layout strategies in order to meet the performance goals of specific I/O workloads. However, while a layout policy may perform well on some I/O workload, it may not perform as well for another. Peak I/O performance is rarely achieved due to the complex data access patterns. Data access is application dependent. In this study, a cost-intelligent data access strategy based on the application-specific optimization principle is proposed. This strategy improves the I/O performance of parallel file systems. We first present examples to illustrate the difference of performance under different data layouts. By developing a cost model which estimates the completion time of data accesses in various data layouts, the layout can better match the application. Static layout optimization can be used for applications with dominant data access patterns, and dynamic layout selection with hybrid replications can be used for applications with complex I/O patterns. Theoretical analysis and experimental testing have been conducted to verify the proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach can provide up to a 74% performance improvement for data-intensive applications.  相似文献   

16.
In vivo non-linear optical microscopy has been essential to advance our knowledge of how intact biological systems work. It has been particularly enabling to decipher fast spatiotemporal cellular dynamics in neural networks. The power of the technique stems from its optical sectioning capability that in turn also limits its application to essentially immobile tissue. Only tissue not affected by movement or in which movement can be physically constrained can be imaged fast enough to conduct functional studies at high temporal resolution. Here, we show dynamic two-photon Ca(2+) imaging in the spinal cord of a living rat at millisecond time scale, free of motion artifacts using an optical stabilization system. We describe a fast, non-contact adaptive movement compensation approach, applicable to rough and weakly reflective surfaces, allowing real-time functional imaging from intrinsically moving tissue in live animals. The strategy involves enslaving the position of the microscope objective to that of the tissue surface in real-time through optical monitoring and a closed feedback loop. The performance of the system allows for efficient image locking even in conditions of random or irregular movements.  相似文献   

17.
I/O bottlenecks are already a problem in many large-scale applications that manipulate huge datasets. This problem is expected to get worse as applications get larger, and the I/O subsystem performance lags behind processor and memory speed improvements. At the same time, off-the-shelf clusters of workstations are becoming a popular platform for demanding applications due to their cost-effectiveness and widespread deployment. Caching I/O blocks is one effective way of alleviating disk latencies, and there can be multiple levels of caching on a cluster of workstations. Previous studies have shown the benefits of caching—whether it be local to a particular node, or a shared global cache across the cluster—for certain applications. However, we show that while caching is useful in some situations, it can hurt performance if we are not careful about what to cache and when to bypass the cache. This paper presents compilation techniques and runtime support to address this problem. These techniques are implemented and evaluated on an experimental Linux/Pentium cluster running a parallel file system. Our results using a diverse set of applications (scientific and commercial) demonstrate the benefits of a discretionary approach to caching for I/O subsystems on clusters, providing as much as 48% savings in overall execution time over indiscriminately caching everything in some applications. Parts of this paper have appeared in the Proceedings of the 3rd IEEE/ACM Symposium on Cluster Computing and the Grid (CCGrid'03). This paper is an extension of these prior results, and includes a more extensive performance evaluation. Murali Vilayannur is a Ph.D. student in the Department of Computer Science and Engineering at The Pennsylvania State University. His research interests are in High-Performance Parallel I/O, File Systems, Virtual Memory Algorithms and Operating Systems. Anand Sivasubramaniam received his B.Tech. in Computer Science from the Indian Institute of Technology, Madras, in 1989, and the M.S. and Ph.D. degrees in Computer Science from the Georgia Institute of Technology in 1991 and 1995 respectively. He has been on the faculty at The Pennsylvania State University since Fall 1995 where he is currently an Associate Professor. Anand's research interests are in computer architecture, operating systems, performance evaluation, and applications for both high performance computer systems and embedded systems. Anand's research has been funded by NSF through several grants, including the CAREER award, and from industries including IBM, Microsoft and Unisys Corp. He has several publications in leading journals and conferences, and is on the editorial board of IEEE Transactions on Computers and IEEE Transactions on Parallel and Distributed Systems. He is a recipient of the 2002 IBM Faculty Award. Anand is a member of the IEEE, IEEE Computer Society, and ACM. Mahmut Kandemir received the B.Sc. and M.Sc. degrees in control and computer engineering from Istanbul Technical University, Istanbul, Turkey, in 1988 and 1992, respectively. He received the Ph.D. from Syracuse University, Syracuse, New York in electrical engineering and computer science, in 1999. He has been an assistant professor in the Computer Science and Engineering Department at the Pennsylvania State University since August 1999. His main research interests are optimizing compilers, I/O intensive applications, and power-aware computing. He is a member of the IEEE and the ACM. Rajeev Thakur is a Computer Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. He received a B.E. from the University of Bombay, India, in 1990, M.S. from Syracuse University in 1992, and Ph.D. from Syracuse University in 1995, all in computer engineering. His research interests are in the area of high-performance computing in general and high-performance networking and I/O in particular. He was a member of the MPI Forum and participated actively in the definition of the I/O part of the MPI-2 standard. He is the author of a widely used, portable implementation of MPI-IO, called ROMIO. He is also a co-author of the book “Using MPI-2: Advanced Features of the Message Passing Interface” published by MIT Press. Robert Ross received his Ph.D. in Computer Engineering from Clemson University in 2000. He is now an Assistant Scientist in the Mathematics and Computer Science Division at Argonne National Laboratory. His research interests are in message passing and storage systems for high performance computing environments. He is the primary author and lead developer for the Parallel Virtual File System (PVFS), a parallel file system for Linux clusters. Current projects include the ROMIO MPI-IO implementation, PVFS, PVFS2, and the MPICH2 implementation of the MPI message passing interface.  相似文献   

18.
The availability of a large number of separate clusters has given rise to the field of multicluster systems in which these resources are coupled to obtain their combined benefits to solve large-scale compute-intensive applications. However, it is challenging to achieve automatic load balancing of the jobs across these participating autonomic systems. We developed a novel user space execution model named DA-TC to address the workload allocation techniques for the applications with large number of sequential jobs in multicluster systems. Through this model, we can achieve dynamic load balancing for task assignment, and slower resources become beneficial factors rather than bottlenecks for application execution. The effectiveness of this strategy is demonstrated through theoretical analysis. This model is also evaluated through extensive experimental studies and the results show that when compared with the traditional method, the proposed DA-TC model can significantly improve the performance of application execution in terms of application turnaround time and system reliability in multicluster circumstances.  相似文献   

19.
Consolidation of multiple applications with diverse and changing resource requirements is common in multicore systems as hardware resources are abundant. As opportunities for better system usage become ample, so are opportunities to degrade individual application performances due to unregulated performance interference between applications and system resources. Can we predict a performance region within which application performance is expected to lie under different consolidations? Alternatively, can we maximize resource utilization while maintaining individual application performance targets? In this work we provide a methodology that offers answers to the above difficult questions by constructing a queueing-theory based tool that can be used to accurately predict application scalability on multicores. The tool can also provide the optimal consolidation suggestions to maximize system resource utilization while meeting application performance targets. The proposed methodology is based on asymptotic analysis that can quickly provide a range of performance values that the user should expect under various consolidation scenarios. In addition, when more accurate performance forecasting is needed, the methodology can provide more accurate predictions using approximate mean value analysis. The methodology is light-weight as it relies on capturing application resource demands using standard system monitoring, via non-intrusive low-level measurements. We evaluate our approach on an IBM Power7 system using the DaCapo and SPECjvm2008 benchmark suites. From 900 different consolidations of application instances, our tool accurately predicts the average iteration time of collocated applications with an average error below 9 per cent. Experimental and analytical results are in excellent agreement, confirming the robustness of the proposed methodology in suggesting the best consolidations that meet given performance objectives of individual applications while maximizing system resource utilization.  相似文献   

20.
Energy aware DAG scheduling on heterogeneous systems   总被引:1,自引:0,他引:1  
We address the problem of scheduling directed a-cyclic task graph (DAG) on a heterogeneous distributed processor system with the twin objectives of minimizing finish time and energy consumption. Previous scheduling heuristics have assigned DAGs to processors to minimize overall run-time of the application. But applications on embedded systems, such as high performance DSP in image processing, multimedia, and wireless security, need schedules which use low energy too.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号