首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, we present a fault tolerant and recovery system called FRASystem (Fault Tolerant & Recovery Agent System) using multi-agent in distributed computing systems. Previous rollback-recovery protocols were dependent on an inherent communication and an underlying operating system, which caused a decline of computing performance. We propose a rollback-recovery protocol that works independently on an operating system and leads to an increasing portability and extensibility. We define four types of agents: (1) a recovery agent performs a rollback-recovery protocol after a failure, (2) an information agent constructs domain knowledge as a rule of fault tolerance and information during a failure-free operation, (3) a facilitator agent controls the communication between agents, (4) a garbage collection agent performs garbage collection of the useless fault tolerance information. Since agent failures may lead to inconsistent states of a system and a domino effect, we propose an agent recovery algorithm. A garbage collection protocol addresses the performance degradation caused by the increment of saved fault tolerance information in a stable storage. We implemented a prototype of FRASystem using Java and CORBA and experimented the proposed rollback-recovery protocol. The simulations results indicate that the performance of our protocol is better than previous rollback-recovery protocols which use independent checkpointing and pessimistic message logging without using agents. Our contributions are as follows: (1) this is the first rollback-recovery protocol using agents, (2) FRASystem is not dependent on an operating system, and (3) FRASystem provides a portability and extensibility.  相似文献   

2.
Fault tolerance in parallel systems has traditionally been achieved through a combination of redundancy and checkpointing methods. This notion has also been extended to message-passing systems with user-transparent process checkpointing and message logging. Furthermore, studies of multiple types of rollback and recovery have been reported in literature, ranging from communication-induced checkpointing to pessimistic and synchronous solutions. However, many of these solutions incorporate high overhead because of their inability to utilize application level information.This paper describes the design and implementation of MPI/FT, a high-performance MPI-1.2 implementation enhanced with low-overhead functionality to detect and recover from process failures. The strategy behind MPI/FT is that fault tolerance in message-passing middleware can be optimized based on an application's execution model derived from its communication topology and parallel programming semantics. MPI/FT exploits the specific characteristics of two parallel application execution models in order to optimize performance. MPI/FT also introduces the self-checking thread that monitors the functioning of the middleware itself. User aware checkpointing and user-assisted recovery are compatible with MPI/FT and complement the techniques used here.This paper offers a classification of MPI applications for fault tolerant MPI purposes and MPI/FT implementation discussed here provides different middleware versions specifically tailored to each of the two models studied in detail. The interplay of various parameters affecting the cost of fault tolerance is investigated. Experimental results demonstrate that the approach used to design and implement MPI/FT results in a low-overhead MPI-based fault tolerant communication middleware implementation.  相似文献   

3.
FPGA based distributed self healing architecture for reusable systems   总被引:1,自引:0,他引:1  
Creating an environment of “no doubt” for computing systems is critical for supporting next generation science, engineering, and commercial applications. With reconfigurable devices such as Field Programmable Gate Arrays (FPGAs), designers are provided with a seductive tool to use as a basis for sophisticated but highly reliable platforms. Reconfigurable computing platforms potentially offer the enhancement of reliability and recovery from catastrophic failures through partial and dynamic reconfigurations; and eliminate the need for redundant hardware resources typically used by existing fault-tolerant systems. We propose a two-level self-healing methodology to offer 100% availability for mission critical systems with comparatively less hardware overhead and performance degradation. Our proposed system first undertakes healing at the node-level. Failing to rectify the system at the node-level, network-level healing is then undertaken. We have designed a system based on Xilinx Virtex-5 FPGAs and Cirronet wireless mesh nodes to demonstrate autonomous wireless healing capability among networked node devices. Our prototype is a proof-of-concept work which demonstrates the feasibility of using FPGAs to provide maximum computational availability in a critical self-healing distributed architecture.  相似文献   

4.
Biological collections are gaining recognition as priceless sources of information about the historic distribution and diversity of life. The Internet is emerging as the major venue for sharing biodiversity information since it supports globalization and broad-scale interoperability. This research demonstrates how a Web-based mapping application for biological collections was developed using WebGD, an open-source software development tool, and illustrates how simple spatial analysis help collection users describe the range of ecogeographic variation in collections and customize the selection of accessions based on georeferenced variables. Our prototype can be viewed at . The demonstration site has three functional areas: (i) Query, (ii) Analyze Collections, and (iii) Add Data. The application was developed relatively quickly and at a low cost, since the complex workings for delivering GIS functions over the Web were an internal part of the WebGD framework. Because it was based on open-source code, costs were greatly decreased compared to commercially available software. In its current form, the prototype WebGRMS application provides users interested in Medicago and Trifolium germplasm with an innovative method to better understand the germplasm collections. More importantly, we hope the prototype provides a glimpse into the future of Web-based spatial analysis of biological collections. The use of trade names in this publication does not imply endorsement of the products named or criticism of similar ones not mentioned.  相似文献   

5.
The growing demand in system reliability and survivability under failures has urged ever-increasing research effort on the development of fault diagnosis and accommodation. In this paper, the on-line fault tolerant control problem for dynamic systems under unanticipated failures is investigated from a realistic point of view without any specific assumption on the type of system dynamical structure or failure scenarios. The sufficient conditions for system on-line stability under catastrophic failures have been derived using the discrete-time Lyapunov stability theory. Based upon the existing control theory and the modern computational intelligence techniques, an on-line fault accommodation control strategy is proposed to deal with the desired trajectory-tracking problems for systems suffering from various unknown and unanticipated catastrophic component failures. Theoretical analysis indicates that the control problem of interest can be solved on-line without a complete realization of the unknown failure dynamics provided an on-line estimator satisfies certain conditions. Through the on-line estimator, effective control signals to accommodate the dynamic failures can be computed using only the partially available information of the faults. Several on-line simulation studies have been presented to demonstrate the effectiveness of the proposed strategy. To investigate the feasibility of using the developed technique for unanticipated fault accommodation in hardware under the real-time environment, an on-line fault tolerant control test bed has been constructed to validate the proposed technology. Both on-line simulations and the real-time experiment show encouraging results and promising futures of on-line real-time fault tolerant control based solely upon insufficient information of the system dynamics and the failure dynamics.  相似文献   

6.
The delivery of scalable, rich multimedia applications and services on the Internet requires sophisticated technologies for transcoding, distributing, and streaming content. Cloud computing provides an infrastructure for such technologies, but specific challenges still remain in the areas of task management, load balancing, and fault tolerance. To address these issues, we propose a cloud-based distributed multimedia streaming service (CloudDMSS), which is designed to run on all major cloud computing services. CloudDMSS is highly adapted to the structure and policies of Hadoop, thus it has additional capacities for transcoding, task distribution, load balancing, and content replication and distribution. To satisfy the design requirements of our service architecture, we propose four important algorithms: content replication, system recovery for Hadoop distributed multimedia streaming, management for cloud multimedia management, and streaming resource-based connection (SRC) for streaming job distribution. To evaluate the proposed system, we conducted several different performance tests on a local testbed: transcoding, streaming job distribution using SRC, streaming service deployment and robustness to data node and task failures. In addition, we performed three different tests in an actual cloud computing environment, Cloudit 2.0: transcoding, streaming job distribution using SRC, and streaming service deployment.  相似文献   

7.
The Internet consists of a vast inhomogeneous reservoir of data. Developing software that can integrate a wide variety of different data sources is a major challenge that must be addressed for the realisation of the full potential of the Internet as a scientific research tool. This article presents a semi-automated object-oriented programming system for integrating web-based resources. We demonstrate that the current Internet standards (HTML, CGI [common gateway interface], Java, etc.) can be exploited to develop a data retrieval system that scans existing web interfaces and then uses a set of rules to generate new Java code that can automatically retrieve data from the Web. The validity of the software has been demonstrated by testing it on several biological databases. We also examine the current limitations of the Internet and discuss the need for the development of universal standards for web-based data.  相似文献   

8.

Background  

BLAST is one of the most common and useful tools for Genetic Research. This paper describes a software application we have termed Windows .NET Distributed Basic Local Alignment Search Toolkit (W.ND-BLAST), which enhances the BLAST utility by improving usability, fault recovery, and scalability in a Windows desktop environment. Our goal was to develop an easy to use, fault tolerant, high-throughput BLAST solution that incorporates a comprehensive BLAST result viewer with curation and annotation functionality.  相似文献   

9.
A fault detection service for wide area distributed computations   总被引:6,自引:0,他引:6  
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to trade off timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

10.
Housing conditions can affect the well-being of laboratory animals and thereby affect the outcomes of experiments. The appropriate environment is essential for the expression of natural behavior in animals. Here, we compared survival rates in four inbred mouse strains maintained under three different environmental conditions. Three mouse strains (C57BL/6J, C3H/HeN, and DBA/2J) housed under environmental enrichment (EE) conditions showed improved survival; however, EE did not alter the survival rate of the fourth strain, BALB/c. None of the strains showed significant differences in body weights or plasma corticosterone levels in the three environmental conditions. For BALB/c mice, the rates of debility were higher in the EE group. Interestingly, for C57BL/6J and C3H/HeN mice, the incidence of animals with alopecia was significantly lower in the EE groups than in the control group. It is possible that the enriched environment provided greater opportunities for sheltering in a secure location in which to avoid interactions with other mice. The cloth mat flooring used for the EE group was bitten and chewed by the mice. Our findings suggest that depending on the mouse strains different responses to EE are caused with regard to health and survival rates. The results of this study provide basic data for further studies on EE.  相似文献   

11.
Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory computers, the increase in computing resources and available memory is usually constrained by the coherency protocol, whose overhead rises with system size, limiting the scalability of the final system. In this paper we propose an efficient and cost-effective way to increase the memory available for a given application by leveraging free memory in other computers in the cluster. Our proposal is based on the observation that many applications benefit from having more memory resources but do not require more computing cores, thus reducing the requirements for cache coherency and allowing a simpler implementation and better scalability. Simulation results show that, when additional mechanisms intended to hide remote memory latency are used, execution time of applications that use our proposal is similar to the time required to execute them in a computer populated with enough local memory, thus validating the feasibility of our proposal. We are currently building a prototype that implements our ideas. The first results from real executions in this prototype demonstrate not only that our proposal works but also that it can efficiently execute applications that make use of remote memory resources.  相似文献   

12.
To secure interactive multimedia applications in wireless LANs (WLANs), it is pertinent to implement real time cryptographic services. In this paper we evaluate the use of software based encryption algorithms that are implemented in the layer service provider as defined by WinSock 2 for Windows 95/NT. Our measurements show that software implementation of various encryptors can sustain the throughput requirements of interactive multimedia applications for WLANs such as telephone-quality audio, video conferencing, and MPEG video. We present a design methodology that includes guidelines for a secure multimedia system design in terms of the encryption method chosen as a function of required application throughput, system configuration, protocol layers overhead and wireless LAN throughput. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

13.
The rapid growth of Internet applications has made communication anonymity an increasingly important or even indispensable security requirement. Onion routing has been employed as an infrastructure for anonymous communication over a public network, which provides anonymous connections that are strongly resistant to both eavesdropping and traffic analysis. However, existing onion routing protocols usually exhibit poor performance due to repeated encryption operations. In this paper, we first present an improved anonymous multi-receiver identity-based encryption (AMRIBE) scheme, and an improved identity-based one-way anonymous key agreement (IBOWAKE) protocol. We then propose an efficient onion routing protocol named AIB-OR that provides provable security and strong anonymity. Our main approach is to use our improved AMRIBE scheme and improved IBOWAKE protocol in onion routing circuit construction. Compared with other onion routing protocols, AIB-OR provides high efficiency, scalability, strong anonymity and fault tolerance. Performance measurements from a prototype implementation show that our proposed AIB-OR can achieve high bandwidths and low latencies when deployed over the Internet.  相似文献   

14.
The Paxos algorithm requires a single correct coordinator process to operate. After a failure, the replacement of the coordinator may lead to a temporary unavailability of the application implemented atop Paxos. So far, this unavailability has been addressed by reducing the coordinator replacement rate through the use of stable coordinator selection algorithms. We have observed that the cost of recovery of the newly elected coordinator’s state is at the core of this unavailability problem. In this paper we present a new technique to manage coordinator replacement that allows the recovery to occur concurrently with new consensus rounds. Experimental results show that our seamless approach effectively solves the temporary unavailability problem, its adoption entails uninterrupted execution of the application. Our solution removes the restriction that the occurrence of coordinator replacements is something to be avoided, allowing the decoupling of the application execution from the accuracy of the mechanism used to choose a coordinator. This result increases the performance of the application even in the presence of failures, it is of special importance to the autonomous operation of replicated applications that have to adapt to varying network conditions and partial failures.  相似文献   

15.
Increasingly, applications need to be able to self-reconfigure in response to changing requirements and environmental conditions. Autonomic computing has been proposed as a means for automating software maintenance tasks. As the complexity of adaptive and autonomic systems grows, designing and managing the set of reconfiguration rules becomes increasingly challenging and may produce inconsistencies. This paper proposes an approach to leverage genetic algorithms in the decision-making process of an autonomic system. This approach enables a system to dynamically evolve target reconfigurations at run time that balance tradeoffs between functional and non-functional requirements in response to changing requirements and environmental conditions. A key feature of this approach is incorporating system and environmental monitoring information into the genetic algorithm such that specific changes in the environment automatically drive the evolutionary process towards new viable solutions. We have applied this genetic-algorithm based approach to the dynamic reconfiguration of a collection of remote data mirrors, demonstrating an effective decision-making method for diffusing data and minimizing operational costs while maximizing data reliability and network performance, even in the presence of link failures.  相似文献   

16.
In aquatic environments, endocrine disrupting chemicals (EDCs) that interfere with the reproductive physiology of males form a threat to the reproduction of populations. This is often manifested as decreased sexual performance or sterility among males. We show that exposure to EDCs can directly affect the mating system of a marine fish, the sand goby (Pomatoschistus minutus). We exposed males for 1 to 4 weeks to two different concentrations (5 ng L− 1 and 24 ng L− 1) of 17α-ethinyl estradiol (EE2); a synthetic compound mimicking estrogen and a water control. The sand goby exhibits a polygynous mating system, in which male mating success is typically skewed towards the largest males, resulting in strong sexual selection for increased male size. Our experiment shows that when males have been exposed to EE2, male size has a smaller effect on mating success, resulting in weaker sexual selection on male size as compared to the control. There was an interaction between treatment and exposure time on the expression of vitellogenin and zona radiata protein mRNAs. Males exposed to high EE2 reached much higher expression levels than males exposed to low EE2. Of the somatic markers, the hepatosomatic index was lower in males exposed to high EE2 than in the low EE2 and control males. Our results suggest that exposure to EDCs can have effects on the mating system before physiological changes are observable. These effects can be of profound nature as they interfere with sexual selection, and may in the long run lead to the loss of traits maintained through sexual selection.  相似文献   

17.
18.
MOTIVATION: Visual programming has the potential to allow non- programmers to redesign and rebuild applications to suit their individual needs. We have built such a visual programming environment, which allows non-programmers to interrogate and combine software components graphically to form new applications. As the needs of the biological community grow, so too will the need for more powerful and easy to use software tools. Intelligent visual programming environments will allow users to design and develop applications easily, so that they can concentrate on the application they wish to build rather than how it is to be done. RESULTS: The environment can read in JAVA Beans, and present relevant information about the beans to the user. The user can then graphically specify how they would like information to flow between the beans by performing simple docking operations. Unnecessary complexities associated with such visual design have been removed by providing intelligent docking of components and visual feedback. With such mechanisms, the complexities of building new applications are reduced. When the biologist has finished the visual construction, the design system is able to generate the new application automatically. The system has been designed specifically to meet the needs of the biological community, and a range of 'BioBeans' are being developed. These include beans for visualization (sequence displays and data visualizers), analysis (feature recognition, error detection) and communication (database access, URL retrieval, DDE communication). AVAILABILITY: Freely available. CONTACT: boyle@synomics.com   相似文献   

19.
A prototype cartridge system is described that rapidly disrupts Bacillus spores by sonication, adds PCR reagent to the disrupted spores, and dispenses the mixture into a PCR tube. The total time to automatically process the spores in the cartridge and then detect the spore DNA by real-time PCR was 20 min.  相似文献   

20.
Configuring and executing applications across multiple clouds is a challenging task due to the various terminologies used by the cloud providers. Therefore, we advocate the use of autonomic systems to do this work automatically. Thus, in this paper, we propose and evaluate Dohko, an autonomic and goal-oriented system for inter-cloud environments. Dohko implements self-configuration, self-healing, and context-awareness properties. Likewise, it relies on a hierarchical P2P overlay (a) to manage the  virtual machines running on the clouds and (b) to deal with inter-cloud communication. Furthermore, it depends on a software product line engineering method to enable applications’ deployment and reconfiguration, without requiring pre-configured virtual machine images. Experimental results show that Dohko can free the users from the duty of executing non-native cloud application on single and over many clouds. In particular, it tackles the lack of middleware prototypes that can support different scenarios when using simultaneous services from multiple clouds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号