首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating processors. In this article, we show how the execution time of collective communication operations can be improved significantly by an internal restructuring based on orthogonal processor structures with two or more levels. The execution time of operations like MPI_Bcast() or MPI_Allgather() can be reduced by 40% and 70% on a dual Xeon cluster and a Beowulf cluster with single-processor nodes. But also on a Cray T3E a significant performance improvement can be obtained by a careful selection of the processor structure. The use of these optimized communication operations can reduce the execution time of data parallel implementations of complex application programs significantly without requiring any other change of the computation and communication structure. We present runtime functions for the modeling of two-phase realizations and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.  相似文献   

2.
We present gmblock, a block-level storage sharing system over Myrinet which uses an optimized I/O path to transfer data directly between the storage medium and the network, bypassing the host CPU and main memory bus of the storage server. It is device driver independent and retains the protection and isolation features of the OS. We evaluate the performance of a prototype gmblock server and find that: (a) the proposed techniques eliminate memory and peripheral bus contention, increasing remote I/O bandwidth significantly, in the order of 20–200% compared to an RDMA-based approach, (b) the impact of remote I/O to local computation becomes negligible, (c) the performance characteristics of RAID storage combined with limited NIC resources reduce performance. We introduce synchronized send operations to improve the degree of disk to network I/O overlapping. We deploy the OCFS2 shared-disk filesystem over gmblock and show gains for various application benchmarks, provided I/O scheduling can eliminate the disk bottleneck due to concurrent access.  相似文献   

3.
Noise-induced cooperative behavior in a multicell system   总被引:4,自引:0,他引:4  
  相似文献   

4.
We discuss the ability of dynamic neural fields to track noisy population codes in an online fashion when signals are constantly applied to the recurrent network. To report on the quantitative performance of such networks we perform population decoding of the ‘orientation’ embedded in the noisy signal and determine which inhibition strength in the network provides the best decoding performance. We also study the performance of decoding on time-varying signals. Simulations of the system show good performance even in the very noisy case and also show that noise is beneficial to decoding time-varying signals.  相似文献   

5.
In oscines, male song stimulates female reproduction and females are known to adjust both their sexual preferences and their maternal investment according to song quality. Female domestic canaries are especially responsive to wide frequency bandwidth (4 kHz) male songs emitted with a high‐repetition syllable rate and low minimal frequencies (1 kHz). We previously showed that low‐frequency urban noise decreases female sexual responsiveness for these low‐frequency songs (1–5 kHz) through auditory masking. Based on the differential allocation hypothesis, we predicted that urban noise exposure will equally affect female maternal investment. Using a crossover design, we broadcast low‐frequency songs to females either in an overlapping noise condition or in an alternating noise condition. Females decreased both their sexual responsiveness and their clutch size in the overlapping noise treatment relative to the alternative noise treatment. No differences were found concerning egg size or egg composition (yolk and albumen mass, testosterone concentration). Due to our experimental design, we can exclude a general impact of noisy conditions and thereby provide evidence for a detrimental effect through masking on avian courtship and reproductive output. These results suggest that noisy conditions may also affect avian communication in outdoor conditions, which may partly explain field reports on noise‐dependent breeding success and reduced breeding densities at noisy sites.  相似文献   

6.
This paper introduces Madeleine II, a new adaptive and portable multi-protocol implementation of the Madeleine communication library. Madeleine II has the ability to control multiple network interfaces (BIP, SISCI, VIA) and multiple network adapters (Ethernet, Myrinet, SCI) within the same application session. We report on performance measurements obtained using BIP/Myrinet and SISCI/SCI and we present preliminary results about our MPICH/Madeleine II and Nexus/Madeleine II ports. We also discuss an extension of Madeleine II for clusters of clusters which is able to handle heterogeneous networks. In particular, we present the fast internal data-forwarding mechanism that is used on gateway nodes to speed up inter-cluster transmissions. Preliminary experiments show that the resulting inter-cluster bandwidth is close to the one delivered by the hardware.  相似文献   

7.
The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.  相似文献   

8.
The operations of encoding and decoding in communication agree with filtering operations of convolution and deconvolution for Gaussian signal processing. In an analogy with power transmission in thermodynamics, an autoregressive model of information transmission is proposed for representing a continuous communication system which requires a pair of an internal noise source and a signal source to encode or decode a message. In this model transinformation (informational entropy) equals the increase in stationary nonequilibrium organization formed through the amplification of white noise by a positive feedback system. The channel capacity is finite due to the existence of inherent noise in the system. The maximum entropy criterion in information dynamics corresponds to the 2nd law of thermodynamics. If the process is stationary, the communication system is invertible, and has the maximum efficiency of transformation. The total variation in informational entropy is zero in the cycle of the invertible system, while in the noninvertible system the entropy of decoding is less than that of encoding. A noisy autoregressive coding which maximizes transinformation is optimum, but is also ideal.  相似文献   

9.
Urban habitats are noisy and constrain acoustic communication in birds. We analysed the effect of anthropogenic noise on the vocalization characteristics of House Wrens Troglodytes aedon at two sites with different noise levels (rural and urban). We measured in each song and song trill the frequency bandwidth, maximum amplitude, highest and minimum frequency, and trill rate. In noisy urban environments, there was a reduction in bandwidth and an increase in trill rate relative to quieter, rural environments. The whole song of birds from both populations increased in minimum frequency as noise increased, improving song transmission.  相似文献   

10.
Extensive research over the last few decades has revealed that many acoustically communicating animals compensate for the masking effect of background noise by changing the structure of their signals. Familiar examples include birds using acoustic properties that enhance the transmission of vocalizations in noisy habitats. Here, we show that the effects of background noise on communication signals are not limited to the acoustic modality, and that visual noise from windblown vegetation has an equally important influence on the production of dynamic visual displays. We found that two species of Puerto Rican lizard, Anolis cristatellus and A. gundlachi, increase the speed of body movements used in territorial signalling to apparently improve communication in visually 'noisy' environments of rapidly moving vegetation. This is the first evidence that animals change how they produce dynamic visual signals when communicating in noisy motion habitats. Taken together with previous work on acoustic communication, our results show that animals with very different sensory ecologies can face similar environmental constraints and adopt remarkably similar strategies to overcome these constraints.  相似文献   

11.
Ambient noise interferes with the propagation of acoustic signals through the environment from sender to receiver. Over the past few centuries, urbanization and the development of busy transport networks have led to dramatic increases in the levels of ambient noise with which animal acoustic communications must compete. Here we show that urban European robins Erithacus rubecula, highly territorial birds reliant on vocal communication, reduce acoustic interference by singing during the night in areas that are noisy during the day. The effect of ambient light pollution, to which nocturnal singing in urban birds is frequently attributed, is much weaker than that of daytime noise.  相似文献   

12.
Performance analysis of MPI collective operations   总被引:1,自引:0,他引:1  
Previous studies of application usage show that the performance of collective communications are critical for high-performance computing. Despite active research in the field, both general and feasible solution to the optimization of collective communication problem is still missing. In this paper, we analyze and attempt to improve intra-cluster collective communication in the context of the widely deployed MPI programming paradigm by extending accepted models of point-to-point communication, such as Hockney, LogP/LogGP, and PLogP, to collective operations. We compare the predictions from models against the experimentally gathered data and using these results, construct optimal decision function for broadcast collective. We quantitatively compare the quality of the model-based decision functions to the experimentally-optimal one. Additionally, in this work, we also introduce a new form of an optimized tree-based broadcast algorithm, splitted-binary. Our results show that all of the models can provide useful insights into various aspects of the different algorithms as well as their relative performance. Still, based on our findings, we believe that the complete reliance on models would not yield optimal results. In addition, our experimental results have identified the gap parameter as being the most critical for accurate modeling of both the classical point-to-point-based pipeline and our extensions to fan-out topologies.
Jack J. DongarraEmail:
  相似文献   

13.
14.
As the domain of communication systems grows, heterogeneity among computers and subnetworks employed for a task also increases. Channel bandwidth available for a message on a communication network varies with time and link. This variation can have a significant effect on performance of an individual message and also that of the network as a whole. Therefore, it is important to understand effects of bandwidth heterogeneity on the network performance in order to optimally utilize a heterogeneous communication network. The ability to use such a network optimally is highly desirable in many applications such as network-based data-intensive high performance computing. The main goal of this paper is to analyze effects of temporal and spatial heterogeneity on performance of individual messages in detail via an extensive simulation, in terms of throughput, end-to-end delay, etc. Also, the problems of path selection and multi-path data transfer are considered to illustrate how the analysis results may be used in the future effort of optimizing the network performance by taking channel bandwidth heterogeneity into account.  相似文献   

15.
Flocks of starlings exhibit a remarkable ability to maintain cohesion as a group in highly uncertain environments and with limited, noisy information. Recent work demonstrated that individual starlings within large flocks respond to a fixed number of nearest neighbors, but until now it was not understood why this number is seven. We analyze robustness to uncertainty of consensus in empirical data from multiple starling flocks and show that the flock interaction networks with six or seven neighbors optimize the trade-off between group cohesion and individual effort. We can distinguish these numbers of neighbors from fewer or greater numbers using our systems-theoretic approach to measuring robustness of interaction networks as a function of the network structure, i.e., who is sensing whom. The metric quantifies the disagreement within the network due to disturbances and noise during consensus behavior and can be evaluated over a parameterized family of hypothesized sensing strategies (here the parameter is number of neighbors). We use this approach to further show that for the range of flocks studied the optimal number of neighbors does not depend on the number of birds within a flock; rather, it depends on the shape, notably the thickness, of the flock. The results suggest that robustness to uncertainty may have been a factor in the evolution of flocking for starlings. More generally, our results elucidate the role of the interaction network on uncertainty management in collective behavior, and motivate the application of our approach to other biological networks.  相似文献   

16.
The deployment of wireless sensor networks for healthcare applications have been motivated and driven by the increasing demand for real-time monitoring of patients in hospital and large disaster response environments. A major challenge in developing such sensor networks is the need for coordinating a large number of randomly deployed sensor nodes. In this study, we propose a multi-parametric clustering scheme designed to aid in the coordination of sensor nodes within cognitive wireless sensor networks. In the proposed scheme, sensor nodes are clustered together based on similar network behaviour across multiple network parameters, such as channel availability, interference characteristics, and topological characteristics, followed by mechanisms for forming, joining and switching clusters. Extensive performance evaluation is conducted to study the impact on important factors such as clustering overhead, cluster joining estimation error, interference probability, as well as probability of reclustering. Results show that the proposed clustering scheme can be an excellent candidate for use in large scale cognitive wireless sensor network deployments with high dynamics.  相似文献   

17.
Software Distributed Shared Memory (DSM) systems can be used to provide a coherent shared address space on multicomputers and other parallel systems without support for shared memory in hardware. The coherency software automatically translates shared memory accesses to explicit messages exchanged among the nodes in the system. Many applications exhibit a good performance on such systems but it has been shown that, for some applications, performance critical messages can be delayed behind less important messages because of the enqueuing behavior in the communication libraries used in current systems. We present in this paper a new portable communication library that supports priorities to remedy this situation. We describe an implementation of the communication library and a quantitative model that is used to estimate the performance impact of priorities for a typical situation. Using the model, we show that the use of high-priority communication reduces the latency of performance critical messages substantially over a wide range of network design parameters. The latency is reduced with up to 10–25% for each delaying low priority message in the queue ahead.  相似文献   

18.
Contention-Aware Communication Schedule for High-Speed Communication   总被引:1,自引:0,他引:1  
A lot of efforts have been devoted to address the software overhead problem in the past decade, which is known as the major hindrance on high-speed communication. However, this paper shows that having a low-latency communication system does not guarantee to achieve high performance, as there are other communication issues that have not been fully addressed by the use of low-latency communication, such as contention and scheduling of communication events. In this paper, we use the complete exchange operation as a case study to show that with careful design of communication schedules, we can achieve efficient communication as well as prevent congestion. We have developed a complete exchange algorithm, the Synchronous Shuffle Exchange, which is an optimal algorithm on the non-blocking network. To avoid congestion loss caused by the non-deterministic delays in communication events, a global congestion control scheme is introduced. This scheme coordinates all participating nodes to monitor and regulate the traffic load, which effectively avoids congestion loss and maintains sufficient throughput to maximize the performance. To improve the effectiveness of the congestion control scheme when working on the hierarchical network, we incorporate information on the network topology to devise a contention-aware permutation. This permutation scheme generates a communication schedule, which is both node and switch contention-free as well as distributing the network loads more evenly across the hierarchy. This relieves the congestion build-up at the uplink ports and improves the synchronism of the traffic information exchange between cluster nodes. Performance results of our implementation on a 32-node cluster with various network configurations are examined and reported in this paper.  相似文献   

19.
In recent years, there has been a growing interest in the cluster system as an accepted form of supercomputing, due to its high performance at an affordable cost. This paper attempts to elaborate performance analysis of Myrinet-based cluster. The communication performance and effect of background load on parallel applications were analyzed. For point-to-point communication, it was found that an extension to the Hockney's model was required to estimate the performance. The proposed model suggested that there should be two ranges to be used for the performance metrics to cope with the cache effect. Moreover, based on the extension of the point-to-point communication model, the Xu and Hwang's model for collective communication performance was also extended. Results showed that our models can make better estimation of the communication performance than the previous models. Finally, the interference of other user processes to the cluster system is evaluated by using synthetic background load generation programs.  相似文献   

20.
As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimization of MPI collectives for clusters of NUMA nodes. We develop performance models for collective communication using shared memory and we demonstrate several algorithms for various collectives. Experiments are conducted on both Xeon X5650 and Opteron 6100 InfiniBand clusters. The measurements agree with the model and indicate that different algorithms dominate for short vectors and long vectors. We compare our shared-memory allreduce with several MPI implementations—Open MPI, MPICH2, and MVAPICH2—that utilize system shared memory to facilitate interprocess communication. On a 16-node Xeon cluster and 8-node Opteron cluster, our implementation achieves on geometric average 2.3X and 2.1X speedup over the best MPI implementation, respectively. Our techniques enable an efficient implementation of collective operations on future multi- and manycore systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号