首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Clusters of Symmetrical Multiprocessors (SMPs) have recently become the norm for high-performance economical computing solutions. Multiple nodes in a cluster can be used for parallel programming using a message passing library. An alternate approach is to use a software Distributed Shared Memory (DSM) to provide a view of shared memory to the application programmer. This paper describes Strings, a high performance distributed shared memory system designed for such SMP clusters. The distinguishing feature of this system is the use of a fully multi-threaded runtime system, using kernel level threads. Strings allows multiple application threads to be run on each node in a cluster. Since most modern UNIX systems can multiplex these threads on kernel level light weight processes, applications written using Strings can exploit multiple processors on a SMP machine. This paper describes some of the architectural details of the system and illustrates the performance improvements with benchmark programs from the SPLASH-2 suite, some computational kernels as well as a full fledged application. It is found that using multiple processes on SMP nodes provides good speedups only for a few of the programs. Multiple application threads can improve the performance in some cases, but other programs show a slowdown. If kernel threads are used additionally, the overall performance improves significantly in all programs tested. Other design decisions also have a beneficial impact, though to a lesser degree. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

2.
As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimization of MPI collectives for clusters of NUMA nodes. We develop performance models for collective communication using shared memory and we demonstrate several algorithms for various collectives. Experiments are conducted on both Xeon X5650 and Opteron 6100 InfiniBand clusters. The measurements agree with the model and indicate that different algorithms dominate for short vectors and long vectors. We compare our shared-memory allreduce with several MPI implementations—Open MPI, MPICH2, and MVAPICH2—that utilize system shared memory to facilitate interprocess communication. On a 16-node Xeon cluster and 8-node Opteron cluster, our implementation achieves on geometric average 2.3X and 2.1X speedup over the best MPI implementation, respectively. Our techniques enable an efficient implementation of collective operations on future multi- and manycore systems.  相似文献   

3.
Software Distributed Shared Memory (DSM) systems can be used to provide a coherent shared address space on multicomputers and other parallel systems without support for shared memory in hardware. The coherency software automatically translates shared memory accesses to explicit messages exchanged among the nodes in the system. Many applications exhibit a good performance on such systems but it has been shown that, for some applications, performance critical messages can be delayed behind less important messages because of the enqueuing behavior in the communication libraries used in current systems. We present in this paper a new portable communication library that supports priorities to remedy this situation. We describe an implementation of the communication library and a quantitative model that is used to estimate the performance impact of priorities for a typical situation. Using the model, we show that the use of high-priority communication reduces the latency of performance critical messages substantially over a wide range of network design parameters. The latency is reduced with up to 10–25% for each delaying low priority message in the queue ahead.  相似文献   

4.
This paper presents a design, an architecture, and performance evaluation of high-performance network of PC cluster, called Maestro. Most networks of recent clusters have been organized based on WAN or LAN technology, due to their market availability. However, communication protocols and functions of such conventional networks are not optimal for parallel computing, which requires low latency and high bandwidth communication. In this paper, we propose two optimizations for high-performance communication: (1) transferring in burst as many packets as the receiving buffer accepts at once, and (2) having each hardware component pass one data unit to another in a pipelined manner. We have developed a network interface and a switch, which are composed of dedicated hardware modules to realize these optimizations. An implementatin of the message passing library developed on Maestro cluster is also described. Performance evaluation shows that the proposed optimizations can extract the potential performance of the physical layer efficiently and improve the performance in communication.  相似文献   

5.
This paper describes studies of the carbohydrate-carbohydrate interaction (CCI) between micelles of a lactosyl lipid and monolayers of the glycosphingolipid GM(3). The lactose Lac.GM(3) interaction is involved in B16 melanoma cell adhesion and signaling processes, and a thorough understanding of the molecular details of this CCI is important for the design of new anti-adhesive and anti-metastatic agents. In this paper, we examine the influence of variations in divalent cations and subphase ionic strength on the Lac.GM(3) interaction. Our results indicate that, in the absence of divalent cations, the Lac.GM(3) CCI is strengthened at higher sodium chloride concentrations in the subphase. In contrast, when divalent cations are present in solution, the CCI is not as sensitive to ionic strength. These results suggest a role for both cation dependent as well as independent interactions in the Lac.GM(3) CCI.  相似文献   

6.
Matsubara T  Iida M  Tsumuraya T  Fujii I  Sato T 《Biochemistry》2008,47(26):6745-6751
We obtained a novel carbohydrate-binding peptide having a helix-loop-helix scaffold from a random peptide library. The helix-loop-helix peptide library randomized at five amino acid residues was displayed on the major coat protein of a filamentous phage. Affinity selection with a ganglioside, Galbeta1-3GalNAcbeta1-4(Neu5Acalpha2-3)Galbeta1-4Glcbeta1-1'Cer (GM1), gave positive phage clones. Surface plasmon resonance spectroscopy showed that a corresponding 35-mer synthetic peptide had high affinity for GM1 with a dissociation constant of 0.24 microM. This peptide preferentially binds to GM1 rather than asialo GM1 and GM2, suggesting that a terminal galactose and sialic acid are required for the binding as for cholera toxin. Circular dichroism spectroscopic studies indicated that a helical structure is important for the affinity and specificity. Furthermore, alanine scanning at randomized positions showed that arginine and phenylalanine play an especially important role in the recognition of carbohydrates. Such a de novo helix-loop-helix peptide would be available for the design of carbohydrate-binding proteins.  相似文献   

7.
8.
This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. The architecture addresses what is considered one of the most important problems of cluster-based parallel computing: the inherent inability of scaling the performance of communication software along with the host CPU performance. The Virtual Communication Machine (VCM), resident on the network coprocessor, presents a scalable software solution by providing configurable communication functionality directly accessible at user-level. The VCM architecture is configurable in that it enables the transfer to the VCM of selected communication-related functionality that is traditionally part of the application and/or the host kernel. Such transfers are beneficial when a significant reduction of the host CPU's load translates into a small increase in the coprocessor's load. The functionality implemented by the coprocessor is available at the application level as VCM instructions. Host CPU(s) and coprocessor interact through shared memory regions, thereby avoiding expensive CPU context switches. The host kernel is not involved in this interaction; it simply “connects” the application to the VCM during the initialization phase and is called infrequently to handle exceptional conditions. Protection is enforced by the VCM based on information supplied by the kernel. The VCM-based communication architecture admits low cost and open implementations, as demonstrated by its current ATM-based implementation based on off-the-shelf hardware components and using standard AAL5 packets. The architecture makes it easy to implement communication software that exhibits negligible overheads on the host CPU(s) and offers latencies and bandwidths close to the hardware limits of the underlying network. These characteristics are due to the VCM's support for zero-copy messaging with gather/scatter capabilities and the VCM's direct access to any data structure in an application's address space. This paper describes two versions of an ATM-based VCM implementation, which differ in the way they use the memory on the network adapter. Their performance under heavy load is compared in the context of a synthetic client/server application. The same application is used to evaluate the scalability of the architecture to multiple VCM-based network interfaces per host. Parallel implementations of the Traveling Salesman Problem and of Georgia Tech Time Warp, an engine for discrete-event simulation, are used to demonstrate VCM functionality and the high performance of its implementation. The distributed- and shared-memory versions of these two applications exhibit comparable performance, despite the significant cost-performance advantage of the distributed-memory platform. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

9.
To avoid the memory registration cost for small messages in MPI implementations over RDMA-enabled networks, message transfer protocols involve a copy to intermediate buffers at both sender and receiver. In this paper, we propose to eliminate the send-side copy when an application buffer is reused frequently. We show that it is more efficient to register the application buffer and use it for data transfer. The idea is examined for small message transfer protocols in MVAPICH2, including RDMA Write and Send/Receive based communications, one-sided communications and collectives. The proposed protocol adaptively falls back to the current protocol when the application does not frequently use its buffers. The performance results over InfiniBand indicate up to 14% improvement for single message latency, close to 20% improvement for one-sided operations and up to 25% improvement for collectives. In addition, the communication time in MPI applications with high buffer reuse is improved using this technique.  相似文献   

10.
Effective chemotherapy for solid cancers is challenging due to a limitation in permeation that prevents anticancer drugs from reaching the center of the tumor, therefore unable to limit cancer cell growth. To circumvent this issue, we planned to apply the drugs directly at the center by first collapsing the outer structure. For this, we focused on cell–cell communication (CCC) between N-glycans and proteins at the tumor cell surface. Mature N-glycans establish CCC; however, CCC is hindered when numerous immature N-glycans are present at the cell surface. Inhibition of Golgi mannosidases (GMs) results in the transport of immature N-glycans to the cell surface. This can be employed to disrupt CCC. Here, we describe the molecular design and synthesis of an improved GM inhibitor with a non-sugar mimic scaffold that was screened from a compound library. The synthesized compounds were tested for enzyme inhibition ability and inhibition of spheroid formation using cell-based methods. Most of the compounds designed and synthesized exhibited GM inhibition at the cellular level. Of those, AR524 had higher inhibitory activity than a known GM inhibitor, kifunensine. Moreover, AR524 inhibited spheroid formation of human malignant cells at low concentration (10 µM), based on the disruption of CCC by GM inhibition.  相似文献   

11.
Presence of microdomains has been postulated in the cell membrane, but two-dimensional distribution of lipid molecules has been difficult to determine in the submicrometer scale. In the present paper, we examined the distribution of gangliosides GM1 and GM3, putative raft molecules in the cell membrane, by immunoelectron microscopy using quick-frozen and freeze-fractured specimens. This method physically immobilized molecules in situ and thus minimized the possibility of artifactual perturbation. By point pattern analysis of immunogold labeling, GM1 was shown to make clusters of <100 nm in diameter in normal mouse fibroblasts. GM1-null fibroblasts were not labeled, but developed a similar clustered pattern when GM1 was administered. On cholesterol depletion or chilling, the clustering of both endogenous and exogenously-loaded GM1 decreased significantly, but the distribution showed marked regional heterogeneity in the cells. GM3 also showed cholesterol-dependent clustering, and although clusters of GM1 and GM3 were found to occasionally coincide, these aggregates were separated in most cases, suggesting the presence of heterogeneous microdomains. The present method enabled to capture the molecular distribution of lipids in the cell membrane, and demonstrated that GM1 and GM3 form clusters that are susceptible to cholesterol depletion and chilling.  相似文献   

12.
This paper examines the numerical and functional consequences of various stimuli on antiviral CD8+ T-cell memory using a mathematical model. The model is based upon biological evidence from the murine model of infection with lymphocytic choriomeningitis virus (LCMV) that the phenotype of immunological memory represents low-level responses driven by various stimuli, and the memory CTL population is partitioned between resting, cycling and effector cells. These subpopulations differ in their lifespan, their potential to mediate antiviral protection and in the stimuli needed for their maintenance. Three types of maintenance stimuli are examined: non-antigen-specific (bystander) stimulation, persisting antigen stimulation and reinfection-mediated stimulation. The modelling predicts that: (i) stable persistence of CTL memory requires the presence of either bystander or antigen-specific stimulation above a certain threshold depending on the sensitivity of memory CTL to stimulation and their life-span; (ii) a relatively low level of stimuli (approximately 10(4) fold less on a per CTL basis compared to acute infection) is needed to stabilize the expanded memory CTL population; (iii) the presence of CTL subsets in the memory pool of different activation states and lifespans ensures the robustness of memory persistence in the face of temporal variation in the low-level stimuli and; (iv) an 'optimal' population structure of the memory CTL pool, in terms of immediate protection, requires the presence of both activated cycling and effector CTL. For this, persisting antigen alone or synergistically with bystander signals provide the appropriate stimulation, so that the stimuli equivalent to approximately 30 p.f.u. of LCMV in the spleen are sufficient to maintain approximately 10(5)-10(6) specific CTL in the memory pool. These observations are relevant both to our understanding of natural protective immunity and to vaccine design.  相似文献   

13.
Y Fukano  M Ito 《Applied microbiology》1997,63(5):1861-1865
This paper describes the preparation of monosialoganglioside GM1 with sialidase-producing marine bacteria as a microbial biocatalyst. A new sialidase-producing bacterium, identified tentatively as Pseudomonas sp. strain YF-2, was isolated from seawater by enrichment culture with ganglioside as the sole source of carbon. When YF-2 was cultured in a synthetic medium containing crude bovine brain gangliosides at 25 degrees C for 3 days, 80 to 90% of the gangliosides were converted to GM1. GM1 was then purified from the supernatant of YF-2 culture by C18 reverse-phased chromatography, followed by DEAE-Sephadex A25 anion-exchange chromatography. In a typical experiment, 178 mg of highly purified GM1 was obtained from 500 mg of the crude ganglioside fraction. The GM1 induced neurite outgrowth of neuroblastoma Neuro2a cells at a concentration of 33 to 100 microM in the presence of fetal calf serum. Sialidase was purified 33-fold with 13.3% recovery from the culture supernatant of YF-2. The purified enzyme hydrolyzed polysialogangliosides to produce GM1 but did not act on GM1. It was therefore concluded that polysialogangliosides in the culture of strain YF-2 were converted to GM1 by this sialidase.  相似文献   

14.
The article describes the design of a user-defined multipurpose system based on the Zilog Z-80, 8-bit microprocessor. The basic concept comprises a mainframe with a keyboard/display interface for communication between user and system. Different types of instrumentation equipment can be obtained using the same mainframe; only the contents of the program memory have to be changed. Up to seven programmable input/output boards can be installed. Some special-purpose boards are constructed to improve the versatility of the system. The major advantage of the presented system is that it can be applied at any time in the laboratory in an experiment or in the hospital for diagnostic or therapeutic purposes, without the engagement of existing computer memory.  相似文献   

15.
This paper introduces Madeleine II, a new adaptive and portable multi-protocol implementation of the Madeleine communication library. Madeleine II has the ability to control multiple network interfaces (BIP, SISCI, VIA) and multiple network adapters (Ethernet, Myrinet, SCI) within the same application session. We report on performance measurements obtained using BIP/Myrinet and SISCI/SCI and we present preliminary results about our MPICH/Madeleine II and Nexus/Madeleine II ports. We also discuss an extension of Madeleine II for clusters of clusters which is able to handle heterogeneous networks. In particular, we present the fast internal data-forwarding mechanism that is used on gateway nodes to speed up inter-cluster transmissions. Preliminary experiments show that the resulting inter-cluster bandwidth is close to the one delivered by the hardware.  相似文献   

16.
The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.  相似文献   

17.
General Purpose Graphic Processing Units (GPGPUs) constitute an inexpensive resource for computing-intensive applications that could exploit an intrinsic fine-grain parallelism. This paper presents the design and implementation in GPGPUs of an exact alignment tool for nucleotide sequences based on the Burrows-Wheeler Transform. We compare this algorithm with state-of-the-art implementations of the same algorithm over standard CPUs, and considering the same conditions in terms of I/O. Excluding disk transfers, the implementation of the algorithm in GPUs shows a speedup larger than 12, when compared to CPU execution. This implementation exploits the parallelism by concurrently searching different sequences on the same reference search tree, maximizing memory locality and ensuring a symmetric access to the data. The paper describes the behavior of the algorithm in GPU, showing a good scalability in the performance, only limited by the size of the GPU inner memory.  相似文献   

18.
Local access networks (LAN) are commonly used as communication infrastructures which meet the demand of a set of users in the local environment. Usually these networks consist of several LAN segments connected by bridges. The topological LAN design bi-level problem consists on assigning users to clusters and the union of clusters by bridges in order to obtain a minimum response time network with minimum connection cost. Therefore, the decision of optimally assigning users to clusters will be made by the leader and the follower will make the decision of connecting all the clusters while forming a spanning tree. In this paper, we propose a genetic algorithm for solving the bi-level topological design of a Local Access Network. Our solution method considers the Stackelberg equilibrium to solve the bi-level problem. The Stackelberg-Genetic algorithm procedure deals with the fact that the follower’s problem cannot be optimally solved in a straightforward manner. The computational results obtained from two different sets of instances show that the performance of the developed algorithm is efficient and that it is more suitable for solving the bi-level problem than a previous Nash-Genetic approach.  相似文献   

19.
Gangliosides have been implicated in exerting multiple physiological functions, and it is important to understand how their distribution is regulated in the cell membrane. By using freeze-fracture immunolabeling electron microscopy, we showed that GM1 and GM3 make independent clusters that are significantly reduced by cholesterol depletion. In the present study, we examined the effects of actin depolymerization/polymerization and Src-family kinase inhibition on the GM1 and GM3 clusters. Both GM1 and GM3 clustering was reduced when the actin cytoskeleton was perturbed by latrunculin A or jasplakinolide, but the decrease was less significant than that induced by cholesterol depletion. On the other hand, inhibition of Src-family kinases decreased GM3 clustering more drastically than did cholesterol depletion, whereas its effect on GM1 clustering was less significant. GM1 and GM3 were segregated from each other in unperturbed cells, but co-clustering increased significantly after actin depolymerization. Our results indicate that the GM1 and GM3 clusters in the cell membrane are regulated in different ways and that segregation of the two gangliosides depends on the intact actin cytoskeleton.  相似文献   

20.
Canada is the world’s largest producer and exporter of flaxseed. In 2009, DNA from deregistered genetically modified (GM) CDC Triffid was detected in a shipment of Canadian flaxseed exported to Europe, causing a large decrease in the amount of flax planted in Canada and a major shift in export markets. The flax industry in Canada undertook major changes to ensure the removal of transgenic flax from the supply chain. To demonstrate compliance, Canada adopted a protocol involving testing grain samples (post-harvest) using an RT-PCR test for the construct found in CDC Triffid. Efforts to remove the presence of GM flax from the value chain included reconstituting major flax varieties from GM-free plants. The reconstituted varieties represented the majority of planting seed in 2014. This study re-evaluates GM flax presence in Canadian grain stocks for an updated dataset (2009–2015) using a previously described simulation model to estimate low-level GM presence. Additionally, losses to the Canadian economy resulting from the reduction in flax production and export opportunities, costs associated with reconstituting major flax varieties, and testing for the presence of GM flax along the flax value chain are estimated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号