期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Virtualizing high-end GPGPUs on ARM clusters for the next generation of high performance cloud computing

Raffaele Montella Giulio Giunta Giuliano Laccetti 《Cluster computing》2014,17(1):139-152

High performance cloud computing is behind the scene powering “the next big thing” as the mainstream accelerator for innovation in many areas. We describe here how to accelerate inexpensive ARM-based computing nodes with high-end GPGPUs hosted on x86_64 machines using the GVirtuS general-purpose virtualization service. We draw the vision of a possible next generation computing clusters characterized by highly heterogeneous parallelism heading to a lower electric power demanding, less heat producing and more environmental friendliness. Preliminary but promising performance data suggest that this solution could be considered as part of the foundations of the next generation of high performance cloud computing components. 相似文献

2.

Malleable applications for scalable high performance computing

Travis Desell Kaoutar El Maghraoui Carlos A. Varela 《Cluster computing》2007,10(3):323-337

Iterative applications are known to run as slow as their slowest computational component. This paper introduces malleability, a new dynamic reconfiguration strategy to overcome this limitation. Malleability is the ability to dynamically change the data size and number of computational entities in an application. Malleability can be used by middleware to autonomously reconfigure an application in response to dynamic changes in resource availability in an architecture-aware manner, allowing applications to optimize the use of multiple processors and diverse memory hierarchies in heterogeneous environments. The modular Internet Operating System (IOS) was extended to reconfigure applications autonomously using malleability. Two different iterative applications were made malleable. The first is used in astronomical modeling, and representative of maximum-likelihood applications was made malleable in the SALSA programming language. The second models the diffusion of heat over a two dimensional object, and is representative of applications such as partial differential equations and some types of distributed simulations. Versions of the heat application were made malleable both in SALSA and MPI. Algorithms for concurrent data redistribution are given for each type of application. Results show that using malleability for reconfiguration is 10 to 100 times faster on the tested environments. The algorithms are also shown to be highly scalable with respect to the quantity of data involved. While previous work has shown the utility of dynamically reconfigurable applications using only computational component migration, malleability is shown to provide up to a 15% speedup over component migration alone on a dynamic cluster environment. This work is part of an ongoing research effort to enable applications to be highly reconfigurable and autonomously modifiable by middleware in order to efficiently utilize distributed environments. Grid computing environments are becoming increasingly heterogeneous and dynamic, placing new demands on applications’ adaptive behavior. This work shows that malleability is a key aspect in enabling effective dynamic reconfiguration of iterative applications in these environments.

Carlos A. VarelaEmail:

相似文献

3.

A lightweight,high performance communication protocol for grid computing

Phillip M. Dickens 《Cluster computing》2010,13(1):47-66

This paper describes a lightweight, high-performance communication protocol for the high-bandwidth, high-delay networks typical of computational Grids. One unique feature of this protocol is that it incorporates an extremely accurate classification mechanism that is efficient enough to diagnose the cause of data loss in real time, providing to the controller the opportunity to respond to different causes of data loss in different ways. The simplest adaptive response, and the one discussed in this paper, is to trigger aggressive congestion control measures only when the data loss is diagnosed as network related. However, even this very simple adaptation can have a tremendous impact on performance in a Grid setting where the resources allocated to a long-running, data-intensive application can fluctuate significantly during the course of its execution. In fact, we provide results showing that the utilization of the information provided by the classifier increased performance by over two orders of magnitude depending on the dominant cause of data loss. In this paper, we discuss the Bayesian statistical framework upon which the classifier is based and the classification metrics that make this approach highly successful. We discuss the integration of the classifier into the congestion control structures of an existing high-performance communication protocol, and provide empirical results showing that it correctly diagnosed the cause of data loss in over 98% of the experimental trials. 相似文献

4.

A taxonomy of application scheduling tools for high performance cluster computing

Jiannong Cao Alvin T. S. Chan Yudong Sun Sajal K. Das Minyi Guo 《Cluster computing》2006,9(3):355-371

Application scheduling plays an important role in high-performance cluster computing. Application scheduling can be classified as job scheduling and task scheduling. This paper presents a survey on the software tools for the graph-based scheduling on cluster systems with the focus on task scheduling. The tasks of a parallel or distributed application can be properly scheduled onto multi-processors in order to optimize the performance of the program (e.g., execution time or resource utilization). In general, scheduling algorithms are designed based on the notion of task graph that represents the relationship of parallel tasks. The scheduling algorithms map the nodes of a graph to the processors in order to minimize overall execution time. Although many scheduling algorithms have been proposed in the literature, surprisingly not many practical tools can be found in practical use. After discussing the fundamental scheduling techniques, we propose a framework and taxonomy for the scheduling tools on clusters. Using this framework, the features of existing scheduling tools are analyzed and compared. We also discuss the important issues in improving the usability of the scheduling tools. This work is supported by the Hong Kong Polytechnic University under grant H-ZJ80 and by NASA Ames Research Center by a cooperative grant agreement with the University of Texas at Arlington. Jiannong Cao received the BSc degree in computer science from Nanjing University, Nanjing, China in 1982, and the MSc and the Ph.D degrees in computer science from Washington State University, Pullman, WA, USA, in 1986 and 1990 respectively. He is currently an associate professor in Department of Computing at the Hong Kong Polytechnic University, Hong Kong. He is also the director of the Internet and Mobile Computing Lab in the department. He was on the faculty of computer science at James Cook University and University of Adelaide in Australia, and City University of Hong Kong. His research interests include parallel and distributed computing, networking, mobile computing, fault tolerance, and distributed software architecture and tools. He has published over 120 technical papers in the above areas. He has served as a member of editorial boards of several international journals, a reviewer for international journals/conference proceedings, and also as an organizing/programme committee member for many international conferences. Dr. Cao is a member of the IEEE Computer Society, the IEEE Communication Society, IEEE, and ACM. He is also a member of the IEEE Technical Committee on Distributed Processing, IEEE Technical Committee on Parallel Processing, IEEE Technical Committee on Fault Tolerant Computing, and Computer Architecture Professional Committee of the China Computer Federation. Alvin Chan is currently an assistant professor at the Hong Kong Polytechnic University. He graduated from the University of New South Wales with a Ph.D. degree in 1995 and was subsequently employed as a Research Scientist by the CSIRO, Australia. From 1997 to 1998, he was employed by the Centre for Wireless Communications, National University of Singapore as a Program Manager. Dr. Chan is one of the founding members and director of a university spin-off company, Information Access Technology Limited. He is an active consultant and has been providing consultancy services to both local and overseas companies. His research interests include mobile computing, context-aware computing and smart card applications. Yudong Sun received the B.S. and M.S. degrees from Shanghai Jiao Tong University, China. He received Ph.D. degree from the University of Hong Kong in 2002, all in computer science. From 1988 to 1996, he was among the teaching staff in Department of Computer Science and Engineering at Shanghai Jiao Tong University. From 2002 to 2003, he held a research position at the Hong Kong Polytechnic University. At present, he is a Research Associate in School of Computing Science at University of Newcastle upon Tyne, UK. His research interests include parallel and distributed computing, Web services, Grid computing, and bioinformatics. Sajal K. Das is currently a Professor of Computer Science and Engineering and the Founding Director of the Center for Research in Wireless Mobility and Networking (CReWMaN) at the University of Texas at Arlington. His current research interests include resource and mobility management in wireless networks, mobile and pervasive computing, sensor networks, mobile internet, parallel processing, and grid computing. He has published over 250 research papers, and holds four US patents in wireless mobile networks. He received the Best Paper Awards in ACM MobiCom’99, ICOIN-16, ACM, MSWiM’00 and ACM/IEEE PADS’97. Dr. Das serves on the Editorial Boards of IEEE Transactions on Mobile Computing, ACM/Kluwer Wireless Networks, Parallel Processing Letters, Journal of Parallel Algorithms and Applications. He served as General Chair of IEEE PerCom’04, IWDC’04, MASCOTS’02 ACM WoWMoM’00-02; General Vice Chair of IEEE PerCom’03, ACM MobiCom’00 and IEEE HiPC’00-01; Program Chair of IWDC’02, WoWMoM’98-99; TPC Vice Chair of ICPADS’02; and as TPC member of numerous IEEE and ACM conferences. Minyi Guo received his Ph.D. degree in information science from University of Tsukuba, Japan in 1998. From 1998 to 2000, Dr. Guo had been a research scientist of NEC Soft, Ltd. Japan. He is currently a professor at the Department of Computer Software, The University of Aizu, Japan. From 2001 to 2003, he was a visiting professor of Georgia State University, USA, Hong Kong Polytechnic University, Hong Kong. Dr. Guo has served as general chair, program committee or organizing committee chair for many international conferences, and delivered more than 20 invited talks in USA, Australia, China, and Japan. He is the editor-in-chief of the Journal of Embedded Systems. He is also in editorial board of International Journal of High Performance Computing and Networking, Journal of Embedded Computing, Journal of Parallel and Distributed Scientific and Engineering Computing, and International Journal of Computer and Applications. Dr. Guo’s research interests include parallel and distributed processing, parallelizing compilers, data parallel languages, data mining, molecular computing and software engineering. He is a member of the ACM, IEEE, IEEE Computer Society, and IEICE. He is listed in Marquis Who’s Who in Science and Engineering. 相似文献

5.

Fast VMM-based overlay networking for bridging the cloud and high performance computing

Lei Xia Zheng Cui John Lange Yuan Tang Peter Dinda Patrick Bridges 《Cluster computing》2014,17(1):39-59

A collection of virtual machines (VMs) interconnected with an overlay network with a layer 2 abstraction has proven to be a powerful, unifying abstraction for adaptive distributed and parallel computing on loosely-coupled environments. It is now feasible to allow VMs hosting high performance computing (HPC) applications to seamlessly bridge distributed cloud resources and tightly-coupled supercomputing and cluster resources. However, to achieve the application performance that the tightly-coupled resources are capable of, it is important that the overlay network not introduce significant overhead relative to the native hardware, which is not the case for current user-level tools, including our own existing VNET/U system. In response, we describe the design, implementation, and evaluation of a virtual networking system that has negligible latency and bandwidth overheads in 1–10 Gbps networks. Our system, VNET/P, is directly embedded into our publicly available Palacios virtual machine monitor (VMM). VNET/P achieves native performance on 1 Gbps Ethernet networks and very high performance on 10 Gbps Ethernet networks. The NAS benchmarks generally achieve over 95 % of their native performance on both 1 and 10 Gbps. We have further demonstrated that VNET/P can operate successfully over more specialized tightly-coupled networks, such as Infiniband and Cray Gemini. Our results suggest it is feasible to extend a software-based overlay network designed for computing at wide-area scales into tightly-coupled environments. 相似文献

6.

Managing performance and power consumption tradeoff for multiple heterogeneous servers in cloud computing

Yuan Tian Chuang Lin Keqin Li 《Cluster computing》2014,17(3):943-955

There are typically multiple heterogeneous servers providing various services in cloud computing. High power consumption of these servers increases the cost of running a data center. Thus, there is a problem of reducing the power cost with tolerable performance degradation. In this paper, we optimize the performance and power consumption tradeoff for multiple heterogeneous servers. We consider the following problems: (1) optimal job scheduling with fixed service rates; (2) joint optimal service speed scaling and job scheduling. For problem (1), we present the Karush-Kuhn-Tucker (KKT) conditions and provide a closed-form solution. For problem (2), both continuous speed scaling and discrete speed scaling are considered. In discrete speed scaling, the feasible service rates are discrete and bounded. We formulate the problem as an MINLP problem and propose a distributed algorithm by online value iteration, which has lower complexity than a centralized algorithm. Our approach provides an analytical way to manage the tradeoff between performance and power consumption. The simulation results show the gain of using speed scaling, and also prove the effectiveness and efficiency of the proposed algorithms. 相似文献

7.

Yabi: An online research environment for grid,high performance and cloud computing

Hunter AA Macgregor AB Szabo TO Wellington CA Bellgard MI 《Source code for biology and medicine》2012,7(1):1

Background

There is a significant demand for creating pipelines or workflows in the life science discipline that chain a number of discrete compute and data intensive analysis tasks into sophisticated analysis procedures. This need has led to the development of general as well as domain-specific workflow environments that are either complex desktop applications or Internet-based applications. Complexities can arise when configuring these applications in heterogeneous compute and storage environments if the execution and data access models are not designed appropriately. These complexities manifest themselves through limited access to available HPC resources, significant overhead required to configure tools and inability for users to simply manage files across heterogenous HPC storage infrastructure. 相似文献

8.

Ultra-fast processing of gigapixel Tissue MicroArray images using high performance computing

Wang Y McCleary D Wang CW Kelly P James J Fennell DA Hamilton P 《Analytical cellular pathology (Amsterdam)》2010,33(5):271-285

相似文献

9.

A spatial scan statistic for multiple clusters 总被引：1，自引：0，他引：1

Li XZ Wang JF Yang WZ Li ZJ Lai SJ 《Mathematical biosciences》2011,233(2):135-142

Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters’ shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster’s shadowing effect. 相似文献

10.

birgHPC: creating instant computing clusters for bioinformatics and molecular dynamics

Chew TH Joyce-Tan KH Akma F Shamsir MS 《Bioinformatics (Oxford, England)》2011,27(9):1320-1321

birgHPC, a bootable Linux Live CD has been developed to create high-performance clusters for bioinformatics and molecular dynamics studies using any Local Area Network (LAN)-networked computers. birgHPC features automated hardware and slots detection as well as provides a simple job submission interface. The latest versions of GROMACS, NAMD, mpiBLAST and ClustalW-MPI can be run in parallel by simply booting the birgHPC CD or flash drive from the head node, which immediately positions the rest of the PCs on the network as computing nodes. Thus, a temporary, affordable, scalable and high-performance computing environment can be built by non-computing-based researchers using low-cost commodity hardware. AVAILABILITY: The birgHPC Live CD and relevant user guide are available for free at http://birg1.fbb.utm.my/birghpc. 相似文献

11.

Constructing complex 3D biological environments from medical imaging using high performance computing

Burkitt M Walker D Romano DM Fazeli A 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):643-654

Extracting information about the structure of biological tissue from static image data is a complex task requiring computationally intensive operations. Here, we present how multicore CPUs and GPUs have been utilized to extract information about the shape, size, and path followed by the mammalian oviduct, called the fallopian tube in humans, from histology images, to create a unique but realistic 3D virtual organ. Histology images were processed to identify the individual cross sections and determine the 3D path that the tube follows through the tissue. This information was then related back to the histology images, linking the 2D cross sections with their corresponding 3D position along the oviduct. A series of linear 2D spline cross sections, which were computationally generated for the length of the oviduct, were bound to the 3D path of the tube using a novel particle system technique that provides smooth resolution of self-intersections. This results in a unique 3D model of the oviduct, which is grounded in reality. The GPU is used for the processor intensive operations of image processing and particle physics based simulations, significantly reducing the time required to generate a complete model. 相似文献

12.

An exhaustive DNA micro-satellite map of the human genome using high performance computing

Collins JR Stephens RM Gold B Long B Dean M Burt SK 《Genomics》2003,82(1):10-19

The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements. 相似文献

13.

Autonomic power and performance management for computing systems

Bithika Khargharia Salim Hariri Mazin S. Yousif 《Cluster computing》2008,11(2):167-181

With the increased complexity of platforms, the growing demand of applications and data centers’ servers sprawl, power consumption is reaching unsustainable limits. The need to improved power management is becoming essential for many reasons including reduced power consumption & cooling, improved density, reliability & compliance with environmental standards. This paper presents a theoretical framework and methodology for autonomic power and performance management in e-business data centers. We optimize for power and performance (performance-per-watt) at each level of the hierarchy while maintaining scalability. We adopt mathematically-rigorous optimization approach to minimize power while meeting performance constraints. Our experimental results show around 72% savings in power while maintaining performance as compared to static power management techniques and 69.8% additional savings with both global and local optimizations.

Mazin S. YousifEmail:

相似文献

14.

High performance computing environment for multidimensional image analysis

Rao AR Cecchi GA Magnasco M 《BMC cell biology》2007,8(Z1):S9

Background

The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications.

Results

We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478× speedup.

Conclusion

Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.

相似文献

15.

Directed evolution strategies for improved enzymatic performance 总被引：1，自引：0，他引：1

Edward?G?Hibbert Paul?A?Dalby Email author 《Microbial cell factories》2005,4(1):29

The engineering of enzymes with altered activity, specificity and stability, using directed evolution techniques that mimic evolution on a laboratory timescale, is now well established. However, the general acceptance of these methods as a route to new biocatalysts for organic synthesis requires further improvement of the methods for both ease-of-use and also for obtaining more significant changes in enzyme properties than is currently possible. Recent advances in library design, and methods of random mutagenesis, combined with new screening and selection tools, continue to push forward the potential of directed evolution. For example, protein engineers are now beginning to apply the vast body of knowledge and understanding of protein structure and function, to the design of focussed directed evolution libraries, with striking results compared to the previously favoured random mutagenesis and recombination of entire genes. Significant progress in computational design techniques which mimic the experimental process of library screening is also now enabling searches of much greater regions of sequence-space for those catalytic reactions that are broadly understood and, therefore, possible to model. 相似文献

16.

High performance computing and medical research

A J Cuticchia 《CMAJ》2000,162(8):1148-1149

相似文献

17.

Data handling strategies for high throughput pyrosequencers

Trombetti GA Bonnal RJ Rizzi E De Bellis G Milanesi L 《BMC bioinformatics》2007,8(Z1):S22

相似文献

18.

VM auto-scaling methods for high throughput computing on hybrid infrastructure

Jieun Choi Younsun Ahn Seoyoung Kim Yoonhee Kim Jaeyoung Choi 《Cluster computing》2015,18(3):1063-1073

相似文献

19.

New strategies and approaches for engineering biosynthetic gene clusters of microbial natural products

《Biotechnology advances》2017,35(8):936-949

With the rapidly growing number of sequenced microbial (meta)genomes, enormous cryptic natural product (NP) biosynthetic gene clusters (BGCs) have been identified, which are regarded as a rich reservoir for novel drug discovery. A series of powerful tools for engineering BGCs has accelerated the discovery and development of pharmaceutically active NPs. Here, we describe recent advances in the strategies for BGCs manipulation, which are driven by emerging technologies, including efficient DNA recombination systems, versatile CRISPR/Cas9 genome editing tools and diverse DNA assembly methods. We further discuss how these approaches could be used for genome mining studies and industrial strain improvement. 相似文献

20.

Genome mining for radical SAM protein determinants reveals multiple sactibiotic-like gene clusters

Murphy K O'Sullivan O Rea MC Cotter PD Ross RP Hill C 《PloS one》2011,6(7):e20852

Thuricin CD is a two-component bacteriocin produced by Bacillus thuringiensis that kills a wide range of clinically significant Clostridium difficile. This bacteriocin has recently been characterized and consists of two distinct peptides, Trnβ and Trnα, which both possess 3 intrapeptide sulphur to α-carbon bridges and act synergistically. Indeed, thuricin CD and subtilosin A are the only antimicrobials known to possess these unusual structures and are known as the sactibiotics (sulplur to alpha carbon-containing antibiotics). Analysis of the thuricin CD-associated gene cluster revealed the presence of genes encoding two highly unusual SAM proteins (TrnC and TrnD) which are proposed to be responsible for these unusual post-translational modifications. On the basis of the frequently high conservation among enzymes responsible for the post-translational modification of specific antimicrobials, we performed an in silico screen for novel thuricin CD-like gene clusters using the TrnC and TrnD radical SAM proteins as driver sequences to perform an initial homology search against the complete non-redundant database. Fifteen novel thuricin CD-like gene clusters were identified, based on the presence of TrnC and TrnD homologues in the context of neighbouring genes encoding potential bacteriocin structural peptides. Moreover, metagenomic analysis revealed that TrnC or TrnD homologs are present in a variety of metagenomic environments, suggesting a widespread distribution of thuricin-like operons in a variety of environments. In-silico analysis of radical SAM proteins is sufficient to identify novel putative sactibiotic clusters. 相似文献