期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Optimizing transport protocol parameters for large scale PC cluster and its evaluation with parallel data mining

Masato Oguchi Masaru Kitsuregawa 《Cluster computing》2000,3(1):15-23

Recently, PC clusters have come to be studied intensively for large scale parallel computers of the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore, an ATM-connected PC cluster is a promising platform from the cost/performance point of view, as a future high performance computing environment. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as for conventional scientific calculations. Thus, investigating the feasibility of applications on an ATM-connected PC cluster is meaningful. In this paper, an ATM-connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed, when a TCP window size parameter is changed. Parallel data mining is implemented and evaluated on the cluster. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. Default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters with the proposed optimization, performance improvement is achieved for parallel data mining on 100 PCs. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

2.

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

Bustamam A Burrage K Hamilton NA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):679-692

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data. 相似文献

3.

On the parallelisation of bioinformatics applications

Trelles O 《Briefings in bioinformatics》2001,2(2):181-194

This paper surveys the computational strategies followed to parallelise the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed and shared/distributed memory architectures. 相似文献

4.

GRAPES: A Software for Parallel Searching on Biological Graphs Targeting Multi-Core Architectures

Rosalba Giugno Vincenzo Bonnici Nicola Bombieri Alfredo Pulvirenti Alfredo Ferro Dennis Shasha 《PloS one》2013,8(10)

Biological applications, from genomics to ecology, deal with graphs that represents the structure of interactions. Analyzing such data requires searching for subgraphs in collections of graphs. This task is computationally expensive. Even though multicore architectures, from commodity computers to more advanced symmetric multiprocessing (SMP), offer scalable computing power, currently published software implementations for indexing and graph matching are fundamentally sequential. As a consequence, such software implementations (i) do not fully exploit available parallel computing power and (ii) they do not scale with respect to the size of graphs in the database. We present GRAPES, software for parallel searching on databases of large biological graphs. GRAPES implements a parallel version of well-established graph searching algorithms, and introduces new strategies which naturally lead to a faster parallel searching system especially for large graphs. GRAPES decomposes graphs into subcomponents that can be efficiently searched in parallel. We show the performance of GRAPES on representative biological datasets containing antiviral chemical compounds, DNA, RNA, proteins, protein contact maps and protein interactions networks. 相似文献

5.

Performance portability on EARTH: a case study across several parallel architectures

Weirong Zhu Yanwei Niu Guang R. Gao 《Cluster computing》2007,10(2):115-126

Due to the increase of the diversity of parallel architectures, and the increasing development time for parallel applications, performance portability has become one of the major considerations when designing the next generation of parallel program execution models, APIs, and runtime system software. This paper analyzes both code portability and performance portability of parallel programs for fine-grained multi-threaded execution and architecture models. We concentrate on one particular event-driven fine-grained multi-threaded execution model—EARTH, and discuss several design considerations of the EARTH model and runtime system that contribute to the performance portability of parallel applications. We believe that these are important issues for future high end computing system software design. Four representative benchmarks were conducted on several different parallel architectures, including two clusters listed in the 23rd supercomputer TOP500 list. The results demonstrate that EARTH based programs can achieve robust performance portability across the selected hardware platforms without any code modification or tuning. 相似文献

6.

Parallel high-dimensional multi-objective feature selection for EEG classification with dynamic workload balancing on CPU–GPU architectures

Juan?José?Escobar Email author Julio?Ortega Jesús?González Miguel?Damas Antonio?F.?Díaz 《Cluster computing》2017,20(3):1881-1897

Many bioinformatics applications that analyse large volumes of high-dimensional data comprise complex problems requiring metaheuristics approaches with different types of implicit parallelism. For example, although functional parallelism would be used to accelerate evolutionary algorithms, the fitness evaluation of the population could imply the computation of cost functions with data parallelism. This way, heterogeneous parallel architectures, including central processing unit (CPU) microprocessors with multiple superscalar cores and accelerators such as graphics processing units (GPUs) could be very useful. This paper aims to take advantage of such CPU–GPU heterogeneous architectures to accelerate electroencephalogram classification and feature selection problems by evolutionary multi-objective optimization, in the context of brain computing interface tasks. In this paper, we have used the OpenCL framework to develop parallel master-worker codes implementing an evolutionary multi-objective feature selection procedure in which the individuals of the population are dynamically distributed among the available CPU and GPU cores. 相似文献

7.

Reverse computation for rollback-based fault tolerance in large parallel systems

Kalyan S. Perumalla Alfred J. Park 《Cluster computing》2014,17(2):303-313

Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle (ideal gas) simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures. 相似文献

8.

Iridis-pi: a low-cost,compact demonstration cluster

Simon J. Cox James T. Cox Richard P. Boardman Steven J. Johnston Mark Scott Neil S. O’Brien 《Cluster computing》2014,17(2):349-358

In this paper, we report on our “Iridis-Pi” cluster, which consists of 64 Raspberry Pi Model B nodes each equipped with a 700 MHz ARM processor, 256 Mbit of RAM and a 16 GiB SD card for local storage. The cluster has a number of advantages which are not shared with conventional data-centre based cluster, including its low total power consumption, easy portability due to its small size and weight, affordability, and passive, ambient cooling. We propose that these attributes make Iridis-Pi ideally suited to educational applications, where it provides a low-cost starting point to inspire and enable students to understand and apply high-performance computing and data handling to tackle complex engineering and scientific challenges. We present the results of benchmarking both the computational power and network performance of the “Iridis-Pi.” We also argue that such systems should be considered in some additional specialist application areas where these unique attributes may prove advantageous. We believe that the choice of an ARM CPU foreshadows a trend towards the increasing adoption of low-power, non-PC-compatible architectures in high performance clusters. 相似文献

9.

Parallel morphological/neural processing of hyperspectral images using heterogeneous and homogeneous platforms

Javier Plaza Rosa Pérez Antonio Plaza Pablo Martínez David Valencia 《Cluster computing》2008,11(1):17-32

The wealth spatial and spectral information available from last-generation Earth observation instruments has introduced extremely high computational requirements in many applications. Most currently available parallel techniques treat remotely sensed data not as images, but as unordered listings of spectral measurements with no spatial arrangement. In thematic classification applications, however, the integration of spatial and spectral information can be greatly beneficial. Although such integrated approaches can be efficiently mapped in homogeneous commodity clusters, low-cost heterogeneous networks of computers (HNOCs) have soon become a standard tool of choice for dealing with the massive amount of image data produced by Earth observation missions. In this paper, we develop a new morphological/neural algorithm for parallel classification of high-dimensional (hyperspectral) remotely sensed image data sets. The algorithm’s accuracy and parallel performance is tested in a variety of homogeneous and heterogeneous computing platforms, using two networks of workstations distributed among different locations, and also a massively parallel Beowulf cluster at NASA’s Goddard Space Flight Center in Maryland.

Javier PlazaEmail:

相似文献

10.

A Performance Comparison of Coscheduling Strategies for Workstation Clusters

Cosimo Anglano 《Cluster computing》2001,4(2):121-131

Workstation clusters are emerging as a general-purpose computing platform for the execution of workloads comprising parallel and sequential applications. The scalability and flexibility typical of implicit coscheduling strategies makes them a very promising solution to the scheduling needs of workstation clusters. In this paper we present a simulation study that compares, for a variety of workloads (that include both parallel and sequential applications) and operating system schedulers, 12 implicit coscheduling strategies in terms of the performance they are able to deliver to applications. By using a detailed simulator, we evaluate the performance of different coscheduling alternatives for a variety of simulation scenarios, and we identify the set of strategies that deliver the best performance to all the applications composing typical cluster workloads. Moreover, we show that for schedulers providing immediate preemption, the best strategies are also the simplest ones to implement. 相似文献

11.

Optimizing dataflow applications on heterogeneous environments

George Teodoro Timothy D. R. Hartley Umit V. Catalyurek Renato Ferreira 《Cluster computing》2012,15(2):125-144

The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster. 相似文献

12.

A nanoliter-scale nucleic acid processor with parallel architecture 总被引：6，自引：0，他引：6

Hong JW Studer V Hang G Anderson WF Quake SR 《Nature biotechnology》2004,22(4):435-439

The purification of nucleic acids from microbial and mammalian cells is a crucial step in many biological and medical applications. We have developed microfluidic chips for automated nucleic acid purification from small numbers of bacterial or mammalian cells. All processes, such as cell isolation, cell lysis, DNA or mRNA purification, and recovery, were carried out on a single microfluidic chip in nanoliter volumes without any pre- or postsample treatment. Measurable amounts of mRNA were extracted in an automated fashion from as little as a single mammalian cell and recovered from the chip. These microfluidic chips are capable of processing different samples in parallel, thereby illustrating how highly parallel microfluidic architectures can be constructed to perform integrated batch-processing functionalities for biological and medical applications. 相似文献

13.

Selection of the Maximum Spatial Cluster Size of the Spatial Scan Statistic by Using the Maximum Clustering Set-Proportion Statistic

Yue Ma Fei Yin Tao Zhang Xiaohua Andrew Zhou Xiaosong Li 《PloS one》2016,11(1)

Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set–proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters. 相似文献

14.

A Software Suite for High-Performance Communications on Clusters of SMPs

P. Geoffray C. Pham B. Tourancheau 《Cluster computing》2002,5(4):353-363

A cluster, by opposition to a parallel computer, is a set of separate workstations interconnected by a high-speed network. The performances one can get on a cluster heavily depend on the performances of the lowest communication layers. In this paper, we address the special case where the cluster contains multi-processor machines. These shared-memory multi-processors desktop machines (SMPs) with 2 or 4 processors are now becoming very popular and present a high performance/price ratio. We present a software suite for achieving high-performance communications on a Myrinet-based cluster: BIP, BIP-SMP and MPI-BIP. The software suite supports single-processor (Intel PC and Digital Alpha) and multi-processor machines, as well as any combination of the two architectures. 相似文献

15.

Evaluation and optimization of clustering in gene expression data analysis

Famili AF Liu G Liu Z 《Bioinformatics (Oxford, England)》2004,20(10):1535-1545

MOTIVATION: A measurement of cluster quality is needed to choose potential clusters of genes that contain biologically relevant patterns of gene expression. This is strongly desirable when a large number of gene expression profiles have to be analyzed and proper clusters of genes need to be identified for further analysis, such as the search for meaningful patterns, identification of gene functions or gene response analysis. RESULTS: We propose a new cluster quality method, called stability, by which unsupervised learning of gene expression data can be performed efficiently. The method takes into account a cluster's stability on partition. We evaluate this method and demonstrate its performance using four independent, real gene expression and three simulated datasets. We demonstrate that our method outperforms other techniques listed in the literature. The method has applications in evaluating clustering validity as well as identifying stable clusters. AVAILABILITY: Please contact the first author. 相似文献

16.

Using Situs for the integration of multi-resolution structures

Willy Wriggers 《Biophysical reviews》2010,2(1):21-27

Situs is a modular and widely used software package for the integration of biophysical data across the spatial resolution scales. It has been developed over the last decade with a focus on bridging the resolution gap between atomic structures, coarse-grained models, and volumetric data from low-resolution biophysical origins, such as electron microscopy, tomography, or small-angle scattering. Structural models can be created and refined with various flexible and rigid body docking strategies. The software consists of multiple, stand-alone programs for the format conversion, analysis, visualization, manipulation, and assembly of 3D data sets. The programs have been ported to numerous platforms in both serial and shared memory parallel architectures and can be combined in various ways for specific modeling applications. The modular design facilitates the updating of individual programs and the development of novel application workflows. This review provides an overview of the Situs package as it exists today with an emphasis on functionality and workflows supported by version 2.5. 相似文献

17.

民勤荒漠植物枝系构型的分类研究 总被引：7，自引：0，他引：7

何明珠王辉张景光《西北植物学报》2005,25(9):1827-1832

荒漠植物的枝系构型因素包括各级分枝角度、各级分枝长度、枝径比、逐步分枝率和总体分枝率、分枝分维数和计盒维数等16个指标。采用组内欧式距离法进行聚类,把荒漠植物依构型指标分为4个类型：第一类型包括霸王（Zygophyllum xanthoxylum）、黄刺条（Caragana frutex）等14种荒漠植物;第二类型包括秦晋锦鸡儿（C．purdomii）、荒漠锦鸡儿（C．roborovskyi）等11种荒漠植物;第三类型包括网状沙拐枣（Calligonum cancellatum）、黄花草木樨（Melilotus suaveolens）等9种荒漠植物;第四类型包括扁果木蓼（Atraphaxis replicta）、洋白蜡（Frawinus amerirana）等14种荒漠植物。不同的枝系构型类型反映了不同荒漠植物对于空间资源与环境的长期适应对策。相似文献

18.

Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

Xiang Zhu Dianwen Zhang 《PloS one》2013,8(10)

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. 相似文献

19.

A middleware approach for pipelining communications in clusters

Sevin Fide Stephen Jenks 《Cluster computing》2007,10(4):409-424

The Pipelining Communications Middleware (PCM) approach provides a flexible, simple, high-performance mechanism to connect parallel programs running on high performance computers or clusters. This approach enables parallel programs to communicate and coordinate with each other to address larger problems than a single program can solve. The motivation behind the PCM approach grew out of using files as an intermediate transfer stage between processing by different programs. Our approach supersedes this practice by using streaming data set transfers as an “online” communication channel between simultaneously active parallel programs. Thus, the PCM approach addresses the issue of sending data from a parallel program to another parallel program without exposing details such as number of nodes allocated to the program, specific node identifiers, etc. This paper outlines and analyzes our proposed computation and communication model to provide efficient and convenient communications between parallel programs running on high performance computing systems or clusters. We also discuss the PCM challenges as well as current PCM implementations. Our approach achieves scalability, transparency, coordination, synchronization and flow control, and efficient programming. We experimented with data parallel applications to evaluate the performance of the PCM approach. Our experiment results show that the PCM approach achieves nearly ideal throughput that scales linearly with the underlying network medium speed. PCM performs well with small and large data transfers. Furthermore, our experiments show that network infrastructure plays the most significant role in the PCM performance.

Stephen JenksEmail:

相似文献

20.

A geographic information system (GIS) analysis for trace metal assessment of sediments in the Gulf of Paria, Trinidad

Ragbirsingh Y Norville W 《Revista de biología tropical》2005,53(Z1):195-206

The Gulf of Paria is a semi-enclosed shallow basin with increasing coastal development activities along Trinidad's west coast. Sediments present a host for trace metal pollutants from overlying waters, therefore determination of their content is critical in evaluating and detecting sources of marine pollution. This paper presents a Geographic Information System (GIS) analysis of geochemical assessment for trace metals in coastal sediments of the Gulf of Paria. This GIS approach facilitates interpretation of the spatial relationships among key environmental processes. The GIS development involves the integration of spatial and attribute data pertaining to bathymetry, current systems, topography, rivers, land use/land cover and coastal sediments. It employs spatial interpolation and retrieval operations to analyze the total trace metal concentrations of aluminum, copper and lead in the sediments and the clay-enriched sediments, to determine whether they are related to sediment type or are affected by the discharge from anthropogenic sources. Spatial distribution modeling of element concentrations are produced to indicate contamination plumes from possible anthropogenic sources such as rivers entering the Gulf of Paria, and to reveal potential hot spots and dispersion patterns. A direct spatial correlation between clay-enriched sediments and high concentrations of aluminum and lead is detected, however regions of high concentrations of copper and lead indicate a relationship to anthropogenic sources. The effectiveness of GIS for visualization, spatial query and overlay of geochemical analysis is demonstrated. 相似文献