期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parallel high-dimensional multi-objective feature selection for EEG classification with dynamic workload balancing on CPU–GPU architectures

Juan?José?Escobar Email author Julio?Ortega Jesús?González Miguel?Damas Antonio?F.?Díaz 《Cluster computing》2017,20(3):1881-1897

Many bioinformatics applications that analyse large volumes of high-dimensional data comprise complex problems requiring metaheuristics approaches with different types of implicit parallelism. For example, although functional parallelism would be used to accelerate evolutionary algorithms, the fitness evaluation of the population could imply the computation of cost functions with data parallelism. This way, heterogeneous parallel architectures, including central processing unit (CPU) microprocessors with multiple superscalar cores and accelerators such as graphics processing units (GPUs) could be very useful. This paper aims to take advantage of such CPU–GPU heterogeneous architectures to accelerate electroencephalogram classification and feature selection problems by evolutionary multi-objective optimization, in the context of brain computing interface tasks. In this paper, we have used the OpenCL framework to develop parallel master-worker codes implementing an evolutionary multi-objective feature selection procedure in which the individuals of the population are dynamically distributed among the available CPU and GPU cores. 相似文献

2.

Architecture and applications for an All-FPGA parallel computer

Yamuna Rajasekhar Ron Sass 《Cluster computing》2014,17(2):315-325

The Reconfigurable Computing Cluster (RCC) project has been investigating unconventional architectures for high end computing using a cluster of FPGA devices connected by a high-speed, custom network. Most applications use the FPGAs to realize an embedded System-on-a-Chip (SoC) design augmented with application-specific accelerators to form a message-passing parallel computer. Other applications take a single accelerator core and tessellate the core across all of the devices, treating them like a large virtual FPGA. The experimental hardware has also been used for basic computer research by emulating novel architectures. This article discusses the genesis of the over-arching project, summarizes results of individual investigations that have been completed, and how this approach may prove useful in the investigation of future Exascale systems. 相似文献

3.

Geographical information system parallelization for spatial big data processing: a review 总被引：1，自引：0，他引：1

Lingjun Zhao Lajiao Chen Rajiv Ranjan Kim-Kwang Raymond Choo Jijun He 《Cluster computing》2016,19(1):139-152

With the increasing interest in large-scale, high-resolution and real-time geographic information system (GIS) applications and spatial big data processing, traditional GIS is not efficient enough to handle the required loads due to limited computational capabilities.Various attempts have been made to adopt high performance computation techniques from different applications, such as designs of advanced architectures, strategies of data partition and direct parallelization method of spatial analysis algorithm, to address such challenges. This paper surveys the current state of parallel GIS with respect to parallel GIS architectures, parallel processing strategies, and relevant topics. We present the general evolution of the GIS architecture which includes main two parallel GIS architectures based on high performance computing cluster and Hadoop cluster. Then we summarize the current spatial data partition strategies, key methods to realize parallel GIS in the view of data decomposition and progress of the special parallel GIS algorithms. We use the parallel processing of GRASS as a case study. We also identify key problems and future potential research directions of parallel GIS. 相似文献

4.

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

Bustamam A Burrage K Hamilton NA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):679-692

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data. 相似文献

5.

FPNA: interaction between FPGA and neural computation

Girau B 《International journal of neural systems》2000,10(3):243-259

Neural networks are usually considered as naturally parallel computing models. But the number of operators and the complex connection graph of standard neural models can not be directly handled by digital hardware devices. More particularly, several works show that programmable digital hardware is a real opportunity for flexible hardware implementations of neural networks. And yet many area and topology problems arise when standard neural models are implemented onto programmable circuits such as FPGAs, so that the fast FPGA technology improvements can not be fully exploited. Therefore neural network hardware implementations need to reconcile simple hardware topologies with complex neural architectures. The theoretical and practical framework developed, allows this combination thanks to some principles of configurable hardware that are applied to neural computation: Field Programmable Neural Arrays (FPNA) lead to powerful neural architectures that are easy to map onto FPGAs, thanks to a simplified topology and an original data exchange scheme. This paper shows how FPGAs have led to the definition of the FPNA computation paradigm. Then it shows how FPNAs contribute to current and future FPGA-based neural implementations by solving the general problems that are raised by the implementation of complex neural networks onto FPGAs. 相似文献

6.

GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs

AS Arefin C Riveros R Berretta P Moscato 《PloS one》2012,7(8):e44000

Background

The analysis of biological networks has become a major challenge due to the recent development of high-throughput techniques that are rapidly producing very large data sets. The exploding volumes of biological data are craving for extreme computational power and special computing facilities (i.e. super-computers). An inexpensive solution, such as General Purpose computation based on Graphics Processing Units (GPGPU), can be adapted to tackle this challenge, but the limitation of the device internal memory can pose a new problem of scalability. An efficient data and computational parallelism with partitioning is required to provide a fast and scalable solution to this problem.

Results

We propose an efficient parallel formulation of the k-Nearest Neighbour (kNN) search problem, which is a popular method for classifying objects in several fields of research, such as pattern recognition, machine learning and bioinformatics. Being very simple and straightforward, the performance of the kNN search degrades dramatically for large data sets, since the task is computationally intensive. The proposed approach is not only fast but also scalable to large-scale instances. Based on our approach, we implemented a software tool GPU-FS-kNN (GPU-based Fast and Scalable k-Nearest Neighbour) for CUDA enabled GPUs. The basic approach is simple and adaptable to other available GPU architectures. We observed speed-ups of 50–60 times compared with CPU implementation on a well-known breast microarray study and its associated data sets.

Conclusion

Our GPU-based Fast and Scalable k-Nearest Neighbour search technique (GPU-FS-kNN) provides a significant performance improvement for nearest neighbour computation in large-scale networks. Source code and the software tool is available under GNU Public License (GPL) at https://sourceforge.net/p/gpufsknn/. 相似文献

7.

Cactus Tools for Grid Applications 总被引：3，自引：0，他引：3

Gabrielle Allen Werner Benger Thomas Dramlitsch Tom Goodale Hans-Christian Hege Gerd Lanfermann André Merzky Thomas Radke Edward Seidel John Shalf 《Cluster computing》2001,4(3):179-188

Cactus is an open source problem solving environment designed for scientists and engineers. Its modular structure facilitates parallel computation across different architectures and collaborative code development between different groups. The Cactus Code originated in the academic research community, where it has been developed and used over many years by a large international collaboration of physicists and computational scientists. We discuss here how the intensive computing requirements of physics applications now using the Cactus Code encourage the use of distributed and metacomputing, and detail how its design makes it an ideal application test-bed for Grid computing. We describe the development of tools, and the experiments which have already been performed in a Grid environment with Cactus, including distributed simulations, remote monitoring and steering, and data handling and visualization. Finally, we discuss how Grid portals, such as those already developed for Cactus, will open the door to global computing resources for scientific users. 相似文献

8.

Harnessing parallelism in multicore clusters with the All-Pairs,Wavefront, and Makeflow abstractions

Li Yu Christopher Moretti Andrew Thrasher Scott Emrich Kenneth Judd Douglas Thain 《Cluster computing》2010,13(3):243-256

Both distributed systems and multicore systems are difficult programming environments. Although the expert programmer may be able to carefully tune these systems to achieve high performance, the non-expert may struggle. We argue that high level abstractions are an effective way of making parallel computing accessible to the non-expert. An abstraction is a regularly structured framework into which a user may plug in simple sequential programs to create very large parallel programs. By virtue of a regular structure and declarative specification, abstractions may be materialized on distributed, multicore, and distributed multicore systems with robust performance across a wide range of problem sizes. In previous work, we presented the All-Pairs abstraction for computing on distributed systems of single CPUs. In this paper, we extend All-Pairs to multicore systems, and introduce the Wavefront and Makeflow abstractions, which represent a number of problems in economics and bioinformatics. We demonstrate good scaling of both abstractions up to 32 cores on one machine and hundreds of cores in a distributed system. 相似文献

9.

A performance comparison of data and memory allocation strategies for sequence aligners on NUMA architectures

Josefina Lenis Miquel Angel Senar 《Cluster computing》2017,20(3):1909-1924

Over the last several years, many sequence alignment tools have appeared and become popular for the fast evolution of next generation sequencing technologies. Obviously, researchers that use such tools are interested in getting maximum performance when they execute them in modern infrastructures. Today’s NUMA (Non-uniform memory access) architectures present major challenges in getting such applications to achieve good scalability as more processors/cores are used. The memory system in NUMA systems shows a high complexity and may be the main cause for the loss of an application’s performance. The existence of several memory banks in NUMA systems implies a logical increase in latency associated with the accesses of a given processor to a remote bank. This phenomenon is usually attenuated by the application of strategies that tend to increase the locality of memory accesses. However, NUMA systems may also suffer from contention problems that can occur when concurrent accesses are concentrated on a reduced number of banks. Sequence alignment tools use large data structures to contain reference genomes to which all reads are aligned. Therefore, these tools are very sensitive to performance problems related to the memory system. The main goal of this study is to explore the trade-offs between data locality and data dispersion in NUMA systems. We have performed experiments with several popular sequence alignment tools on two widely available NUMA systems to assess the performance of different memory allocation policies and data partitioning strategies. We find that there is not one method that is best in all cases. However, we conclude that memory interleaving is the memory allocation strategy that provides the best performance when a large number of processors and memory banks are used. In the case of data partitioning, the best results are usually obtained when the number of partitions used is greater, sometimes combined with an interleave policy. 相似文献

10.

Empirical performance evaluation of schedulers for cluster of workstations

Kalim Qureshi Syed Munir Hussain Shah Paul Manuel 《Cluster computing》2011,14(2):101-113

Cluster computing is receiving exponential popularity as a choice for high performance computing. This is mainly due to its effective cost performance ratio. Resource management systems (RMS) are the key component to manage the resources of clusters efficiently and have a very vital role in the performance of distributed parallel systems especially a job scheduling module. In this paper, we have empirically evaluated four resource management systems (SGE, TORQUE, and MAUI Scheduler and SLURM) with special focus on job scheduler component. These schedulers have been evaluated on a more comprehensive set of metrics such as throughput, CPU, memory and network utilization. Experiments were carried out on three different size testbeds with a range of scheduler configurations such as FCFS, Backfilling, Fair share and SJF scheduling techniques. 相似文献

11.

A Software Architecture for Multi-Cellular System Simulations on Graphics Processing Units

Anne Jeannin-Girardon Pascal Ballet Vincent Rodin 《Acta biotheoretica》2013,61(3):317-327

The first aim of simulation in virtual environment is to help biologists to have a better understanding of the simulated system. The cost of such simulation is significantly reduced compared to that of in vivo simulation. However, the inherent complexity of biological system makes it hard to simulate these systems on non-parallel architectures: models might be made of sub-models and take several scales into account; the number of simulated entities may be quite large. Today, graphics cards are used for general purpose computing which has been made easier thanks to frameworks like CUDA or OpenCL. Parallelization of models may however not be easy: parallel computer programing skills are often required; several hardware architectures may be used to execute models. In this paper, we present the software architecture we built in order to implement various models able to simulate multi-cellular system. This architecture is modular and it implements data structures adapted for graphics processing units architectures. It allows efficient simulation of biological mechanisms. 相似文献

12.

On the parallelisation of bioinformatics applications

Trelles O 《Briefings in bioinformatics》2001,2(2):181-194

This paper surveys the computational strategies followed to parallelise the most used software in the bioinformatics arena. The studied algorithms are computationally expensive and their computational patterns range from regular, such as database-searching applications, to very irregularly structured patterns (phylogenetic trees). Fine- and coarse-grained parallel strategies are discussed for these very diverse sets of applications. This overview outlines computational issues related to parallelism, physical machine models, parallel programming approaches and scheduling strategies for a broad range of computer architectures. In particular, it deals with shared, distributed and shared/distributed memory architectures. 相似文献

13.

Finding elementary flux modes in metabolic networks based on flux balance analysis and flux coupling analysis: application to the analysis of Escherichia coli metabolism

Shayan Tabe-Bordbar Sayed-Amir Marashi 《Biotechnology letters》2013,35(12):2039-2044

Elementary modes (EMs) are steady-state metabolic flux vectors with minimal set of active reactions. Each EM corresponds to a metabolic pathway. Therefore, studying EMs is helpful for analyzing the production of biotechnologically important metabolites. However, memory requirements for computing EMs may hamper their applicability as, in most genome-scale metabolic models, no EM can be computed due to running out of memory. In this study, we present a method for computing randomly sampled EMs. In this approach, a network reduction algorithm is used for EM computation, which is based on flux balance-based methods. We show that this approach can be used to recover the EMs in the medium- and genome-scale metabolic network models, while the EMs are sampled in an unbiased way. The applicability of such results is shown by computing “estimated” control-effective flux values in Escherichia coli metabolic network. 相似文献

14.

Reduced order multiport parallel and multidirectional neural associative memories

Abdul Aziz Bhatti 《Biological cybernetics》2009,100(5):395-407

This paper proposes multiport parallel and multidirectional intraconnected associative memories of outer product type with reduced interconnections. Some new reduced order memory architectures such as k-directional and k-port parallel memories are suggested. These architectures are, also, very suitable for implementation of spatio-temporal sequences and multiassociative memories. It is shown that in the proposed memory architectures, a substational reduction in interconnections is achieved if the actual length of original N-bit long vectors is subdivided into k sublengths. Using these sublengths, submemory matrices, T _s or W _s, are computed, which are then intraconnected to form k-port parallel or k-directional memories. The subdivisions of N-bit long vectors into k sublengths save of interconnections. It is shown, by means of an example, that more than 80% reduction in interconnections is achieved. Minimum limit in bits on k as well as maximum limit on subdivisions in k is determined. The topologies of reduced interconnectivity developed in this paper are symmetric in structure and can be used to scale up to larger systems. The underlying principal of construction, storage and retrieval processes of such associative memories has been analyzed. The effect of complexity of different levels of reduced interconnectivity on the quality of retrieval, signal to noise ratio, and storage capacity has been investigated. The model possesses analogies to biological neural structures and digital parallel port memories commonly used in parallel and multiprocessing systems. 相似文献

15.

Integrating Biosystem Models Using Waveform Relaxation

Linzhong Li RobertM Seymour Stephen Baigent 《EURASIP Journal on Bioinformatics and Systems Biology》2008,2008(1):308623

Modelling in systems biology often involves the integration of component models into larger composite models. How to do this systematically and efficiently is a significant challenge: coupling of components can be unidirectional or bidirectional, and of variable strengths. We adapt the waveform relaxation (WR) method for parallel computation of ODEs as a general methodology for computing systems of linked submodels. Four test cases are presented: (i) a cascade of unidirectionally and bidirectionally coupled harmonic oscillators, (ii) deterministic and stochastic simulations of calcium oscillations, (iii) single cell calcium oscillations showing complex behaviour such as periodic and chaotic bursting, and (iv) a multicellular calcium model for a cell plate of hepatocytes. We conclude that WR provides a flexible means to deal with multitime-scale computation and model heterogeneity. Global solutions over time can be captured independently of the solution techniques for the individual components, which may be distributed in different computing environments. 相似文献

16.

regEfmtool: Speeding up elementary flux mode calculation using transcriptional regulatory rules in the form of three-state logic

Christian Jungreuthmayer David E. Ruckerbauer Jürgen Zanghellini 《Bio Systems》2013

相似文献

17.

On the Computing Potential of Intracellular Vesicles

Richard Mayne Andrew Adamatzky 《PloS one》2015,10(10)

Collision-based computing (CBC) is a form of unconventional computing in which travelling localisations represent data and conditional routing of signals determines the output state; collisions between localisations represent logical operations. We investigated patterns of Ca²⁺-containing vesicle distribution within a live organism, slime mould Physarum polycephalum, with confocal microscopy and observed them colliding regularly. Vesicles travel down cytoskeletal ‘circuitry’ and their collisions may result in reflection, fusion or annihilation. We demonstrate through experimental observations that naturally-occurring vesicle dynamics may be characterised as a computationally-universal set of Boolean logical operations and present a ‘vesicle modification’ of the archetypal CBC ‘billiard ball model’ of computation. We proceed to discuss the viability of intracellular vesicles as an unconventional computing substrate in which we delineate practical considerations for reliable vesicle ‘programming’ in both in vivo and in vitro vesicle computing architectures and present optimised designs for both single logical gates and combinatorial logic circuits based on cytoskeletal network conformations. The results presented here demonstrate the first characterisation of intracelluar phenomena as collision-based computing and hence the viability of biological substrates for computing. 相似文献

18.

Divisible Load Scheduling in Systems with Limited Memory 总被引：3，自引：0，他引：3

M. Drozdowski P. Wolniewicz 《Cluster computing》2003,6(1):19-29

In this work we consider scheduling divisible loads on a distributed computing system with limited available memory. The communication delays and heterogeneity of the system are taken into account. The problem studied consists in finding such a distribution of the load that the communication and computation time is the shortest possible. A new robust method is proposed to solve the problem of finding optimal distribution of computations on star network, and networks in which binomial trees can be embedded (meshes, hypercubes, multistage interconnections). We demonstrate that in many cases memory limitations do not restrict efficiency of parallel processing as much as computation and communication speeds. 相似文献

19.

Parallel Clustering Algorithm for Large-Scale Biological Data Sets

Minchao Wang Wu Zhang Wang Ding Dongbo Dai Huiran Zhang Hao Xie Luonan Chen Yike Guo Jiang Xie 《PloS one》2014,9(4)

Backgrounds

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs.

Methods

Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes.

Result

A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies. 相似文献

20.

Reverse engineering and analysis of large genome-scale gene networks

Maneesha Aluru Jaroslaw Zola Dan Nettleton Srinivas Aluru 《Nucleic acids research》2013,41(1):e24

Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web. 相似文献