期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Performance evaluation of image processing algorithms on the GPU

Castaño-Díez D Moser D Schoenegger A Pruggnaller S Frangakis AS 《Journal of structural biology》2008,164(1):153-160

The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are dependent on the particular application, as the GPU architecture gives the best performance for algorithms that exhibit high data parallelism and high arithmetic intensity. Here, we evaluate the performance of the GPU on a number of common algorithms used for three-dimensional image processing. The algorithms were developed on a new software platform called "CUDA", which allows a direct translation from C code to the GPU. The implemented algorithms include spatial transformations, real-space and Fourier operations, as well as pattern recognition procedures, reconstruction algorithms and classification procedures. In our implementation, the direct porting of C code in the GPU achieves typical acceleration values in the order of 10-20 times compared to a state-of-the-art conventional processor, but they vary depending on the type of the algorithm. The gained speed-up comes with no additional costs, since the software runs on the GPU of the graphics card of common workstations. 相似文献

2.

Solvated and generalised Born calculations differences using GPU CUDA and multi-CPU simulations of an antifreeze protein with AMBER

Antonio Peramo 《Molecular simulation》2016,42(15):1263-1273

While there has been an increase in the number of biomolecular computational studies employing graphics processing units (GPU), results describing their use with the molecular dynamics package AMBER with the CUDA implementation are scarce. No information is available comparing MD methodologies pmemd.cuda, pmemd.mpi or sander.mpi, available in AMBER, for generalised Born (GB) simulations or with solvated systems. As part of our current studies with antifreeze proteins (AFP), and for the previous reasons, we present details of our experience comparing performance of MD simulations at varied temperatures between multi-CPU runs using sander.mpi, pmemd.mpi and pmemd.cuda with the AFP from the fish ocean pout (1KDF). We found extremely small differences in total energies between multi-CPU and GPU CUDA implementations of AMBER12 in 1ns production simulations of the solvated system using the TIP3P water model. Additionally, GPU computations achieved typical one order of magnitude speedups when using mixed precision but were similar to CPU speeds when computing with double precision. However, we found that GB calculations were highly sensitive to the choice of initial GB parametrisation regardless of the type of methodology, with substantial differences in total energies. 相似文献

3.

Scalable and highly parallel implementation of Smith-Waterman on graphics processing unit using CUDA 总被引：1，自引：0，他引：1

Ali Akoglu Gregory M. Striemer 《Cluster computing》2009,12(3):341-352

Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23× speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs. 相似文献

4.

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

Bustamam A Burrage K Hamilton NA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):679-692

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data. 相似文献

5.

Patient-specific modelling of pulmonary airflow using GPU cluster for the application in medical practice

Miki T Wang X Aoki T Imai Y Ishikawa T Takase K Yamaguchi T 《Computer methods in biomechanics and biomedical engineering》2012,15(7):771-778

In this paper, we propose a novel patient-specific method of modelling pulmonary airflow using graphics processing unit (GPU) computation that can be applied in medical practice. To overcome the barriers imposed by computation speed, installation price and footprint to the application of computational fluid dynamics, we focused on GPU computation and the lattice Boltzmann method (LBM). The GPU computation and LBM are compatible due to the characteristics of the GPU. As the optimisation of data access is essential for the performance of the GPU computation, we developed an adaptive meshing method, in which an airway model is covered by isotropic subdomains consisting of a uniform Cartesian mesh. We found that 4(3) size subdomains gave the best performance. The code was also tested on a small GPU cluster to confirm its performance and applicability, as the price and footprint are reasonable for medical applications. 相似文献

6.

Fast parallel tandem mass spectral library searching using GPU hardware acceleration

Baumgardner LA Shanmugam AK Lam H Eng JK Martin DB 《Journal of proteome research》2011,10(6):2882-2888

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment. 相似文献

7.

CPU/GPU computing for a multi-block structured grid based high-order flow solver on a large heterogeneous system

Wei Cao Chuan-fu Xu Zheng-hua Wang Lu Yao Hua-yong Liu 《Cluster computing》2014,17(2):255-270

The high-order schemes have attracted more and more attention in computational fluid dynamics (CFD) simulations. As a kind of high-order schemes, weighted compact nonlinear schemes (WCNSs) have been widely applied in large eddy simulations, direct numerical simulations etc. However, due to the computational complexity, WCNSs require high-performance platforms. In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. In this paper, we present a high-order double-precision solver of the three-dimensional, compressible viscous flow using multi-block structured grids on GPU clusters. The solver utilizes the high-order WCNS scheme for space discretization and Jacobi iteration method for time discretization. In order to utilize the computational capability of CPU and GPU for the solver, we present a workload balancing model for distributing workload among CPUs and GPUs. And we design two strategies to overlap computations with communications. The performance analyses show that the single-GPU solver achieves about 8× speed-ups relative to a serial computation on a CPU core. The performance results validate the workload distribution scheme. The strong and weak scaling analyses show that GPU clusters offer a significant advantage in performance. 相似文献

8.

Patient-specific modelling of pulmonary airflow using GPU cluster for the application in medical practice

T. Miki X. Wang T. Aoki Y. Imai T. Ishikawa K. Takase 《Computer methods in biomechanics and biomedical engineering》2013,16(7):771-778

In this paper, we propose a novel patient-specific method of modelling pulmonary airflow using graphics processing unit (GPU) computation that can be applied in medical practice. To overcome the barriers imposed by computation speed, installation price and footprint to the application of computational fluid dynamics, we focused on GPU computation and the lattice Boltzmann method (LBM). The GPU computation and LBM are compatible due to the characteristics of the GPU. As the optimisation of data access is essential for the performance of the GPU computation, we developed an adaptive meshing method, in which an airway model is covered by isotropic subdomains consisting of a uniform Cartesian mesh. We found that 4³ size subdomains gave the best performance. The code was also tested on a small GPU cluster to confirm its performance and applicability, as the price and footprint are reasonable for medical applications. 相似文献

9.

Harvesting graphics power for MD simulations

J.A. van Meel A. Arnold D. Frenkel S.F. Portegies Zwart R.G. Belleman Section Computational Science University of Amsterdam Amsterdam The Netherlands 《Molecular simulation》2013,39(3):259-266

We discuss an implementation of molecular dynamics (MD) simulations on a graphic processing unit (GPU) in the NVIDIA CUDA language. We tested our code on a modern GPU, the NVIDIA GeForce 8800 GTX. Results for two MD algorithms suitable for short-ranged and long-ranged interactions, and a congruential shift random number generator are presented. The performance of the GPU's is compared to their main processor counterpart. We achieve speedups of up to 40, 80 and 150 fold, respectively. With the latest generation of GPU's one can run standard MD simulations at 10⁷ flops/$. 相似文献

10.

GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units

Zandevakili P Hu M Qin Z 《PloS one》2012,7(5):e36865

相似文献

11.

Patient-specific non-linear finite element modelling for predicting soft organ deformation in real-time; Application to non-rigid neuroimage registration

Adam Wittek Grand Joldes Mathieu Couton Simon K. Warfield Karol Miller 《Progress in biophysics and molecular biology》2010,103(2-3):292-303

Long computation times of non-linear (i.e. accounting for geometric and material non-linearity) biomechanical models have been regarded as one of the key factors preventing application of such models in predicting organ deformation for image-guided surgery. This contribution presents real-time patient-specific computation of the deformation field within the brain for six cases of brain shift induced by craniotomy (i.e. surgical opening of the skull) using specialised non-linear finite element procedures implemented on a graphics processing unit (GPU). In contrast to commercial finite element codes that rely on an updated Lagrangian formulation and implicit integration in time domain for steady state solutions, our procedures utilise the total Lagrangian formulation with explicit time stepping and dynamic relaxation. We used patient-specific finite element meshes consisting of hexahedral and non-locking tetrahedral elements, together with realistic material properties for the brain tissue and appropriate contact conditions at the boundaries. The loading was defined by prescribing deformations on the brain surface under the craniotomy. Application of the computed deformation fields to register (i.e. align) the preoperative and intraoperative images indicated that the models very accurately predict the intraoperative deformations within the brain. For each case, computing the brain deformation field took less than 4 s using an NVIDIA Tesla C870 GPU, which is two orders of magnitude reduction in computation time in comparison to our previous study in which the brain deformation was predicted using a commercial finite element solver executed on a personal computer. 相似文献

12.

Advancing simulations of biological materials: applications of coarse-grained models on graphics processing unit hardware

David N. LeBard 《Molecular simulation》2014,40(10-11):802-820

The timescales of biological processes, primarily those inherent to the molecular mechanisms of disease, are long (>μs) and involve complex interactions of systems consisting of many atoms (>10⁶). Simulating these systems requires an advanced computational approach, and as such, coarse-grained (CG) models have been developed and highly optimised for accelerator hardware, primarily graphics processing units (GPUs). In this review, I discuss the implementation of CG models for biologically relevant systems, and show how such models can be optimised and perform well on GPU-accelerated hardware. Several examples of GPU implementations of CG models for both molecular dynamics and Monte Carlo simulations on purely GPU and hybrid CPU/GPU architectures are presented. Both the hardware and algorithmic limitations of various models, which depend greatly on the application of interest, are discussed. 相似文献

13.

A Software Architecture for Multi-Cellular System Simulations on Graphics Processing Units

Anne Jeannin-Girardon Pascal Ballet Vincent Rodin 《Acta biotheoretica》2013,61(3):317-327

The first aim of simulation in virtual environment is to help biologists to have a better understanding of the simulated system. The cost of such simulation is significantly reduced compared to that of in vivo simulation. However, the inherent complexity of biological system makes it hard to simulate these systems on non-parallel architectures: models might be made of sub-models and take several scales into account; the number of simulated entities may be quite large. Today, graphics cards are used for general purpose computing which has been made easier thanks to frameworks like CUDA or OpenCL. Parallelization of models may however not be easy: parallel computer programing skills are often required; several hardware architectures may be used to execute models. In this paper, we present the software architecture we built in order to implement various models able to simulate multi-cellular system. This architecture is modular and it implements data structures adapted for graphics processing units architectures. It allows efficient simulation of biological mechanisms. 相似文献

14.

Sop‐GPU: Accelerating biomolecular simulations in the centisecond timescale using graphics processors

A. Zhmurov R. I. Dima Y. Kholodov V. Barsegov 《Proteins》2010,78(14):2984-2999

Theoretical exploration of fundamental biological processes involving the forced unraveling of multimeric proteins, the sliding motion in protein fibers and the mechanical deformation of biomolecular assemblies under physiological force loads is challenging even for distributed computing systems. Using a C_α‐based coarse‐grained self organized polymer (SOP) model, we implemented the Langevin simulations of proteins on graphics processing units (SOP‐GPU program). We assessed the computational performance of an end‐to‐end application of the program, where all the steps of the algorithm are running on a GPU, by profiling the simulation time and memory usage for a number of test systems. The ～90‐fold computational speedup on a GPU, compared with an optimized central processing unit program, enabled us to follow the dynamics in the centisecond timescale, and to obtain the force‐extension profiles using experimental pulling speeds (v_f = 1–10 μm/s) employed in atomic force microscopy and in optical tweezers‐based dynamic force spectroscopy. We found that the mechanical molecular response critically depends on the conditions of force application and that the kinetics and pathways for unfolding change drastically even upon a modest 10‐fold increase in v_f. This implies that, to resolve accurately the free energy landscape and to relate the results of single‐molecule experiments in vitro and in silico, molecular simulations should be carried out under the experimentally relevant force loads. This can be accomplished in reasonable wall‐clock time for biomolecules of size as large as 10⁵ residues using the SOP‐GPU package. Proteins 2010; © 2010 Wiley‐Liss, Inc. 相似文献

15.

图形处理器在实时光学相干断层成像中的应用

杜竹君高天欣唐晓英《激光生物学报》2017,26(2)

光学相干断层成像(optical coherence tomography,OCT)技术在成像过程中具有极大的数据量和计算量,传统的基于中央处理器(central processing unit,CPU)的计算平台难以满足OCT实时成像的需求。图形处理器(graphics processing unit,GPU)在通用计算方面具有强大的并行处理能力和数值计算能力,可以突破OCT实时成像的瓶颈。本文对GPU做了简要介绍并阐述了GPU在OCT实时成像及功能成像中的应用及研究进展。相似文献

16.

Multi-dimensional, mesoscopic Monte Carlo simulations of inhomogeneous reaction-drift-diffusion systems on graphics-processing units

Vigelius M Meyer B 《PloS one》2012,7(4):e33384

For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway. 相似文献

17.

Parallel high-dimensional multi-objective feature selection for EEG classification with dynamic workload balancing on CPU–GPU architectures

Juan?José?Escobar Email author Julio?Ortega Jesús?González Miguel?Damas Antonio?F.?Díaz 《Cluster computing》2017,20(3):1881-1897

Many bioinformatics applications that analyse large volumes of high-dimensional data comprise complex problems requiring metaheuristics approaches with different types of implicit parallelism. For example, although functional parallelism would be used to accelerate evolutionary algorithms, the fitness evaluation of the population could imply the computation of cost functions with data parallelism. This way, heterogeneous parallel architectures, including central processing unit (CPU) microprocessors with multiple superscalar cores and accelerators such as graphics processing units (GPUs) could be very useful. This paper aims to take advantage of such CPU–GPU heterogeneous architectures to accelerate electroencephalogram classification and feature selection problems by evolutionary multi-objective optimization, in the context of brain computing interface tasks. In this paper, we have used the OpenCL framework to develop parallel master-worker codes implementing an evolutionary multi-objective feature selection procedure in which the individuals of the population are dynamically distributed among the available CPU and GPU cores. 相似文献

18.

GPU-accelerated replica exchange molecular simulation on solid–liquid phase transition study of Lennard-Jones fluids

Kentaro Nomura Minoru Oikawa Atsushi Kawai Tetsu Narumi 《Molecular simulation》2015,41(10-12):874-880

Determining the solid–liquid phase transition point by conventional molecular dynamics (MD) simulations is difficult because of the tendency of the system to get trapped in local minimum energy states at low temperatures and hysteresis during cooling and heating cycles. The replica exchange method, used in performing many MD simulations of the system at different temperature conditions simultaneously and performs exchanges of these temperatures at certain intervals, has been introduced as a tool to overcome this local-minimum problem. However, around the phase transition temperature, a greater number of different temperatures are required to adequately find the phase transition point. In addition, the number of different temperature values increases when treating larger systems resulting in huge computation times. We propose a computational acceleration of the replica exchange MD simulation on graphics processing units (GPUs) in studying first-order solid–liquid phase transitions of Lennard-Jones (LJ) fluids. The phase transition temperature for a 108-atom LJ fluid has been calculated to validate our new code. The result corresponds with that of a previous study using multicanonical ensemble. The computational speed is measured for various GPU-cluster sizes. A peak performance of 196.3 GFlops with one GPU and 8.13 TFlops with 64 GPUs is achieved. 相似文献

19.

Invariants and Other Structural Properties of Biochemical Models as a Constraint Satisfaction Problem

Soliman S 《Algorithms for molecular biology : AMB》2012,7(1):15-9

Background

We present a way to compute the minimal semi-positive invariants of a Petri net representing a biological reaction system, as resolution of a Constraint Satisfaction Problem. The use of Petri nets to manipulate Systems Biology models and make available a variety of tools is quite old, and recently analyses based on invariant computation for biological models have become more and more frequent, for instance in the context of module decomposition.

Results

In our case, this analysis brings both qualitative and quantitative information on the models, in the form of conservation laws, consistency checking, etc. thanks to finite domain constraint programming. It is noticeable that some of the most recent optimizations of standard invariant computation techniques in Petri nets correspond to well-known techniques in constraint solving, like symmetry-breaking. Moreover, we show that the simple and natural encoding proposed is not only efficient but also flexible enough to encompass sub/sur-invariants, siphons/traps, etc., i.e., other Petri net structural properties that lead to supplementary insight on the dynamics of the biochemical system under study.

Conclusions

A simple implementation based on GNU-Prolog's finite domain solver, and including symmetry detection and breaking, was incorporated into the BIOCHAM modelling environment and in the independent tool Nicotine. Some illustrative examples and benchmarks are provided. 相似文献

20.

Bayesian Computation Emerges in Generic Cortical Microcircuits through Spike-Timing-Dependent Plasticity

Bernhard Nessler Michael Pfeiffer Lars Buesing Wolfgang Maass 《PLoS computational biology》2013,9(4)

The principles by which networks of neurons compute, and how spike-timing dependent plasticity (STDP) of synaptic weights generates and maintains their computational function, are unknown. Preceding work has shown that soft winner-take-all (WTA) circuits, where pyramidal neurons inhibit each other via interneurons, are a common motif of cortical microcircuits. We show through theoretical analysis and computer simulations that Bayesian computation is induced in these network motifs through STDP in combination with activity-dependent changes in the excitability of neurons. The fundamental components of this emergent Bayesian computation are priors that result from adaptation of neuronal excitability and implicit generative models for hidden causes that are created in the synaptic weights through STDP. In fact, a surprising result is that STDP is able to approximate a powerful principle for fitting such implicit generative models to high-dimensional spike inputs: Expectation Maximization. Our results suggest that the experimentally observed spontaneous activity and trial-to-trial variability of cortical neurons are essential features of their information processing capability, since their functional role is to represent probability distributions rather than static neural codes. Furthermore it suggests networks of Bayesian computation modules as a new model for distributed information processing in the cortex. 相似文献