首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
In this paper, we propose a novel patient-specific method of modelling pulmonary airflow using graphics processing unit (GPU) computation that can be applied in medical practice. To overcome the barriers imposed by computation speed, installation price and footprint to the application of computational fluid dynamics, we focused on GPU computation and the lattice Boltzmann method (LBM). The GPU computation and LBM are compatible due to the characteristics of the GPU. As the optimisation of data access is essential for the performance of the GPU computation, we developed an adaptive meshing method, in which an airway model is covered by isotropic subdomains consisting of a uniform Cartesian mesh. We found that 43 size subdomains gave the best performance. The code was also tested on a small GPU cluster to confirm its performance and applicability, as the price and footprint are reasonable for medical applications.  相似文献   

2.
The high-order schemes have attracted more and more attention in computational fluid dynamics (CFD) simulations. As a kind of high-order schemes, weighted compact nonlinear schemes (WCNSs) have been widely applied in large eddy simulations, direct numerical simulations etc. However, due to the computational complexity, WCNSs require high-performance platforms. In recent years, the highly parallel graphics processing unit (GPU) is rapidly gaining maturity as a powerful engine for high performance computer. In this paper, we present a high-order double-precision solver of the three-dimensional, compressible viscous flow using multi-block structured grids on GPU clusters. The solver utilizes the high-order WCNS scheme for space discretization and Jacobi iteration method for time discretization. In order to utilize the computational capability of CPU and GPU for the solver, we present a workload balancing model for distributing workload among CPUs and GPUs. And we design two strategies to overlap computations with communications. The performance analyses show that the single-GPU solver achieves about 8× speed-ups relative to a serial computation on a CPU core. The performance results validate the workload distribution scheme. The strong and weak scaling analyses show that GPU clusters offer a significant advantage in performance.  相似文献   

3.
Program development environments have enabled graphics processing units (GPUs) to become an attractive high performance computing platform for the scientific community. A commonly posed problem in computational biology is protein database searching for functional similarities. The most accurate algorithm for sequence alignments is Smith-Waterman (SW). However, due to its computational complexity and rapidly increasing database sizes, the process becomes more and more time consuming making cluster based systems more desirable. Therefore, scalable and highly parallel methods are necessary to make SW a viable solution for life science researchers. In this paper we evaluate how SW fits onto the target GPU architecture by exploring ways to map the program architecture on the processor architecture. We develop new techniques to reduce the memory footprint of the application while exploiting the memory hierarchy of the GPU. With this implementation, GSW, we overcome the on chip memory size constraint, achieving 23× speedup compared to a serial implementation. Results show that as the query length increases our speedup almost stays stable indicating the solid scalability of our approach. Additionally this is a first of a kind implementation which purely runs on the GPU instead of a CPU-GPU integrated environment, making our design suitable for porting onto a cluster of GPUs.  相似文献   

4.
This study aims to improve the performance of Dynamic Causal Modelling for Event Related Potentials (DCM for ERP) in MATLAB by using external function calls to a graphics processing unit (GPU). DCM for ERP is an advanced method for studying neuronal effective connectivity. DCM utilizes an iterative procedure, the expectation maximization (EM) algorithm, to find the optimal parameters given a set of observations and the underlying probability model. As the EM algorithm is computationally demanding and the analysis faces possible combinatorial explosion of models to be tested, we propose a parallel computing scheme using the GPU to achieve a fast estimation of DCM for ERP. The computation of DCM for ERP is dynamically partitioned and distributed to threads for parallel processing, according to the DCM model complexity and the hardware constraints. The performance efficiency of this hardware-dependent thread arrangement strategy was evaluated using the synthetic data. The experimental data were used to validate the accuracy of the proposed computing scheme and quantify the time saving in practice. The simulation results show that the proposed scheme can accelerate the computation by a factor of 155 for the parallel part. For experimental data, the speedup factor is about 7 per model on average, depending on the model complexity and the data. This GPU-based implementation of DCM for ERP gives qualitatively the same results as the original MATLAB implementation does at the group level analysis. In conclusion, we believe that the proposed GPU-based implementation is very useful for users as a fast screen tool to select the most likely model and may provide implementation guidance for possible future clinical applications such as online diagnosis.  相似文献   

5.
Hybrid functional Petri nets are a wide-spread tool for representing and simulating biological models. Due to their potential of providing virtual drug testing environments, biological simulations have a growing impact on pharmaceutical research. Continuous research advancements in biology and medicine lead to exponentially increasing simulation times, thus raising the demand for performance accelerations by efficient and inexpensive parallel computation solutions. Recent developments in the field of general-purpose computation on graphics processing units (GPGPU) enabled the scientific community to port a variety of compute intensive algorithms onto the graphics processing unit (GPU). This work presents the first scheme for mapping biological hybrid functional Petri net models, which can handle both discrete and continuous entities, onto compute unified device architecture (CUDA) enabled GPUs. GPU accelerated simulations are observed to run up to 18 times faster than sequential implementations. Simulating the cell boundary formation by Delta-Notch signaling on a CUDA enabled GPU results in a speedup of approximately 7x for a model containing 1,600 cells.  相似文献   

6.
Here, we present a hybrid approach for simulating an edge illumination X-ray phase-contrast imaging (EIXPCi) set-up using graphics processor units (GPU) with a high degree of accuracy. In this study, the applicability of pixel, mesh and non-uniform rational B-splines (NURBS) objects to carry out realistic maps of X-ray phase-contrast distribution at a human scale is accounted for by using numerical anthropomorphic phantoms and a very fast and robust simulation framework which integrates total interaction probabilities along selected X-ray paths. We exploit the mathematical and algorithmic properties of NURBS and describe how to represent human scale phantoms in an edge illumination X-ray phase-contrast model. The presented implementation allows the modeling of a variety of physical interactions of x-rays with different mathematically described objects and the recording of quantities, e.g. path integrals, interaction sites and deposited energies. Furthermore, our efficient, scalable and optimized hybrid Monte Carlo and ray-tracing projector can be used in iterative reconstruction algorithms on multi GPU heterogeneous systems. The preliminary results of our innovative approach show the fine performance of an edge illumination X-ray phase-contrast medical imaging system on various human-like soft tissues with noticeably reduced computation time. Our approach to the EIXPCi modeling confirms that building a true imaging system at a human scale should be possible and the simulations presented here aim at its future development.  相似文献   

7.
Bioinformatic research relies on large-scale computational infrastructures which have a nonzero carbon footprint but so far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this work, we estimate the carbon footprint of bioinformatics (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org, last accessed 2022). We assessed 1) bioinformatic approaches in genome-wide association studies (GWAS), RNA sequencing, genome assembly, metagenomics, phylogenetics, and molecular simulations, as well as 2) computation strategies, such as parallelization, CPU (central processing unit) versus GPU (graphics processing unit), cloud versus local computing infrastructure, and geography. In particular, we found that biobank-scale GWAS emitted substantial kgCO2e and simple software upgrades could make it greener, for example, upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Moreover, switching from the average data center to a more efficient one can reduce carbon footprint by approximately 34%. Memory over-allocation can also be a substantial contributor to an algorithm’s greenhouse gas emissions. The use of faster processors or greater parallelization reduces running time but can lead to greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimize kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.  相似文献   

8.
Graphics processors evolve rapidly and promise to support power-efficient, cost, differentiated price-performance, and scalable high performance computing. MapReduce is a well-known distributed programming model to ease the development of applications for large-scale data processing on a large number of commodity CPUs. When compared to CPUs, GPUs are an order of magnitude faster in terms of computation power and memory bandwidth, but they are harder to program. Although several studies have implemented the MapReduce model on GPUs, most of them are based on the single GPU model and bounded by a GPU memory with inefficient atomic operations. This paper focuses on the development of MGMR, a standalone MapReduce system that utilizes multiple GPUs to manage large-scale data processing beyond the GPU memory limitation, and also to eliminate serial atomic operations. Experimental results have demonstrated the effectiveness of MGMR in handling large data sets.  相似文献   

9.
Tensor contractions are generalized multidimensional matrix multiplication operations that widely occur in quantum chemistry. Efficient execution of tensor contractions on Graphics Processing Units (GPUs) requires several challenges to be addressed, including index permutation and small dimension-sizes reducing thread block utilization. Moreover, to apply the same optimizations to various expressions, we need a code generation tool. In this paper, we present our approach to automatically generate CUDA code to execute tensor contractions on GPUs, including management of data movement between CPU and GPU. To evaluate our tool, GPU-enabled code is generated for the most expensive contractions in CCSD(T), a key coupled cluster method, and incorporated into NWChem, a popular computational chemistry suite. For this method, we demonstrate speedup over a factor of 8.4 using one GPU as compared to one CPU core and over 2.6 when utilizing the entire system using hybrid CPU+GPU solution with 2 GPUs and 5 cores (instead of 7 cores per node). We further investigate tensor contraction code on a new series of GPUs, the Fermi GPUs, and provide several effective optimization algorithms. For the same computation of CCSD(T), on a cluster with Fermi GPUs, we achieve a speedup of 3.4 over a cluster with T10 GPUs. With a single Fermi GPU on each node, we achieve a speedup of 43 over the sequential CPU version.  相似文献   

10.
ADF/cofilins are actin binding proteins that bind actin close to both the N- and C-termini (site 1), and we have found a second cofilin binding site (site 2) centered around helix 112-125 [Renoult, C., Ternent, D., Maciver, S.K., Fattoum, A., Astier, C., Benyamin, Y. & Roustan, C. (1999) J. Biol. Chem. 274, 28893-28899]. We proposed a model in which ADF/cofilin intercalated between subdomains 1 and 2 of two longitudinally associated actin monomers within the actin:cofilin cofilament, explaining the change in twist that ADF/cofilins induce in the filament [McGough, A. Pope, B., Chiu, W. & Weeds, A. (1998) J. Cell Biol. 138, 771-781]. Here, we have determined the fuller extent of the cofilin footprint on site 1 of actin. Site 1 is primarily the G-actin binding site. Experiments with both peptide mimetics and fluorescently labeled cofilin suggest that site 2 only becomes available for cofilin binding within the filament, possibly due to motion between subdomains 1 and 2 within an actin monomer. We have detected motion between subdomains 1 and 2 of G-actin by FRET induced by cofilin, to reveal the second cofilin-binding site. This motion may also explain how cofilins inhibit the nucleotide exchange of actin, and why the actin:cofilin complex is polymerizable without dissociation.  相似文献   

11.
Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.  相似文献   

12.
The graphics processing unit (GPU), which originally was used exclusively for visualization purposes, has evolved into an extremely powerful co-processor. In the meanwhile, through the development of elaborate interfaces, the GPU can be used to process data and deal with computationally intensive applications. The speed-up factors attained compared to the central processing unit (CPU) are dependent on the particular application, as the GPU architecture gives the best performance for algorithms that exhibit high data parallelism and high arithmetic intensity. Here, we evaluate the performance of the GPU on a number of common algorithms used for three-dimensional image processing. The algorithms were developed on a new software platform called "CUDA", which allows a direct translation from C code to the GPU. The implemented algorithms include spatial transformations, real-space and Fourier operations, as well as pattern recognition procedures, reconstruction algorithms and classification procedures. In our implementation, the direct porting of C code in the GPU achieves typical acceleration values in the order of 10-20 times compared to a state-of-the-art conventional processor, but they vary depending on the type of the algorithm. The gained speed-up comes with no additional costs, since the software runs on the GPU of the graphics card of common workstations.  相似文献   

13.
Water footprinting has emerged as an important approach to assess water use related effects from consumption of goods and services. Assessment methods are proposed by two different communities, the Water Footprint Network (WFN) and the Life Cycle Assessment (LCA) community. The proposed methods are broadly similar and encompass both the computation of water use and its impacts, but differ in communication of a water footprint result. In this paper, we explain the role and goal of LCA and ISO-compatible water footprinting and resolve the six issues raised by Hoekstra (2016) in “A critique on the water-scarcity weighted water footprint in LCA”. By clarifying the concerns, we identify both the overlapping goals in the WFN and LCA water footprint assessments and discrepancies between them. The main differing perspective between the WFN and LCA-based approach seems to relate to the fact that LCA aims to account for environmental impacts, while the WFN aims to account for water productivity of global fresh water as a limited resource. We conclude that there is potential to use synergies in research for the two approaches and highlight the need for proper declaration of the methods applied.  相似文献   

14.
High performance computing on the Graphics Processing Unit (GPU) is an emerging field driven by the promise of high computational power at a low cost. However, GPU programming is a non-trivial task and moreover architectural limitations raise the question of whether investing effort in this direction may be worthwhile. In this work, we use GPU programming to simulate a two-layer network of Integrate-and-Fire neurons with varying degrees of recurrent connectivity and investigate its ability to learn a simplified navigation task using a policy-gradient learning rule stemming from Reinforcement Learning. The purpose of this paper is twofold. First, we want to support the use of GPUs in the field of Computational Neuroscience. Second, using GPU computing power, we investigate the conditions under which the said architecture and learning rule demonstrate best performance. Our work indicates that networks featuring strong Mexican-Hat-shaped recurrent connections in the top layer, where decision making is governed by the formation of a stable activity bump in the neural population (a "non-democratic" mechanism), achieve mediocre learning results at best. In absence of recurrent connections, where all neurons "vote" independently ("democratic") for a decision via population vector readout, the task is generally learned better and more robustly. Our study would have been extremely difficult on a desktop computer without the use of GPU programming. We present the routines developed for this purpose and show that a speed improvement of 5x up to 42x is provided versus optimised Python code. The higher speed is achieved when we exploit the parallelism of the GPU in the search of learning parameters. This suggests that efficient GPU programming can significantly reduce the time needed for simulating networks of spiking neurons, particularly when multiple parameter configurations are investigated.  相似文献   

15.
Studies on the genetic mechanisms involved in the regulation of lean body mass (LBM) in mammals are minimal, although LBM is associated with a competent immune system and an overall good (healthy) body functional status. In this study, we performed a high-density genome-wide scan using 633 (MRL/MPJ × SJL/J) F2 intercross to identify the quantitative trait loci (QTL) involved in the regulation of LBM. We hypothesized that additional QTL can be identified using a different mouse cross (MRL/SJL cross). Ten QTL were identified for LBM on chromosomes (chrs) 2, 6, 7, 9,13 and 14. Of those ten, QTL on chrs 6, 7 and 14 were exclusive to LBM, while QTL on chrs 4 and 11 were exclusively body length. LBM QTL on chrs 2 and 9 overlap with those of size. Altogether, the ten LBM QTL explained 41.2% of phenotypic variance in F2 mice. Five significantly interacting loci that may be involved in the regulation of LBM were identified and accounted for 24.4% of phenotypic variance explained by the QTL. Five epistatic interactions, contributing 22.9% of phenotypic variance, were identified for body length. Interacting loci on chr 2 may influence LBM by regulating body length. Therefore, epistatic interactions as well as single QTL effects play an important role in the regulation of LBM. Electronic Publication  相似文献   

16.
With the continuous development of hardware and software, Graphics Processor Units (GPUs) have been used in the general-purpose computation field. They have emerged as a computational accelerator that dramatically reduces the application execution time with CPUs. To achieve high computing performance, a GPU typically includes hundreds of computing units. The high density of computing resource on a chip brings in high power consumption. Therefore power consumption has become one of the most important problems for the development of GPUs. This paper analyzes the energy consumption of parallel algorithms executed in GPUs and provides a method to evaluate the energy scalability for parallel algorithms. Then the parallel prefix sum is analyzed to illustrate the method for the energy conservation, and the energy scalability is experimentally evaluated using Sparse Matrix-Vector Multiply (SpMV). The results show that the optimal number of blocks, memory choice and task scheduling are the important keys to balance the performance and the energy consumption of GPUs.  相似文献   

17.
18.
Propagation of sound waves in air can be considered as a special case of fluid dynamics. Consequently, the lattice Boltzmann method (LBM) for fluid flow can be used for simulating sound propagation. In this article application of the LBM to sound propagation is illustrated for various cases: free-field propagation, propagation over porous and non-porous ground, propagation over a noise barrier, and propagation in an atmosphere with wind. LBM results are compared with solutions of the equations of acoustics. It is found that the LBM works well for sound waves, but dissipation of sound waves with the LBM is generally much larger than real dissipation of sound waves in air. To circumvent this problem it is proposed here to use the LBM for assessing the excess sound level, i.e. the difference between the sound level and the free-field sound level. The effect of dissipation on the excess sound level is much smaller than the effect on the sound level, so the LBM can be used to estimate the excess sound level for a non-dissipative atmosphere, which is a useful quantity in atmospheric acoustics. To reduce dissipation in an LBM simulation two approaches are considered: i) reduction of the kinematic viscosity and ii) reduction of the lattice spacing.  相似文献   

19.
Maximum Likelihood (ML) method has an excellent performance for Direction-Of-Arrival (DOA) estimation, but a multidimensional nonlinear solution search is required which complicates the computation and prevents the method from practical use. To reduce the high computational burden of ML method and make it more suitable to engineering applications, we apply the Artificial Bee Colony (ABC) algorithm to maximize the likelihood function for DOA estimation. As a recently proposed bio-inspired computing algorithm, ABC algorithm is originally used to optimize multivariable functions by imitating the behavior of bee colony finding excellent nectar sources in the nature environment. It offers an excellent alternative to the conventional methods in ML-DOA estimation. The performance of ABC-based ML and other popular meta-heuristic-based ML methods for DOA estimation are compared for various scenarios of convergence, Signal-to-Noise Ratio (SNR), and number of iterations. The computation loads of ABC-based ML and the conventional ML methods for DOA estimation are also investigated. Simulation results demonstrate that the proposed ABC based method is more efficient in computation and statistical performance than other ML-based DOA estimation methods.  相似文献   

20.
To investigate factors affecting the low lean body mass (LBM) of young women, we focused on the increase in body weight until one year of age and current lifestyles. In 442 young women, the increase in body weight from birth until one year of age, breast-feeding method in infancy, current physique index and body composition, and physique and lifestyles were investigated using a questionnaire. Subjects with an LBM percentile of less than 33.3 (less than 36.8 kg) were classified as having a low LBM (n = 150), and those with a 33.3 or higher LBM percentile as the control (n = 293). Based on body weight changes from birth to days 3 and 7, the subjects were divided into a rapid weight gain group and two non-rapid weight gain groups (groups 1-3). To analyze factors involved in a low LBM, multivariate analysis using a logistic model was employed. The prevalence of a low LBM in the rapid weight gain group was 0.41 times higher than in the others. The prevalence of a low LBM with a low birth weight was 0.58 times higher, indicating that a low birth weight is likely to result in a low LBM. Regarding the lifestyles, the prevalence of a low LBM in subjects with a current breakfasting habit was 0.60 times higher than in those without one. These findings suggest that the thinness of young women characterized by a low LBM is associated with the increase in body weight until one year of age and current lifestyles.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号