期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DecGPU: distributed error correction on massively parallel graphics processing units using CUDA and MPI

Yongchao Liu Bertil Schmidt Douglas L Maskell 《BMC bioinformatics》2011,12(1):85

Background

Next-generation sequencing technologies have led to the high-throughput production of sequence data (reads) at low cost. However, these reads are significantly shorter and more error-prone than conventional Sanger shotgun reads. This poses a challenge for the de novo assembly in terms of assembly quality and scalability for large-scale short read datasets. 相似文献

2.

Six-fold speed-up of Smith-Waterman sequence database searches using parallel processing on common microprocessors

Rognes T Seeberg E 《Bioinformatics (Oxford, England)》2000,16(8):699-706

MOTIVATION: Sequence database searching is among the most important and challenging tasks in bioinformatics. The ultimate choice of sequence-search algorithm is that of Smith-Waterman. However, because of the computationally demanding nature of this method, heuristic programs or special-purpose hardware alternatives have been developed. Increased speed has been obtained at the cost of reduced sensitivity or very expensive hardware. RESULTS: A fast implementation of the Smith-Waterman sequence-alignment algorithm using Single-Instruction, Multiple-Data (SIMD) technology is presented. This implementation is based on the MultiMedia eXtensions (MMX) and Streaming SIMD Extensions (SSE) technology that is embedded in Intel's latest microprocessors. Similar technology exists also in other modern microprocessors. Six-fold speed-up relative to the fastest previously known Smith-Waterman implementation on the same hardware was achieved by an optimized 8-way parallel processing approach. A speed of more than 150 million cell updates per second was obtained on a single Intel Pentium III 500 MHz microprocessor. This is probably the fastest implementation of this algorithm on a single general-purpose microprocessor described to date. 相似文献

3.

MrBayes on a graphics processing unit 总被引：1，自引：0，他引：1

Zhou J Liu X Stones DS Xie Q Wang G 《Bioinformatics (Oxford, England)》2011,27(9):1255-1261

相似文献

4.

Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations

Dematté L 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):655-667

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output 相似文献

5.

Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format

Bustamam A Burrage K Hamilton NA 《IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM》2012,9(3):679-692

Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data. 相似文献

6.

Fast space-filling molecular graphics using dynamic partitioning among parallel processors

《Journal of molecular graphics》1991,9(3):139-147

We present a novel algorithm for the efficient generation of high-quality space-filling molecular graphics that is particularly appropriate for the creation of the large number of images needed in the animation of molecular dynamics. Each atom of the molecule is represented by a sphere of an appropriate radius, and the image of the sphere is constructed pixel-by-pixel using a generalization of the lighting model proposed by Porter (Comp. Graphics 1978, 12, 282). The edges of the spheres are antialiased, and intersections between spheres are handled through a simple blending algorithm that provides very smooth edges. We have implemented this algorithm on a multiprocessor computer using a procedure that dynamically repartitions the effort among the processors based on the CPU time used by each processor to create the previous image. This dynamic reallocation among processors automatically maximizes efficiency in the face of both the changing nature of the image from frame to frame and the shifting demands of the other programs running simultaneously on the same processors. We present data showing the efficiency of this multiprocessing algorithm as the number of processors is increased. The combination of the graphics and multiprocessor algorithms allows the fast generation of many high-quality images. 相似文献

7.

Advancing simulations of biological materials: applications of coarse-grained models on graphics processing unit hardware

David N. LeBard 《Molecular simulation》2014,40(10-11):802-820

The timescales of biological processes, primarily those inherent to the molecular mechanisms of disease, are long (>μs) and involve complex interactions of systems consisting of many atoms (>10⁶). Simulating these systems requires an advanced computational approach, and as such, coarse-grained (CG) models have been developed and highly optimised for accelerator hardware, primarily graphics processing units (GPUs). In this review, I discuss the implementation of CG models for biologically relevant systems, and show how such models can be optimised and perform well on GPU-accelerated hardware. Several examples of GPU implementations of CG models for both molecular dynamics and Monte Carlo simulations on purely GPU and hybrid CPU/GPU architectures are presented. Both the hardware and algorithmic limitations of various models, which depend greatly on the application of interest, are discussed. 相似文献

8.

Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA

Dariusz Mrozek Miłosz Brożek Bożena Małysiak-Mrozek 《Journal of molecular modeling》2014,20(2):1-17

Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT (“GPU-CASSERT”) parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm. 相似文献

9.

Chemical databases processing using parallel computer hardware

M F Lynch E M Rasmussen P Willett G A Manson G A Wilson 《Biochemical Society transactions》1989,17(5):856-858

相似文献

10.

GPUmotif: an ultra-fast and energy-efficient motif analysis program using graphics processing units

Zandevakili P Hu M Qin Z 《PloS one》2012,7(5):e36865

相似文献

11.

cuConv: CUDA implementation of convolution for CNN inference

Jord&#; Marc Valero-Lara Pedro Pe&#;a Antonio J. 《Cluster computing》2022,25(2):1459-1473

Cluster Computing - Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and... 相似文献

12.

Application of a parallel processor to molecular graphics

《Journal of molecular graphics》1984,2(2):56

相似文献

13.

Indoor objects detection system implementation using multi-graphic processing units

Afif Mouna Ayachi Riadh Atri Mohamed 《Cluster computing》2022,25(1):469-483

Cluster Computing - Indoor objects detection and recognition plays an important role in computer science and artificial intelligence fields. This task plays also a crucial role especially for blind... 相似文献

14.

Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs)

Palenstijn WJ Batenburg KJ Sijbers J 《Journal of structural biology》2011,176(2):250-253

Iterative reconstruction algorithms are becoming increasingly important in electron tomography of biological samples. These algorithms, however, impose major computational demands. Parallelization must be employed to maintain acceptable running times. Graphics Processing Units (GPUs) have been demonstrated to be highly cost-effective for carrying out these computations with a high degree of parallelism. In a recent paper by Xu et al. (2010), a GPU implementation strategy was presented that obtains a speedup of an order of magnitude over a previously proposed GPU-based electron tomography implementation. In this technical note, we demonstrate that by making alternative design decisions in the GPU implementation, an additional speedup can be obtained, again of an order of magnitude. By carefully considering memory access locality when dividing the workload among blocks of threads, the GPU’s cache is used more efficiently, making more effective use of the available memory bandwidth. 相似文献

15.

Postdifferential display: parallel processing of candidates using small amounts of RNA

Poirier GM Erlander MG 《Methods (San Diego, Calif.)》1998,16(4):444-452

The necessity of screening differentially expressed candidate genes has imposed a limit on the application of differential display to large-scale analysis of gene expression patterns. Screening candidates has indeed proven a burden because traditional screening methods require the purification of large amounts of RNA. In this article we describe an assay that allows the screening of 240 candidate genes with only 5 microg of total RNA. This assay consists of using cDNA probes synthesized from amplified RNA in differential screening and can be performed in a 96-well plate format. 相似文献

16.

Learning and memory: parallel processing

Yates D 《Nature reviews. Neuroscience》2011,12(9):488

相似文献

17.

Production of complex nucleic acid libraries using highly parallel in situ oligonucleotide synthesis

Cleary MA Kilian K Wang Y Bradshaw J Cavet G Ge W Kulkarni A Paddison PJ Chang K Sheth N Leproust E Coffey EM Burchard J McCombie WR Linsley P Hannon GJ 《Nature methods》2004,1(3):241-248

Generation of complex libraries of defined nucleic acid sequences can greatly aid the functional analysis of protein and gene function. Previously, such studies relied either on individually synthesized oligonucleotides or on cellular nucleic acids as the starting material. As each method has disadvantages, we have developed a rapid and cost-effective alternative for construction of small-fragment DNA libraries of defined sequences. This approach uses in situ microarray DNA synthesis for generation of complex oligonucleotide populations. These populations can be recovered and either used directly or immortalized by cloning. From a single microarray, a library containing thousands of unique sequences can be generated. As an example of the potential applications of this technology, we have tested the approach for the production of plasmids encoding short hairpin RNAs (shRNAs) targeting numerous human and mouse genes. We achieved high-fidelity clone retrieval with a uniform representation of intended library sequences. 相似文献

18.

Microdroplet-enabled highly parallel co-cultivation of microbial communities 总被引：2，自引：0，他引：2

Park J Kerner A Burns MA Lin XN 《PloS one》2011,6(2):e17019

Microbial interactions in natural microbiota are, in many cases, crucial for the sustenance of the communities, but the precise nature of these interactions remain largely unknown because of the inherent complexity and difficulties in laboratory cultivation. Conventional pure culture-oriented cultivation does not account for these interactions mediated by small molecules, which severely limits its utility in cultivating and studying "unculturable" microorganisms from synergistic communities. In this study, we developed a simple microfluidic device for highly parallel co-cultivation of symbiotic microbial communities and demonstrated its effectiveness in discovering synergistic interactions among microbes. Using aqueous micro-droplets dispersed in a continuous oil phase, the device could readily encapsulate and co-cultivate subsets of a community. A large number of droplets, up to ～1,400 in a 10 mm × 5 mm chamber, were generated with a frequency of 500 droplets/sec. A synthetic model system consisting of cross-feeding E. coli mutants was used to mimic compositions of symbionts and other microbes in natural microbial communities. Our device was able to detect a pair-wise symbiotic relationship when one partner accounted for as low as 1% of the total population or each symbiont was about 3% of the artificial community. 相似文献

19.

Automating parallel implementation of neural learning algorithms

Rana OF 《International journal of neural systems》2000,10(3):227-241

Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries. 相似文献

20.

160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

Isaac TS Li Warren Shum Kevin Truong 《BMC bioinformatics》2007,8(1):185

Background

To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. 相似文献