期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Large Scale Molecular Dynamics on Parallel Computers using the Link-cell Algorithm

M. R. S. Pinches D. J. Tildesley W. Smith 《Molecular simulation》2013,39(1-3):51-87

Parallel computers offer a more cost-effective route to high performance computing than traditional single processor machines. Software for such machines is still in its infancy and they are often much more difficult to program than sequential machines. In addition many of the algorithms which are successful with sequential and vector processors are no longer appropriate. Both the force calculation and integration steps of molecular dynamics are parallel in nature and for that reason we have developed a parallel algorithm based on the link cell technique. This method is particularly efficient when the range of intermolecular potential is much smaller than the dimensions of the simulation box. The details of the algorithm are presented for systems of atoms in two and three dimensions using a number of decompositions into sub-units. The algorithm has been tested on an Intel iPSC/2 and a Cray X-MP/416 and the results are presented for simulations of up to 2 · 10⁶ atoms. 相似文献

2.

Parallel network simulations with NEURON

Migliore M Cannia C Lytton WW Markram H Hines ML 《Journal of computational neuroscience》2006,21(2):119-129

The NEURON simulation environment has been extended to support parallel network simulations. Each processor integrates the equations for its subnet over an interval equal to the minimum (interprocessor) presynaptic spike generation to postsynaptic spike delivery connection delay. The performance of three published network models with very different spike patterns exhibits superlinear speedup on Beowulf clusters and demonstrates that spike communication overhead is often less than the benefit of an increased fraction of the entire problem fitting into high speed cache. On the EPFL IBM Blue Gene, almost linear speedup was obtained up to 100 processors. Increasing one model from 500 to 40,000 realistic cells exhibited almost linear speedup on 2000 processors, with an integration time of 9.8 seconds and communication time of 1.3 seconds. The potential for speed-ups of several orders of magnitude makes practical the running of large network simulations that could otherwise not be explored. Action Editor: Alain Destexhe 相似文献

3.

Genetic algorithms and parallel processing in maximum-likelihood phylogeny inference 总被引：2，自引：0，他引：2

Brauer MJ Holder MT Dries LA Zwickl DJ Lewis PO Hillis DM 《Molecular biology and evolution》2002,19(10):1717-1726

We investigated the usefulness of a parallel genetic algorithm for phylogenetic inference under the maximum-likelihood (ML) optimality criterion. Parallelization was accomplished by assigning each "individual" in the genetic algorithm "population" to a separate processor so that the number of processors used was equal to the size of the evolving population (plus one additional processor for the control of operations). The genetic algorithm incorporated branch-length and topological mutation, recombination, selection on the ML score, and (in some cases) migration and recombination among subpopulations. We tested this parallel genetic algorithm with large (228 taxa) data sets of both empirically observed DNA sequence data (for angiosperms) as well as simulated DNA sequence data. For both observed and simulated data, search-time improvement was nearly linear with respect to the number of processors, so the parallelization strategy appears to be highly effective at improving computation time for large phylogenetic problems using the genetic algorithm. We also explored various ways of optimizing and tuning the parameters of the genetic algorithm. Under the conditions of our analyses, we did not find the best-known solution using the genetic algorithm approach before terminating each run. We discuss some possible limitations of the current implementation of this genetic algorithm as well as of avenues for its future improvement. 相似文献

4.

The massively parallel genetic algorithm for RNA folding: MIMD implementation and population variation 总被引：4，自引：0，他引：4

Shapiro BA Wu JC Bengali D Potts MJ 《Bioinformatics (Oxford, England)》2001,17(2):137-148

A massively parallel Genetic Algorithm (GA) has been applied to RNA sequence folding on three different computer architectures. The GA, an evolution-like algorithm that is applied to a large population of RNA structures based on a pool of helical stems derived from an RNA sequence, evolves this population in parallel. The algorithm was originally designed and developed for a 16384 processor SIMD (Single Instruction Multiple Data) MasPar MP-2. More recently it has been adapted to a 64 processor MIMD (Multiple Instruction Multiple Data) SGI ORIGIN 2000, and a 512 processor MIMD CRAY T3E. The MIMD version of the algorithm raises issues concerning RNA structure data-layout and processor communication. In addition, the effects of population variation on the predicted results are discussed. Also presented are the scaling properties of the algorithm from the perspective of the number of physical processors utilized and the number of virtual processors (RNA structures) operated upon. 相似文献

5.

Efficiency of Parallel Direct Optimization

Daniel A. Janies Ward C. Wheeler 《Cladistics : the international journal of the Willi Hennig Society》2001,17(1):S71-S82

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. 相似文献

6.

A Parallel Algorithm for Nonequilibrium Molecular Dynamics Simulation of Shear Flow on Distributed Memory Machines

David P. Hansen Denis J. Evans 《Molecular simulation》2013,39(6):375-393

Abstract

An algorithm is described which allows Nonequilibrium Molecular Dynamics (NEMD) simulations of a fluid undergoing planar Couette flow (shear flow) to be carried out on a distributed memory parallel processor using a (spatial) domain decomposition technique. Unlike previous algorithms, this algorithm uses a co-moving, or Lagrangian, simulation box. Also, the shape of the simulation box changes throughout the course of the simulation. The algorithm, which can be used for two or three dimensional systems, has been tested on a Fujitsu AP1000 Parallel computer with 128 processors. 相似文献

7.

Systolic Loop Methods for Molecular Dynamics Simulation,Generalised for Macromolecules

A. R. C. Raine 《Molecular simulation》2013,39(1-2):59-69

Abstract

Systolic loop programs have been shown to be very efficient for molecular dynamics simulations of liquid systems on distributed memory parallel computers. The original methods address the case where the number of molecules simulated exceeds the number of processors used. Simulations of large flexible molecules often do not meet this condition, requiring the three- and four-body terms used to model chemical bonds within a molecule to be distributed over several processors. This paper discusses how the systolic loop methods may be generalised to accommodate such systems, and describes the implementation of a computer program for simulation of protein dynamics. Performance figures are given for this program running typical simulations on a Meiko Computing Surface using different number of processors. 相似文献

8.

Effects of rabbit gastrointestinal mucins and dextran on hydrochloride diffusion in vitro

Fujita T Ohara S Sugaya T Saigenji K Hotta K 《Comparative biochemistry and physiology. Part B, Biochemistry & molecular biology》2000,126(3):353-359

We compared a viscous fingering formation of hydrochloric acid (HCl) in rabbit corpus, antral and duodenal mucins and with dextran under neutral and acidic conditions with respect to relative viscosity, molecular mass, and carbohydrate composition. The effect of desialyzation of duodenal mucin on the viscous fingering formation of HCl was also examined. HCl (0.1 N) was injected into 1% solutions of mucins and dextran and a subsequent viscous fingering formation was assessed based on an influx volume rate of HCl. A low influx volume rate indicates a high ability of the solutions to produce viscous fingers. The influx volume rate of HCl was lowest in duodenal mucin followed bl corpus mucin, antral mucin, and dextran at pH 7. The influx volume rate of HCl was inversely correlated with the relative viscosity of the solution. Maximum molecular masses were large in the order of corpus, antral, and duodenal mucins, and they were larger than dextran T2000. Rabbit gastrointestinal mucins were very polydisperse system. Duodenal mucin contains more sialic acid than gastric mucins; the influx volume rate of HCl increased in desialylated duodenal mucin. It is suggested that the higher ability of gastric mucins to prevent HCl diffusion than dextran were due to the differences in the molecular mass. The ability of duodenal mucin to prevent HCl diffusion was probably attributed to its high sialic acid content, which may reflect a physiological role of duodenal mucin in the duodenum that has to deal with HCl influx from the stomach. 相似文献

9.

Optical Doppler tomography based on a field programmable gate array

Henning Engelbrecht Larsen Ronnie Thorup Nilsson Lars Thrane Finn Pedersen Thomas Martini Jrgensen Peter E. Andersen 《Biomedical signal processing and control》2008,3(1):102-106

We report the design of and results obtained by using a field programmable gate array (FPGA) to digitally process optical Doppler tomography signals. The processor fits into the analog signal path in an existing optical coherence tomography setup. We demonstrate both Doppler frequency and envelope extraction using the Hilbert transform, all in a single FPGA. An FPGA implementation has certain advantages over general purpose digital signal processor (DSP) due to the fact that the processing elements operate in parallel as opposed to the DSP, which is primarily a sequential processor. 相似文献

10.

Fully implicit parallel simulation of single neurons 总被引：1，自引：1，他引：0

Hines ML Markram H Schürmann F 《Journal of computational neuroscience》2008,25(3):439-448

When a multi-compartment neuron is divided into subtrees such that no subtree has more than two connection points to other subtrees, the subtrees can be on different processors and the entire system remains amenable to direct Gaussian elimination with only a modest increase in complexity. Accuracy is the same as with standard Gaussian elimination on a single processor. It is often feasible to divide a 3-D reconstructed neuron model onto a dozen or so processors and experience almost linear speedup. We have also used the method for purposes of load balance in network simulations when some cells are so large that their individual computation time is much longer than the average processor computation time or when there are many more processors than cells. The method is available in the standard distribution of the NEURON simulation program. 相似文献

11.

A case study of high-throughput biological data processing on parallel platforms

Pekurovsky D Shindyalov IN Bourne PE 《Bioinformatics (Oxford, England)》2004,20(12):1940-1947

MOTIVATION: Analysis of large biological data sets using a variety of parallel processor computer architectures is a common task in bioinformatics. The efficiency of the analysis can be significantly improved by properly handling redundancy present in these data combined with taking advantage of the unique features of these compute architectures. RESULTS: We describe a generalized approach to this analysis, but present specific results using the program CEPAR, an efficient implementation of the Combinatorial Extension algorithm in a massively parallel (PAR) mode for finding pairwise protein structure similarities and aligning protein structures from the Protein Data Bank. CEPAR design and implementation are described and results provided for the efficiency of the algorithm when run on a large number of processors. AVAILABILITY: Source code is available by contacting one of the authors. 相似文献

12.

Fast space-filling molecular graphics using dynamic partitioning among parallel processors

《Journal of molecular graphics》1991,9(3):139-147

We present a novel algorithm for the efficient generation of high-quality space-filling molecular graphics that is particularly appropriate for the creation of the large number of images needed in the animation of molecular dynamics. Each atom of the molecule is represented by a sphere of an appropriate radius, and the image of the sphere is constructed pixel-by-pixel using a generalization of the lighting model proposed by Porter (Comp. Graphics 1978, 12, 282). The edges of the spheres are antialiased, and intersections between spheres are handled through a simple blending algorithm that provides very smooth edges. We have implemented this algorithm on a multiprocessor computer using a procedure that dynamically repartitions the effort among the processors based on the CPU time used by each processor to create the previous image. This dynamic reallocation among processors automatically maximizes efficiency in the face of both the changing nature of the image from frame to frame and the shifting demands of the other programs running simultaneously on the same processors. We present data showing the efficiency of this multiprocessing algorithm as the number of processors is increased. The combination of the graphics and multiprocessor algorithms allows the fast generation of many high-quality images. 相似文献

13.

Optimizing dataflow applications on heterogeneous environments

George Teodoro Timothy D. R. Hartley Umit V. Catalyurek Renato Ferreira 《Cluster computing》2012,15(2):125-144

The increases in multi-core processor parallelism and in the flexibility of many-core accelerator processors, such as GPUs, have turned traditional SMP systems into hierarchical, heterogeneous computing environments. Fully exploiting these improvements in parallel system design remains an open problem. Moreover, most of the current tools for the development of parallel applications for hierarchical systems concentrate on the use of only a single processor type (e.g., accelerators) and do not coordinate several heterogeneous processors. Here, we show that making use of all of the heterogeneous computing resources can significantly improve application performance. Our approach, which consists of optimizing applications at run-time by efficiently coordinating application task execution on all available processing units is evaluated in the context of replicated dataflow applications. The proposed techniques were developed and implemented in an integrated run-time system targeting both intra- and inter-node parallelism. The experimental results with a real-world complex biomedical application show that our approach nearly doubles the performance of the GPU-only implementation on a distributed heterogeneous accelerator cluster. 相似文献

14.

The effects of grain geometry on waterflooding and viscous fingering in micro-fractures and porous media from a lattice Boltzmann method study

Yousef Shiri Mohsen Nazari Mohammad Sharifi 《Molecular simulation》2018,44(9):708-721

The study of fluid fronts formed in porous media is important for enhanced oil recovery. The purposes of this study are to simulate waterflooding and to investigate influential factors on the fluid front movement through a micro-fracture and through simple porous media with different grain geometries. This study used the Shan–Chen form of the Lattice Boltzmann Method (LBM). An increase in the velocity is found to result in viscous fingering, whereas an increase in the wettability of the displacing fluid and the dynamic viscosity ratio creates a piston form of the fluid front. In porous media with the same porosities, various geometries act differently as obstacles against fluid flow from the inlet to the outlet. By enlarging the cross-sectional area of grains in the fluid paths and making them more tortuous, narrower and more twisted films of viscous fingering are formed. The sweep efficiency was also determined under various conditions: with a fixed capillary number, neutral wettability and different viscosity ratios; and with a fixed capillary number, viscosity ratio of (1/3) and wet or non-wet conditions. In all cases, the best sweep efficiency was obtained with grains of diamond geometry. Generally, the least sweep efficiency occurs with grains of star geometry. Simulation results verified the strength and accuracy of LBM predictions. 相似文献

15.

Parallel computing in interval mapping of quantitative trait loci.

O Carlborg L Andersson-Eklund L Andersson 《The Journal of heredity》2001,92(5):449-451

Linear regression analysis is considered the least computationally demanding method for mapping quantitative trait loci (QTL). However, simultaneous search for multiple QTL, the use of permutations to obtain empirical significance thresholds, and larger experimental studies significantly increase the computational demand. This report describes an easily implemented parallel algorithm, which significantly reduces the computing time in both QTL mapping and permutation testing. In the example provided, the analysis time was decreased to less than 15% of a single processor system by the use of 18 processors. We indicate how the efficiency of the analysis could be improved by distributing the computations more evenly to the processors and how other ways of distributing the data facilitate the use of more processors. The use of parallel computing in QTL mapping makes it possible to routinely use permutations to obtain empirical significance thresholds for multiple traits and multiple QTL models. It could also be of use to improve the computational efficiency of the more computationally demanding QTL analysis methods. 相似文献

16.

Optimizing MPI collective communication by orthogonal structures

Matthias Kühnemann Thomas Rauber Gudula Rünger 《Cluster computing》2006,9(3):257-279

MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating processors. In this article, we show how the execution time of collective communication operations can be improved significantly by an internal restructuring based on orthogonal processor structures with two or more levels. The execution time of operations like MPI_Bcast() or MPI_Allgather() can be reduced by 40% and 70% on a dual Xeon cluster and a Beowulf cluster with single-processor nodes. But also on a Cray T3E a significant performance improvement can be obtained by a careful selection of the processor structure. The use of these optimized communication operations can reduce the execution time of data parallel implementations of complex application programs significantly without requiring any other change of the computation and communication structure. We present runtime functions for the modeling of two-phase realizations and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs. 相似文献

17.

A fractal model for the characterization of mycelial morphology 总被引：1，自引：0，他引：1

Patankar DB Liu TC Oolman T 《Biotechnology and bioengineering》1993,42(5):571-578

A new technique based on a fractal model has been developed for the quantification of the macroscopic morophology of mycelia. The morphological structuring is treated as a fractal object, and the fractal dimension, determined by an ultrasonic scattering procedure developed for the purpose, serves as a quantitative morphological index. Experimental observations reported earlier and simulations of mycelial growth, carried out using a probabilistic-geometric growth model developed for the purpose, both validate the applicability of the fractal model. In experiments with three different species, the fractal dimensions of pelletous structures were found to be in the range 1.45-2.0 and those of filamentous structures were in the range 1.9-2.7, with values around 2.0 representing mixed morphologies. Fractal dimensions calculated from simulated mycelia are in rough agreement with these ranges. The fractal dimension is also found to be relatively insensitive to the biomass concentration, as seen by dilution of the original broths. The relation between morphology and filtration properties of the broths has also been studied. The fractal dimension shows a strong correlation with the index of cake compressibility and with the Kozeny constant, two filtration parameters that are known to be morphology dependent. This technique could thus be used to develop correlations between the morphology, represented by the fractal dimension, and important morphology-dependent process variables. (c) 1993 John Wiley & Sons, Inc. 相似文献

18.

Reverse computation for rollback-based fault tolerance in large parallel systems

Kalyan S. Perumalla Alfred J. Park 《Cluster computing》2014,17(2):303-313

Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle (ideal gas) simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures. 相似文献

19.

Importance of long-time simulations for rare event sampling in zinc finger proteins

Ryan Godwin William Gmeiner 《Journal of biomolecular structure & dynamics》2016,34(1):125-134

Molecular dynamics (MD) simulation methods have seen significant improvement since their inception in the late 1950s. Constraints of simulation size and duration that once impeded the field have lessened with the advent of better algorithms, faster processors, and parallel computing. With newer techniques and hardware available, MD simulations of more biologically relevant timescales can now sample a broader range of conformational and dynamical changes including rare events. One concern in the literature has been under which circumstances it is sufficient to perform many shorter timescale simulations and under which circumstances fewer longer simulations are necessary. Herein, our simulations of the zinc finger NEMO (2JVX) using multiple simulations of length 15, 30, 1000, and 3000 ns are analyzed to provide clarity on this point. 相似文献

20.

Fractal dimensions of metropolitan area road networks and the impacts on the urban built environment

《Ecological Indicators》2016

The fractal dimension of road networks emerges as a measure of the complexity of road transport infrastructures. In this study, we measured fractal dimensions of both the geometric form (i.e., the layout of the roads) and structure hierarchy (i.e., the connections among roads) of the major road networks in the largest 95 U.S. metro areas. We explained the causes of the variances in these fractal dimensions, especially the one for structure hierarchy. Further, we hypothesized the impacts of these fractal dimensions on the urban built environment and validated our hypotheses using path analysis. We found that a larger geometric fractal dimension (D_g) shows a more uniform distribution of roads over the metro area, which provides the accessibility to suburban areas and incentives to low-density development. A larger structural fractal dimension (D_s) indicates the highly-connected roads (e.g., highways) tend to join to other highly-connected roads so that most roads can be reached by a small number of neighboring roads (i.e., the small-world phenomenon). As D_s increases and the small-world effect become more significant, daily vehicle miles traveled per capita (DVMT/Cap) decline. However, D_s should be kept low in order to reduce the DVMT/Cap as population size increases. We consider that the low D_s can contribute to more mixed, polycentric and more uniform on an urban area-wide basis. Overall, higher D_g and D_s of the major road network in a metro area leads to higher per capita carbon emissions of transport, and lower quality of life as population increases. In the end, we conclude that fractal dimensions can provide valuable insight into the nature of the transportation land use nexus. 相似文献