首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Diffusion limited aggregation (DLA) has proved very successful in modelling systems which display fractal characteristics, like viscous fingering. However, by nature, such simulations are very processor intensive, requiring large amounts of processor time even for relatively small models. We have performed simulations of viscous fingering on the NCUBE parallel computer which has hypercube architecture. We find that, as long as the number of processors used is much less than both the total number of walkers released and the overall dimensions of the model, the fractal dimensions obtained using serial and parallel algorithms give similar results whilst achieving a considerable speed-up in the parallel implementation. An average fractal dimension of 1.71 was obtained along with a speed-up of 106 (in the best case) and 83% efficiency using 128 processors.  相似文献   

2.
Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle (ideal gas) simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures.  相似文献   

3.
Particle swarm optimization (PSO) has previously been parallelized primarily by distributing the computation corresponding to particles across multiple processors. In these approaches, the only benefit of additional processors is an increased swarm size. However, in many cases this is not efficient when scaled to very large swarm sizes (on very large clusters). Current methods cannot answer well the question: “How can 1000 processors be fully utilized when 50 or 100 particles is the most efficient swarm size?” In this paper we attempt to answer that question with a speculative approach to the parallelization of PSO that we refer to as SEPSO.  相似文献   

4.
With the increasing interest in large-scale, high-resolution and real-time geographic information system (GIS) applications and spatial big data processing, traditional GIS is not efficient enough to handle the required loads due to limited computational capabilities.Various attempts have been made to adopt high performance computation techniques from different applications, such as designs of advanced architectures, strategies of data partition and direct parallelization method of spatial analysis algorithm, to address such challenges. This paper surveys the current state of parallel GIS with respect to parallel GIS architectures, parallel processing strategies, and relevant topics. We present the general evolution of the GIS architecture which includes main two parallel GIS architectures based on high performance computing cluster and Hadoop cluster. Then we summarize the current spatial data partition strategies, key methods to realize parallel GIS in the view of data decomposition and progress of the special parallel GIS algorithms. We use the parallel processing of GRASS as a case study. We also identify key problems and future potential research directions of parallel GIS.  相似文献   

5.
Recently, PC clusters have come to be studied intensively for large scale parallel computers of the next generation. ATM technology is a strong candidate as a de facto standard of high speed communication networks. Therefore, an ATM-connected PC cluster is a promising platform from the cost/performance point of view, as a future high performance computing environment. Data intensive applications, such as data mining and ad hoc query processing in databases, are considered very important for massively parallel processors, as well as for conventional scientific calculations. Thus, investigating the feasibility of applications on an ATM-connected PC cluster is meaningful. In this paper, an ATM-connected PC cluster consisting of 100 PCs is reported, and characteristics of a transport layer protocol for the PC cluster are evaluated. Point-to-point communication performance is measured and discussed, when a TCP window size parameter is changed. Parallel data mining is implemented and evaluated on the cluster. Retransmission caused by cell loss at the ATM switch is analyzed, and parameters of retransmission mechanism suitable for parallel processing on the large scale PC cluster are clarified. Default TCP protocol cannot provide good performance, since a lot of collisions happen during all-to-all multicasting executed on the large scale PC cluster. Using TCP parameters with the proposed optimization, performance improvement is achieved for parallel data mining on 100 PCs. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

6.
7.
We present POY version 5, an open source program for the phylogenetic analysis of diverse data types including qualitative, aligned sequences, unaligned sequences, genomic data, and user‐defined sequences. In addition to the maximum‐parsimony optimality criterion supported by POY4, POY5 supports several types of maximum likelihood as well as posterior probability. To make these analyses feasible, new heuristic search algorithms and parallelization options have been implemented for all criteria.  相似文献   

8.
The speed of analytical algorithms becomes increasingly important as systematists accumulate larger data sets. In this paper I discuss several time-saving modifications to published Fitch-parsimony tree search algorithms, including shortcuts that allow rapid evaluation of tree lengths and fast reoptimization of trees after clipping or joining of subtrees, as well as search strategies that allows one to successively increase the exhaustiveness of branch swapping. I also describe how Fitch-parsimony algorithms can be restructured to take full advantage of the computing power of modern microprocessors by horizontal or vertical packing of characters, allowing simultaneous processing of many characters, and by avoidance of conditional branches that disturb instruction flow. These new multicharacter algorithms are particularly useful for large data sets of characters with a small number of states, such as nucleotide characters. As an example, the multicharacter algorithms are estimated to be 3.6–10 times faster than single-character equivalents on a PowerPC 604. The speed gain is even larger on processors using MMX, Altivec or similar technologies allowing single instructions to be performed on multiple data simultaneously.  相似文献   

9.
MPI collective communication operations to distribute or gather data are used for many parallel applications from scientific computing, but they may lead to scalability problems since their execution times increase with the number of participating processors. In this article, we show how the execution time of collective communication operations can be improved significantly by an internal restructuring based on orthogonal processor structures with two or more levels. The execution time of operations like MPI_Bcast() or MPI_Allgather() can be reduced by 40% and 70% on a dual Xeon cluster and a Beowulf cluster with single-processor nodes. But also on a Cray T3E a significant performance improvement can be obtained by a careful selection of the processor structure. The use of these optimized communication operations can reduce the execution time of data parallel implementations of complex application programs significantly without requiring any other change of the computation and communication structure. We present runtime functions for the modeling of two-phase realizations and verify that these runtime functions can predict the execution time both for communication operations in isolation and in the context of application programs.  相似文献   

10.
A Load Balancing Tool for Distributed Parallel Loops   总被引:1,自引:0,他引:1  
Large scale applications typically contain parallel loops with many iterates. The iterates of a parallel loop may have variable execution times which translate into performance degradation of an application due to load imbalance. This paper describes a tool for load balancing parallel loops on distributed-memory systems. The tool assumes that the data for a parallel loop to be executed is already partitioned among the participating processors. The tool utilizes the MPI library for interprocessor coordination, and determines processor workloads by loop scheduling techniques. The tool was designed independent of any application; hence, it must be supplied with a routine that encapsulates the computations for a chunk of loop iterates, as well as the routines to transfer data and results between processors. Performance evaluation on a Linux cluster indicates that the tool reduces the cost of executing a simulated irregular loop without load balancing by up to 81%. The tool is useful for parallelizing sequential applications with parallel loops, or as an alternate load balancing routine for existing parallel applications.  相似文献   

11.
In discrete tomography, a scanned object is assumed to consist of only a few different materials. This prior knowledge can be effectively exploited by a specialized discrete reconstruction algorithm such as the Discrete Algebraic Reconstruction Technique (DART), which is capable of providing more accurate reconstructions from limited data compared to conventional reconstruction algorithms. However, like most iterative reconstruction algorithms, DART suffers from long computation times. To increase the computational efficiency as well as the reconstruction quality of DART, a multiresolution version of DART (MDART) is proposed, in which the reconstruction starts on a coarse grid with big pixel (voxel) size. The resulting reconstruction is then resampled on a finer grid and used as an initial point for a subsequent DART reconstruction. This process continues until the target pixel size is reached. Experiments show that MDART can provide a significant speed-up, reduce missing wedge artefacts and improve feature reconstruction in the object compared with DART within the same time, making its use with large datasets more feasible.  相似文献   

12.
We investigated the usefulness of a parallel genetic algorithm for phylogenetic inference under the maximum-likelihood (ML) optimality criterion. Parallelization was accomplished by assigning each "individual" in the genetic algorithm "population" to a separate processor so that the number of processors used was equal to the size of the evolving population (plus one additional processor for the control of operations). The genetic algorithm incorporated branch-length and topological mutation, recombination, selection on the ML score, and (in some cases) migration and recombination among subpopulations. We tested this parallel genetic algorithm with large (228 taxa) data sets of both empirically observed DNA sequence data (for angiosperms) as well as simulated DNA sequence data. For both observed and simulated data, search-time improvement was nearly linear with respect to the number of processors, so the parallelization strategy appears to be highly effective at improving computation time for large phylogenetic problems using the genetic algorithm. We also explored various ways of optimizing and tuning the parameters of the genetic algorithm. Under the conditions of our analyses, we did not find the best-known solution using the genetic algorithm approach before terminating each run. We discuss some possible limitations of the current implementation of this genetic algorithm as well as of avenues for its future improvement.  相似文献   

13.
We describe an optimized algorithm, which is faster and more accurate compared to previously described algorithms, for computing the statistical mechanics of denaturation of nucleic acid sequences according to the classical Poland-Scheraga type of model. Nearest neighbor thermodynamics has been included in a complete and general way, by rigorously treating nearest neighbor interactions, helix end interactions, and isolated base-pairs. This avoids the simplifications of previous approaches and achieves full generality and controllability with respect to thermodynamic modeling. The algorithm computes subchain partition functions by recursion, from which various quantitative aspects of the melting process are easily derived, for example the base-pairing probability profiles. The algorithm represents an optimization with respect to algorithmic complexity of the partition function algorithm of Yeramian et al. (Biopolymers 1990, 30, 481-497): we reduce the computation time for a base-pairing probability profile from O(N2) to O(N), where N is the sequence length. This speed-up comes in addition to the speed-up due to a multiexponential approximation of the loop entropy factor as introduced by Fixman and Freire22 and applied by Yeramian et al. The speed-up, however, is independent of the multiexponential approximation and reduces time from O(N3) to O(N2) in the exact case. A method for representing very large numbers is described, which avoids numerical overflow in the partition functions for genomic length sequences. In addition to calculating the standard base-pairing probability profiles, we propose to use the algorithm to calculate various other probabilities (loops, helices, tails) for a more direct view of the melting regions and their positions and sizes. This can provide a better understanding of the physics of denaturation and the biology of genomes.  相似文献   

14.
Using a triangular lattice model to study the designability of protein folding, we overcame the parity problem of previous cubic lattice model and enumerated all the sequences and compact structures on a simple two-dimensional triangular lattice model of size 4 5 6 5 4. We used two types of amino acids, hydrophobic and polar, to make up the sequences, and achieved 223W212 different sequences excluding the reverse symmetry sequences. The total string number of distinct compact structures was 219,093, excluding reflection symmetry in the self-avoiding path of length 24 triangular lattice model. Based on this model, we applied a fast search algorithm by constructing a cluster tree. The algorithm decreased the computation by computing the objective energy of non-leaf nodes. The parallel experiments proved that the fast tree search algorithm yielded an exponential speed-up in the model of size 4 5 6 5 4. Designability analysis was performed to understand the search result.  相似文献   

15.

Background  

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet).  相似文献   

16.
The phylogeny of megachiropteran bats (Mammalia: Chiroptera: Pteropodidae) has been investigated using several different molecular datasets. These studies differed widely in taxonomic and locus sampling, and their results tended to lack resolution of internal nodes and were themselves largely incongruent. To address this, we assembled a data set of 5 loci (up to 3.5 kbp from 12S rDNA, 16S rDNA, tDNA‐valine, cytochrome b, and the nuclear gene c‐mos) for 43 species of megachiropterans and 6 microchiropteran outgroups. We analyzed these data with direct optimization under equal costs for substitutions and indels. We used POY in a parallel setting, and searches consisted of replicated swapping + refinements (ratcheting, tree fusing, and iterative pass optimization). Our results indicate that Megachiroptera and all recognized genera (including Pteropus) are monophyletic, and that Melonycteris is the sister group of the clade containing all the other genera. Clades previously proposed using molecular data, as well as many new and traditional groups, were well‐supported, and various sources suggest that the degree of conflict with morphological data may be considerably less marked than previously supposed. Analysis of individual loci suffer 70% loss in the number of compatible groups recovered across all analyses with respect to combined analyses. Our results indicate that, within Megachiroptera, nectarivory and cave‐dwelling originated several times, but echolocation (used for obstacle detection) evolved only once. Megachiropterans likely originated in SE Asia‐Melanesia, and colonized Africa at least four times.  相似文献   

17.
Sensitivity analyses can be performed with respect to different methodologies, differential analytical parameters or models within a single methodology, or alignment parameters. The latter investigations are particularly relevant when divergence and/or the size of molecular data sets make alignment of sequences difficult. Sensitivity analyses are often performed for analyses incorporating Direct Optimization (via POY), either to select optimal alignment parameters or to investigate the stability of topology across parameter sets. Such investigations are rarely, if ever, performed for Clustal alignments as some manual adjustments are nearly always incorporated in the final alignment. Exploration of the performance of both POY and Clustal for a large insect data set incorporating three genes (18S, 28S, H3) and morphology reveals that the performance of POY, as measured by and ILD metric, is predictable across the landscape topology with minimal incongruence when all parameters are treated equally. In contrast, Clustal alignment followed by parsimony analysis yields a landscape with less overall variance, but less predictable behaviour across the parameter topology. © The Willi Hennig Society 2005.  相似文献   

18.
Given a set of evolutionary trees on a same set of taxa, the maximum agreement subtree problem (MAST), respectively, maximum compatible tree problem (MCT), consists of finding a largest subset of taxa such that all input trees restricted to these taxa are isomorphic, respectively compatible. These problems have several applications in phylogenetics such as the computation of a consensus of phylogenies obtained from different data sets, the identification of species subjected to horizontal gene transfers and, more recently, the inference of supertrees, e.g., Trees Of Life. We provide two linear time algorithms to check the isomorphism, respectively, compatibility, of a set of trees or otherwise identify a conflict between the trees with respect to the relative location of a small subset of taxa. Then, we use these algorithms as subroutines to solve MAST and MCT on rooted or unrooted trees of unbounded degree. More precisely, we give exact fixed-parameter tractable algorithms, whose running time is uniformly polynomial when the number of taxa on which the trees disagree is bounded. The improves on a known result for MAST and proves fixed-parameter tractability for MCT.  相似文献   

19.
Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.  相似文献   

20.
Much modern work in phylogenetics depends on statistical sampling approaches to phylogeny construction to estimate probability distributions of possible trees for any given input data set. Our theoretical understanding of sampling approaches to phylogenetics remains far less developed than that for optimization approaches, however, particularly with regard to the number of sampling steps needed to produce accurate samples of tree partition functions. Despite the many advantages in principle of being able to sample trees from sophisticated probabilistic models, we have little theoretical basis for concluding that the prevailing sampling approaches do in fact yield accurate samples from those models within realistic numbers of steps. We propose a novel approach to phylogenetic sampling intended to be both efficient in practice and more amenable to theoretical analysis than the prevailing methods. The method depends on replacing the standard tree rearrangement moves with an alternative Markov model in which one solves a theoretically hard but practically tractable optimization problem on each step of sampling. The resulting method can be applied to a broad range of standard probability models, yielding practical algorithms for efficient sampling and rigorous proofs of accurate sampling for heated versions of some important special cases. We demonstrate the efficiency and versatility of the method by an analysis of uncertainty in tree inference over varying input sizes. In addition to providing a new practical method for phylogenetic sampling, the technique is likely to prove applicable to many similar problems involving sampling over combinatorial objects weighted by a likelihood model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号