首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Over the last several years, many sequence alignment tools have appeared and become popular for the fast evolution of next generation sequencing technologies. Obviously, researchers that use such tools are interested in getting maximum performance when they execute them in modern infrastructures. Today’s NUMA (Non-uniform memory access) architectures present major challenges in getting such applications to achieve good scalability as more processors/cores are used. The memory system in NUMA systems shows a high complexity and may be the main cause for the loss of an application’s performance. The existence of several memory banks in NUMA systems implies a logical increase in latency associated with the accesses of a given processor to a remote bank. This phenomenon is usually attenuated by the application of strategies that tend to increase the locality of memory accesses. However, NUMA systems may also suffer from contention problems that can occur when concurrent accesses are concentrated on a reduced number of banks. Sequence alignment tools use large data structures to contain reference genomes to which all reads are aligned. Therefore, these tools are very sensitive to performance problems related to the memory system. The main goal of this study is to explore the trade-offs between data locality and data dispersion in NUMA systems. We have performed experiments with several popular sequence alignment tools on two widely available NUMA systems to assess the performance of different memory allocation policies and data partitioning strategies. We find that there is not one method that is best in all cases. However, we conclude that memory interleaving is the memory allocation strategy that provides the best performance when a large number of processors and memory banks are used. In the case of data partitioning, the best results are usually obtained when the number of partitions used is greater, sometimes combined with an interleave policy.  相似文献   

2.
This paper provides an overview of methods and current applications of distributed computing in bioinformatics. Distributed computing is a strategy of dividing a large workload among multiple computers to reduce processing time, or to make use of resources such as programs and databases that are not available on all computers. Participating computers may be connected either through a local high-speed network or through the Internet.  相似文献   

3.
The explosion of data and transactions demands a creative approach for data processing in a variety of applications. Research on remote memory systems (RMSs), so as to exploit the superior characteristics of dynamic random access memory (DRAM), has been performed for many decades, and today’s information explosion galvanizes researchers into shedding new light on the technology. Prior studies have mainly focused on architectural suggestions for such systems, highlighting different design rationale. These studies have shown that choosing the appropriate applications to run on an RMS is important in fully utilizing the advantages of remote memory. This article provides an extensive performance evaluation for various types of data processing applications so as to address the efficacy of an RMS by means of a prototype RMS with reliability functionality. The prototype RMS used is a practical kernel-level RMS that renders large memory data processing feasible. The abstract concept of remote memory was materialized by borrowing unused local memory in commodity PCs via a high speed network capable of Remote Direct Memory Access (RDMA) operations. The prototype RMS uses remote memory without any part of its computation power coming from remote computers. Our experimental results suggest that an RMS can be practical in supporting the rigorous demands of commercial in memory database systems that have high data access locality. Our evaluation also convinces us of the possibility that a reliable RMS can satisfy both the high degree of reliability and efficiency for large memory data processing applications whose data access pattern has high locality.  相似文献   

4.

Background  

BLAST is a widely used genetic research tool for analysis of similarity between nucleotide and protein sequences. This paper presents a software application entitled "Squid" that makes use of grid technology. The current version, as an example, is configured for BLAST applications, but adaptation for other computing intensive repetitive tasks can be easily accomplished in the open source version. This enables the allocation of remote resources to perform distributed computing, making large BLAST queries viable without the need of high-end computers.  相似文献   

5.
Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Many-task computing denotes high-performance computations comprising multiple distinct activities, coupled via file system operations. The aggregate number of tasks, quantity of computing, and volumes of data may be extremely large. Traditional techniques found in production systems in the scientific community to support many-task computing do not scale to today’s largest systems, due to issues in local resource manager scalability and granularity, efficient utilization of the raw hardware, long wait queue times, and shared/parallel file system contention and scalability. To address these limitations, we adopted a “top-down” approach to building a middleware called Falkon, to support the most demanding many-task computing applications at the largest scales. Falkon (Fast and Light-weight tasK executiON framework) integrates (1) multi-level scheduling to enable dynamic resource provisioning and minimize wait queue times, (2) a streamlined task dispatcher able to achieve orders-of-magnitude higher task dispatch rates than conventional schedulers, and (3) data diffusion which performs data caching and uses a data-aware scheduler to co-locate computational and storage resources. Micro-benchmarks have shown Falkon to achieve over 15K+ tasks/s throughputs, scale to hundreds of thousands of processors and to millions of queued tasks, and execute billions of tasks per day. Data diffusion has also shown to improve applications scalability and performance, with its ability to achieve hundreds of Gb/s I/O rates on modest sized clusters, with Tb/s I/O rates on the horizon. Falkon has shown orders of magnitude improvements in performance and scalability than traditional approaches to resource management across many diverse workloads and applications at scales of billions of tasks on hundreds of thousands of processors across clusters, specialized systems, Grids, and supercomputers. Falkon’s performance and scalability have enabled a new class of applications called Many-Task Computing to operate at previously so-believed impossible scales with high efficiency.  相似文献   

6.
Many research institutions are deploying computing clusters based on a shared/buy-in paradigm. Such clusters combine shared computers, which are free to be used by all users, and buy-in computers, which are computers purchased by users for semi-exclusive use. The purpose of this paper is to characterize the typical behavior and performance of a shared/buy-in computing cluster, using data traces from the Shared Computing Cluster (SCC) at Boston University that runs under this paradigm as a case study. Among our main findings, we show that the semi-exclusive policy, which allows any SCC user to use idle buy-in resources for a limited time, increases the utilization of buy-in resources by 17.4%, thus significantly improving the performance of the system as a whole. We find that jobs allowed to run on idle buy-in resources arrive more frequently and run for a shorter time than other jobs. Finally, we identify the run time limit (i.e., the maximum time during which a job is allowed to use resources) and the type of parallel environment as two factors that have a significant impact on the different performance experienced by shared and buy-in jobs.  相似文献   

7.
Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.  相似文献   

8.
The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.  相似文献   

9.
Advances on multicore technologies lead to processors with tens and soon hundreds of cores in a single socket, resulting in an ever growing gap between computing power and available memory and I/O bandwidths for data handling. It would be beneficial if some of the computing power can be transformed into gains of I/O efficiency, thereby reducing this speed disparity between computing and I/O. In this paper, we design and implement a NEarline data COmpression and DECompression (neCODEC) scheme for data-intensive parallel applications. Several salient techniques are introduced in neCODEC, including asynchronous compression threads, elastic file representation, distributed metadata handling, and balanced subfile distribution. Our performance evaluation indicates that neCODEC can improve the performance of a variety of data-intensive microbenchmarks and scientific applications. Particularly, neCODEC is capable of increasing the effective bandwidth of S3D, a combustion simulation code, by more than 5 times.  相似文献   

10.
Rapid accumulation of large and standardized microarray data collections is opening up novel opportunities for holistic characterization of genome function. The limited scalability of current preprocessing techniques has, however, formed a bottleneck for full utilization of these data resources. Although short oligonucleotide arrays constitute a major source of genome-wide profiling data, scalable probe-level techniques have been available only for few platforms based on pre-calculated probe effects from restricted reference training sets. To overcome these key limitations, we introduce a fully scalable online-learning algorithm for probe-level analysis and pre-processing of large microarray atlases involving tens of thousands of arrays. In contrast to the alternatives, our algorithm scales up linearly with respect to sample size and is applicable to all short oligonucleotide platforms. The model can use the most comprehensive data collections available to date to pinpoint individual probes affected by noise and biases, providing tools to guide array design and quality control. This is the only available algorithm that can learn probe-level parameters based on sequential hyperparameter updates at small consecutive batches of data, thus circumventing the extensive memory requirements of the standard approaches and opening up novel opportunities to take full advantage of contemporary microarray collections.  相似文献   

11.
A simple distributed processing system named "Peach" was developed to meet the rising computational demands of modern structural biology (and other) laboratories without additional expense by using existing hardware resources more efficiently. A central server distributes jobs to idle workstations in such a way that each computer is used maximally, but without disturbing intermittent interactive users. As compared to other distributed systems, Peach is simple, easy to install, easy to administer, easy to use, scalable, and robust. While it was designed to queue and distribute large numbers of small tasks to participating computers, it can also be used to send single jobs automatically to the fastest currently available computer and/or survey the activity of an entire laboratory's computers. Tests of robustness and scalability are reported, as are three specific electron cryomicroscopy applications where Peach enabled projects that would not otherwise have been feasible without an expensive, dedicated cluster.  相似文献   

12.
The rapid growth of Internet applications has made communication anonymity an increasingly important or even indispensable security requirement. Onion routing has been employed as an infrastructure for anonymous communication over a public network, which provides anonymous connections that are strongly resistant to both eavesdropping and traffic analysis. However, existing onion routing protocols usually exhibit poor performance due to repeated encryption operations. In this paper, we first present an improved anonymous multi-receiver identity-based encryption (AMRIBE) scheme, and an improved identity-based one-way anonymous key agreement (IBOWAKE) protocol. We then propose an efficient onion routing protocol named AIB-OR that provides provable security and strong anonymity. Our main approach is to use our improved AMRIBE scheme and improved IBOWAKE protocol in onion routing circuit construction. Compared with other onion routing protocols, AIB-OR provides high efficiency, scalability, strong anonymity and fault tolerance. Performance measurements from a prototype implementation show that our proposed AIB-OR can achieve high bandwidths and low latencies when deployed over the Internet.  相似文献   

13.

Background

Molecular dynamics (MD) simulations provide valuable insight into biomolecular systems at the atomic level. Notwithstanding the ever-increasing power of high performance computers current MD simulations face several challenges: the fastest atomic movements require time steps of a few femtoseconds which are small compared to biomolecular relevant timescales of milliseconds or even seconds for large conformational motions. At the same time, scalability to a large number of cores is limited mostly due to long-range interactions. An appealing alternative to atomic-level simulations is coarse-graining the resolution of the system or reducing the complexity of the Hamiltonian to improve sampling while decreasing computational costs. Native structure-based models, also called Gō-type models, are based on energy landscape theory and the principle of minimal frustration. They have been tremendously successful in explaining fundamental questions of, e.g., protein folding, RNA folding or protein function. At the same time, they are computationally sufficiently inexpensive to run complex simulations on smaller computing systems or even commodity hardware. Still, their setup and evaluation is quite complex even though sophisticated software packages support their realization.

Results

Here, we establish an efficient infrastructure for native structure-based models to support the community and enable high-throughput simulations on remote computing resources via GridBeans and UNICORE middleware. This infrastructure organizes the setup of such simulations resulting in increased comparability of simulation results. At the same time, complete workflows for advanced simulation protocols can be established and managed on remote resources by a graphical interface which increases reusability of protocols and additionally lowers the entry barrier into such simulations for, e.g., experimental scientists who want to compare their results against simulations. We demonstrate the power of this approach by illustrating it for protein folding simulations for a range of proteins.

Conclusions

We present software enhancing the entire workflow for native structure-based simulations including exception-handling and evaluations. Extending the capability and improving the accessibility of existing simulation packages the software goes beyond the state of the art in the domain of biomolecular simulations. Thus we expect that it will stimulate more individuals from the community to employ more confidently modeling in their research.

Electronic supplementary material

The online version of this article (doi:10.1186/1471-2105-15-292) contains supplementary material, which is available to authorized users.  相似文献   

14.
We review the resources available to systematic biologists who wish to use computers to build classifications. Algorithm development is in an early stage, and only a few examples of integrated applications for systematic biology are available. The availability of data is crucial if systematic biology is to enter the computer age.  相似文献   

15.
The large amount of biological data available in the current times, makes it necessary to use tools and applications based on sophisticated and efficient algorithms, developed in the area of bioinformatics. Further, access to high performance computing resources is necessary, to achieve results in reasonable time. To speed up applications and utilize available compute resources as efficient as possible, software developers make use of parallelization mechanisms, like multithreading. Many of the available tools in bioinformatics offer multithreading capabilities, but more compute power is not always helpful. In this study we investigated the behavior of well-known applications in bioinformatics, regarding their performance in the terms of scaling, different virtual environments and different datasets with our benchmarking tool suite BOOTABLE. The tool suite includes the tools BBMap, Bowtie2, BWA, Velvet, IDBA, SPAdes, Clustal Omega, MAFFT, SINA and GROMACS. In addition we added an application using the machine learning framework TensorFlow. Machine learning is not directly part of bioinformatics but applied to many biological problems, especially in the context of medical images (X-ray photographs). The mentioned tools have been analyzed in two different virtual environments, a virtual machine environment based on the OpenStack cloud software and in a Docker environment. The gained performance values were compared to a bare-metal setup and among each other. The study reveals, that the used virtual environments produce an overhead in the range of seven to twenty-five percent compared to the bare-metal environment. The scaling measurements showed, that some of the analyzed tools do not benefit from using larger amounts of computing resources, whereas others showed an almost linear scaling behavior. The findings of this study have been generalized as far as possible and should help users to find the best amount of resources for their analysis. Further, the results provide valuable information for resource providers to handle their resources as efficiently as possible and raise the user community’s awareness of the efficient usage of computing resources.  相似文献   

16.
MOTIVATION: Due to the steadily growing computational demands in bioinformatics and related scientific disciplines, one is forced to make optimal use of the available resources. A straightforward solution is to build a network of idle computers and let each of them work on a small piece of a scientific challenge, as done by Seti@Home (http://setiathome.berkeley.edu), the world's largest distributed computing project. RESULTS: We developed a generally applicable distributed computing solution that uses a screensaver system similar to Seti@Home. The software exploits the coarse-grained nature of typical bioinformatics projects. Three major considerations for the design were: (1) often, many different programs are needed, while the time is lacking to parallelize them. Models@Home can run any program in parallel without modifications to the source code; (2) in contrast to the Seti project, bioinformatics applications are normally more sensitive to lost jobs. Models@Home therefore includes stringent control over job scheduling; (3) to allow use in heterogeneous environments, Linux and Windows based workstations can be combined with dedicated PCs to build a homogeneous cluster. We present three practical applications of Models@Home, running the modeling programs WHAT IF and YASARA on 30 PCs: force field parameterization, molecular dynamics docking, and database maintenance.  相似文献   

17.
18.
A global technology arms race is underway to build evermore powerful and precise quantum computers. Quantum computers have the potential to tackle certain quantitative problems quicker than classical computers. The current focus of quantum computing is on pushing the boundaries of fundamental quantum information and commercial applications in industrial sectors, financial services, and other profit-led sectors, particularly where improvements in optimisation and sampling can improve increased economic return. We believe that ecologists could exploit the computational power of quantum computers because the statistical approaches commonly used in ecology already have proven pathways on quantum computers. Moreover, quantum computing could ultimately leapfrog our understanding of complex ecological systems, if the hardware, opportunity, and creativity of quantitative ecologists all align.  相似文献   

19.
I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed.In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号