首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
OLAP (On-Line Analytical Processing) is an approach to efficiently evaluate multidimensional data for business intelligence applications. OLAP contributes to business decision-making by identifying, extracting, and analyzing multidimensional data. The fundamental structure of OLAP is a data cube that enables users to interactively explore the distinct data dimensions. Processing depends on the complexity of queries, dimensionality, and growing size of the data cube. As data volumes keep on increasing and the demands by business users also increase, higher processing speed than ever is needed, as faster processing means faster decisions and more profit to industry. In this paper, we are proposing an Adaptive Hybrid OLAP Architecture that takes advantage of heterogeneous systems with GPUs and CPUs and leverages their different memory subsystems characteristics to minimize response time. Thus, our approach (a) exploits both types of hardware rather than using the CPU only as a frontend for GPU; (b) uses two different data formats (multidimensional cube and relational cube) to match the GPU and CPU memory access patterns and diverts queries adaptively to the best resource for solving the problem at hand; (c) exploits data locality of multidimensional OLAP on NUMA multicore systems through intelligent thread placement; and (d) guides its adaptation and choices by an architectural model that captures the memory access patterns and the underlying data characteristics. Results show an increase in performance by roughly four folds over the best known related approach. There is also the important economical factor. The proposed hybrid system costs only 10 % more than same system without GPU. With this small extra cost, the added GPU increases query processing by almost 2 times.  相似文献   

2.
The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.  相似文献   

3.
As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimization of MPI collectives for clusters of NUMA nodes. We develop performance models for collective communication using shared memory and we demonstrate several algorithms for various collectives. Experiments are conducted on both Xeon X5650 and Opteron 6100 InfiniBand clusters. The measurements agree with the model and indicate that different algorithms dominate for short vectors and long vectors. We compare our shared-memory allreduce with several MPI implementations—Open MPI, MPICH2, and MVAPICH2—that utilize system shared memory to facilitate interprocess communication. On a 16-node Xeon cluster and 8-node Opteron cluster, our implementation achieves on geometric average 2.3X and 2.1X speedup over the best MPI implementation, respectively. Our techniques enable an efficient implementation of collective operations on future multi- and manycore systems.  相似文献   

4.
Liu  Peini  Guitart  Jordi 《Cluster computing》2022,25(2):847-868

Containerization technology offers an appealing alternative for encapsulating and operating applications (and all their dependencies) without being constrained by the performance penalties of using Virtual Machines and, as a result, has got the interest of the High-Performance Computing (HPC) community to obtain fast, customized, portable, flexible, and reproducible deployments of their workloads. Previous work on this area has demonstrated that containerized HPC applications can exploit InfiniBand networks, but has ignored the potential of multi-container deployments which partition the processes that belong to each application into multiple containers in each host. Partitioning HPC applications has demonstrated to be useful when using virtual machines by constraining them to a single NUMA (Non-Uniform Memory Access) domain. This paper conducts a systematical study on the performance of multi-container deployments with different network fabrics and protocols, focusing especially on Infiniband networks. We analyze the impact of container granularity and its potential to exploit processor and memory affinity to improve applications’ performance. Our results show that default Singularity can achieve near bare-metal performance but does not support fine-grain multi-container deployments. Docker and Singularity-instance have similar behavior in terms of the performance of deployment schemes with different container granularity and affinity. This behavior differs for the several network fabrics and protocols, and depends as well on the application communication patterns and the message size. Moreover, deployments on Infiniband are also more impacted by the computation and memory allocation, and because of that, they can exploit the affinity better.

  相似文献   

5.
The presence of histopathological lesions characteristic of Alzheimer's disease, senile plaques and neurofibrillar degeneration in the brains of cognitively normal individuals has been well documented in several longitudinal clinicopathological studies over the last two decades. Clinical and pathological epidemiological data suggest that Alzheimer's disease can begin to develop almost a decade before the first clinical manifestations appear. The present article reviews the studies investigating cognitive alterations before the disease manifests. All these studies reveal the presence of alterations in preclinical phases in cognitive functions other than memory, such as those related to attention, processing speed and verbal fluency. Assessment of memory with tools sensitive to hippocampal memory impairment is recommended. The best predictor is having each individual's baseline performance, which can then be used for comparison with subsequent performance. Once reduced performance is detected (even when within the “normal” range), affected persons should be referred to specific units able to diagnose the disease in the early stages.  相似文献   

6.
The presence of histopathological lesions characteristic of Alzheimer's disease, senile plaques and neurofibrillar degeneration in the brains of cognitively normal individuals has been well documented in several longitudinal clinicopathological studies over the last two decades. Clinical and pathological epidemiological data suggest that Alzheimer's disease can begin to develop almost a decade before the first clinical manifestations appear. The present article reviews the studies investigating cognitive alterations before the disease manifests. All these studies reveal the presence of alterations in preclinical phases in cognitive functions other than memory, such as those related to attention, processing speed and verbal fluency. Assessment of memory with tools sensitive to hippocampal memory impairment is recommended. The best predictor is having each individual's baseline performance, which can then be used for comparison with subsequent performance. Once reduced performance is detected (even when within the "normal" range), affected persons should be referred to specific units able to diagnose the disease in the early stages.  相似文献   

7.
Multigene and genomic data sets have become commonplace in the field of phylogenetics, but many existing tools are not designed for such data sets, which often makes the analysis time‐consuming and tedious. Here, we present PhyloSuite , a (cross‐platform, open‐source, stand‐alone Python graphical user interface) user‐friendly workflow desktop platform dedicated to streamlining molecular sequence data management and evolutionary phylogenetics studies. It uses a plugin‐based system that integrates several phylogenetic and bioinformatic tools, thereby streamlining the entire procedure, from data acquisition to phylogenetic tree annotation (in combination with iTOL). It has the following features: (a) point‐and‐click and drag‐and‐drop graphical user interface; (b) a workplace to manage and organize molecular sequence data and results of analyses; (c) GenBank entry extraction and comparative statistics; and (d) a phylogenetic workflow with batch processing capability, comprising sequence alignment (mafft and macse ), alignment optimization (trimAl, HmmCleaner and Gblocks), data set concatenation, best partitioning scheme and best evolutionary model selection (PartitionFinder and modelfinder ), and phylogenetic inference (MrBayes and iq‐tree ). PhyloSuite is designed for both beginners and experienced researchers, allowing the former to quick‐start their way into phylogenetic analysis, and the latter to conduct, store and manage their work in a streamlined way, and spend more time investigating scientific questions instead of wasting it on transferring files from one software program to another.  相似文献   

8.
The explosion of data and transactions demands a creative approach for data processing in a variety of applications. Research on remote memory systems (RMSs), so as to exploit the superior characteristics of dynamic random access memory (DRAM), has been performed for many decades, and today’s information explosion galvanizes researchers into shedding new light on the technology. Prior studies have mainly focused on architectural suggestions for such systems, highlighting different design rationale. These studies have shown that choosing the appropriate applications to run on an RMS is important in fully utilizing the advantages of remote memory. This article provides an extensive performance evaluation for various types of data processing applications so as to address the efficacy of an RMS by means of a prototype RMS with reliability functionality. The prototype RMS used is a practical kernel-level RMS that renders large memory data processing feasible. The abstract concept of remote memory was materialized by borrowing unused local memory in commodity PCs via a high speed network capable of Remote Direct Memory Access (RDMA) operations. The prototype RMS uses remote memory without any part of its computation power coming from remote computers. Our experimental results suggest that an RMS can be practical in supporting the rigorous demands of commercial in memory database systems that have high data access locality. Our evaluation also convinces us of the possibility that a reliable RMS can satisfy both the high degree of reliability and efficiency for large memory data processing applications whose data access pattern has high locality.  相似文献   

9.

Background  

Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy.  相似文献   

10.
General Purpose Graphic Processing Units (GPGPUs) constitute an inexpensive resource for computing-intensive applications that could exploit an intrinsic fine-grain parallelism. This paper presents the design and implementation in GPGPUs of an exact alignment tool for nucleotide sequences based on the Burrows-Wheeler Transform. We compare this algorithm with state-of-the-art implementations of the same algorithm over standard CPUs, and considering the same conditions in terms of I/O. Excluding disk transfers, the implementation of the algorithm in GPUs shows a speedup larger than 12, when compared to CPU execution. This implementation exploits the parallelism by concurrently searching different sequences on the same reference search tree, maximizing memory locality and ensuring a symmetric access to the data. The paper describes the behavior of the algorithm in GPU, showing a good scalability in the performance, only limited by the size of the GPU inner memory.  相似文献   

11.
MOTIVATION: Several new de novo assembly tools have been developed recently to assemble short sequencing reads generated by next-generation sequencing platforms. However, the performance of these tools under various conditions has not been fully investigated, and sufficient information is not currently available for informed decisions to be made regarding the tool that would be most likely to produce the best performance under a specific set of conditions. RESULTS: We studied and compared the performance of commonly used de novo assembly tools specifically designed for next-generation sequencing data, including SSAKE, VCAKE, Euler-sr, Edena, Velvet, ABySS and SOAPdenovo. Tools were compared using several performance criteria, including N50 length, sequence coverage and assembly accuracy. Various properties of read data, including single-end/paired-end, sequence GC content, depth of coverage and base calling error rates, were investigated for their effects on the performance of different assembly tools. We also compared the computation time and memory usage of these seven tools. Based on the results of our comparison, the relative performance of individual tools are summarized and tentative guidelines for optimal selection of different assembly tools, under different conditions, are provided.  相似文献   

12.
We aim to compare the performance of Bowtie2 , bwa‐mem , blastn and blastx when aligning bacterial metagenomes against the Comprehensive Antibiotic Resistance Database (CARD). Simulated reads were used to evaluate the performance of each aligner under the following four performance criteria: correctly mapped, false positives, multi‐reads and partials. The optimal alignment approach was applied to samples from two wastewater treatment plants to detect antibiotic resistance genes using next generation sequencing. blastn mapped with greater accuracy among the four sequence alignment approaches considered followed by Bowtie2 . blastx generated the greatest number of false positives and multi‐reads when aligned against the CARD. The performance of each alignment tool was also investigated using error‐free reads. Although each aligner mapped a greater number of error‐free reads as compared to Illumina‐error reads, in general, the introduction of sequencing errors had little effect on alignment results when aligning against the CARD. Given each performance criteria, blastn was found to be the most favourable alignment tool and was therefore used to assess resistance genes in sewage samples. Beta‐lactam and aminoglycoside were found to be the most abundant classes of antibiotic resistance genes in each sample.

Significance and Impact of the Study

Antibiotic resistance genes (ARGs) are pollutants known to persist in wastewater treatment plants among other environments, thus methods for detecting these genes have become increasingly relevant. Next generation sequencing has brought about a host of sequence alignment tools that provide a comprehensive look into antimicrobial resistance in environmental samples. However, standardizing practices in ARG metagenomic studies is challenging since results produced from alignment tools can vary significantly. Our study provides sequence alignment results of synthetic, and authentic bacterial metagenomes mapped against an ARG database using multiple alignment tools, and the best practice for detecting ARGs in environmental samples.  相似文献   

13.
Content-Aware Dispatching Algorithms for Cluster-Based Web Servers   总被引:1,自引:0,他引:1  
Cluster-based Web servers are leading architectures for highly accessed Web sites. The most common Web cluster architecture consists of replicated server nodes and a Web switch that routes client requests among the nodes. In this paper, we consider content-aware Web switches that can use application level information to assign client requests. We evaluate the performance of some representative state-of-the-art dispatching algorithms for Web switches operating at layer 7 of the OSI protocol stack. Specifically, we consider dispatching algorithms that use only client information as well as the combination of client and server information for load sharing, reference locality or service partitioning. We demonstrate through a wide set of simulation experiments that dispatching policies aiming to improve locality in server caches give best results for traditional Web publishing sites providing static information and some simple database searches. On the other hand, when we consider more recent Web sites providing dynamic and secure services, dispatching policies that aim to share the load are the most effective.  相似文献   

14.
Graphics processors evolve rapidly and promise to support power-efficient, cost, differentiated price-performance, and scalable high performance computing. MapReduce is a well-known distributed programming model to ease the development of applications for large-scale data processing on a large number of commodity CPUs. When compared to CPUs, GPUs are an order of magnitude faster in terms of computation power and memory bandwidth, but they are harder to program. Although several studies have implemented the MapReduce model on GPUs, most of them are based on the single GPU model and bounded by a GPU memory with inefficient atomic operations. This paper focuses on the development of MGMR, a standalone MapReduce system that utilizes multiple GPUs to manage large-scale data processing beyond the GPU memory limitation, and also to eliminate serial atomic operations. Experimental results have demonstrated the effectiveness of MGMR in handling large data sets.  相似文献   

15.
SUMMARY: Multiple sequence alignment is a frequently used technique for analyzing sequence relationships. Compilation of large alignments is computationally expensive, but processing time can be considerably reduced when the computational load is distributed over many processors. Parallel processing functionality in the form of single-instruction multiple-data (SIMD) technology was implemented into the multiple alignment program Praline by using 'message passing interface' (MPI) routines. Over the alignments tested here, the parallelized program performed up to ten times faster on 25 processors compared to the single processor version. AVAILABILITY: Example program code for parallelizing pairwise alignment loops is available from http://mathbio.nimr.mrc.ac.uk/~jkleinj/tools/mpicode. The 'message passing interface' package (MPICH) is available from http:/www.unix.mcs.anl.gov/mpi/mpich. CONTACT: jhering@nimr.mrc.ac.uk SUPPLEMENTARY INFORMATION: Praline is accessible at http://mathbio.nimr.mrc.ac.uk/praline.  相似文献   

16.
Qiang Zou 《Cluster computing》2014,17(4):1427-1441
This paper studies the correlation of memory accesses in high-performance computer systems from a time dependence perspective, and concludes that correlations in memory access-arrival times are inconsistent, either with little correlation or with evident and abundant correlations. Thus neither independent identically distributed or self-similar is appropriate to characterize all memory activities. For memory workload with evident correlations, we present both pictorial and statistical evidence that memory accesses have self-similar like behavior. In addition, we implement a memory access series generator in which the inputs are the measured properties of the available trace data. Experimental results show that this model can accurately emulate the complex access arrival behaviors in both workloads with little and strong correlations, particularly for the burstiness characteristics in the memory workloads.  相似文献   

17.

Background  

Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest.  相似文献   

18.
Concurrency bugs usually manifest under very rare conditions, and reproducing such bugs can be a challenging task. To reproduce concurrency bugs with a given input, one would have to explore the vast interleaving space, searching for erroneous schedules. The challenges are compounded in a big data environment. This paper explores the topic of concurrency bug reproduction using runtime data. We approach the concurrency testing and bug reproduction problem differently from existing literature, by emphasizing on the preemptable synchronization points. In our approach, a light-weight profiler is implemented to monitor program runs, and collect synchronization points where thread scheduler could intervene and make scheduling decisions. Traces containing important synchronization API calls and shared memory accesses are recorded and analyzed. Based on the preemptable synchronization points, we build a reduced preemption set (RPS) to narrow down the search space for erroneous schedules. We implement an optimized preemption-bounded schedule search algorithm and an RPS directed search algorithm, in order to reproduce concurrency bugs more efficiently. Those schedule exploration algorithms are integrated into our prototype, Profile directed Event driven Dynamic AnaLysis (PEDAL). The runtime data consisting of synchronization points is used as a source of feedback for PEDAL. To demonstrate utility, we evaluate the performance of PEDAL against those of two systematic concurrency testing tools. The findings demonstrate that PEDAL can detect concurrency bugs more quickly with given inputs, and consuming less memory. To prove its scalability in a big data environment, we use PEDAL to analyze several real concurrency bugs in large scale multithread programs, namely: Apache, and MySQL.  相似文献   

19.
Software Distributed Shared Memory (DSM) systems can be used to provide a coherent shared address space on multicomputers and other parallel systems without support for shared memory in hardware. The coherency software automatically translates shared memory accesses to explicit messages exchanged among the nodes in the system. Many applications exhibit a good performance on such systems but it has been shown that, for some applications, performance critical messages can be delayed behind less important messages because of the enqueuing behavior in the communication libraries used in current systems. We present in this paper a new portable communication library that supports priorities to remedy this situation. We describe an implementation of the communication library and a quantitative model that is used to estimate the performance impact of priorities for a typical situation. Using the model, we show that the use of high-priority communication reduces the latency of performance critical messages substantially over a wide range of network design parameters. The latency is reduced with up to 10–25% for each delaying low priority message in the queue ahead.  相似文献   

20.
Poxvirus Orthologous Clusters (POCs) is a JAVA client-server application which accesses an updated database containing all complete poxvirus genomes; it automatically groups orthologous genes into families based on BLASTP scores for assessment by a human database curator. POCs has a user-friendly interface permitting complex SQL queries to retrieve interesting groups of DNA and protein sequences as well as gene families for subsequent interrogation by a variety of integrated tools: BLASTP, BLASTX, TBLASTN, Jalview (multiple alignment), Dotlet (Dotplot), Laj (local alignment), and NAP (nucleotide to amino acid alignment).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号