首页 | 本学科首页   官方微博 | 高级检索  
     


Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors
Authors:Sandra Catalán  Francisco D. Igual  Rafael Mayo  Rafael Rodríguez-Sánchez  Enrique S. Quintana-Ortí
Affiliation:1.Depto. Ingeniería y Ciencia de Computadores,Universidad Jaume I,Castellón de la plana,Spain;2.Depto. de Arquitectura de Computadores y Automática,Universidad Complutense de Madrid,Madrid,Spain
Abstract:Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications on clusters of commodity systems-on-chip. In this paper, we design and embed several architecture-aware optimizations into a multi-threaded general matrix multiplication (gemm), a key operation of the BLAS, in order to obtain a high performance implementation for ARM big.LITTLE AMPs. Our solution is based on the reference implementation of gemm in the BLIS library, and integrates a cache-aware configuration as well as asymmetric-static and dynamic scheduling strategies that carefully tune and distribute the operation’s micro-kernels among the big and LITTLE cores of the target processor. The experimental results on a Samsung Exynos 5422, a system-on-chip with ARM Cortex-A15 and Cortex-A7 clusters that implements the big.LITTLE model, expose that our cache-aware versions of gemm with asymmetric scheduling attain important gains in performance with respect to its architecture-oblivious counterparts while exploiting all the resources of the AMP to deliver considerable energy efficiency.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号