ARM support has been added into the mainline ATLAS distribution. The current stable version 3.10.0 supports ARM. Benchmarks for multicore ARM processor (1GHz Dual-Core - PandaBoard) are online at http://www.vesperix.com/arm/atlas-arm/bench/gcc-a9-3.10.0-nonieee/index.html . In short for Matrix Size 2000 x 2000
MFLOPs for SGEMM using one core = 1500
MFLOPs for DGEMM using one core = 757
MFLOPs for DGEMM using two cores = 1426
MFLOPs for SGETRF using one core = 1347
MFLOPs for DGEMM using one core = 700
MFLOPs for SGETRF using two cores = 2312
MFLOPs for DGEMM using two cores = 1224
The speedup is close to linear for two processors. I am keen to see how it trends when more number of cores are introduced. How will the cache behave (especially for sparse matrix operations where the access to the memory is normally irregular) ?
![]() |
| ODROID-X - Quad-Core ARM Cortex A9 - 1GB RAM - 32/32 L1 and 1MB L2 cache - I am running Linux Linaro - Kernel 3.6.2 |
Currently in the market the maximum count for cores on an ARM processor is 4. Samsung Exynos4412 is one such where each core runs at 1.4 GHz. Exynos4412 is coming with ODROID-X which is a $129 single-board computer with 1GB RAM. ODROID-X seems the best candidate for doing some multicore HPC.
I might be celebrating my EID with BLAS and ARM :P
-----
Jargon Alert !!
BLAS: Basic Linear Algebra Subprograms.
MFLOPs: Millions of Floating Point Operations per Second.
S: Single Precision.
D: Double Precision.
GE: General Matrix.
MM: Matrix Multiply - a subroutine in BLAS.
TR: Tridiagonal.
some links worth visiting

0 comments:
Post a Comment