## Thursday, June 11, 2020

### Benchmarking Intel-based Numpy packages

Intel has their own distribution (modified version) of Numpy, Scipy, and scikit-learn packages. They claim that these packages are optimized from those existing packages. To prove this claim, I made a little following benchmark. I used two similar machines MBP 2012 and Lenovo G580. Both have similar specs, i7-3520 processors, and 16GB RAM. The following code I used to benchmark.

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-

# Roughly based on: http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-integration

from __future__ import print_function

import numpy as np
from time import time

# Let's take the randomness out of random numbers (for reproducibility)
np.random.seed(0)

size = 4096
A, B = np.random.random((size, size)), np.random.random((size, size))
C, D = np.random.random((size * 128,)), np.random.random((size * 128,))
E = np.random.random((int(size / 2), int(size / 4)))
F = np.random.random((int(size / 2), int(size / 2)))
F = np.dot(F, F.T)
G = np.random.random((int(size / 2), int(size / 2)))

# Matrix multiplication
N = 20
t = time()
for i in range(N):
np.dot(A, B)
delta = time() - t
print('Dotted two %dx%d matrices in %0.2f s.' % (size, size, delta / N))
del A, B

# Vector multiplication
N = 5000
t = time()
for i in range(N):
np.dot(C, D)
delta = time() - t
print('Dotted two vectors of length %d in %0.2f ms.' % (size * 128, 1e3 * delta / N))
del C, D

# Singular Value Decomposition (SVD)
N = 3
t = time()
for i in range(N):
np.linalg.svd(E, full_matrices = False)
delta = time() - t
print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N))
del E

# Cholesky Decomposition
N = 3
t = time()
for i in range(N):
np.linalg.cholesky(F)
delta = time() - t
print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N))

# Eigendecomposition
t = time()
for i in range(N):
np.linalg.eig(G)
delta = time() - t
print("Eigendecomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N))

print('')
print('This was obtained using the following Numpy configuration:')
np.__config__.show()



And here is the result.

RESULT ON MBP-2012 (With Intel-Numpy)
bta@mbp:~$python3.6 np-blas.py Dotted two 4096x4096 matrices in 3.12 s. Dotted two vectors of length 524288 in 0.37 ms. SVD of a 2048x1024 matrix in 1.10 s. Cholesky decomposition of a 2048x2048 matrix in 0.21 s. Eigendecomposition of a 2048x2048 matrix in 8.68 s. This was obtained using the following Numpy configuration: mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']  RESULT ON G580 (With Numpy) bta@g580:~$ python3.6 np-blas.py
Dotted two 4096x4096 matrices in 4.13 s.
Dotted two vectors of length 524288 in 0.43 ms.
SVD of a 2048x1024 matrix in 1.97 s.
Cholesky decomposition of a 2048x2048 matrix in 0.19 s.
Eigendecomposition of a 2048x2048 matrix in 13.07 s.

This was obtained using the following Numpy configuration:
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]


RESULT ON G580 (with Intel-Numpy)
bta@g580:~\$ python3.6 np-blas.py
Dotted two 4096x4096 matrices in 3.92 s.
Dotted two vectors of length 524288 in 0.46 ms.
SVD of a 2048x1024 matrix in 1.28 s.
Cholesky decomposition of a 2048x2048 matrix in 0.30 s.
Eigendecomposition of a 2048x2048 matrix in 10.17 s.

This was obtained using the following Numpy configuration:
mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
blas_mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
blas_opt_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
lapack_mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
lapack_opt_info:

