## Thursday, June 11, 2020

### Benchmarking Intel-based Numpy packages

Intel has their own distribution (modified version) of Numpy, Scipy, and scikit-learn packages. They claim that these packages are optimized from those existing packages. To prove this claim, I made a little following benchmark. I used two similar machines MBP 2012 and Lenovo G580. Both have similar specs, i7-3520 processors, and 16GB RAM. The following code I used to benchmark.

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-

# Roughly based on: http://stackoverflow.com/questions/11443302/compiling-numpy-with-openblas-integration

from __future__ import print_function

import numpy as np
from time import time

# Let's take the randomness out of random numbers (for reproducibility)
np.random.seed(0)

size = 4096
A, B = np.random.random((size, size)), np.random.random((size, size))
C, D = np.random.random((size * 128,)), np.random.random((size * 128,))
E = np.random.random((int(size / 2), int(size / 4)))
F = np.random.random((int(size / 2), int(size / 2)))
F = np.dot(F, F.T)
G = np.random.random((int(size / 2), int(size / 2)))

# Matrix multiplication
N = 20
t = time()
for i in range(N):
np.dot(A, B)
delta = time() - t
print('Dotted two %dx%d matrices in %0.2f s.' % (size, size, delta / N))
del A, B

# Vector multiplication
N = 5000
t = time()
for i in range(N):
np.dot(C, D)
delta = time() - t
print('Dotted two vectors of length %d in %0.2f ms.' % (size * 128, 1e3 * delta / N))
del C, D

# Singular Value Decomposition (SVD)
N = 3
t = time()
for i in range(N):
np.linalg.svd(E, full_matrices = False)
delta = time() - t
print("SVD of a %dx%d matrix in %0.2f s." % (size / 2, size / 4, delta / N))
del E

# Cholesky Decomposition
N = 3
t = time()
for i in range(N):
np.linalg.cholesky(F)
delta = time() - t
print("Cholesky decomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N))

# Eigendecomposition
t = time()
for i in range(N):
np.linalg.eig(G)
delta = time() - t
print("Eigendecomposition of a %dx%d matrix in %0.2f s." % (size / 2, size / 2, delta / N))

print('')
print('This was obtained using the following Numpy configuration:')
np.__config__.show()



And here is the result.

RESULT ON MBP-2012 (With Intel-Numpy)
bta@mbp:~$python3.6 np-blas.py Dotted two 4096x4096 matrices in 3.12 s. Dotted two vectors of length 524288 in 0.37 ms. SVD of a 2048x1024 matrix in 1.10 s. Cholesky decomposition of a 2048x2048 matrix in 0.21 s. Eigendecomposition of a 2048x2048 matrix in 8.68 s. This was obtained using the following Numpy configuration: mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include'] lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']  RESULT ON G580 (With Numpy) bta@g580:~$ python3.6 np-blas.py
Dotted two 4096x4096 matrices in 4.13 s.
Dotted two vectors of length 524288 in 0.43 ms.
SVD of a 2048x1024 matrix in 1.97 s.
Cholesky decomposition of a 2048x2048 matrix in 0.19 s.
Eigendecomposition of a 2048x2048 matrix in 13.07 s.

This was obtained using the following Numpy configuration:
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]


RESULT ON G580 (with Intel-Numpy)
bta@g580:~\$ python3.6 np-blas.py
Dotted two 4096x4096 matrices in 3.92 s.
Dotted two vectors of length 524288 in 0.46 ms.
SVD of a 2048x1024 matrix in 1.28 s.
Cholesky decomposition of a 2048x2048 matrix in 0.30 s.
Eigendecomposition of a 2048x2048 matrix in 10.17 s.

This was obtained using the following Numpy configuration:
mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
blas_mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
blas_opt_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
lapack_mkl_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']
lapack_opt_info:
library_dirs = ['/opt/anaconda1anaconda2anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/anaconda1anaconda2anaconda3/include']


These results shows that what is claimed by Intel (as shown in the figure below) is true.

So, installing Intel Python Distribution is worthwhile. To install intel version of Numpy, Scipy, and Scikit-learn, you must uninstall the original version first, then install Intel's distribution, e.g., using pip.
pip3 uninstall numpy scipy scikit-learn -y
pip3 install --user intel-numpy intel-scipy intel-scikit-learn 