Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

25
Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College

Transcript of Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

Page 1: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

Performance Libraries:Intel Math Kernel Library (MKL)

Intel Software College

Page 2: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

2

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Agenda

Introduction

• Purpose of Library

• Intel® Math Kernel Library (Intel® MKL) Contents

Performance Features

• Resource Limited Optimization

• Threading

Using the Library

The Library Sections• BLAS• LAPACK*• DFTs• VML• VSL

Page 3: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

3

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose

Performance, Performance, Performance!

Intel’s engineering, scientific, and financial math library

Addresses:

• Solvers (BLAS, LAPACK)

• Eigenvector/eigenvalue solvers (BLAS, LAPACK)

• Some quantum chemistry needs (dgemm)

• PDEs, signal processing, seismic, solid-state physics (FFTs)

• General scientific, financial [vector transcendental functions (VML) and vector random number generators (VSL)]

Tune for Intel® processors – current and future

Page 4: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

4

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Purpose – Don’ts

But don’t use Intel® Math Kernel (Intel® MKL) on …

Don’t use Intel® MKL on “small” counts.

Don’t call vector math functions on small n.

X’Y’Z’W’

XYZW

=4x4

Transformationmatrix

Geometric Transformation

§ But you could use Intel® Performance Primitives.

Page 5: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

5

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents

BLAS (Basic Linear Algebra Subroutines)

Level 1 BLAS – vector-vector operations• 15 function types• 48 functions

Level 2 BLAS – matrix-vector operations• 26 function types• 66 functions

Level 3 BLAS – matrix-matrix operations• 9 function types• 30 functions

Extended BLAS – level 1 BLAS for sparse vectors• 8 function types• 24 functions

Page 6: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

6

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents

LAPACK (linear algebra package)

Solvers and eigensolvers. Many hundreds of routines total!

There are more than 1000 total user callable and support routines

DFTs (Discrete Fourier transforms)

Mixed radix, multi-dimensional transforms

Multithreaded

VML (Vector Math Library)

Set of vectorized transcendental functions

Most of libm functions, but faster

VSL (Vector Statistical Library)

Set of vectorized random number generators

Page 7: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

7

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Contents

BLAS and LAPACK* are both Fortran.

• Legacy of high performance computation

VSL and VML have Fortran and C interfaces.

DFTs have Fortran 95 and C interfaces.

cblas interface. It is more convenient for a C/C++ programmer to call BLAS.

Page 8: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

8

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library (Intel® MKL) Environment

Support 32-bit and 64-bit Intel® processors

Large set of examples and tests

Extensive documentation

Windows* Linux*

Compilers Intel, CVF, Microsoft Intel, Gnu

Libraries .dll, .lib .a, .so

Page 9: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

9

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Resource Limited Optimization

The goal of all optimization is maximum speed.

Resource limited optimization – exhaust one or more resource of system:

• CPU: Register use, FP units.

• Cache: Keep data in cache as long as possible; deal with cache interleaving.

• TLBs: Maximally use data on each page.

• Memory bandwidth: Minimally access memory.

• Computer: Use all the processors available using threading.

• System: Use all the nodes available (cluster software).

Page 10: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

10

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Threading

Most of Intel® Math Kernel Library (Intel® MKL) could be threaded but:• Limited resource is memory bandwidth.

• Threading level 1 and level 2 BLAS are mostly ineffective ( O(n) )

There are numerous opportunities for threading:• Level 3 BLAS ( O(n3) )

• LAPACK* ( O(n3) )

• FFTs ( O(n log(n) )

• VML, VSL ? depends on processor and function

All threading is via OpenMP*.

All Intel MKL is designed and compiled for thread safety.

Page 11: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

11

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Linking with Intel® Math Kernel Library (Intel® MKL)

Scenario 1: ifort, BLAS, IA-32 processor:

ifort myprog.f mkl_c.lib

Scenario 2: CVF, LAPACK, IA-32 processor:

f77 myprog.f mkl_s.lib

Scenario 3: Statically link a C program with DLL linked at runtime:

link myprog.obj mkl_c_dll.lib

Note: Optimal binary code will execute at run time based on processor.

Page 12: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

12

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Matrix MultiplicationRoll Your Own/Dot Product

for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ){ for( k = 0; k < kk; k++ )

c[i][j] += a[i][k] * b[k][j]; }}

for( i = 0; i < n; i++ ){ for( j = 0; j < m; j++ ) c[i][j] = cblas_ddot( n, &a[i], incx,&b[0][j], incy); }

Roll Your Own

ddot

Page 13: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

13

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Matrix MultiplicationDGEMV/DGEMM

for( i = 0; i < n; i++ ) cblas_dgemv( CBLAS_RowMajor, CBLAS_NoTrans, m, n, alpha, a, lda, &b[0][i], ldb, beta, &c[0][i], ldc );

dgemv

Cblas_dgemm( CblasColMajor, CblasNoTrans, CblasNoTrans, m, n, kk, alpha, b, ldb, a, lda, beta, c, ldc );

dgemm

Page 14: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

14

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity 1: DGEMM

Compare the performance of matrix multiply as implemented by C source code, DDOT, DGEMG and DGEMM.

Exercise control of the threading capabilities in MKL/BLAS.

Page 15: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

15

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Intel® Math Kernel Library Optimizations in LAPACK*

Most important LAPACK optimizations:

• Threading – effectively uses multiple CPUs

• Recursive factorization• Reduces scalar time (Amdahl’s law: t = tscalar + tparallel/p)• Extends blocking further into the code

No runtime library support required

Page 16: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

16

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Discrete Fourier Transforms

One dimensional, two-dimensional, three-dimensional…

Multithreaded

Mixed radix

User-specified scaling, transform sign

Transforms on imbedded matrices

Multiple one-dimensional transforms on single call

Strides

C and F90 interfaces

Page 17: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

17

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using the Intel® Math Kernel Library DFTs

Basically a 3-step Process

Create a descriptor.

Status = DftiCreateDescriptor(MDH, …)

Commit the descriptor (instantiates it).

• Status = DftiCommitDescriptor(MDH)

Perform the transform.

• Status = DftiComputeForward(MDH, X)

Optionally free the descriptor.

MDH: MyDescriptorHandle

Now supports FFTW interface

Page 18: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

18

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Vector Math Library (VML) Features/Issues

Vector Math Library: vectorized transcendental functions – like libm but better (faster)

Interface: Have both Fortran and C interfaces

Multiple accuracies

• High accuracy ( < 1 ulp )

• Lower accuracy, faster ( < 4 ulps )

Special value handling √(-a), sin(0), and so on

Error handling – can not duplicate libm here

Page 19: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

19

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

VML: Why Does It Matter?

It is important for financial codes (Monte Carlo simulations).

• Exponentials, logarithms

Other scientific codes depend on transcendental functions.

Error functions can be big time sinks in some codes.

And so on

Page 20: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

20

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Vector Statistical Library (VSL)

Set of random number generators (RNGs)

Numerous non-uniform distributions

VML used extensively for transformations

Parallel computation support – some functions

User can supply own BRNG or transformations

Five basic RNGs (BRNGs) – bits, integer, FP

• MCG31, R250, MRG32, MCG59, WH

Page 21: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

21

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Non-Uniform RNGs

Gaussian (two methods)

Exponential

Laplace

Weibull

Cauchy

Rayleigh

Lognormal

Gumbel

Page 22: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

22

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Using VSL

Basically a 3-step Process

Create a stream pointer.

VSLStreamStatePtr stream;

Create a stream.

vslNewStream(&stream, VSL_BRNG_MC_G31, seed );

Generate a set of RNGs.

vsRngUniform( 0, &stream, size, out, start, end );

Delete a stream (optional).

vslDeleteStream(&stream);

Page 23: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

23

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Activity: Calculating Pi using a Monte Carlo method

Compare the performance of C source code (RAND function) and VSL.

Exercise control of the threading capabilities in MKL/VSL.

Page 24: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

24

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.

Performance Libraries: Intel® MKLWhat’s Been Covered

Intel® Math Kernel Library is a broad scientific/engineering math library.

It is optimized for Intel® processors.

It is threaded for effective use on SMP machines.

Page 25: Performance Libraries: Intel Math Kernel Library (MKL) Intel Software College.

25

Copyright © 2006, Intel Corporation. All rights reserved.

Performance Libraries: Intel® Math Kernel Library (MKL)

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. *Other brands and names are the property of their respective owners.