SIMD inside and outside oracle 12c

47
SIMD Instructions outside and inside Oracle 12c Laurent Léturgez 2016

Transcript of SIMD inside and outside oracle 12c

Page 1: SIMD inside and outside oracle 12c

SIMD Instructions outside and insideOracle 12cLaurent Léturgez – 2016

Page 2: SIMD inside and outside oracle 12c

Whoami

•Oracle Consultant since 2001

•Former developer (C, Java, perl, PL/SQL)

•Owner@Premiseo: Data Management on Premise and in the Cloud

•Blogger since 2004• http://laurent.leturgez.free.fr (In french and discontinued)• http://laurent-leturgez.com

•Twitter : @lleturgez

Page 3: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 4: SIMD inside and outside oracle 12c

Caveats

•Most of the topics are from• My own researches• My past life as a developer

•Some of the topics are about internals, so:• Analysis and conclusion may be incomplete• Future versions of Oracle may change the features

•Tests have been done with Oracle 12.1.0.2, Oracle Enterprise Linux 7.3 (UEKR3), VMWare Fusion 8 (And VirtualBox)

Page 5: SIMD inside and outside oracle 12c

Before we start …

•Some fundamentals (from Dennis Yurichev’s book)• CPU register : […]The easiest way to understand a register is to think of it as an

untyped temporary variable. Imagine if you were working with high-level PL1 and could only use eight 32-bit (or 64-bit) variables. Yet a lot can be done using just these!

• Instruction : A primitive CPU command. The simplest examples include: moving data between registers, working with memory and arithmetic primitives. As a rule, each CPU has its own instruction set architecture (ISA).

• Assembly language : Mnemonic code and some extensions like macros which are intended to make a programmer’s life easier.

http://beginners.re/Reverse_Engineering_for_Beginners-en.pdf

Page 6: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 7: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

• SIMD stands for Single Instruction Multiple Data• Process multiple data

• In one CPU instruction

• Based on • Specific registers

• Specific CPU instructions and sets of instructions

• Not Oracle specific

• CPU Architecture specific• Intel

• IBM (Altivec)

• Sparc v9 (VIS)

• This presentation is mainly about Intel architecture

Page 8: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

•What is a SIMD register ?• It’s a CPU register•Wider than traditional registers (RDI, RSI, R8, R9 etc.)

• 128 up to 512 bits wide

• Contains many data

Page 9: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

•How does it work ? Scalar operation• an array of 4 integers {1,2,3,4}• add 1 to each value

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

2

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

1

1

2

2

Reg1

Reg2

Reg3

CPU

RAM

In

Out

2 3 41

4

1

5

3 4 52

…/…

LOAD ADD SAVE4 LOAD4 ADD4 SAVE

Page 10: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

•How does it work ? SIMD operation• an array of 4 integers {1,2,3,4}• add 1 to each value

SIMD Reg1

CPU

RAM

In

Out

2 3 41

1 1 11SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

2 3 41

1 1 11SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

2 3 41

1 1 11

3 4 52

SIMD Reg2

SIMD Reg3

SIMD Reg1

CPU

RAM

In

Out

2 3 41

3 4 52

2 3 41

1 1 11

3 4 52

SIMD Reg2

SIMD Reg3

LOAD ADD SAVE

Page 11: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

Instruction set MMX SSE SSE2/SSE3/SSSE3/SSE4

AVX/AVX2 AVX3 or AVX512

Register Size 64 Bits 128 bits 128 bits 256 Bits 512 bits

# Registers 8 8 16 16 32

Register Name MM0 to MM7 XMM0 to XMM7 XMM0 to XMM15 YMM0 to YMM15 ZMM0 to ZMM31

Processors Pentium II Pentium III Pentium IV to Nehalem Sandy Bridge - Haswell Skylake (initially announced but not available yet)Maybe on Kaby Lake Xeon chip

Other Only four 32 bits single precision floating point numbers

Usage expansion (two 64 bits double precision, four 32 bits integers and up to sixteen 8 bits bytes)

Three operand instructions (non destructive) : A+B=C rather than A=A+B

Alignments requirements relaxed

Page 12: SIMD inside and outside oracle 12c

SIMD instructions … outside Oracle 12c

• Intel API (C/C++) : Intel Intrinsics Guide https://software.intel.com/sites/landingpage/IntrinsicsGuide/

• Sample code: https://app.box.com/simdSampleC-2015

Page 13: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 14: SIMD inside and outside oracle 12c

•It depends on :•Hardware

• Consult processors datasheets to see which instruction set extensions are used (if many)

• http://ark.intel.com/#@Processors

•Hypervisor• Some (old) hypervisors do not support modern extensions• VirtualBox versions <5.0 don’t support SSE4, AVX and AVX2• Hyper-V on W2008R2-SP1 needs patch for specific processors to support

AVX

Will my application use SIMD registers and instructions ?

Page 15: SIMD inside and outside oracle 12c

•It depends on the Operating System•AVX (256 bits) is supported from

• Linux Kernel >= 2.6.30• Redhat EL5 : 2.6.18• Oracle EL5 w/UEK : 2.6.32

AVX needs xsave kernel parameter

• Solaris 10 upd 10 and Solaris 11 (x86-64)• Windows 2008 R2 SP1

Will my application use SIMD registers and instructions ?

Page 16: SIMD inside and outside oracle 12c

•It depends on the compiler •GCC

• > 4.6 for AVX support

• Use of specific switches (-msse2, -msse4.1, msse4.2, -mavx, -mavx2 …)

• Intel C/C++ Compiler (ICC)• > 11.1 for AVX Support and > 13.0 for AVX2 support

• Use of specific switches (-xsse4.2, -xavx, -xCORE-AVX2 …)

•Beware of optimization switches (-O1,-O2, -O3)•More … disassemble (if you are allowed to )

• Registers

• Assembly language instructions

Will my application use SIMD registers and instructions ?

Page 17: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 18: SIMD inside and outside oracle 12c

• Based on a C program

• Used CPU: Haswell microarchitecture (Core i7-4960HQ). AVX/AVX2 enabled

• 3 tests : No SIMD, SSE4, AVX

• Input: one array containing 1Million values.

• Goal: Add 1 to each value, each million values repeated 4k, 8k, 16k and 32k times

• CPU Time(s) = f(#rows)

“Quick and Dirty” Sample code available here: https://app.box.com/s/ibmnbblpho4xtbeq2x8ir60nrk37208v

Raw Performance

Page 19: SIMD inside and outside oracle 12c

10.35

20.46

42.35

85.64

3.3 6.81

13.73

25.58

1.96 3.517.23

15.15

0

10

20

30

40

50

60

70

80

90

4096 M. ROWS 8192 M. ROWS 16384 M. ROWS 32768 M. ROWS

CP

U T

ime

(Se

c)

RAW Performance (CPU) for SIMD Instructions

NO SIMD SSE4 (XMM Registers) AVX (YMM Registers)

Raw Performance

Page 20: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 21: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

• In Memory Data Structure• In Memory Compression Unit : IMCU• IMCU is the unit of column store allocation• Target size is 1M rows(controlled by _inmemory_imcu_target_rows in 12.1, replaced by _inmemory_imcu_target_maxrows in 12.2 (?))

• One IMCU can contain more than one column• Each column in one IMCU is a column unit (CU)

Page 22: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•In memory column store storage indexes• For each column unit, min and max values are maintained in a

storage index• Storage Indexes provide CU pruning

• Information about CU available in GV$IM_COL_CU

(Undocumented. See Bug ID 19361690)

IMCU Pruning

Page 23: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

• The way your data is sorted matters for best IMCU pruning

Page 24: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•SIMD extensions are used with In Memory storage indexes for efficient filtering

1. IM Storage Indexes do IMCU pruning

2. SIMD instructions apply efficiently filter predicates

IMCU Pruning

Prod-id

10

10

14

14

10

Filtering with SIMD

Page 25: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•Oracle 12c uses specific libraries for SIMD (and compression)

• Located in $ORACLE_HOME/lib• libshpksse4212.so for SSE4.2 extensions

Compiled with ICC v12 with specific xsse4.2 switch

• libshpkavx12.so for AVX extensionsCompiled with ICC v12 with specific xavx switch

• libshpkavx212.so for AVX2 extensionsNot totally implemented (8 functions implemented in 12.1, 824 in 12.2)No ICC avx2 switch used because ICC v12 doesn’t support AVX2

•Thanks Tanel Pöder for this

Page 26: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•Oracle SIMD related functions• Located in kdzk kernel module (HPK)• Part of Advanced Compression library (ADVCMP)

• Easily tracked with systemtap

Page 27: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

SQL> select count(*) from s where amount_sold=20;

COUNT(*)

----------

140

SQL> select count(*) from s where amount_sold>20;

COUNT(*)

----------

666306

[oracle@oel7-im demo1]$ stap -x 7503 ./trc_orcl_simd_func_121.stp

Begin.^C

End.

Function: count

kdzk_lbiv_ictx_ini2_dydi: 4

kdzk_lbiviter_dydi: 2

kdzk_lbivset_range_dydi: 9

kdzk_lbivclr_range_dydi: 9

kdzk_eq_dynp_32bit: 9

kdzk_lbivones_dydi: 2

[oracle@oel7-im demo1]$ stap -x 7503 ./trc_orcl_simd_func_121.stp

Begin.^C

End.

Function: count

kdzk_lbivset_range_dydi: 9

kdzk_lbivclr_range_dydi: 9

kdzk_gt_dynp_32bit: 9

kdzk_lbiviter_dydi: 654

kdzk_lbiv_ictx_ini2_dydi: 1308

kdzk_lbivones_dydi: 654

Page 28: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•How Oracle uses SIMD extensions ?It depends on many parameters• OS Level : /proc/cpuinfo

• AVX and AVX2 support

• SSE4 Support only

Page 29: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•Which library am I using ?•pmap

• AVX support

• SSE4 support

Page 30: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•Which compiler options have been used ?• Read “comment” section in ELF

• Read the corresponding compiler documentation

[oracle@oel7 conf]$ readelf -p .comment $ORACLE_HOME/lib/libshpkavx12.so |

> | egrep -i 'intel|gcc' | egrep 'xavx|mavx’

[ 2c] -?comment:Intel(R) C Intel(R) 64 Compiler XE for applications running on

Intel(R) 64, Version 12.0 Build 20120731

…/…

-DNTEV_USE_EPOLL -DNET_USE_LDAP -xavx

Page 31: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•How are SIMD registers used by Oracle ?

• GDB • To get and know the call stack (backtrace)• To set breakpoints on interesting functions• To view register contents (traditional and SIMD)

• “Info registers” for traditional registers

• “Info all-registers” for all registers (SIMD reg included)

• (gdb) print $ymmX.<format>

Format can be v8_float, v4_double, v32_int8, v16_int16, v8_int32, v4_int64, or v2_int128

Page 32: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

break kdzk_gt_dynp_32bit

commands 1

bt

continue

end;

Breakpoint 1, 0x00007f2a341b2e30 in kdzk_gt_dynp_32bit () from

/u01/app/oracle/product/12.1.0/dbhome_1/lib/libshpkavx12.so

#0 0x00007f2a341b2e30 in kdzk_gt_dynp_32bit () from

/u01/app/oracle/product/12.1.0/dbhome_1/lib/libshpkavx12.so

#1 0x000000000b7041bc in kdzk_cmp ()

#2 0x000000000b4deada in kdzdcol_theta_imc_sep ()

#3 0x00000000038a075f in kdzdcol_theta ()

#4 0x000000000b577726 in kdpEvalTheta ()

#5 0x000000000b57bad0 in kdpPredEval ()

#6 0x00000000038a02ef in kdzt_acmp_predeval ()

#7 0x0000000009fb9758 in kdstf11101010001101km ()

#8 0x000000000cd2de55 in kdsttgr ()

#9 0x000000000cd73576 in qertbFetch ()

#10 0x000000000cd9ed50 in qergsFetch ()

#11 0x000000000cbd424b in opifch2 ()

#12 0x0000000002207899 in kpoal8 ()

#13 0x000000000cbdaecd in opiodr ()

#14 0x000000000ce0ffab in ttcpip ()

#15 0x0000000001bcd8b6 in opitsk ()

#16 0x0000000001bd2241 in opiino ()

#17 0x000000000cbdaecd in opiodr ()

#18 0x0000000001bc9a0b in opidrv ()

#19 0x00000000026d9f91 in sou2o ()

#20 0x0000000000bd680a in opimai_real ()

#21 0x00000000026e46dc in ssthrdmain ()

Page 33: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

break kdzk_gt_dynp_32bit

commands 1

python gdb.execute("print $ymm0.v2_int128"); gdb.execute("print $ymm1.v2_int128"); gdb.execute("print

$ymm2.v2_int128"); gdb.execute("print $ymm3.v2_int128"); gdb.execute("print $ymm4.v2_int128"); gdb.execute("print

$ymm5.v2_int128"); gdb.execute("print $ymm6.v2_int128"); gdb.execute("print $ymm7.v2_int128"); gdb.execute("print

$ymm8.v2_int128"); gdb.execute("print $ymm9.v2_int128"); gdb.execute("print $ymm10.v2_int128");gdb.execute("print

$ymm11.v2_int128");gdb.execute("print $ymm12.v2_int128");gdb.execute("print $ymm13.v2_int128");gdb.execute("print

$ymm14.v2_int128");gdb.execute("print $ymm15.v2_int128");gdb.execute("finish"); gdb.execute("print $ymm0.v2_int128");

gdb.execute("print $ymm1.v2_int128"); gdb.execute("print $ymm2.v2_int128"); gdb.execute("print $ymm3.v2_int128");

gdb.execute("print $ymm4.v2_int128"); gdb.execute("print $ymm5.v2_int128"); gdb.execute("print $ymm6.v2_int128");

gdb.execute("print $ymm7.v2_int128"); gdb.execute("print $ymm8.v2_int128"); gdb.execute("print $ymm9.v2_int128");

gdb.execute("print $ymm10.v2_int128");gdb.execute("print $ymm11.v2_int128");gdb.execute("print

$ymm12.v2_int128");gdb.execute("print $ymm13.v2_int128");gdb.execute("print $ymm14.v2_int128");gdb.execute("print

$ymm15.v2_int128");

Continue

end

Page 34: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

In red, register content has been modified

In blue, the second part of the SIMD registers (128 bits) is empty

Page 35: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•Oracle (12.1) IM can use AVX or SSE4 extensions for SIMD operations

•When AVX is usedIt uses only 128 bits out of 256 bits wide registers• AVX adds new register-state through the 256-bit wide YMM register file• Explicit operating system support is required to properly save and restore AVX's

expanded registers between context switches• Without this, only AVX 128-bit is supported

Page 36: SIMD inside and outside oracle 12c

SIMD instructions … inside Oracle 12c

•The culprit•Oracle 12.1.0.2 is supported from EL5 onwards

•EL5 Redhat Kernel is 2.6.18 and this flag (xsave) is supported from 2.6.30 kernels

•For compatibility reasons, Oracle has had to compile its code on 2.6.18 kernels

Page 37: SIMD inside and outside oracle 12c

Agenda

•SIMD Instructions, outside Oracle 12c•What is a SIMD instruction ?•Will my application use SIMD ?•Raw Performance

•SIMD Instructions, inside Oracle 12c•How SIMD instructions are used inside Oracle 12c•Tracing SIMD in Oracle 12c

Page 38: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•Interesting components to trace for SIMD and/or IMCU Pruning are :

•ADVCMP_DECOMP.*• ADVCMP_DECOMP_HPK : SIMD functions• ADVCMP_DECOMP_PCODE : Portable Code Machine (usually not

related to specific CPU instructions)

• IM_optimizer• Gives information about CBO calculation related to IM

Page 39: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•IM_optimizer

• Information available in trace file• IMCU Pruning ratio

• CU decompression costing (per IMCU)

• Predicate evaluation costing (per row)

• Statement has to be parsed to get results

Page 40: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12cselect prod_id,cust_id,time_id from laurent.s_capa_high where amount_sold=20;

Page 41: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c• This information is reported in CBO trace file (10053 or SQL_costing event)

Page 42: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•ADVCMP_DECOMP• ADVCMP_DECOMP_HPK

• Information is available in the trace file (for each IMCU processed)• Used library and function

• Number of rows and counting algorithm

• Processing rate (comparison and decompression if relevant)

• But nothing on the results of the processing

Page 43: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•ADVCMP_DECOMP• ADVCMP_DECOMP_HPK

• Gives information about SIMD function usage and filtering (after IMCU pruning)

• Example: inmemory table with NO MEMCOMPRESS or DML compression

Page 44: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•ADVCMP_DECOMP• ADVCMP_DECOMP_HPK

• Example: inmemory compressed table

• SIMD are used only in the kdzk_eq_dict functions

Page 45: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•My thoughts about compression/decompression• NO MEMCOMPRESS / COMPRESS FOR DML

• kdzk*dynp* functions (ex: kdzk_eq_dynp_16bit, kdzk_le_dynp_32bit etc.)

• FOR QUERY LOW / QUERY HIGH • Dictionary Encoding (LZW ?) : kdzk_*dict* functions (ex: kdzk_eq_dict_7bit,

kdzk_le_dict_4bit etc.)

• Run Length Encoding: kdzk_burst_rle* functions (ex: kdzk_burst_rle_8bit, kdzk_burst_rle_16bit …)

• Bit packing compression: kdzk*fixed* functions (ex: kdzk_ge_lt_fixed_32bit, kdzk_lt_fixed_8bit …)

Page 46: SIMD inside and outside oracle 12c

Tracing SIMD in Oracle 12c

•My thoughts about compression/decompression• FOR CAPACITY LOW

• FOR QUERY LOW + additional proprietary compression (OZIP)

• Functions: ozip_decode_dict*, kdzk_ozip_decode* (Ex: kdzk_ozip_decode_dydi, ozip_decode_dict_9_bit etc.)

• FOR CAPACITY HIGH• FOR QUERY HIGH + heavy weigth compression algorithm

•Compression/decompression method depends on:• Datatype

• Column Compression Unit size• Column contents

Page 47: SIMD inside and outside oracle 12c

[email protected]

http://laurent-leturgez.com

@lleturgez

www.premiseo.com