LLVM Backend for HHVM · 2019. 10. 30. · HHVM Compilation Pipeline HHBC HHIR vasm x86-64 HHBC...

LLVM Backend for HHVM

Brett SimmersMaksim PanchenkoFacebook

• Initial work started in early 2010• Running facebook.com since February 2013• Open source! http://hhvm.com/repo• wikipedia.org since December 2014• Baidu, Etsy, Box, many others: https://github.com/facebook/hhvm/wiki/Users

HHVMJIT for PHP/Hack

• Not a PHP -> C++ source transformer: that was HPHPc.• Emits type-specialized code after verifying assumptions with type guards.• Ahead-of-time static analysis eliminates many type guards, speeds up other

operations as well.• 2-4x faster than PHP 5.6:

http://hhvm.com/blog/9293/lockdown-results-and-hhvm-performance

HHVMJIT for PHP/Hack

HHVM Compilation Pipeline

HHBC HHIR vasm x86-64

HHBC HHIR vasm LLVM IR x86-64

• No spilling across calls – native stack is shared between all active PHP frames.

• Callee may leave jitted code, interpret for a while, and resume after bindcallinstruction.

• No support for catching exceptions – pessimizes many optimizations.

• Fixed all limitations and implemented using invoke instruction – also helped existing backend.

Modifications to HHVMPHP Function Calls

• idiv: %rax and %rdx are implicit inputs/outputs.• x86-64 implicitly zeros top 32 bits of registers.• Endianness: had to shake out any assumptions of a little-endian target.

Modifications to HHVMGeneralizing x86-specific concepts in vasm

Codegen DifferencesArithmetic Simplification

movq -0x20(%rbp), %rax

mov %rax, %rcx

shl $0x1, %rcx

... 11 more lines of shl/add ...

add %rdx, %rcx

mov %rax, %rdx

shl $0x28, %rdx

add %rdx, %rcx

add %rcx, %rax

movb $0xa, -0x18(%rbp)

movq %rax, -0x20(%rbp)

mov $0x100000001b3, %rax

imulq -0x20(%rbp), %rax

movb $0xa, -0x18(%rbp)

movq %rax, -0x20(%rbp)

vasm LLVM

Codegen DifferencesTail Duplication

0x0: callq ...

0x1: test %rax, %rax

0x2: jnz 0x5

0x3: mov $0x0, %al

0x4: jmp 0x9

0x5: cmpb $0x50, 0x8(%rax)

0x6: cmovzq (%rax), %rax

0x7: cmpb $0x8, 0x8(%rax)

0x8: setnle %al

0x9: test %al, %al

0xa: jz ...

0xb: jmp ...

0x0: callq ...

0x1: test %rax, %rax

0x2: jz ...

0x3: cmpb $0x50, 0x8(%rax)

0x4: cmovzq (%rax), %rax

0x5: cmpb $0x9, 0x8(%rax)

0x6: jl ...

0x7: jmp ...

vasm LLVM

• Large switch statements: single path of comparisons vs. binary search.• Register allocator: sometimes vasm spills fewer values, sometimes LLVM.

LLVM generally better at avoid reg-reg moves.• vasm almost always prefers smaller code due to icache pressure. Bad for

microbenchmarks, good for our workload.

Codegen DifferencesMisc

• Custom calling conventions• Location records• Smashable call attribute• Code size optimizations• Performance tweaks

LLVM ChangesCorrectness and Performance

• VMs SP and FP pinned to %rbx and %rbp• %r12 used for thread-local storage• Different stack alignment for hhvmcc• C++ helpers always expect VmFP in %rbp• 5 calling conventions + more planned

Calling ConventionsCorrectness

• Can use any number of regs for passing arguments• Pass undef in unused regs• Can return in any of 14 GP registers• %r12 still reserved and callee-saved• 5 -> 2 calling conventions

(Almost) Universal Calling Convention

• Replace destination of call/jmp after code gen• Locate code for a given IR instruction (call/invoke)• Why not use patchpoint?• Support tail call optimization• Use direct call instruction• Don’t need de-optimization information

Location RecordsCorrectness

• musttail call void @foo(i64 %val), !locrec !{i32 42}

• Propagate info to MCInst• Data written to .llvm_locrecs• Unique ID per module• Works with any IR instruction• Switch from metadata to operand bundles

Location RecordsCorrectness

$ cat smashable.ll...%tmp = call i64 @callee(i64 %a, i64 %b) !locrec !{i32 42}...

$ llc < smashable.ll....Ltmp0: # !locrec 42

pushq %rax.Ltmp1: # !locrec 42

callq callee

Call with LocRecExample

.section .llvm_locrecs...

.quad .Ltmp0 # Address

.long 42 # ID

.byte 1 # Size

.byte 0

.short 0

.quad .Ltmp1 # Address

.long 42 # ID

.byte 5 # Size

.byte 0

.short 0

Call with LocRecSection Format

• Overwrite destination in MT environment after code generation and during code execution

• Instruction shall not pass 64-byte boundary• Use modified .bundle_align_mode• Works with call/invoke only

Smashable Call AttributeCorrectness Change

$ cat smashable.ll...%tmp = call i64 @callee(i64 %a, i64 %b) smashable, !locrec !{i32 42}...

$ llc < smashable.ll....Ltmp0: # !locrec 42

pushq %rax.bundle_align_mode 6

.Ltmp1: # !locrec 42callq callee.bundle_align_mode 0

Smashable Call with LocRecExample

• Smashable needs 64-byte boundary• JIT does not know where the code goes• JIT has to request 64-byte aligned code section?• Our code is packed• Use “code_skew” module flag to modify effect of align

directives

Code SkewCorrectness Change

• 80% coverage• -10% performance• Increase coverage• Increase performance

HHVM+LLVM CheckpointCorrectness Done

• Eliminate relocation stubs• Allow no alignment for any function• Code gen tweaks for size• No silver bullet• “-Os” vs “-O2” not much difference

Size & Performance TweaksPerformance

• Profile- and heuristic-driven basic block splitting• 3 code blocks: hot/cold/frozen• Improved I$ and iTLB performance• Hacky implementation was easy• C++ exception support required runtime mods

Code SplittingPerformance

• Enter PHP function via call• No return address on stack - use tail call to return• Makes HW return buffer unhappy• Could not use patchpoint since has to be after epilog• Custom call attribute TCR to force push+ret• Net worth: ~1.5% CPU time

Tail call via push+retPerformance

; Common pattern – decrement ref counter and check

%t0 = load i64, i64* inttoptr (i64 60042 to i64*)%t1 = sub nsw i64 %t0, 1store i64 %t1, i64* inttoptr (i64 60042 to i64*)%t2 = icmp sle i64 %t1, 0br i1 %t2, label %l1, label %l2

Code Size

movq 60042, %raxleaq -1(%rax), %rcxmovq %rcx, 60042cmpq $2, %raxjl .LBB0_2

llc < decmin.ll

; Common pattern – decrement counter

%t0 = load i64, i64* inttoptr (i64 60042 to i64*)%t1 = add nsw i64 %t0, -1store i64 %t1, i64* inttoptr (i64 60042 to i64*)%t2 = icmp sle i64 %t1, 0br i1 %t2, label %l1, label %l2

Code Size

decq 60042jle .LBB0_2

llc < decmin.ll

decq 60042jle .LBB0_2

movq 60042, %raxleaq -1(%rax), %rcxmovq %rcx, 60042cmpq $2, %raxjl .LBB0_2

llc < decmin.ll opt -O2 -S | llc

func() {if (cond)

return foo();else

return bar();=======================================cmpl %esi, %edijg .L5jmp bar

.L5:jmp foo

Conditional Tail Call Optimization

func() { if (cond)

return foo();else

return bar();=======================================cmpl %esi, %edijg foojmp bar

; How much win!?

Conditional Tail Call

; BAD order ~50% slowdownfoo:bar:func:

; GOOD order ~30% winfunc:foo:bar:

Conditional Tail Call

PerformanceOpen Source PHP Frameworks

• vasm and LLVM backends not measurably different.• LLVM clearly beats vasm in certain situations – not hot enough to

make a difference overall.• Not currently using in production – need a reward to take risk.

PerformanceFacebook Workload

• Patches to LLVM 3.5 are on github (HHVM)• Calling conventions in LLVM trunk• Get all required features before 3.8 release• Switch HHVM to 3.8/trunk LLVM under option

Upstreaming Plans

More Information

http://hhvm.com/http://hhvm.com/blog/10205/llvm-code-generation-in-hhvmhttps://github.com/facebook/hhvm

Freenode: #hhvm and #hhvm-dev

LLVM Backend for HHVM · 2019. 10. 30. · HHVM Compilation Pipeline HHBC HHIR vasm x86-64 HHBC...

Documents

Transcript of LLVM Backend for HHVM · 2019. 10. 30. · HHVM Compilation Pipeline HHBC HHIR vasm x86-64 HHBC...

HHBC · Created Date: 9/14/2017 2:28:04 PM

MAY - HHBC · 2018-05-22 · HENDERSON HILLS 1200 E. I-35 Frontage Rd. Edmond, OK 73034 405.341.4639 | hhbc.com Weekly Services Saturdays 5:30 pm, Sundays 9 & 10:45 am Wednesdays

HHVM on AArch64 - BUD17-400K1

… · Web viewMartha served (v. 2) ... or get the HHBC Mobile App. Author: Sharilyn Snelling Created Date: 03/22/2018 14:30:00 Last modified by: Sherri Wullschleger Company: OGE

Php 7.x 8.0 and hhvm and

Performance Comparison of PHP 5.6 vs. 7.0 vs HHVM

Sponsorship & Exhibition Prospectus...VASM opportunities to bring a higher level of targeted marketing to your partnership. • Your company’s logo and message will appear on holding

Improving Memory Management for HHVM

VASM TASM Collaborative Workshop 2015 - RACS · VASM TASM Collaborative Workshop 2015 16 October 2015 12.30pm-5.00pm ... He is a devotee of acoustic finger style guitar. Mr Emilio

Hacking hhvm

Release latest-3.0.0-71-g4c5ef93XRL, Release latest-3.0.0-71-g4c5ef93 2.3HipHop Virtual Machine HipHop Virtual Machine (HHVM) is a process virtual machine based on just-in-time (JIT)

· Web viewSermon notes are available online and may be downloaded from hhbc.com/sermon-notes. Interactive notes are also available on the HHBC Mobile App. Available in the App store

Diving into HHVM Extensions (PHPNW Conference 2015)

· Web viewThe call of the King Sermon notes are available online and may be downloaded from hhbc.com/sermon-notes. Interactive notes are also available on the HHBC Mobile App. Available

Hacking with hhvm

超高速WordPress ～ PHP7 vs HHVM vs PHP5.6

Magento on HHVM...Check compatibility with Enterprise Edition Run automated tests (TAF) to compare Magento frontend/backend with PHP Tweak HHVM configuration to be optimal for Magento

Child Protection Policy - s3.amazonaws.com · Web viewThe goal of our CPP is to protect our children, caregivers, and the mission of our church. To this end we: Screen all HHBC

“1” + “2” - Dominic Mulligandominic-mulligan.co.uk/wp-content/uploads/2015/05/... · 2016. 6. 28. · 2013: HHVM: highly performant runtime used by FB, Wikipedia, … Dynamic

AC Line Filters Differential Mode Coils, HHBC Series (Fe-Si) · © KEMET Electronics Corporation • KEMET Tower • One East Broward Boulevard LF0050_HHBC • 2/3/2020 2 Fort Lauderdale,