Performance in the details: A way to make faster...

81
Performance in the details: A way to make faster Ruby Koichi Sasada <[email protected]> RailsClub 2015

Transcript of Performance in the details: A way to make faster...

Page 1: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Performance in the details:A way to make faster Ruby

Koichi Sasada<[email protected]>

RailsClub 2015

Page 2: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

A way to make faster Ruby

The only way I can find is:

Repeating a process.

Page 3: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

A way to make faster Ruby: A process

1. Observe Ruby interpreter

2. Make assumption the reason of slowness

3. Consider ideas to overcome

4. Implement ideas

5. Measure the result

•Bad/same performance → Goto 4, 3, 2 or 1•Good performance! → Commit it.

Page 4: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Koichi SasadaA programmer from Japan

Page 5: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Koichi is a Programmer

•MRI committer since 2007/01•Original YARV developer since 2004/01

• YARV: Yet Another RubyVM• Introduced into Ruby (MRI) 1.9.0 and later

•Generational/incremental GC for 2.x

Page 6: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Koichi is an Employee

Page 7: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Koichi is a member of Heroku Matz team

Mission

Design Ruby language

and improve quality of MRI

Heroku employs three full time Ruby core developers in Japan named “Matz team”

Page 8: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Heroku Matz team

Matz Designer/director of Ruby

Nobu Quite active committer

Ko1 Internal Hacker

Page 9: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

MatzTitle collector

• He has so many (job) title• Chairman - Ruby Association

• Fellow - NaCl

• Chief architect, Ruby - Heroku

• Research institute fellow – Rakuten

• Chairman – NPO mruby Forum

• Senior researcher – Kadokawa Ascii Research Lab

• Visiting professor – Shimane University

• Honorable citizen (living) – Matsue city

• Honorable member – Nihon Ruby no Kai

• …

• This margin is too narrow to contain

Page 10: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

NobuGreat Patch monster

Ruby’s bug

|> Fix Ruby

|> Break Ruby

|> And Fix Ruby

Page 11: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

NobuPatch monster

nobu29%

akr12%svn

9%naruse

8%usa4%

ko14%

drbrain3%

kosaki3%

kazu2%

zzak2%

tenderlove2%

matz2%

marcandre2%

mame2%

tadf2%

knu1%

shugo1%

nagachika1%

yugui1%

kou1%

mrkn1%

emboss1%

shyouhei1%

nari0%

glass0%

ktsj0%

nahi0%

ayumin0%

tarui0%

sorah0%ryan0%

charliesome

0%

shirosaki0%xibbar

0%nagai

0%eregon

0%ngoto

0%wanabe

0%azav0%

keiju0%suke0%

kouji0%

duerst0%

takano320%

luislavena0%jeg20%hsbt0%

arton0%seki0%

kanemoto0%

tmm10%

eban0%

muraken0%

headius0%

evan0%

a_matsuda0%

iwamatsu0%

technorama

0%

davidflanagan0%

gotoken0%

okkez0%

COMMIT RATIO IN LAST 5 YEARS

Commit count of MRI

Page 12: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

NobuThe Ruby Hero

Page 13: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ko1EDD developer

0

5

10

15

20

25

20

10

/11

/8

20

11

/1/8

20

11

/3/8

20

11

/5/8

20

11

/7/8

20

11

/9/8

20

11

/11

/8

20

12

/1/8

20

12

/3/8

20

12

/5/8

20

12

/7/8

20

12

/9/8

20

12

/11

/8

20

13

/1/8

20

13

/3/8

20

13

/5/8

20

13

/7/8

20

13

/9/8

20

13

/11

/8

Commit number of ko1 (last 3 years)

RubyConf2012

RubyKaigi2013

Ruby 2.0

Euruko2013

RubyConf2013

EDD: Event Driven Development

Page 14: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Heroku Matz team and Ruby core teamRecent achievement

Ruby 2.2Current stable

Page 15: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Syntax

• Symbol key of Hash literal can be quoted

{“foo-bar”: baz}#=> {:“foo-bar” => baz}#=> not {“foo-bar” => baz} like JSON

TRAP!!Easy to misunderstand

(I wrote a wrong code, already…)

Page 16: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Classes and Methods• Some methods are introduces• Kernel#itself• String#unicode_normalize• Method#curry• Binding#receiver• Enumerable#slice_after, slice_before• File.birthtime• Etc.nprocessors• …

Page 17: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Improvements

• Improve GC• Symbol GC• Incremental GC• Improved promotion algorithm

• Young objects promote after 4 GCs

• Fast keyword parameters

•Use frozen string literals if possible

Page 18: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Symbol GC

before = Symbol.all_symbols.size

1_000_000.times{|i| i.to_s.to_sym} # Make 1M symbols

after = Symbol.all_symbols.size; p [before, after]

# Ruby 2.1

#=> [2_378, 1_002_378] # not GCed# Ruby 2.2

#=> [2_456, 2_456] # GCed!

Page 19: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Symbol GC Issues history• Ruby 2.2.0 has memory (object) leak problem

• Symbols has corresponding String objects

• Symbols are collected, but Strings are not collected! (leak)

• Ruby 2.2.1 solved this problem!!• However, 2.2.1 also has problem (rarely you encounter BUG at the end of process

[Bug #10933] ← not big issue, I want to believe)

• Ruby 2.2.2 had solved [Bug #10933]!!• However, patch was forgot to introduce!!

• Finally, Ruby 2.2.3 solved it!!

•Please use newest version!!

Page 20: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Fast keyword parameters

“Keyword parameters” introduced in Ruby 2.0 is useful, but slow!!

0

5

10

15

20

foo6(1, 2, 3, 4, 5, 6) foo_kw6(k1: 1, k2: 2, k3: 3, k4: 4, k5: 5, k6: 6)Exec

uti

on

tim

e (s

ec)

Repeat 10M times

Evaluation on Ruby 2.1

x30 slower

Page 21: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Fast keyword parameters

0

5

10

15

20

foo6(1, 2, 3, 4, 5, 6) foo_kw6(k1: 1, k2: 2, k3: 3, k4: 4, k5: 5, k6: 6)

Exec

uti

on

tim

e (s

ec)

Repeat 10M times

Ruby 2.1 Ruby 2.2

x14 faster!!

But still x2 times slowercompare with normal dispatch

Ruby 2.2 optimizes method dispatch with keyword parameters

Page 22: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2Incremental GC

BeforeRuby 2.1

Ruby 2.1 RGenGC

Incremental GC

Ruby 2.2 Gen+IncGC

Throughput Low High Low High

Pause time Long Long Short Short

Goal

Page 23: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

RGenGC from Ruby 2.1:Micro-benchmark

1699.805974

87.230735

704.843669

867.740319

0

500

1000

1500

2000

2500

3000

no RGenGC

Tim

e (m

s)

total mark total sweep

x2.5 faster

Page 24: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

RGenGC from Ruby 2.1:Pause time

Most of cases, FASTER

(w/o rgengc)

Page 25: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

RGenGC from Ruby 2.1:Pause time

Several peaks

(w/o rgengc)

Page 26: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby 2.2 Incremental GC

Short pause time

Page 27: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Heroku Matz team and Ruby core teamNext target is

Ruby 2.3

Page 28: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Heroku Matz team and Ruby core teamNext target is

Ruby 2.3No time to talk about it.

Please ask me later

Page 29: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Performance in the details:A way to make faster Ruby

Page 30: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby interpreter

Ruby (Rails) app

RubyGems/Bundler

So many gemssuch as Rails, pry, thin, … and so on.

Ruby’s components for users

i gigantum umeris insidentesStanding on the shoulders of giants

Page 31: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Interpret on RubyVM

Rubyscript

Parse

Compile

RubyBytecode

Object management(GC)

Threading

Embeddedclasses and methods

(Array, String, …)

BundledLibraries

Evaluator

GemLibraries

Ruby’s componentsfrom core developer’s perspective

Ko1’s area

Page 32: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Basic flow to make faster Ruby

1. Observe Ruby interpreter

2. Make assumption the reason of slowness

3. Consider ideas to overcome

4. Implement ideas

5. Measure the result

•Bad/same performance → Goto 4, 3, 2 or 1•Good performance! → Commit it.

Page 33: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Basic weapons to overcome issues

•Knowledge of computer science• Computer system, Programming techniques, and many

others• From:

• Textbook• Academic papers• Other implementation

• Feedback from users

Page 34: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Basic technique to improve performance

• Change the algorithm to reduce computation complexity• e.g.: Selection sort (O(n^2)) v.s. Quick sort (O(n log(n))

• Chang the data structure to improve data locality• e.g.: “list” and “array”

• Remove redundant process• e.g.: Using cache (utilize time locality)

• Considering trade-off• Speed-up major cases and slow-down minor cases

• e.g.: speed-up non-exception flow (and slow-down exception cases)

• Machine dependent technique• e.g.: Using assembler / CPU register directly

• …

Page 35: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Case studies

Page 36: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Let’s play hangman game

Page 37: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Page 38: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Page 39: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Page 40: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Page 41: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby has many

Or Methods

Page 42: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Case study:Optimize method dispatch

Page 43: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Interpret on RubyVM

Rubyscript

Parse

Compile

RubyBytecode

Object management(GC)

Threading

Embeddedclasses and methods

(Array, String, …)

BundledLibraries

Evaluator

GemLibraries

Ruby’s componentsfrom core developer’s perspective

Ko1’s area

Page 44: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatch

# Example

recv.selector(arg1, arg2)

•recv: receiver

•selector: method id

•arg1, arg2: arguments

Page 45: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatchOverview1. Get class of `recv’ (`klass’)

2. Search method `body’ named `selector’ from `klass’• Method is not fixed at compile time• “Dynamic” method dispatch

3. Dispatch method with `body’1. Check visibility2. Check arity (expected args # and given args #)3. Store `PC’ and `SP’ to continue after method returning4. Build `local environment’5. Set program counter

4. And continue VM execution

Page 46: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OverviewMethod search• Search method from `klass’

1. Search method table of `klass’1. if method `body’ is found, return `body’

2. `klass’ = super class of `klass’ and repeat it

2. If no method is given, exceptional flow• In Ruby language, `method_missing’ will be called

BasicObject

Object

C1

C2

Kernel

selector: body...

Each Class hasmethod table

Page 47: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OverviewChecking arity and visibility• Checking arity

• Compare with given argument number and expected argument number

• Checking visibility• In Ruby language, there are three visibilities

• can you explain each of them ?:-p• public

• private

• protected

Page 48: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OverviewBuilding `local environment’• How to maintain local variables?

→ Prepare `local variables space’ in stack

→ `local environment’ (short `env’)

• Parameters are also in `env’

Page 49: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatchOverview (again)1. Get class of `recv’ (`klass’)

2. Search method `body’ `selector’ from `klass’• Method is not fixed at compile time• “Dynamic” method dispatch

3. Dispatch method with `body’1. Check visibility2. Check arity (expected args # and given args #)3. Store `PC’ and `SP’ to continue after method returning4. Build `local environment’5. Set program counter

4. And continue VM execution

It seems very easy and simple!and slow...

Page 50: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatch

•Quiz: How many steps in Ruby’s method dispatch?• Hint: More complex than I explained overview① 8 steps② 12 steps③ 16 steps④ 20 steps

Answer isAbout ④ 20 steps

Page 51: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatchRuby’s case

1. Check caller’s arguments1. Check splat (*args)

2. Check block (given by compile time or block parameter (&block))

2. Get class of `recv’ (`klass’)

3. Search method `body’ `selector’ from `klass’• Method is not fixed at compile time

• “Dynamic” method dispatch

4. Dispatch method with `body’1. Check visibility

2. Check arity (expected args # and given args #) and process1. Post arguments

2. Optional arguments

3. Rest argument

4. Keyword arguments

5. Block argument

3. Push new control frame1. Store `PC’ and `SP’ to continue after method returning

2. Store `block information’

3. Store `defined class’

4. Store bytecode info (iseq)

5. Store recv as self

4. Build `local environment’

5. Initialize local variables by `nil’

6. Set program counter

5. And continue VM execution

... simple?

(*) Underlined items are additonal process

Page 52: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Ruby’s caseComplex parameter checking

• “def foo(m1, m2, o1=..., o2=...,

p1, p2, *rest, &block)”• m1, m2: mandatory parameter• o1, o2: optional parameter• p1, p2: post parameter• rest: rest parameter• block: block parameter

• From Ruby 2.0, keyword parameter is supported

Page 53: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatch1. CHeck caller’s arguments

1. Check splat (*args)

2. Check block (given by compile time or block parameter (&block))

2. Get class of `recv’ (`klass’)

3. Search method `body’ `selector’ from `klass’• Method is not fixed at compile time

• “Dynamic” method dispatch

4. Dispatch method with `body’1. Check visibility

2. Check arity (expected args # and given args #) and process1. Post arguments

2. Optional arguments

3. Rest argument

4. Keyword arguments

5. Block argument

3. Push new control frame1. Store `PC’ and `SP’ to continue after method returning

2. Store `block information’

3. Store `defined class’

4. Store bytecode info (iseq)

5. Store recv as self

4. Build `local environment’

5. Initialize local variables by `nil’

6. Set program counter

5. And continue VM execution

Complexand

Slow!!!

Page 54: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Method dispatchOverhead

OS: Linux 2.6.31 32-bitCPU: IntelCore2Quad 2.66GHzMem: 4GBC Compiler: GCC 4.4.1, -O3Profiled by Oprofile

ruby 1.9.3dev (2010-05-26)Profiled by Mr. Shiba

VMObj

OthersInsn

MethodBlock

InstCompileGC

MM

Cfunc

NotRuby

Others

Pentomino

VM

Obj

Others

Insn

Method

BlockOthersCompileGCMMCfunc

NotRuby

Others

Fib

Method dispatch overhead is big especially on micro-benchmarks

Page 55: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Speedup techniquesfor method dispatch1. Specialized instructions

2. Method caching

3. Caching checking results

4. Special path for `send’ and `method_missing’

Page 56: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OptimizationSpecialized instruction (from Ruby 1.9.0)

•Make special VM instruction for several methods• +, -, *, /, ...

def opt_plus(recv, obj)if recv.is_a(Fixnum) and obj.is_a(Fixnum) and

Fixnum#+ is not redefinedreturn Fixnum.plus(recv, obj)

elsereturn recv.send(:+, obj) # not prepared

endend

Page 57: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OptimizationMethod caching (from Ruby 1.9.0)• Eliminate method search overhead

• Reuse search result

• Invalidate cache entry with VM stat

• Two level method caching• Inline method caching

• Global method caching

class => body

class, id => bodyclass, id => body

....class, id => body

BasicObject

Object

C1

C2

Kernel

Inline cache1 element per call-site

Global cachehash table

miss

method search

return fill

miss

fill

naive search

Page 58: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

OptimizationCaching checking results (from 2.0.0)• Idea: Visibility and arity check can be skipped after first checking

• Store result in inline method cache

1. Check caller’s arguments

2. Search method `body’ `selector’ from `klass’

3. Dispatch method with `body’1. Check visibility and arity

1. Cache result into inline method cache

2. Push new control frame

3. Build `local environment’

4. Initialize local variables by `nil’

Seco

nd

tim

e

Firs

t ti

me

Page 59: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Evaluation resultMicro benchmarks

0

0.5

1

1.5

2

vm1_attr_ivar*

vm1_attr_ivar_set*

vm1_block*

vm1_simplereturn*

vm1_yield*

vm2_defined_method*

vm2_method*

vm2_method_missing*

vm2_method_with_block*

vm2_poly_method*

vm2_send*

vm2_super*

vm2_zsuper*

Spee

du

p r

atio

Faster than first date

trunk 2012/10/13 trunk 2012/10/31

Page 60: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Case studyFaster keyword parameters

Page 61: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Keyword parameters from Ruby 2.0

# def with keywords

def foo(a, b, key1: 1, key2: 2)

end

# call with keywords

foo(1, 2, key1: 123, key2: 456)

Page 62: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Slow keyword parameters

0

5

10

15

20

foo6(1, 2, 3, 4, 5, 6) foo_kw6(k1: 1, k2: 2, k3: 3, k4: 4,k5: 5, k6: 6)Ex

ecu

tio

n t

ime

(se

c)

Repeat 10M times

Evaluation on Ruby 2.1x30 slower

Page 63: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Why slow, compare with normal parameters?

1. Hash creation

2. Hash accessdef foo(h = {})

k1 = h.fetch(:k1, v1)

k2 = h.fetch(:k2, v2)

end

foo( {k1: 1, k2: 2} )

def foo(k1: v1, k2: v2)

end

foo(k1: 1, k2: 2)

1. Hash creation

2. Hash access

Page 64: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Optimization technique of keyword parameters from Ruby 2.2

•Key technique

→ Pass “a keyword list”

nstead of a Hash object

Check “Evolution of Keyword parameters” at Rubyconf portugal'15 http://www.atdot.net/~ko1/activities/2015_RubyConfPortgual.pdf

Page 65: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Result: Fast keyword parameters (Ruby 2.2.0)

0

5

10

15

20

foo6(1, 2, 3, 4, 5, 6) foo_kw6(k1: 1, k2: 2, k3: 3, k4: 4, k5:5, k6: 6)

Exec

uti

on

tim

e (s

ec)

Repeat 10M times

Ruby 2.1 Ruby 2.2

x14 faster!!(best case)

But still x2 times slower compare with normal dispatch

Ruby 2.2 optimizes method dispatch with keyword parameters

Page 66: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Case studyGarbage collection

http://www.flickr.com/photos/circasassy/6817999189/

Page 67: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Interpret on RubyVM

Rubyscript

Parse

Compile

RubyBytecode

Object management(GC)

Threading

Embeddedclasses and methods

(Array, String, …)

BundledLibraries

Evaluator

GemLibraries

Ruby’s componentsfrom core developer’s perspective

Ko1’s area

Page 68: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Automatic memory managementBasic concept

• Garbage collector recycled “unused” objects automatically

Page 69: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

marked

marked marked

markedmarked

Mark & Sweep algorithm

1. Mark reachable objects from root objects

2. Sweep unmarkedobjects (collection and de-allocation)

Root objects

free

traverse

traverse traverse

traverse traverse

free

free

Collect unreachable objects

Page 70: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Generational GC (GenGC) from Ruby 2.1.0•Weak generational hypothesis:

“Most objects die young”

→ Concentrate reclamation effort

only on the young objects

http://www.flickr.com/photos/ell-r-brown/5026593710

Page 71: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Generational hypothesis

0

10

20

30

40

50

60

70

80

90

100

0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102

Perc

enta

ge o

f d

ead

ob

ject

#

Lifetime (Survibing GC count)

Object lifetime in RDoc(How many GCs surviving?)

95% of objects dead by the first GC

Page 72: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Generational GC (GenGC)

•Separate young generation and old generation•Create objects as young generation•Promote to old generation after surviving n-th GC

•Usually, GC on young space (minor GC)

•GC on both spaces if no memory (major/full GC)

Page 73: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

GenGC [Minor M&S GC] (1/2)

• Mark reachable objects from root objects.• Mark and promote to old

generation• Stop traversing after old

objects

→ Reduce mark overhead

• Sweep not (marked or old) objects

• Can’t collect Some unreachable objects

Root objects

new

new new

new/free

newnew

traverse

traverse traverse

traverse traverse

new/free

old/free

Don’t collect old objecteven if it is unreachable.

collect

1st MinorGC

old

old old

oldold

Page 74: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

GenGC [Minor M&S GC] (2/2)

• Mark reachable objects from root objects.• Mark and promote to old

generation• Stop traversing after old

objects

→ Reduce mark overhead

• Sweep not (marked or old) objects

• Can’t collect Some unreachable objects

Root objects

old

old old

new/free

oldold

traverse

ignore ignore

ignore ignore

new/free

old/free

Don’t collect old objecteven if it is unreachable.

collect

2nd MinorGC

Page 75: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

GenGC [Major M&S GC]

• Normal M&S

• Mark reachable objects from root objects• Mark and promote to old gen

• Sweep unmarked objects

• Sweep all unreachable (unused) objects

Root objects

new

old new

new/free

oldold

traverse

traverse traverse

traverse traverse

new/free

old/free

collect

collect

Page 76: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

0

2

4

6

8

10

12

14

Total mark time (ms) Total sweep time (sec)

Acc

um

ula

ted

exe

cuti

on

tim

e (s

ec)

w/o RGenGC RGenGC

RGenGC from Ruby 2.1.0Performance evaluation (RDoc)

About x15 speedup!

* Disabled lazy sweep to measure correctly.

Page 77: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

RGenGC from Ruby 2.1.0Performance evaluation (RDoc)

* 12% improvements compare with w/ and w/o RGenGC* Disabled lazy sweep to measure correctly.

103.7627479 102.3799865

16.043938154.946003494

0

20

40

60

80

100

120

140

w/o RGenGC RGenGC

Tota

l exe

cuti

on

tim

e (s

ec)

other than GC GC

Page 78: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Summary

Page 79: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

SummaryRepeating “Basic flow” is my daily job

1. Observe Ruby interpreter

2. Make assumption the reason of slowness

3. Consider ideas to overcome

4. Implement ideas

5. Measure the result

•Bad/same performance → Goto 4, 3, 2 or 1•Good performance! → Commit it.

Page 80: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Summary

Ruby/MRI is gettingbetter and better.

Page 81: Performance in the details: A way to make faster Rubyrvm.jp/~ko1/activities/2015_RailsClub2015_pub.pdfRailsClub 2015 A way to make faster Ruby The only way I can find is: Repeating

Thank you for your attention

Koichi Sasada<[email protected]>