How To Write Middleware In Ruby
-
Upload
satoshi-tagomori -
Category
Software
-
view
3.571 -
download
0
Transcript of How To Write Middleware In Ruby
How To Write Middleware in Ruby2016/12/02 RubyConf Taiwan Day 1
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori (@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
http://www.fluentd.org/
open source data collector for unified logging layer.
LOG
script to parse data
cron job forloading
filteringscript
syslogscript
Tweet-fetching
script
aggregationscript
aggregationscript
script to parse data
rsyncserver
FILE
LOGFILE
✓ Parse/Format data ✓ Buffering & Retries ✓ Load balancing ✓ FailoverBefore
After
Middleware? : Fluentd• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
Middleware? Batches:Minutes - Hours
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
Middleware?Providing APIs
and/or Client Libraries
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
Middleware?
Daily Development& Deployment
Providing Client Tools
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
Middleware?
Make Your ApplicationStable
• Long running daemon process
• Compatibility for API, behavior and configuration files
• Multi platform / environment support • Linux, Mac and Windows(!) • Ruby, JRuby?, Rubinius? • Baremetal servers, Virtual machines, Containers
• Many use cases • Various data, Various data formats, Unexpected errors • Various traffic - small to huge
Middleware?
Make Your ApplicationFast and Scalable
Case studies from development of Fluentd
• Platform: Linux, Mac and Windows
• Resource: Memory usage and malloc
• Resource and Stability: Handling JSON
• Stability: Threads and exceptions
Platforms: Linux, Mac and Windows
Linux and Mac: Thread/process scheduling• Both are UNIX-like systems...
• Mac (development), Linux (production)
• Test code must run on both!
• CI services provide multi-environment support • Fluentd uses Travis CI :D • Travis CI provides "os" option: "linux" & "osx"
• Important tests to be written: Threading
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
assert_equal(["data 0", "data 1"], list) end end
Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0"]>
diff: ["data 0", "data 1"] ===========================================================================================
Finished in 0.007253 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 137.87 tests/s, 137.87 assertions/s
Mac OS X (10.11.16)
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
assert_equal(["data 0", "data 1"], list) end end
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
sleep 1
assert_equal(["data 0", "data 1"], list) end end
Loaded suite example Started .
Finished in 1.002745 seconds. -------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed -------------------------------------------------------------------------------------------- 1.00 tests/s, 1.00 assertions/s
Mac OS X (10.11.16)
Loaded suite example Started E ================================================================================================= Error: test: client sends 2 data(MyTest): Errno::ECONNREFUSED: Connection refused - connect(2) for "127.0.0.1" port 2048 example.rb:16:in `initialize' example.rb:16:in `open' example.rb:16:in `block (2 levels) in <class:MyTest>' example.rb:15:in `times' example.rb:15:in `block in <class:MyTest>' =================================================================================================
Finished in 0.005918197 seconds. ------------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------------- 168.97 tests/s, 0.00 assertions/s
Linux (Ubuntu 16.04)
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
sleep 1
assert_equal(["data 0", "data 1"], list) end end
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
sleep 1
assert_equal(["data 0", "data 1"], list) end end
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = []
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
sleep 1
assert_equal(["data 0", "data 1"], list) end end
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'socket'
class MyTest < ::Test::Unit::TestCase test 'client sends 2 data' do list = [] listening = false
thr = Thread.new do # Mock server TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accept list << sock.read.chomp end end end
sleep 0.1 until listening
2.times do |i| TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end end
require 'timeout' Timeout.timeout(3){ sleep 0.1 until list.size >= 2 }
assert_equal(["data 0", "data 1"], list) end end
*NIX and Windows: fork-exec and spawn• Windows: another thread scheduling :(
• daemonize: • double fork (or Process.daemon) on *nix • spawn on Windows
• Execute one another process: • fork & exec on *nix • spawn on Windows
• CI on Windows: AppVeyor
Lesson 1: Run Tests
on All Platforms Supported
Resource: Memory usage and malloc
Memory Usage: Object leak
• Temp values must leak in long running process • 1,000 objects / hour
=> 8,760,000 objects / year
• Some solutions: • In-process GC • Storage with TTL • (External storages: Redis, ...)
module MyDaemon class Process def hour_key Time.now.to_i / 3600 end
def hourly_store @map[hour_key] ||= {} end
def put(key, value) hourly_store[key] = value end
def get(key) hourly_store[key] end
# add # of data per hour def read_data(table_name, data) key = "records_of_#{table_name}" put(key, get(key) + data.size) end end
Lesson 2: Make Sure to Collect Garbages
Resource and Stability: Handling JSON
Formatting Data Into JSON
• Fluentd handles JSON in many use cases • both of parsing and generating • it consumes much CPU time...
• JSON, Yajl and Oj • JSON: ruby standard library • Yajl (yajl-ruby): ruby binding of YAJL (SAX-based) • Oj (oj): Optimized JSON
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'json'; require 'yajl'; require 'oj' Oj.default_options = {bigdecimal_load: :float, mode: :compat, use_to_json: true}
module MyDaemon class Json def initialize(mode) klass = case mode when :json then JSON when :yajl then Yajl when :oj then Oj end @proc = klass.method(:dump) end
def dump(data); @proc.call(data); end end end
require 'benchmark' N = 500_000 obj = {"message" => "a"*100, "100" => 100, "pi" => 3.14159, "true" => true}
Benchmark.bm{|x| x.report("json") { formatter = MyDaemon::Json.new(:json) N.times{ formatter.dump(obj) } } x.report("yajl") { formatter = MyDaemon::Json.new(:yajl) N.times{ formatter.dump(obj) } } x.report("oj") { formatter = MyDaemon::Json.new(:oj) N.times{ formatter.dump(obj) } } }
$ ruby example2.rb user system total real json 3.870000 0.050000 3.920000 ( 4.005429) yajl 2.940000 0.030000 2.970000 ( 2.998924) oj 1.130000 0.020000 1.150000 ( 1.152596)
# for 500_000 objects
Mac OS X (10.11.16) Ruby 2.3.1
yajl-ruby 1.3.0 oj 2.18.0
Speed is not only thing: APIs for unstable I/O
• JSON and Oj have only ".load" • it raises parse error for:
• incomplete JSON string • additional bytes after JSON string
• Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'oj'
Oj.load('{"message":"this is ') # Oj::ParseError Oj.load('{"message":"this is a pen."}') # => Hash
Oj.load('{"message":"this is a pen."}{"messa"') # Oj::ParseError
Speed is not only thing: APIs for unstable I/O
• JSON and Oj have only ".load" • it raises parse error for:
• incomplete JSON string • additional bytes after JSON string
• Yajl has stream parser: very useful for servers • method to feed input data • callback for parsed objects
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'yajl'
parsed_objs = []
parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ parsed_objs << obj }
parse << '{"message":"aaaaaaaaaaaaaaa' parse << 'aaaaaaaaa"}{"message"' # on_parse_complete is called parse << ':"bbbbbbbbb"' parse << '}' # on_parse_complete is called again
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'socket' require 'oj'
TCPServer.open(port) do |server| while sock = server.accept begin buf = "" while input = sock.readpartial(1024) buf << input # can we feed this value to Oj.load ? begin obj = Oj.load(buf) # never succeeds if buf has 2 objects call_method(obj) buf = "" rescue Oj::ParseError # try with next input ... end end rescue EOFError sock.close rescue nil end end end
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
require 'socket' require 'yajl'
TCPServer.open(port) do |server| while sock = server.accept begin parser = Yajl::Parser.new parser.on_parse_complete = ->(obj){ call_method(obj) } while input = sock.readpartial(1024) parser << input end rescue EOFError sock.close rescue nil end end end
Lesson 3: Choose
Fast/Well-Designed(/Stable) Libraries
Stability: Threads and Exceptions
Thread in Ruby• GVL(GIL): Giant VM Lock (Global Interpreter Lock)
• Just one thread in many threads can run at a time • Ruby VM can use only 1 CPU core
• Thread in I/O is *not* running • I/O threads can run in parallel
threads in I/O running threads
• We can write network servers in Ruby!
class MyTest < ::Test::Unit::TestCase test 'yay 1' do data = []
thr = Thread.new do data << "line 1" end
data << "line 2"
assert_equal ["line 1", "line 2"], data end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
Loaded suite example7 Started .
Finished in 0.104729 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed ------------------------------------------------------------------------------------------- 9.55 tests/s, 9.55 assertions/s
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received end end
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received # [] == [] end end
Thread in Ruby: Methods for errors
• Threads will die silently if any errors are raised
• abort_on_exception • raise error in threads on main thread if true • required to make sure not to create false success
(silent crash)
• report_on_exception • warn errors in threads if true (2.4 feature)
class MyTestCase < ::Test::Unit::TestCase test 'sent data should be received' do received = [] sent = [] listening = false
th1 = Thread.new do Thread.current.abort_on_exception = true TCPServer.open("127.0.0.1", 2048) do |server| listening = true while sock = server.accepto received << sock.read end end end
sleep 0.1 until listening
["foo", "bar"].each do |str| begin TCPSocket.open("127.0.0.1", 2048) do |client| client.write "data #{i}" end sent << str rescue => e # ignore end end
assert_equal sent, received # [] == [] end end
Loaded suite example7 Started E =========================================================================================== Error: test: sent data should be received(MyTestCase): NoMethodError: undefined method `accepto' for #<TCPServer:(closed)> Did you mean? accept example7.rb:14:in `block (3 levels) in <class:MyTestCase>' example7.rb:12:in `open' example7.rb:12:in `block (2 levels) in <class:MyTestCase>' ===========================================================================================
Finished in 0.0046 seconds. ------------------------------------------------------------------------------------------- 1 tests, 0 assertions, 0 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 217.39 tests/s, 0.00 assertions/s
sleeping = false Thread.abort_on_exception = true
Thread.new{ sleep 0.1 until sleeping ; raise "yay" }
begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end
p "foo!"
Thread in Ruby: Process crash from errors in threads• Middleware SHOULD NOT crash as far as possible :)
• An error from a TCP connection MUST NOT crash the whole process
• Many points to raise errors... • Socket I/O, Executing commands • Parsing HTTP requests, Parsing JSON (or other formats)
• Process • should crash in tests, but • should not in production
Thread in Ruby: What needed in your code about threads
• Set Thread#abort_on_exception = true • for almost all threads...
• "rescue" all errors in threads • to log these errors, and not to crash whole process
• "raise" rescued errors again only in testing • to make tests failed for bugs
Lesson 4: Handle Exceptions
in Right Way
Wrap-up: Writing Middleware is ...
Writing Middleware:• Taking care about:
• various platforms and environment • Resource usage and stability
• Requiring to know about: • Ruby's features • Ruby VM's behavior • Library implementation
• In different viewpoint from writing applications!
Write your code, like middleware :D
Make it efficient & stable!
Thank you! @tagomoris
Loaded suite example Started F =========================================================================================== Failure: test: client sends 2 data(MyTest) example.rb:22:in `block in <class:MyTest>' 19: end 20: end 21: => 22: assert_equal(["data 0", "data 1"], list) 23: end 24: end <["data 0", "data 1"]> expected but was <["data 0", "data 1"]>
diff: ["data 0", "data 1"] ===========================================================================================
Finished in 0.009425 seconds. ------------------------------------------------------------------------------------------- 1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 0% passed ------------------------------------------------------------------------------------------- 106.10 tests/s, 106.10 assertions/s
Mac OS X (10.11.16)
Memory Usage: Memory fragmentation• High memory usage, low # of objects
• memory fragmentation? • glibc malloc: weak for fine-grained memory allocation
and multi threading
• Switching to jemalloc by LD_PRELOAD • FreeBSD standard malloc (available on Linux) • fluentd's rpm/deb package uses jemalloc in default
abort_on_exception in detail• It doesn't abort the whole process, actually
• it just re-raise errors in main threadsleeping = false Thread.abort_on_exception = true
Thread.new{ sleep 0.1 until sleeping ; raise "yay" }
begin sleeping = true sleep 5 rescue => e p(here: "rescue in main thread", error: e) end
p "foo!"
$ ruby example.rb {:here=>"rescue in main thread", :error=>#<RuntimeError: yay>} "foo!"