How to avoid hanging yourself with Rails

26
How to avoid hanging yourself with Rails Using ActiveRecord right the first time work.rowanhick.com 1

description

Presentation given to Toronto Rails Project Night, performance tips for ActiveRecord usage

Transcript of How to avoid hanging yourself with Rails

Page 1: How to avoid hanging yourself with Rails

How to avoid hanging yourself with Rails

Using ActiveRecord right the first time

work.rowanhick.com

1

Page 2: How to avoid hanging yourself with Rails

Discussion tonight

• Intended for new Rails Developers

• People that think Rails is slow

• Focus on simple steps to improve common :has_many performance problems

• Short - 15mins

• All links/references up on http://work.rowanhick.com tomorrow

2

Page 3: How to avoid hanging yourself with Rails

About me

• New Zealander (not Australian)

• Product Development Mgr for a startup in Toronto

• Full time with Rails for 2 years

• Previously PHP/MySQL for 4 years

• 6 years Prior QA/BA/PM for Enterprise CAD/CAM software dev company

3

Page 4: How to avoid hanging yourself with Rails

Disclaimer

• For sake of brevity and understanding, the SQL shown here is cut down to “psuedo sql”

• This is not an exhaustive in-depth analysis, just meant as a heads up

• Times were done using ApacheBench through mongrel in production mode

• ab -n 1000 http://127.0.0.1/orders/test_xxxx

4

Page 5: How to avoid hanging yourself with Rails

ActiveRecord lets you get in trouble far to quick.

• Super easy syntax comes at a cost. @orders = Order.find(:all)@orders.each do |order| puts order.customer.name puts order.customer.country.nameend

✴Congratulations, you just overloaded your DB with (total number of Orders x 2) unnecessary SQL calls

5

Page 6: How to avoid hanging yourself with Rails

What happened there?

• One query to get the orders@orders = Order.find(:all)“SELECT * FROM orders”

• For every item in the orders collection customer.name:“SELECT * FROM customers WHERE id = x”

customer.country.name:“SELECT * FROM customers WHERE id = y”

6

Page 7: How to avoid hanging yourself with Rails

Systemic Problem in Web development

I’ve seen:

- 15 Second page reloads

- 10000 queries per page

“<insert name here> language performs really poorly, we’re going to get it redeveloped in <insert new language here>”

7

Page 8: How to avoid hanging yourself with Rails

Atypical root cause

• Failure to build application with *real* data

• ie “It worked fine on my machine” but the developer never loaded up 100’000 records to see what would happen

• Using Rake tasks to build realistic data sets

• Test, test, test

• tail -f log/development.log

8

Page 9: How to avoid hanging yourself with Rails

Faker to the rescue• in lib/xchain.rake

namespace :xchain do desc "Load fake customers" task :load_customers => :environment do require 'Faker' Customer.find(:all, :conditions => "email LIKE('%XCHAIN_%')").each { |c| c.destroy } 1..300.times do c = Customer.new c.status_id = rand(3) + 1 c.country_id = rand(243) + 1 c.name = Faker::Company.name c.alternate_name = Faker::Company.name c.phone = Faker::PhoneNumber.phone_number c.email = "XCHAIN_"+Faker::Internet.email c.save end end

$ rake xchain:load_customers

9

Page 10: How to avoid hanging yourself with Rails

Eager loading

• By using :include in .finds you create sql joins

• Pull all required records in one queryfind(:all, :include => [ :customer, :order_lines ])

✓ order.customer, order.order_lines

find(:all, :include => [ { :customer => :country }, :order_lines ])

✓ order.customer order.customer.country order.order_lines

10

Page 11: How to avoid hanging yourself with Rails

Improvement

• Let’s start optimising ... @orders = Order.find(:all, :include => {:customers => :country} )

• Resulting SQL ...“SELECT orders.*, countries.* FROM orders LEFT JOIN customers ON ( customers.id = orders.customers_id ) LEFT JOIN countries ON ( countries.id = customers.country_id)

✓ 7.70 req/s 1.4x faster

11

Page 12: How to avoid hanging yourself with Rails

Select only what you need

• Using the :select parameter in the find options, you can limit the columns you are requesting back from the database

• No point grabbing all columns, if you only want :id and :name Orders.find(:all, :select => ‘orders.id, orders.name’)

12

Page 13: How to avoid hanging yourself with Rails

The last slide was very important

• Not using selects is *okay* provided you have very small columns, and never any binary, or large text data

• You can suddenly saturate your DB connection.

• Imagine our Orders table had an Invoice column on it storing a pdf of the invoice...

13

Page 14: How to avoid hanging yourself with Rails

Oops

• Can’t show a benchmark

• :select and :include don’t work together !, reverts back to selecting all columns

• Core team for a long time have not included patches to make it work

• One little sentence in ActiveRecord rdoc “Because eager loading generates the SELECT statement too, the :select option is ignored.”

14

Page 15: How to avoid hanging yourself with Rails

‘mrj’ to the rescue

• http://dev.rubyonrails.org/attachment/ticket/7147/init.5.rb

• Monkey patch to fix select/include problem

• Produces much more efficient SQL

15

Page 16: How to avoid hanging yourself with Rails

Updated finder

• Now :select and :include playing nice: @orders = Order.find(:all, :select => 'orders.id, orders.created_at, customers.name, countries.name, order_statuses.name', :include => [{:customer[:name] => :country[:name]}, :order_status[:name]], :conditions => conditions, :order => 'order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC')

✓15.15 req/s 2.88x faster

16

Page 17: How to avoid hanging yourself with Rails

r8672 change

• http://blog.codefront.net/2008/01/30/living-on-the-edge-of-rails-5-better-eager-loading-and-more/

• The following uses new improved association load (12 req/s)

@orders = Order.find(:all, :include => [{:customer => :country}, :order_status] )

• The following does not

@orders = Order.find(:all, :include => [{:customer => :country}, :order_status], :order => ‘order_statuses.sort_order’ )

17

Page 18: How to avoid hanging yourself with Rails

r8672 output...

• Here’s the SQL

Order Load (0.000837) SELECT * FROM `orders` WHERE (order_status_id < 100) LIMIT 10

Customer Load (0.000439) SELECT * FROM `customers` WHERE (customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017))

Country Load (0.000324) SELECT * FROM `countries` WHERE (countries.id IN (33,17,56,150,194,90,91,113,80,54))

OrderStatus Load (0.000291) SELECT * FROM `order_statuses` WHERE (order_statuses.id IN (10))

18

Page 19: How to avoid hanging yourself with Rails

But I want more

• Okay, this still isn’t blazing fast. I’m building the next killr web2.0 app

• Forgetabout associations, just load it via SQL, depending on application, makes a huge difference

• Concentrate on commonly used pages

19

Page 20: How to avoid hanging yourself with Rails

Catch 22

• Hard coding SQL is the fastest solution

• No construction of SQL, no generation of ActiveRecord associated classes

• If your DB changes, you have to update SQL

‣ Keep SQL with models where possible

20

Page 21: How to avoid hanging yourself with Rails

It ain’t pretty.. but it’s fast

• Find by SQL class order def self.find_current_orders find_by_sql("SELECT orders.id, orders.created_at, customers.name as customer_name, countries.name as country_name, order_statuses.name as status_name FROM orders LEFT OUTER JOIN `customers` ON `customers`.id = `orders`.customer_id LEFT OUTER JOIN `countries` ON `countries`.id = `customers`.country_id LEFT OUTER JOIN `order_statuses` ON `order_statuses`.id = `orders`.order_status_id WHERE order_status_id < 100 ORDER BY order_statuses.sort_order ASC,order_statuses.id ASC, orders.id DESC") endend

• 28.90 req/s ( 5.49x faster )

21

Page 22: How to avoid hanging yourself with Rails

And the results

find(:all) 5.26 req/s

find(:all, :include) 7.70 req/s 1.4x

find(:all, :select, :include) 15.15 req/s 2.88x

find_by_sql() 28.90 req/s 5.49x

22

Page 23: How to avoid hanging yourself with Rails

Don’t forget indexes

• 64000 ordersOrderStatus.find(:all).each { |os| puts os.orders.count }

• Avg 0.61 req/s no indexes

• EXPLAIN your SQLALTER TABLE `xchain_test`.`orders` ADD INDEX order_status_idx(`order_status_id`);

• Avg 23 req/s after index (37x improvment)

23

Page 24: How to avoid hanging yourself with Rails

Avoid .count

• It’s damned slowOrderStatus.find(:all).each { |os| puts os.orders.count }

• Add column orders_count + update codeOrderStatus.find(:all).each { |os| puts os.orders_count }

✓34 req/s vs 108 req/s (3x faster)

24

Page 25: How to avoid hanging yourself with Rails

For the speed freaks

• Merb - http://merbivore.com

• 38.56 req/s - 7x performance improvement

• Nearly identical code

• Blazingly fast

25

Page 26: How to avoid hanging yourself with Rails

The End

work.rowanhick.com

26