PostgreSQL Materialized Views with Active Record
-
Upload
david-roberts -
Category
Software
-
view
1.838 -
download
4
description
Transcript of PostgreSQL Materialized Views with Active Record
PostgreSQL Materialized Views
and Active Record
David Roberts
The ProblemHow do you quickly filter data
represented by multiple ActiveRecord associations and calculations?
Data ModelCity%
&%Philly%&%Boston%
Technology%&%Ruby%&%Python%
Club%&%Philly.rb%
Talk%&%PostgreSQL%Materialized%Views%
Feedback%&%“Super%rad%talk!!”%&%“The%whole%world%is%now%dumber”%
Author%&%David%Roberts%
View all Commentsclass Feedback < ActiveRecord::Base belongs_to :talk INVALID_COMMENTS = ['', 'NA', 'N/A', 'not applicable']
scope :filled_out, -> { where.not(comment: INVALID_COMMENTS) }end
Feedback.filled_out
Feedback Load (2409.6ms) SELECT "feedbacks".* FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))
Highest Scoring Talk with Valid Comments
Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)
# 665msSELECT talk_id, avg(score) as overall_score FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) GROUP BY talk_id ORDER BY overall_score desc LIMIT 10;
Highest Scoring Talk with Valid Comments
results = Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)
results.first.inspect
=> "#<Feedback id: nil, talk_id: 24510>"
Highest Scoring Talks in PA by Authors named Parker
Feedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)
# 665msSELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks" INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id" INNER JOIN "authors" ON "authors"."id" = "talks"."author_id" INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id" INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) AND (cities.state_abbr = 'PA') AND (authors.name LIKE '%Parker%') GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;
What’s Wrong with these Examples?
• Long ugly queries
• Slow queries are bad for Web Applications
• Fighting ActiveRecord framework
• SQL aggregates are difficult to access
• Returned object no longer corresponds to model
• Repetitive code to setup joins and Filter invalid comments
Using Database Views
to clean up ActiveRecord models
Views• Uses a stored / pre-defined Query
• Uses live data from corresponding tables when queried
• Can reference data from many tables
• Great for hiding complex SQL statements
• Allows you to push functionality to database
But this is a Ruby talk!
class CreateTalkView < ActiveRecord::Migration def up connection.execute <<-SQL CREATE VIEW v_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id SQL end
def down connection.execute 'DROP VIEW IF EXISTS v_talks_report' endend
Encapsulate in ActiveRecord Modelclass TalkReport < ActiveRecord::Base # Use associations just like any other ActiveRecord object belongs_to :author belongs_to :talk belongs_to :club belongs_to :city belongs_to :technology # take advantage of talks has_many relationship delegate :feedbacks, to: :talk
self.table_name = 'v_talks_report'
# views cannot be changed since they are virtual def readonly true endend
Highest Scoring Talks with Valid Comments
results = TalkReport.order(overall_score: :desc) \ .limit(10)
results.first.inspect => "#<TalkReport city_id: 16, city_name: \"Burlington\", state_abbr: \"VT\", technology_id: 2, club_id: 73, club_name: \"Python - Burlington\", talk_id: 6508, talk_name: \"The SQL protocol is down, reboot the optical syste...\", author_id: 100, author_name: \"Jerrell Gleichner\", overall_score: #<BigDecimal:7ff3f80f1678,'0.4E1',9(27)>>"
Highest Scoring Talks in PA by Authors named Parker
TalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%Parker%'") \ .order(overall_score: :desc).limit(10)
Using Materialized Views
to Improve Performance
Materialized Views• Acts similar to a Database View, but results persist for
future queries
• Creates a table on disk with the Result set
• Can be indexed
• Ideal for capturing frequently used joins and aggregations
• Allows optimization of tables for updating and Materialized Views for reporting
• Must be refreshed to be updated with most recent data
class CreateTalkReportMv < ActiveRecord::Migration def up connection.execute <<-SQL CREATE MATERIALIZED VIEW mv_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id; CREATE INDEX ON mv_talks_report (overall_score); SQL end
def down connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report' endend
ActiveRecord Model Changeclass TalkReport < ActiveRecord::Base # No changes to associations belongs_to …
self.table_name = 'mv_talks_report'
def self.repopulate connection.execute("REFRESH MATERIALIZED VIEW #{table_name}") end # Materialized Views cannot be modified def readonly true endend
Highest Scoring Talks99% reduction in runtime
577msFeedback.filled_out \ .select('talk_id, avg(score) as score') \ .group('talk_id').order('score desc').limit(10)
1msTalkReport.order(overall_score: :desc).limit(10)
Highest Scoring Talks in PA by Authors named “Parker”
400msFeedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)
19msTalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%?%'", 'Parker') \ .order(overall_score: :desc).limit(10)
95% reduction in runtime
Why Use Materialized Views in your Rails application?
• ActiveRecord models allow for easy representation in Rails
• Capture commonly used joins / filters
• Allows for fast, live filtering and sorting of complex associations or calculated fields
• Push data intensive processing out of Ruby to Database
• Make use of advanced Database functions
• Can Optimize Indexes for reporting only
• When Performance is more important than Storage
Downsides
• Requires PostgreSQL 9.3
• Entire Materialized View must be refreshed to update
• Bad when Live Data is required
• For this use case, roll your own Materialized View using standard tables
Downsides• Migrations are painful!
• Recommend writing in SQL, so no using scopes
• Entire Materialized View must be dropped and redefined for any changes to the View or referring tables
• Hard to read and track what changed
Resources
• Source Code used in talk
• https://github.com/droberts84/materialized-view-demo
• PostgreSQL Documentation
• https://wiki.postgresql.org/wiki/Materialized_Views