PostgreSQL Materialized Views
and Active Record
David Roberts
The ProblemHow do you quickly filter data
represented by multiple ActiveRecord associations and calculations?
Data ModelCity%
&%Philly%&%Boston%
Technology%&%Ruby%&%Python%
Club%&%Philly.rb%
Talk%&%PostgreSQL%Materialized%Views%
Feedback%&%“Super%rad%talk!!”%&%“The%whole%world%is%now%dumber”%
Author%&%David%Roberts%
View all Commentsclass Feedback < ActiveRecord::Base belongs_to :talk INVALID_COMMENTS = ['', 'NA', 'N/A', 'not applicable']
scope :filled_out, -> { where.not(comment: INVALID_COMMENTS) }end
Feedback.filled_out
Feedback Load (2409.6ms) SELECT "feedbacks".* FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable'))
Highest Scoring Talk with Valid Comments
Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)
# 665msSELECT talk_id, avg(score) as overall_score FROM "feedbacks" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) GROUP BY talk_id ORDER BY overall_score desc LIMIT 10;
Highest Scoring Talk with Valid Comments
results = Feedback.filled_out \ .select('talk_id, avg(score) as overall_score') \ .group('talk_id').order('overall_score desc') \ .limit(10)
results.first.inspect
=> "#<Feedback id: nil, talk_id: 24510>"
Highest Scoring Talks in PA by Authors named Parker
Feedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)
# 665msSELECT feedbacks.talk_id, avg(feedbacks.score) as overall_score FROM "feedbacks" INNER JOIN "talks" ON "talks"."id" = "feedbacks"."talk_id" INNER JOIN "authors" ON "authors"."id" = "talks"."author_id" INNER JOIN "clubs" ON "clubs"."id" = "talks"."club_id" INNER JOIN "cities" ON "cities"."id" = "clubs"."city_id" WHERE ("feedbacks"."comment" NOT IN ('', 'NA', 'N/A', 'not applicable')) AND (cities.state_abbr = 'PA') AND (authors.name LIKE '%Parker%') GROUP BY feedbacks.talk_id ORDER BY overall_score desc LIMIT 10;
What’s Wrong with these Examples?
• Long ugly queries
• Slow queries are bad for Web Applications
• Fighting ActiveRecord framework
• SQL aggregates are difficult to access
• Returned object no longer corresponds to model
• Repetitive code to setup joins and Filter invalid comments
Using Database Views
to clean up ActiveRecord models
Views• Uses a stored / pre-defined Query
• Uses live data from corresponding tables when queried
• Can reference data from many tables
• Great for hiding complex SQL statements
• Allows you to push functionality to database
But this is a Ruby talk!
class CreateTalkView < ActiveRecord::Migration def up connection.execute <<-SQL CREATE VIEW v_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id SQL end
def down connection.execute 'DROP VIEW IF EXISTS v_talks_report' endend
Encapsulate in ActiveRecord Modelclass TalkReport < ActiveRecord::Base # Use associations just like any other ActiveRecord object belongs_to :author belongs_to :talk belongs_to :club belongs_to :city belongs_to :technology # take advantage of talks has_many relationship delegate :feedbacks, to: :talk
self.table_name = 'v_talks_report'
# views cannot be changed since they are virtual def readonly true endend
Highest Scoring Talks with Valid Comments
results = TalkReport.order(overall_score: :desc) \ .limit(10)
results.first.inspect => "#<TalkReport city_id: 16, city_name: \"Burlington\", state_abbr: \"VT\", technology_id: 2, club_id: 73, club_name: \"Python - Burlington\", talk_id: 6508, talk_name: \"The SQL protocol is down, reboot the optical syste...\", author_id: 100, author_name: \"Jerrell Gleichner\", overall_score: #<BigDecimal:7ff3f80f1678,'0.4E1',9(27)>>"
Highest Scoring Talks in PA by Authors named Parker
TalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%Parker%'") \ .order(overall_score: :desc).limit(10)
Using Materialized Views
to Improve Performance
Materialized Views• Acts similar to a Database View, but results persist for
future queries
• Creates a table on disk with the Result set
• Can be indexed
• Ideal for capturing frequently used joins and aggregations
• Allows optimization of tables for updating and Materialized Views for reporting
• Must be refreshed to be updated with most recent data
class CreateTalkReportMv < ActiveRecord::Migration def up connection.execute <<-SQL CREATE MATERIALIZED VIEW mv_talks_report AS SELECT cities.id as city_id, cities.name as city_name, cities.state_abbr as state_abbr, technologies.id as technology_id, clubs.id as club_id, clubs.name as club_name, talks.id as talk_id, talks.name as talk_name, authors.id as author_id, authors.name as author_name, feedback_agg.overall_score as overall_score FROM ( SELECT talk_id, avg(score) as overall_score FROM feedbacks WHERE feedbacks.comment NOT IN ('', 'NA', 'N/A', 'not applicable') GROUP BY talk_id ) as feedback_agg INNER JOIN talks ON feedback_agg.talk_id = talks.id INNER JOIN authors ON talks.author_id = authors.id INNER JOIN clubs ON talks.club_id = clubs.id INNER JOIN cities ON clubs.city_id = cities.id INNER JOIN technologies ON clubs.technology_id = technologies.id; CREATE INDEX ON mv_talks_report (overall_score); SQL end
def down connection.execute 'DROP MATERIALIZED VIEW IF EXISTS mv_talks_report' endend
ActiveRecord Model Changeclass TalkReport < ActiveRecord::Base # No changes to associations belongs_to …
self.table_name = 'mv_talks_report'
def self.repopulate connection.execute("REFRESH MATERIALIZED VIEW #{table_name}") end # Materialized Views cannot be modified def readonly true endend
Highest Scoring Talks99% reduction in runtime
577msFeedback.filled_out \ .select('talk_id, avg(score) as score') \ .group('talk_id').order('score desc').limit(10)
1msTalkReport.order(overall_score: :desc).limit(10)
Highest Scoring Talks in PA by Authors named “Parker”
400msFeedback.filled_out.joins(talk: [:author, { club: :city }] ) \ .select('feedbacks.talk_id, avg(feedbacks.score) as overall_score') \ .where("cities.state_abbr = ?", 'PA') \ .where("authors.name LIKE '%?%'", 'Parker') \ .group('feedbacks.talk_id') \ .order('overall_score desc') \ .limit(10)
19msTalkReport.where(state_abbr: 'PA') \ .where("author_name LIKE '%?%'", 'Parker') \ .order(overall_score: :desc).limit(10)
95% reduction in runtime
Why Use Materialized Views in your Rails application?
• ActiveRecord models allow for easy representation in Rails
• Capture commonly used joins / filters
• Allows for fast, live filtering and sorting of complex associations or calculated fields
• Push data intensive processing out of Ruby to Database
• Make use of advanced Database functions
• Can Optimize Indexes for reporting only
• When Performance is more important than Storage
Downsides
• Requires PostgreSQL 9.3
• Entire Materialized View must be refreshed to update
• Bad when Live Data is required
• For this use case, roll your own Materialized View using standard tables
Downsides• Migrations are painful!
• Recommend writing in SQL, so no using scopes
• Entire Materialized View must be dropped and redefined for any changes to the View or referring tables
• Hard to read and track what changed
Resources
• Source Code used in talk
• https://github.com/droberts84/materialized-view-demo
• PostgreSQL Documentation
• https://wiki.postgresql.org/wiki/Materialized_Views
Top Related