Epic South Disasters

Post on 23-Jan-2015

885 views 0 download

description

Slides from my talk at DjangoCon 2013.

Transcript of Epic South Disasters

Epic South DisastersOr: Why You Need to Actually Pay Attention to Your DBMS

http://todaysmeet.com/epic-south-disasters

Christopher AdamsEngineering Lead, Scrollmotion

@adamsc64

http://todaysmeet.com/epic-south-disastershttps://github.com/adamsc64/epic-south-disasters

What is South?South fills a need in the Django ecosystem.

The Django ORM

A tool to write SQL for you

“You can still write SQL if needed.”You don’t have to write SQL any more?

In other words, it's your responsibility.

Problem: Django does not change a table once it’s created with syncdb.

So "if needed" is really "when needed."Most web apps that change often are going to need schema

migrations.

Django 1.5 documentation

The recommended way to migrate schemas:

“If you have made changes to a model and wish

to alter the database tables to match, use the

sql command to display the new SQL structure

and compare that to your existing table schema

to work out the changes.”

https://docs.djangoproject.com/en/1.5/ref/django-admin/#syncdb

In practice...

BEGIN;CREATE TABLE "foo_foo_bars" ( "id" serial NOT NULL PRIMARY KEY,- "foo_id" integer NOT NULL, "bar_id" integer NOT NULL REFERENCES "baz_bar" ("id") DEFERRABLE INITIALLY DEFERRED,+ "foo_id" integer NOT NULL, UNIQUE ("foo_id", "bar_id"));CREATE TABLE "foo_foo_zazzs" ( "id" serial NOT NULL PRIMARY KEY, "foo_id" integer NOT NULL,+ "user_id" integer REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED,+)-;- "bang_id" integer NOT NULL REFERENCES "baz_bang" ("id") DEFERRABLE INITIALLY DEFERRED,- UNIQUE ("foo_id", "bang_id")-)-;-CREATE TABLE "foo_foo" (- "user_id" integer REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED,+ "properties" text NOT NULL,- "id" serial NOT NULL PRIMARY KEY, "name" varchar(100) NOT NULL, "description" text NOT NULL,+ "user_id" integer REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED, "created_by_id" integer REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED, "updated_at" timestamp with time zone NOT NULL, "bing" text NOT NULL,+ "properties" text NOT NULL,- "lft" integer CHECK ("lft" >= 0) NOT NULL,- "rght" integer CHECK ("rght" >= 0) NOT NULL,- "tree_id" integer CHECK ("tree_id" >= 0) NOT NULL,- "level" integer CHECK ("level" >= 0) NOT NULL);ALTER TABLE "foo_foo_bars" ADD CONSTRAINT "foo_id_refs_id_1401a163" FOREIGN KEY ("foo_id") REFERENCES "foo_foo" ("id") DEFERRABLE INITIALLY DEFERRED;ALTER TABLE "foo_foo_zazzs" ADD CONSTRAINT "foo_id_refs_id_ca2b0e5" FOREIGN KEY ("foo_id") REFERENCES "foo_foo" ("id") DEFERRABLE INITIALLY DEFERRED;+ALTER TABLE "foo_foo" ADD CONSTRAINT "parent_id_refs_id_1eb29019" FOREIGN KEY ("parent_id") REFERENCES "foo_foo" ("id") DEFERRABLE INITIALLY DEFERRED;CREATE TABLE "foo_zilch" ( "foo_ptr_id" integer NOT NULL PRIMARY KEY REFERENCES "foo_foo" ("id") DEFERRABLE INITIALLY DEFERRED,

SQL diffs are no fun

IT’S EASY, OH SO EASY

South Solves Some Problems for us.

Problems South solves:

1. Automates writing schema migrations for us.2. In Python (no SQL).3. Broadly “Don’t Repeat Yourself” (DRY).4. Migrations are applied in order.5. Version control migration code.6. Shared migrations, dev and production.7. Fast iteration.

Levels of abstraction

● great because they take us away from the messy details

● risky because they take us away from the messy details

● can obscure what’s going on

How to Use South (very quick 101)

Our Model

+++ b/minifier/models.py

class MinifiedURL(models.Model): url = models.CharField(

max_length=100) datetime = models.DateTimeField(

auto_now_add=True)

Initial Migration

$ ./manage.py schemamigration minifier --initialCreating migrations directory at '/.../minifier/migrations'...Creating __init__.py in '/.../minifier/migrations'... + Added model minifier.MinifiedURLCreated 0001_initial.py. You can now apply this migration with: ./manage.py migrate minifier

Initial Migration

$ ls -l minifier/migrations/total 8-rw-r--r-- 1 chris staff 1188 Aug 30 11:40 0001_initial.py-rw-r--r-- 1 chris staff 0 Aug 30 11:40 __init__.py

Initial Migration

$ ./manage.py syncdbSyncing...Creating tables ...Creating table south_migrationhistory

Synced: > django.contrib.auth > south

Not synced (use migrations): - minifier(use ./manage.py migrate to migrate these)

Initial Migration

$ ./manage.py migrateRunning migrations for minifier: - Migrating forwards to 0001_initial. > minifier:0001_initial - Loading initial data for minifier.Installed 0 object(s) from 0 fixture(s)

Ok, let’s add a field.

Adding a Field

class MinifiedURL(models.Model):+ submitter = models.ForeignKey(+ 'auth.user', null=True) url = models.CharField(

max_length=100) datetime = models.DateTimeField(

auto_now_add=True)

$ ./manage.py schemamigration minifier --auto

+ Added field submitter on minifier.MinifiedURLCreated 0002_auto__add_field_minifiedurl_submitter.py.You can now apply this migration with: ./manage.py migrate minifier

Adding a Field

class Migration(SchemaMigration): def forwards(self, orm): # Adding field 'MinifiedURL.submitter' db.add_column(u'minifier_minifiedurl', 'submitter', ...

)

def backwards(self, orm): # Deleting field 'MinifiedURL.submitter' db.delete_column(u'minifier_minifiedurl',

'submitter_id' )

Adding a Field

Adding a Field

$ ./manage.py migrate minifier 0002

- Soft matched migration 0002 to 0002_auto__add_field_minifiedurl_submitter.Running migrations for minifier: - Migrating forwards to 0002_auto__add_field_minifiedurl_submitter. > minifier:0002_auto__add_field_minifiedurl_submitter - Loading initial data for minifier.Installed 0 object(s) from 0 fixture(s)

$ ./manage.py dbshell

psql-# \d+ minifier_minifiedurl;

Table "public.minifier_minifiedurl"

Column | Type --------------+------------------------- id | integer url | character varying(100) submitter_id | integer datetime | timestamp with time zone

It worked!

More details: follow the South Tutorialhttp://south.readthedocs.org/en/latest/tutorial/

● Many people approach a new tool with a broad set of expectations as to what they think it will do for them.

● This may have little correlation with what the project actually has implemented.

Expectations

Disaster situationsDon't panic.

Our Model

class MinifiedURL(models.Model): submitter = models.ForeignKey( 'auth.user', null=True) url = models.CharField( max_length=100) datetime = models.DateTimeField( auto_now_add=True)

Our Model

class MinifiedURL(models.Model): submitter = models.ForeignKey( 'auth.user', null=True) url = models.CharField( max_length=100)+ created = models.DateTimeField( auto_now_add=True)

$ vim minifier/models.py

$ ./manage.py schemamigration minifier --auto

$ git commit -am "Rename field."

$ git push

$ ./deploy-to-production.shDone!

Fast iteration!

Fast iteration!

+++ b/minifier/migrations/0003_auto_del_field.py

# Deleting field 'MinifiedURL.datetime'db.delete_column(u'minifier_minifiedurl',

'datetime')

# Adding field 'MinifiedURL.created'db.add_column(u'minifier_minifiedurl',

'created', ... )

Lesson #1Always read migrations that are generated with --auto.

So how do we do this?

class MinifiedURL(models.Model): submitter = models.ForeignKey( 'auth.user', null=True) url = models.CharField( max_length=100)+ created = models.DateTimeField( auto_now_add=True)

Data migration - basic example

1. schemamigration - Create the new field.

2. datamigration - Copy the data to the new field from the old field.

3. schemamigration - Delete the old field.

Data migration - basic example

class MinifiedURL(models.Model): submitter = models.ForeignKey( 'auth.user', null=True) url = models.CharField( max_length=100) datetime = models.DateTimeField(

auto_now_add=True)+ created = models.DateTimeField(

auto_now_add=True)

Data migration - basic example

$ ./manage.py schemamigration minifier --autoCreated 0003_auto__add_field_minifiedurl_crea ted.py.

$ ./manage.py datamigration minifier \ datetime_to_created

Created 0004_datetime_to_created.py.

$ vim minifier/migrations/0004_datetime_to_ created.py

# Note: Don't use "from # appname.models import ModelName".

# Use orm.ModelName to refer to # models in this application...

Data migration - basic example

Data migration - basic example

class Migration(DataMigration): def forwards(self, orm):

+ for minified_url in orm.MinifiedURL.objects.all():+ minified_url.created = minified_url.datetime+ minified_url.save() def backwards(self, orm):

+ for minified_url in orm.MinifiedURL.objects.all():+ minified_url.datetime = minified_url.created+ minified_url.save()

Data migration - basic example

class MinifiedURL(models.Model): submitter = models.ForeignKey( 'auth.user', null=True) url = models.CharField(

max_length=100)- datetime = models.DateTimeField(

auto_now_add=True) created = models.DateTimeField(

auto_now_add=True)

$ ./manage.py migrate --list

minifier (*) 0001_initial (*) 0002_auto__add_field_minifiedurl_submitt ( ) 0003_auto__add_field_minifiedurl_created ( ) 0004_datetime_to_created ( ) 0005_auto__del_field_minifiedurl_datetim

Data migration - basic example

$ ./manage.py migrate

- Migrating forwards to 0005_auto__del_ field_minifiedurl_datetime. > minifier:0003_auto__add_field_minifiedurl_ created > minifier:0004_datetime_to_created > minifier:0005_auto__del_field_ minifiedurl_datetime

Data migration - basic example

$ ./manage.py migrate

- Migrating forwards to 0005_auto__del_ field_minifiedurl_datetime. > minifier:0003_auto__add_field_minifiedurl_ created > minifier:0004_datetime_to_created > minifier:0005_auto__del_field_ minifiedurl_datetime

Data migration - basic example

South’s frozen ORM is pretty nifty.It will expose the model at an historical point-in-time.

Danger.Many parts of the Django ORM still function.

Our Model

class MinifiedURL(models.Model): submitter = models.ForeignKey(

'auth.user') created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True) url = models.CharField(

max_length=100)

Our Model

class MinifiedURL(models.Model): submitter = models.ForeignKey(

'auth.user') created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True) url = models.CharField(

max_length=100)+ domain = models.CharField(+ max_length=30)

class Migration(DataMigration):

def forwards(self, orm): + model = orm.MinifiedURL++ for minified_url in model.objects.all():+ minified_url.domain = (+ minified_url.url.split('/')[2]+ )+ minified_url.save()

Data migration

$ git commit -am "Migrate 'domain' from 'url' field."

$ git push

$ ./deploy-to-production.shDone!

Fast iteration!

Before and After

pk updated (before) ---|-----------------|566|2013-03-01 09:01 |567|2012-01-22 17:34 |568|2012-12-31 19:11 |569|2013-04-10 10:02 | ...

Before and After

pk updated (before) updated (after)---|-----------------|-----------------|566|2013-03-01 09:01 |2013-09-04 14:01 |567|2012-01-22 17:34 |2013-09-04 14:01 |568|2012-12-31 19:11 |2013-09-04 14:01 |569|2013-04-10 10:02 |2013-09-04 14:01 | ... ...

Oh no! Why did we lose datetime information?

Our Model

class MinifiedURL(models.Model): submitter = models.ForeignKey(

'auth.user') created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True) url = models.CharField(

max_length=100)

The South ORM wraps over the Django ORM, which applies rules such as

auto_now=True and auto_now_add=True.

Especially nasty because no exception raised or warning given even with this

kind of data loss.

Lesson #2Always confirm your migrations do what you expect --

and nothing more.

Workaround

+ opts = model._meta+ field = opts.get_field_by_name('updated')[0]

+ old_auto_now = field.auto_now+ field.auto_now = False

for minified_url in model.objects.all(): minified_url.domain = ( minified_url.url.split('/')[2] ) minified_url.save()

+ field.auto_now = old_auto_nowPeter Bengtsson, http://www.peterbe.com/plog/migration-south-auto_now_add

Before and After

pk updated (before) updated (after)---|-----------------|-----------------|566|2013-03-01 09:01 |2013-03-01 09:01 |567|2012-01-22 17:34 |2012-01-22 17:34 |568|2012-12-31 19:11 |2012-12-31 19:11 |569|2013-04-10 10:02 |2013-04-10 10:02 | ... ...

Oh no! Why did all our users suddenly get emailed?

@receiver(post_save)def email_user_on_save(sender, **kwargs): """ Not sure why I'm doing this here, but it seems like a good place! REFACTOR LATER TBD FYI!! """ if sender.__name__ == "MinifiedURL": email(kwargs['instance'].submitter, "Congratulations on changing " "your url!", )

Whoops, forgot about this.

The South ORM wraps over the Django ORM, so it sends post_save signals.

However, the metaclass magic usually takes care of avoiding problems.

@receiver(post_save)def email_user_on_save(sender, **kwargs): """ Not sure why I'm doing this here, but it seems like a good place! REFACTOR LATER TBD FYI!! """ if sender.__name__ == "MinifiedURL": email(kwargs['instance'].submitter, "Congratulations on changing " "your url!", )

So this...

@receiver(post_save)def email_user_on_save(sender, **kwargs): """ Not sure why I'm doing this here, but it seems like a good place! REFACTOR LATER TBD FYI!! """- if sender.__name__ == "MinifiedURL":+ if sender == MinifiedURL: email(kwargs['instance'].submitter, "Congratulations on changing " "your url!", )

...should be this.

ipdb> print repr(sender)<class 'minifier.models.MinifiedURL'>

ipdb> print repr(MinifiedURL)<class 'minifier.models.MinifiedURL'>

ipdb> print MinifiedURL == senderFalse

ipdb> print id(MinifiedURL)140455961329984

ipdb> print id(sender)140455961864000

Metaclass magic

Lesson #3Always check data migrations for unintended consequences.

class MinifiedURL(models.Model): created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True) url = models.CharField(

max_length=100, db_index=True)

Our Model

class MinifiedURL(models.Model): created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True)- url = models.CharField(- max_length=100, db_index=True)+ url = models.CharField(+ max_length=1000, db_index=True)

Our Model

$ ./manage.py schemamigration minifier \ --auto

~ Changed field url on minifier.MinifiedURLCreated 0010_auto__chg_field_minifiedurl_ url.py. You can now apply this migration with: ./manage.py migrate minifier

Create the schema migration.

Seems fine...

$ ./manage.py migrateRunning migrations for minifier: - Migrating forwards to 0010_auto__chg_field_minifiedurl_url. > minifier:0010_auto__chg_field_minifiedurl_url - Loading initial data for minifier.Installed 0 object(s) from 0 fixture(s)

“Works fine on development?”“Ship it!”

Production vs. DevelopmentBeware of differences in configuration.

From a Django blog

7. Local vs. Production Environments

Django comes with sqlite, a simple flat-file database that doesn't need any configuration. This makes prototyping fast and easy right out of the box.

However, once you've moved your project into a production environment, odds are you'll have to use a more robust database like Postgresql or MySQL. This means that you're going to have two separate environments: production and development.

http://net.tutsplus.com/tutorials/other/10-django-troublespots-for-beginners/

$ git commit -am "Add some breathing space to url fields."

$ git push

$ ./deploy-to-production.shDone!

Fast iteration!

Migration Failed

Running migrations for minifier: - Migrating forwards to 0010_auto__chg_field_minifiedurl. > minifier:0010_auto__chg_field_minifiedurl ! Error found during real run of migration! Aborting.Error in migration: minifier:0010_auto__chg_field_minifiedurlWarning: Specified key was too long; max key length is 250 bytes

class MinifiedURL(models.Model): created = models.DateTimeField(

auto_now_add=True) updated = models.DateTimeField(

auto_now=True) url = models.CharField( max_length=1000, db_index=True)

Our Model

Always pay attention to the limitations of your DBMS.

Lesson #4

Schema-altering commands (DDL commands) cause a phantom auto-commit.

Major limitation of MySQL

With InnoDB, when a client executes a DDL change, the server executes an

implicit commit even if the normal auto-commit behavior is turned off.

DDL transaction on Postgres

psql=# DROP TABLE IF EXISTS foo;NOTICE: table "foo" does not existpsql=# BEGIN;psql=# CREATE TABLE foo (bar int);psql=# INSERT INTO foo VALUES (1);psql=# ROLLBACK; # rolls back two cmdspsql=# SELECT * FROM foo;ERROR: relation "foo" does not exist

No DDL transaction on MySQL

mysql> drop table if exists foo;mysql> begin;mysql> create table foo (bar int) type=InnoDB;mysql> insert into foo values (1);mysql> rollback; # Table 'foo' exists!mysql> select * from foo;0 rows in set (0.00 sec)

South uses DDL transactions if they are available.

Pay attention to your DBMS

FATAL ERROR - The following SQL query failed:ALTER TABLE `minifier_minifiedurl` ADD CONSTRAINT `minifier_minifiedurl_url_263b28b6c6b349a8_uniq` UNIQUE (`name`)The error was: (1062, "Duplicate entry 'http://cnn.com' for key 'minifier_minifiedurl_url_263b28b6c6b349a8_uniq'") ! Error found during real run of migration! Aborting.

! Since you have a database that does not support running ! schema-altering statements in transactions, we have had ! to leave it in an interim state between migrations.

! You *might* be able to recover with:= ALTER TABLE `minifier_minifiedurl` DROP COLUMN `url` CASCADE; []- no dry run output for alter_column() due to dynamic DDL, sorry

! The South developers regret this has happened, and would ! like to gently persuade you to consider a slightly ! easier-to-deal-with DBMS (one that supports DDL transactions) ! NOTE: The error which caused the migration to fail is further up.

1. Always read migrations that are generated with --auto.

2. Always confirm your migrations do what you expect.

3. Always check data migrations for unintended consequences.

4. Always pay attention to the limitations of your DBMS.

Lessons Recap

Encouragement

● Tools are not the problem. Tools are why we are in this business.

● Knowledge is power. Know what South is doing.

● Know what Django is doing for that matter.

● David Cho● Hadi Arbabi● Mike Harris

Special Thanks

http://www.scrollmotion.com/careers