Google Devfest 2009 Argentina - Intro to Appengine

76

description

Google Devfest 2009 Argentina - Intro to Appengine

Transcript of Google Devfest 2009 Argentina - Intro to Appengine

Page 1: Google Devfest 2009 Argentina - Intro to Appengine
Page 2: Google Devfest 2009 Argentina - Intro to Appengine

From Spark Plug to Drive Train: Life of an App Engine RequestPatrick Chanezon @chanezon

Ignacio Blanco @blanconet 11/16/09Post your questions for this talk on Twitter #devfest09 #appengine

Page 3: Google Devfest 2009 Argentina - Intro to Appengine

Designing for Scale and Reliability

Page 4: Google Devfest 2009 Argentina - Intro to Appengine

Designing for Scale and Reliability

App Engine: Design Motivations

Life of a RequestRequest for static contentRequest for dynamic contentRequests that use APIs

App Engine: Design Motivations, Recap

App Engine: The Numbers

Agenda

Page 5: Google Devfest 2009 Argentina - Intro to Appengine

Google App Engine

Page 6: Google Devfest 2009 Argentina - Intro to Appengine

LiveJournal circa 2007From Brad Fitzpatrick's USENIX '07 talk:"LiveJournal: Behind the Scenes"

Page 7: Google Devfest 2009 Argentina - Intro to Appengine

From Brad Fitzpatrick's USENIX '07 talk:"LiveJournal: Behind the Scenes"

LiveJournal circa 2007

Frontends StorageApplication Servers

Memcache

Static File Servers

Page 8: Google Devfest 2009 Argentina - Intro to Appengine

Basic LAMP

Linux, Apache, MySQL, Programming LanguageScalable?Shared machine for database and webserverReliable?Single point of failure (SPOF)

Page 9: Google Devfest 2009 Argentina - Intro to Appengine

Database running on a separate serverRequirementsAnother machine plus additional managementScalable? Up to one web serverReliable? Two single points of failure

Dedicated Database

Page 10: Google Devfest 2009 Argentina - Intro to Appengine

Benefits:Grow traffic beyond the capacity of one webserverRequirements:More machinesSet up load balancing

Multiple Web Servers

Page 11: Google Devfest 2009 Argentina - Intro to Appengine

Load Balancing: DNS Round RobinMultiple Web Servers

Register list of IPs with DNSStatistical load balancingDNS record is cached with Time To Live (TTL)

TTL may not be respected

Page 12: Google Devfest 2009 Argentina - Intro to Appengine

Register list of IPs with DNSStatistical load balancingDNS record is cached with Time To Live (TTL)

TTL may not be respected

Load Balancing: DNS Round RobinMultiple Web Servers

Now wait for DNS changes to propagate :-(

Page 13: Google Devfest 2009 Argentina - Intro to Appengine

Scalable?Add more webservers as necessaryStill I/O bound on one database

Reliable?Cannot redirect traffic quicklyDatabase still SPOF

Load Balancing: DNS Round RobinMultiple Web Servers

Page 14: Google Devfest 2009 Argentina - Intro to Appengine

Benefits:Custom Routing

SpecializationApplication-level load balancing

Requirements:More machinesConfiguration and code for reverse proxies

Reverse Proxy

Page 15: Google Devfest 2009 Argentina - Intro to Appengine

Scalable?Add more web serversSpecializationBound by

Routing capacity of reverse proxyOne database server

Reliable?Agile application-level routingSpecialized components are more robustMultiple reverse proxies requires network-level routing

DNS Round Robin (again)Fancy network routing hardware

Database is still SPOF

Reverse Proxy

Page 16: Google Devfest 2009 Argentina - Intro to Appengine

Master-Slave Database

Benefits:Better read throughputInvisible to application

Requirements:Even more machinesChanges to MySQL

Page 17: Google Devfest 2009 Argentina - Intro to Appengine

Master-Slave Database

Scalable?Scales read rate with # of servers

But not writes

What happens eventually?

Page 18: Google Devfest 2009 Argentina - Intro to Appengine

Master-Slave Database

Reliable?Master is SPOF for writesMaster may die before replication

Page 19: Google Devfest 2009 Argentina - Intro to Appengine

Partitioned Database

Benefits:Increase in both readand write throughput

Requirements:Even more machines Lots of managementRe-architect data modelRewrite queries

Page 20: Google Devfest 2009 Argentina - Intro to Appengine

The App Engine Stack

Page 21: Google Devfest 2009 Argentina - Intro to Appengine

App Engine:Design Motivations

Page 22: Google Devfest 2009 Argentina - Intro to Appengine

Design Motivations

Build on Existing Google Technology

Provide an Integrated Environment

Encourage Small Per-Request Footprints

Encourage Fast Requests

Maintain Isolation Between Applications

Encourage Statelessness and Specialization

Require Partitioned Data Model

Page 23: Google Devfest 2009 Argentina - Intro to Appengine

Life of a Request:Request for Static Content

Page 24: Google Devfest 2009 Argentina - Intro to Appengine

Request for Static Content on Google Network

Routed to the nearest Google datacenterTravels over Google's network

Same infrastructure other Google products useLots of advantages for free

Page 25: Google Devfest 2009 Argentina - Intro to Appengine

Request for Static Content

Google App Engine Front EndsLoad balancingRouting

Frontends route static requests to specialized serving infrastructure

Routing at the Front End

Page 26: Google Devfest 2009 Argentina - Intro to Appengine

Request for Static Content

Google Static Content ServingBuilt on shared Google Infrastructure

Static files are physically separate from code filesHow are static files defined?

Static Content Servers

Page 27: Google Devfest 2009 Argentina - Intro to Appengine

What content is static?Request for Static Content

...<static> <include path="/**.png" /> <exclude path="/data/**.png /></static>...

Java Runtime: appengine-web.xml

...- url: /imagesstatic_dir: static/imagesOR- url: /images/(.*)static_files: static/images/\1upload: static/images/(.*)...

Python Runtime: app.yaml

Page 28: Google Devfest 2009 Argentina - Intro to Appengine

Request For Static Content

Back to the Front End and out to the user

Specialized infrastructure App runtimes don't serve static content

Response to the user

Page 29: Google Devfest 2009 Argentina - Intro to Appengine

Life of a Request:Request for Dynamic Content

Page 30: Google Devfest 2009 Argentina - Intro to Appengine

Request for Dynamic Content: New Components

App ServersServe dynamic requestsWhere your code runs

App MasterSchedules applicationsInforms Front Ends

App Servers and App Master

Page 31: Google Devfest 2009 Argentina - Intro to Appengine

Request for Dynamic Content: Appservers

Many applicationsMany concurrent requests

Smaller app footprint + fast requests = more apps Enforce Isolation

Keeps apps safe from each otherEnforce statelessness

Allows for scheduling flexibilityService API requests

What do they do?

Page 32: Google Devfest 2009 Argentina - Intro to Appengine

Request For Dynamic Content

Front Ends route dynamic requests to App Servers

Routing at the Frontend

Page 33: Google Devfest 2009 Argentina - Intro to Appengine

1. Checks for cached runtimeIf it exists, no initialization

2. Execute request3. Cache the runtime

System is designed to maximize caching

Slow first request, faster subsequent requestsOptimistically cache data in your runtime!

Request for Dynamic ContentApp Server

Page 34: Google Devfest 2009 Argentina - Intro to Appengine

Life of a Request:Requests accessing APIs

Page 35: Google Devfest 2009 Argentina - Intro to Appengine

App Server

1. App issues API call 2. App Server accepts3. App Server blocks runtime4. App Server issues call5. Returns the response

Use APIs to do things you don't want to do in your runtime, such as...

API Requests

Page 36: Google Devfest 2009 Argentina - Intro to Appengine

APIs

Page 37: Google Devfest 2009 Argentina - Intro to Appengine

Distributed in-memory cachememcacheg

Also written by Brad Fitzpatrickadds: set_multi, get_multi, add_multi

Optimistically cache for optimization Very stable, robust and specialized

MemcachegA more persistent in-memory cache

Page 38: Google Devfest 2009 Argentina - Intro to Appengine

The App Engine DatastorePersistent storage

Page 39: Google Devfest 2009 Argentina - Intro to Appengine

Bigtable in one slide

Is not...

databasesharded databasedistributed hashtable

Is...

sharded, sorted array

Scalable structured storage

Page 40: Google Devfest 2009 Argentina - Intro to Appengine

Datastore Under the Covers

Scalability does not come for freeDatastore built on top of BigtableQueriesIndexes

Page 41: Google Devfest 2009 Argentina - Intro to Appengine

Bigtable

Sort

Machine #2

Machine #3

Machine #1

Page 42: Google Devfest 2009 Argentina - Intro to Appengine

Bigtable

Operations

Read Write Delete

Single row transaction (atomic)

Operate

Page 43: Google Devfest 2009 Argentina - Intro to Appengine

Bigtable

Operations

Scan: prefix b Scan: range b to d

Page 44: Google Devfest 2009 Argentina - Intro to Appengine

Entities table

Primary tableEvery entity in every appGeneric, schemalessRow name -> entity keyColumn (only 1) serialized entity

Page 45: Google Devfest 2009 Argentina - Intro to Appengine

Entity keys

class Grandparent(db.Model): passclass Parent(db.Model): passclass Child(db.Model): pass

Felipe = Grandparent()Carlos = Parent(parent=Felipe)Ignacio = Child(parent=Carlos)

/Grandparent:Felipe/Grandparent:Felipe/Parent:Carlos/Grandparent:Felipe/Parent:Carlos/Child:Ignacio

Based on parent entities: root entity, then child, child, ...

my family

Page 46: Google Devfest 2009 Argentina - Intro to Appengine

Queries

Restrict by kindFilter by property valuesSort on propertiesAncestor

SELECT * FROM Person ...

WHERE name = 'John';

ORDER BY name DESC; WHERE city = 'Sonoma' AND state = 'CA'AND country = 'USA';

WHERE ANCESTOR IS :Felipe ORDER BY name;

GQL < SQL

Page 47: Google Devfest 2009 Argentina - Intro to Appengine

Indexes

Dedicated Bigtable tablesMap property values to entitiesEach row = index data + entity key

No filtering in memoryNo sorting in memory

Convert queries to scans

Page 48: Google Devfest 2009 Argentina - Intro to Appengine

Query planning

1. Pick index2. Compute prefix or range3. Scan!

Convert query to dense index scan

Page 49: Google Devfest 2009 Argentina - Intro to Appengine

Kind index

SELECT * FROM Grandparent

1. Kind index2. Prefix = Grandparent3. Scan!

Row name Row name

Page 50: Google Devfest 2009 Argentina - Intro to Appengine

Single-property index

WHERE name = 'John'ORDER BY name DESCWHERE name >= 'B' AND name < 'C' ORDER BY name

Queries on single property (one ASC one DESC)

Row name Key

Page 51: Google Devfest 2009 Argentina - Intro to Appengine

Single-property indexQueries on single property (one ASC one DESC)

Row name Key

Page 52: Google Devfest 2009 Argentina - Intro to Appengine

Single-property index

WHERE name = 'John'1. Prefix = Parent name John (fully specified)2. Scan!

Page 53: Google Devfest 2009 Argentina - Intro to Appengine

Single-property index

ORDER BY name DESC1. Prefix = Parent name2. Scan!

Page 54: Google Devfest 2009 Argentina - Intro to Appengine

Single-property index

WHERE name >= 'B' and name < 'C' ORDER BY name1. RANGE = [Parent name B, Parent name C)2. Scan!

Page 55: Google Devfest 2009 Argentina - Intro to Appengine

The App Engine Datastore

Your data is already partitionedUse Entity Groups

Explicit Indexes make for fast readsBut slower writes

Replicated and fault tolerantOn commit: ≥3 machinesGeographically distributed

Bonus: Keep globally unique IDs for free

Persistent storage

Page 56: Google Devfest 2009 Argentina - Intro to Appengine

GMail

Google Accounts

Picasaweb

Other APIs

Gadget API

Tasks Queue

Google Talk

Page 57: Google Devfest 2009 Argentina - Intro to Appengine

App Engine:Design Motivations, Recap

Page 58: Google Devfest 2009 Argentina - Intro to Appengine

creative commons licensed photograph: http://www.flickr.com/photos/cote/54408562/

Build on Existing Google Technology

Page 59: Google Devfest 2009 Argentina - Intro to Appengine

Provide an Integrated Environment

Why?Manage all apps together

What it means for you:Follow best practicesSome restrictionsUse our tools

Benefits:Use our toolsAdmin ConsoleAll of your logs in one placeNo machines to configure or manageEasy deployment

Page 60: Google Devfest 2009 Argentina - Intro to Appengine

Encourage Small Per-Request Footprints

Why?Better utilization of App ServersFairness

What it means for your app:Less Memory UsageLimited CPU

Benefits:Better use of resources

Page 61: Google Devfest 2009 Argentina - Intro to Appengine

Encourage Fast Requests

Why?Better utilization of appserversFairness between applicationsRouting and scheduling agility

What it means for your app:Runtime cachingRequest deadlines

Benefits:Optimistically share state between requestsBetter throughputFault toleranceBetter use of resources

Page 62: Google Devfest 2009 Argentina - Intro to Appengine

Maintain Isolation Between Apps

Why?SafetyPredictability

What it means for your app:Certain system calls unavailable

Benefits:SecurityPerformance

Page 63: Google Devfest 2009 Argentina - Intro to Appengine

Encourage Statelessness and Specialization

Why?App Server performanceLoad balancingFault tolerance

What this means for you app:Use API calls

Benefits:Automatic load balancingFault toleranceLess code for you to writeBetter use of resources

Page 64: Google Devfest 2009 Argentina - Intro to Appengine

Require Partitioned Data Model

Why?The Datastore

What this means for your app:Data model + IndexesReads are fast, writes are slower

Benefits:Design your schema once

No need to re-architect for scalabilityMore efficient use of cpu and memory

Page 65: Google Devfest 2009 Argentina - Intro to Appengine

Google App Engine:The Numbers

Page 66: Google Devfest 2009 Argentina - Intro to Appengine

Google App Engine

Currently, over 100K applications

Serving over 250M pageviews per day

Written by over 200K developers

Page 67: Google Devfest 2009 Argentina - Intro to Appengine

App Engine

Page 68: Google Devfest 2009 Argentina - Intro to Appengine

Open For QuestionsThe White House's "Open For Questions" application accepted 100K questions and 3.6M votes in under 48 hours

Page 69: Google Devfest 2009 Argentina - Intro to Appengine

Apr 2008 Python launchMay 2008 Memcache, Images APIJul 2008 Logs exportAug 2008 Batch write/deleteOct 2008 HTTPS supportDec 2008 Status dashboard, quota detailsFeb 2009 Billing, larger filesApr 2009 Java launch, DB import, cron support, SDCMay 2009 Key-only queriesJun 2009 Task queuesAug 2009 Kindless queriesSep 2009 XMPPOct 2009 Incoming email

18 months in review

Page 70: Google Devfest 2009 Argentina - Intro to Appengine

Roadmap

Storing/serving large filesMapping operations across data setsDatastore cursorsNotification alerts for application exceptionsDatastore dump and restore facility

Page 71: Google Devfest 2009 Argentina - Intro to Appengine

Wrap up

Page 72: Google Devfest 2009 Argentina - Intro to Appengine

Always free to get started

~5M pageviews/month6.5 CPU hrs/day1 GB storage650K URL Fetch calls2,000 recipients emailed1 GB/day bandwidthN tasks

Page 73: Google Devfest 2009 Argentina - Intro to Appengine

Purchase additional resources *

* free monthly quota of ~5 million page views still in full effect

Page 74: Google Devfest 2009 Argentina - Intro to Appengine

Thank you

Read morehttp://code.google.com/appengine/

Page 75: Google Devfest 2009 Argentina - Intro to Appengine

Questions?Post your questions for this talk on Twitter #devfest09 #appengine

Page 76: Google Devfest 2009 Argentina - Intro to Appengine

ThanksTo Alon Levi, Fred Sauer for most of these slides