Google Devfest 2009 Argentina - Intro to Appengine

Post on 19-May-2015

2.611 views 0 download

Tags:

description

Google Devfest 2009 Argentina - Intro to Appengine

Transcript of Google Devfest 2009 Argentina - Intro to Appengine

From Spark Plug to Drive Train: Life of an App Engine RequestPatrick Chanezon @chanezon

Ignacio Blanco @blanconet 11/16/09Post your questions for this talk on Twitter #devfest09 #appengine

Designing for Scale and Reliability

Designing for Scale and Reliability

App Engine: Design Motivations

Life of a RequestRequest for static contentRequest for dynamic contentRequests that use APIs

App Engine: Design Motivations, Recap

App Engine: The Numbers

Agenda

Google App Engine

LiveJournal circa 2007From Brad Fitzpatrick's USENIX '07 talk:"LiveJournal: Behind the Scenes"

From Brad Fitzpatrick's USENIX '07 talk:"LiveJournal: Behind the Scenes"

LiveJournal circa 2007

Frontends StorageApplication Servers

Memcache

Static File Servers

Basic LAMP

Linux, Apache, MySQL, Programming LanguageScalable?Shared machine for database and webserverReliable?Single point of failure (SPOF)

Database running on a separate serverRequirementsAnother machine plus additional managementScalable? Up to one web serverReliable? Two single points of failure

Dedicated Database

Benefits:Grow traffic beyond the capacity of one webserverRequirements:More machinesSet up load balancing

Multiple Web Servers

Load Balancing: DNS Round RobinMultiple Web Servers

Register list of IPs with DNSStatistical load balancingDNS record is cached with Time To Live (TTL)

TTL may not be respected

Register list of IPs with DNSStatistical load balancingDNS record is cached with Time To Live (TTL)

TTL may not be respected

Load Balancing: DNS Round RobinMultiple Web Servers

Now wait for DNS changes to propagate :-(

Scalable?Add more webservers as necessaryStill I/O bound on one database

Reliable?Cannot redirect traffic quicklyDatabase still SPOF

Load Balancing: DNS Round RobinMultiple Web Servers

Benefits:Custom Routing

SpecializationApplication-level load balancing

Requirements:More machinesConfiguration and code for reverse proxies

Reverse Proxy

Scalable?Add more web serversSpecializationBound by

Routing capacity of reverse proxyOne database server

Reliable?Agile application-level routingSpecialized components are more robustMultiple reverse proxies requires network-level routing

DNS Round Robin (again)Fancy network routing hardware

Database is still SPOF

Reverse Proxy

Master-Slave Database

Benefits:Better read throughputInvisible to application

Requirements:Even more machinesChanges to MySQL

Master-Slave Database

Scalable?Scales read rate with # of servers

But not writes

What happens eventually?

Master-Slave Database

Reliable?Master is SPOF for writesMaster may die before replication

Partitioned Database

Benefits:Increase in both readand write throughput

Requirements:Even more machines Lots of managementRe-architect data modelRewrite queries

The App Engine Stack

App Engine:Design Motivations

Design Motivations

Build on Existing Google Technology

Provide an Integrated Environment

Encourage Small Per-Request Footprints

Encourage Fast Requests

Maintain Isolation Between Applications

Encourage Statelessness and Specialization

Require Partitioned Data Model

Life of a Request:Request for Static Content

Request for Static Content on Google Network

Routed to the nearest Google datacenterTravels over Google's network

Same infrastructure other Google products useLots of advantages for free

Request for Static Content

Google App Engine Front EndsLoad balancingRouting

Frontends route static requests to specialized serving infrastructure

Routing at the Front End

Request for Static Content

Google Static Content ServingBuilt on shared Google Infrastructure

Static files are physically separate from code filesHow are static files defined?

Static Content Servers

What content is static?Request for Static Content

...<static> <include path="/**.png" /> <exclude path="/data/**.png /></static>...

Java Runtime: appengine-web.xml

...- url: /imagesstatic_dir: static/imagesOR- url: /images/(.*)static_files: static/images/\1upload: static/images/(.*)...

Python Runtime: app.yaml

Request For Static Content

Back to the Front End and out to the user

Specialized infrastructure App runtimes don't serve static content

Response to the user

Life of a Request:Request for Dynamic Content

Request for Dynamic Content: New Components

App ServersServe dynamic requestsWhere your code runs

App MasterSchedules applicationsInforms Front Ends

App Servers and App Master

Request for Dynamic Content: Appservers

Many applicationsMany concurrent requests

Smaller app footprint + fast requests = more apps Enforce Isolation

Keeps apps safe from each otherEnforce statelessness

Allows for scheduling flexibilityService API requests

What do they do?

Request For Dynamic Content

Front Ends route dynamic requests to App Servers

Routing at the Frontend

1. Checks for cached runtimeIf it exists, no initialization

2. Execute request3. Cache the runtime

System is designed to maximize caching

Slow first request, faster subsequent requestsOptimistically cache data in your runtime!

Request for Dynamic ContentApp Server

Life of a Request:Requests accessing APIs

App Server

1. App issues API call 2. App Server accepts3. App Server blocks runtime4. App Server issues call5. Returns the response

Use APIs to do things you don't want to do in your runtime, such as...

API Requests

APIs

Distributed in-memory cachememcacheg

Also written by Brad Fitzpatrickadds: set_multi, get_multi, add_multi

Optimistically cache for optimization Very stable, robust and specialized

MemcachegA more persistent in-memory cache

The App Engine DatastorePersistent storage

Bigtable in one slide

Is not...

databasesharded databasedistributed hashtable

Is...

sharded, sorted array

Scalable structured storage

Datastore Under the Covers

Scalability does not come for freeDatastore built on top of BigtableQueriesIndexes

Bigtable

Sort

Machine #2

Machine #3

Machine #1

Bigtable

Operations

Read Write Delete

Single row transaction (atomic)

Operate

Bigtable

Operations

Scan: prefix b Scan: range b to d

Entities table

Primary tableEvery entity in every appGeneric, schemalessRow name -> entity keyColumn (only 1) serialized entity

Entity keys

class Grandparent(db.Model): passclass Parent(db.Model): passclass Child(db.Model): pass

Felipe = Grandparent()Carlos = Parent(parent=Felipe)Ignacio = Child(parent=Carlos)

/Grandparent:Felipe/Grandparent:Felipe/Parent:Carlos/Grandparent:Felipe/Parent:Carlos/Child:Ignacio

Based on parent entities: root entity, then child, child, ...

my family

Queries

Restrict by kindFilter by property valuesSort on propertiesAncestor

SELECT * FROM Person ...

WHERE name = 'John';

ORDER BY name DESC; WHERE city = 'Sonoma' AND state = 'CA'AND country = 'USA';

WHERE ANCESTOR IS :Felipe ORDER BY name;

GQL < SQL

Indexes

Dedicated Bigtable tablesMap property values to entitiesEach row = index data + entity key

No filtering in memoryNo sorting in memory

Convert queries to scans

Query planning

1. Pick index2. Compute prefix or range3. Scan!

Convert query to dense index scan

Kind index

SELECT * FROM Grandparent

1. Kind index2. Prefix = Grandparent3. Scan!

Row name Row name

Single-property index

WHERE name = 'John'ORDER BY name DESCWHERE name >= 'B' AND name < 'C' ORDER BY name

Queries on single property (one ASC one DESC)

Row name Key

Single-property indexQueries on single property (one ASC one DESC)

Row name Key

Single-property index

WHERE name = 'John'1. Prefix = Parent name John (fully specified)2. Scan!

Single-property index

ORDER BY name DESC1. Prefix = Parent name2. Scan!

Single-property index

WHERE name >= 'B' and name < 'C' ORDER BY name1. RANGE = [Parent name B, Parent name C)2. Scan!

The App Engine Datastore

Your data is already partitionedUse Entity Groups

Explicit Indexes make for fast readsBut slower writes

Replicated and fault tolerantOn commit: ≥3 machinesGeographically distributed

Bonus: Keep globally unique IDs for free

Persistent storage

GMail

Google Accounts

Picasaweb

Other APIs

Gadget API

Tasks Queue

Google Talk

App Engine:Design Motivations, Recap

creative commons licensed photograph: http://www.flickr.com/photos/cote/54408562/

Build on Existing Google Technology

Provide an Integrated Environment

Why?Manage all apps together

What it means for you:Follow best practicesSome restrictionsUse our tools

Benefits:Use our toolsAdmin ConsoleAll of your logs in one placeNo machines to configure or manageEasy deployment

Encourage Small Per-Request Footprints

Why?Better utilization of App ServersFairness

What it means for your app:Less Memory UsageLimited CPU

Benefits:Better use of resources

Encourage Fast Requests

Why?Better utilization of appserversFairness between applicationsRouting and scheduling agility

What it means for your app:Runtime cachingRequest deadlines

Benefits:Optimistically share state between requestsBetter throughputFault toleranceBetter use of resources

Maintain Isolation Between Apps

Why?SafetyPredictability

What it means for your app:Certain system calls unavailable

Benefits:SecurityPerformance

Encourage Statelessness and Specialization

Why?App Server performanceLoad balancingFault tolerance

What this means for you app:Use API calls

Benefits:Automatic load balancingFault toleranceLess code for you to writeBetter use of resources

Require Partitioned Data Model

Why?The Datastore

What this means for your app:Data model + IndexesReads are fast, writes are slower

Benefits:Design your schema once

No need to re-architect for scalabilityMore efficient use of cpu and memory

Google App Engine:The Numbers

Google App Engine

Currently, over 100K applications

Serving over 250M pageviews per day

Written by over 200K developers

App Engine

Open For QuestionsThe White House's "Open For Questions" application accepted 100K questions and 3.6M votes in under 48 hours

Apr 2008 Python launchMay 2008 Memcache, Images APIJul 2008 Logs exportAug 2008 Batch write/deleteOct 2008 HTTPS supportDec 2008 Status dashboard, quota detailsFeb 2009 Billing, larger filesApr 2009 Java launch, DB import, cron support, SDCMay 2009 Key-only queriesJun 2009 Task queuesAug 2009 Kindless queriesSep 2009 XMPPOct 2009 Incoming email

18 months in review

Roadmap

Storing/serving large filesMapping operations across data setsDatastore cursorsNotification alerts for application exceptionsDatastore dump and restore facility

Wrap up

Always free to get started

~5M pageviews/month6.5 CPU hrs/day1 GB storage650K URL Fetch calls2,000 recipients emailed1 GB/day bandwidthN tasks

Purchase additional resources *

* free monthly quota of ~5 million page views still in full effect

Thank you

Read morehttp://code.google.com/appengine/

Questions?Post your questions for this talk on Twitter #devfest09 #appengine

ThanksTo Alon Levi, Fred Sauer for most of these slides