Moneytree - Data Aggregation with SWF

Post on 21-Dec-2014

227 views 0 download

Tags:

description

An outline of how Moneytree uses Amazon SWF to coordinate our backend aggregation workflow. Focuses on how to run a large scale distributed system with a few developers while still sleeping at night.

Transcript of Moneytree - Data Aggregation with SWF

Ross Sharrott Founder / CTO

rsharrott@moneytree.jp

@moneytreejp

Who Am I?

Ross Sharrott

Founder & CTO of Moneytree

American

10 Years in Japan (Feb 24!)

Previously Senior IT Manager

Love distributed architectures in the cloud

What is Moneytree?

Internet banking is fragmented; not simple

Email is Simple

For mail we use just ONE app!

Gmail Yahoo! Work, etc.

Radically simplify your relationship with money

and make it beautiful.

Data Aggregator

Our Goals:

Download accounts for 1M people every day

Deliver new data in < 1 minute

2-3 developers

Sleep at night

First Idea

I know…I’ll use a queue!

Original Queue Based Process

Download Data

Process Statement

sStore Data

1 Account / Many Statements

Download Data

Process Statements

Post Process Statements

Store Data + Additional

Information

But we had a problem…

To determine a CC balance, we need information from multiple statements

We needed a post statement process

What We Needed

Download Data

Process Statement

s

• Statement 1

• Statement 2

• Many More

Post Process

Queue Falls Down

I know…I’ll use a queue!

Queues are linear

Where are we in the process?Logged in yet? Processing data?

What do you do when a job fails?

How do you relate jobs to one workflow?

Enter SWF

AWS Managed Service

Coordinates Workflows / Maintains history

Provides multiple queues called Task Lists

Handle decision points with Deciders

Perform tasks with Activity Workers

Real World – A Restaurant

SWF World – A Restaurant

Decider – does nothing, makes decisions

Workflow Starter – takes orders

Activity Worker – makes food

Activity Worker – distributes food

SWF – maintains history, distributes tasks

Activity Worker

Very similar to any queue worker

Handles a specific task

Polls a Task List to get new info

Reports activity success or failure

Puts results in a DB or on S3, etc.

Workflow Decider

Uses workflow history to make decisions

Schedules tasks

Handles rescheduling failures & timeouts

Reacts to external events (Signals)

Reacts to completion events

Moneytree’s Workflow

Download Data

Statement

Post Process

Statement

Moneytree’s SWF Architecture

1 Day of Work

Yesterday:

70,000 Workflows

Average Completion Time: 1 Minute

575,000 Decision Tasks

146,000 Statements Processed

70,000 Aggregation Tasks

70,000 Post Process Tasks

Data Aggregator

Our Goals: 1M people every day Deliver new data in < 1 minute 2-3 developers Sleep at night

How To Sleep At Night

Make Workers Scalable

Avoid SWF API Throttling

Expect Failures

Measure Everything

Make Workers Scalable

Separate concerns into individual workers

Scale each worker process individually

Automate scaling your workers

Make workers idempotentYou can always try again

Avoid API Throttling

Don’t call GetWorkflowHistory

Stress test your implementation

Limits are by Region, not domain!

Get your limits raisedWe hit limits on day 1

Use exponential retry

Have a circuit breaker

Expect Failures

Cloud = FailuresDyno / EC2 instance restarts

Network & Service outages

Don’t wait for failed processesUse aggressive timeouts

Use heartbeats for long processes

Monitor Everything

Use Performance Monitoring10x increase in performance = 10x workers

New Relic & Cloudwatch

Centralize LoggingCloud resources disappear w/their logs

Papertrail / Logentries

Log Everything & Setup AlertsIf you don’t log it, you can’t fix it

Sleep At Night

Make Workers Scalable

Avoid SWF API Throttling

Expect Failures

Measure Everything

Thank You!

Moneytree is hiring!iOS Developers

API Developers / AWS Dev Ops

Technology Ninjas

Ross Sharrott Founder / CTOrsharrott@moneytree.jp

@moneytreejp