Serverless Data Pipeline Lessons Learned Building and ...€¦ · Lessons Learned Building and...

Will Norman

Lessons Learned Building and Operating a Serverless Data Pipeline

Introduction

● Will Norman - Director of Engineering @ Intent○ FinTech and AdTech background

● Intent○ Data Science company for commerce sites○ Primary application is an ad network for travel sites

● MOD owns data

● 4 Engineers

● 1 Product Manager

MOD Squad

What we’ll be covering

● What is Serverless?

● Intent Data Platform

● Lessons Learned

● More about managed services than lack servers

● Not just FaaS

● Scale on demand / pay for only what you use

● Empowers developers to own their platform

What is Serverless?

● Active MQ

● Log Processors○ Java applications

○ Kept state locally

○ Cron scheduled tasks to roll files to S3

○ Ran on dedicated EC2 instances

● S3

Intent Data Platform [Old World]

● Kinesis

● Lambda

● Kinesis Firehose

● SNS

● AWS Batch

● S3

Intent Data Platform [New World]

● Streaming Data Consumers

● Spark Jobs / Aggregations -> Redshift

● Snowflake Loader -> Snowflake

● Parqour -> Athena○ EMR based jobs that convert AVRO -> Parquet

Data Consumers

● Fewer production issues

● Separation of concerns

● Horizontally scalable

● Removed a lot of undifferentiated heavy lifting

Worth the move?

Lessons Learned

1. Total Cost of Ownership

2. Think about data formats upfront

3. Design for Failure

4. Design for Scalability

5. Not NoOps just DiffOps

6. Build Components

7. CI / CD Strategies

8. Leverage the Community

● On demand costs

● Hidden Costs / Tag All The Things!

● Enterprise Support

● Value of being able to focus on core business problems

Total Cost of Ownership

● What does the ecosystem support?

● Schema vs Schemaless (eg AVRO vs JSON)

● Data validation & Data evolution

● Data at rest vs data in flight

● JSON / CSV / AVRO / Parquet?

Think about data formats up front

record DataWrapper { string dataType; long schemaFingerprint; bytes data; }

● Publish Schema in JSON format to S3● Consumers lookup schemas, and calculate fingerprints

Schema Registry

● System Guarantees?

● Idempotency

● Over process (data lookbacks)

● Dead Letter Queues

Design for failure

● Decouple from non-scalable systems

● Don’t run lambdas in VPC if you can help it

● Partition data at rest

● Shard events based on GUID / random id if ordering isn’t

necessary

● Think about fan out patterns

Design for Scalability

● Application problem or service problem

● Platform Limits

● Logs

● Metrics

● Dashboards

● Alerts

Not NoOps, just DiffOps

Some things remain the same

● Help to reason about different parts of the system

● Make it easy to do the right thing

● Easier to extend

● Infrastructure as Code

Build Components

module "conversion_event_processor" {

source = "../modules/event_processor"

data_type = "conversion"

data_source = "ad_server"

processor_lambda_handler = "com.intentmedia.data.stream.ConversionLambda::handler"

environment = "${var.environment}"

firehose_lambda_handler = "com.intentmedia.data.stream.ConversionFirehose::handler"

processor_lambda_reserved_concurrent_executions = 3

firehose_lambda_reserved_concurrent_executions = 2

● Step backwards from being able to run stack locally

● Unit tests for business logic

● Integration Tests / End to End tests to ensure that

everything is working as expected

● Use different AWS accounts to segregate staging and

production

CI / CD

● Slack○ Serverless Forum○ og-aws

● Blogs○ Symphonia https://www.symphonia.io/○ Charity Majors https://charity.wtf/○ Jeremy Daly https://www.jeremydaly.com/

● Twitter● Meetup Events / Conferences

Leverage the Community

Questions?

Will Normanwill.norman@intent.com

We’re hiring!

Serverless Data Pipeline Lessons Learned Building and ...€¦ · Lessons Learned Building and...

Documents

Transcript of Serverless Data Pipeline Lessons Learned Building and ...€¦ · Lessons Learned Building and...

Nascent Notions On Serverless Computing€¦ · 2 Serverless Computing 3 Server Based Model vs Serverless Model 4 Why to use Serverless? 5 Leading Serverless providers 6 The Serverless

Introduction to Serverless PHP - on.notist.cloud · Serverless (e.g. OpenWhisk) Rob Allen ~ @akrabat. Serverless? The first thing to know about serverless computing is that "serverless"

The Dakota Access Pipeline: Lessons Learned from the Standing Rock Protests for Future Pipeline Installation, Mason Miller

Serverless use cases with AWS Lambda - More Serverless Event

Serverless & PHPthe future serverless? is PHP's future serverless? try! learn! share! bref.sh null serverless-php.news @matthieunapoli. Title: Bref - February 2020 Created Date:

Application Repository AWS Serverless · AWS Serverless Application Repository Developer Guide What Is the AWS Serverless Application Repository? The AWS Serverless Application Repository

Lessons Learned from Mega Pipeline Projects: TANAP Case

An NCC Group Publication SERVERLESS ARCHITECTURE: THE FUTURE OF CODE DEPLOYMENT … · An NCC Group Publication | Serverless Architecture 3 Scalability in Serverless Computing Serverless

A Serverless Data Pipeline

Lessons learned from oil pipeline natech accidents and ...publications.jrc.ec.europa.eu/.../homepipelinefinallleur26913en.pdf · Lessons learned from oil pipeline natech accidents

TIAD 2016 : Building a Serverless Pipeline

Serverless Architecture Patterns - Manoj Ganapathi - Serverless Summit

Securing Serverless Architectures - AWS Serverless Web Day

Serverless London - Lambda@Edge (Serverless & Originless on AWS)

How I Learned to Love Serverless - Thagomizer · Easy: 1 DevOps consultant for the kubernetes part, 1 serverless hero for the serverless part, 1 agile coach to supervise, 1 management

Conquering Serverless - Serverless Computing...Serverless is cheap enough for every developer to have their own application instances Serverless local development and testing is hard

Building a Serverless Pipeline

Lambda Serverless & Serverless Microservices · Lambda functions API Gateway Typescript Serverless framework Serverless Microservices Some AWS services and Demo session 2 / Confidential

Serverless Meetup SF - Lambda@Edge (Serverless & Originless on AWS)

Introduction to Serverless PHP - akrabat.com · Serverless Serverless is all about composing software systems from a collection of cloud services. With serverless, you can lean on