The Evolution of Big Data Pipelines at Intuit

41
The Evolution of Big Data Pipelines At Intuit June 30, 2016 #hadoopsummit #HS16SJ

Transcript of The Evolution of Big Data Pipelines at Intuit

Page 1: The Evolution of Big Data Pipelines at Intuit

The Evolution of Big Data Pipelines At IntuitJune 30, 2016

#hadoopsummit #HS16SJ

Page 2: The Evolution of Big Data Pipelines at Intuit

Your Speakers

Lokesh RajaramSenior Software Engineer, Intuit

likes Photography

Rekha JoshiPrincipal Software Engineer, Intuit

Currently likes Chopped

Page 3: The Evolution of Big Data Pipelines at Intuit

The Plan

Page 4: The Evolution of Big Data Pipelines at Intuit

Unicellular Amoeba

Multicellular Humans

Page 5: The Evolution of Big Data Pipelines at Intuit

Cannot Evolve? Disappear..

Gone!

Page 6: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data

Page 7: The Evolution of Big Data Pipelines at Intuit

Our Mission

To improve our customers’ financial lives so profoundly … they can’t imagine going back to the old way!

Page 8: The Evolution of Big Data Pipelines at Intuit

Consumers Small Businesses Accounting Professionals

Who we serve

Page 9: The Evolution of Big Data Pipelines at Intuit

42M 2.3M 7MFile their own taxes with

TurboTaxRun their small businesses

with QuickBooksManage their personal finances

with Mint

The Numbers Are Growing

65+ Applications, 25% of US GDP

Page 10: The Evolution of Big Data Pipelines at Intuit

Era of Windows Era of

Web

Era of the Cloud

Era of DOS

Intuit - An Evolution Case Study

Compliantdata

Mobile First

1980s 1990s 2000s

• Employees: 150• Customers: 1.3M customers• Revenue: $33M

• Employees: 4,500 • Customers: 5.6M • Revenue: $1.04B

• Employees: 7,700• Customers: 37M• Revenue: $4.2B

20162010

Regulatory data Transactional data Batch data Real time data Complex, secure data

Page 11: The Evolution of Big Data Pipelines at Intuit

Data Is The Decision Maker

Page 12: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines – The Need

Secure Cloud Environment

Single Cohesive Data Pipeline

AB Testing

Personalization

StreamingProfile Store

Fraud Detection

Support Varied Use Cases

and more..

Page 13: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines

Thin Slices - Minimal Viable Product

Page 14: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines – The Recipe

Taking the Data In

Transforming Data

Handling The Indigestion With Scale

Page 15: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines – The Recipe

No SnowflakesSolutions

Getting Vested Stakeholders Agreements

Establishing The

Standards

Page 16: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines – The Recipe

Breaking The Silos

Moving Organization In

One Direction

Page 17: The Evolution of Big Data Pipelines at Intuit

Evolution of Big Data Pipelines – The Recipe

● Making The Configuration Knobs Work● At Scaleo Latency o Throughput

● Schema, PII, Metadata, Changes, Audit, Governance ● Controlled Access←→ Innovation● Error Monitoring ● Cluster Deployment

Page 18: The Evolution of Big Data Pipelines at Intuit

Organization Evolution Data Evolution

Page 19: The Evolution of Big Data Pipelines at Intuit

SDK

User-entered data

Apache Kafka

Collector: User-entered and clickstream data

Real-time processing

Personalization Engine

Profile Store

Big Data Pipeline Slice View

Page 20: The Evolution of Big Data Pipelines at Intuit

Big Data Pipeline Components

Page 21: The Evolution of Big Data Pipelines at Intuit

Monitoring The Pipeline

AWS resource alarms

Custom App MetricsJVM and App Metrics

Custom process alerts

Logging and alert

Page 22: The Evolution of Big Data Pipelines at Intuit

Evolution In Stages

Page 23: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 0: Disparate And Chaotic

Disparate Databases

Page 24: The Evolution of Big Data Pipelines at Intuit

Data Pipeline (an example)

• Collect event stream data into one location

• Handle ~ 200k events / sec

• Payload ~ 3-5KB

• Enrich message and load it into Hive in defined SLA

Page 25: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 1

Event Stream

OozieSqoop

Netezza LoaderHive QL

operationsStormSamzaFlume

Page 26: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 2

Event Stream

{ ReST }

Page 27: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 3 (HA & DR)SDK

{ ReST }SDK

{ ReST }Mirroring

Page 28: The Evolution of Big Data Pipelines at Intuit

Challenges & Opportunities

Page 29: The Evolution of Big Data Pipelines at Intuit
Page 30: The Evolution of Big Data Pipelines at Intuit
Page 31: The Evolution of Big Data Pipelines at Intuit

Set of Changes

• Network upgrades

• Increase pipe

• Broker

• Mirrormaker

• Host TCP

Page 32: The Evolution of Big Data Pipelines at Intuit
Page 33: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 4 (Streaming + Batch)

SDK

{ ReST }

SDK

{ ReST }

Mirroring

Page 34: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 5 (Cloud only)

SDK

{ ReST }

Kafka Connectors

Page 35: The Evolution of Big Data Pipelines at Intuit

Evolution - Stage 5 (Cloud only - Future state)

Page 36: The Evolution of Big Data Pipelines at Intuit

Pipeline Essentials

SDK { ReST }

SDK { ReST }

Page 37: The Evolution of Big Data Pipelines at Intuit

Traffic Rate Monitoring

Page 38: The Evolution of Big Data Pipelines at Intuit

Trust by Verification

• Test all Observable End-points• Functional• Data Loss• Data Parity

• Measure for SLA• Baseline Tests

Page 39: The Evolution of Big Data Pipelines at Intuit
Page 40: The Evolution of Big Data Pipelines at Intuit

Interested in Joining?

goo.gl/BLPfyR

Page 41: The Evolution of Big Data Pipelines at Intuit

Thank You!