DataStax Enterprise in the Field – 20160920

70
DataStax Enterprise in the Field Daniel Cohen Solutions Engineer @ DataStax

Transcript of DataStax Enterprise in the Field – 20160920

Page 1: DataStax Enterprise in the Field – 20160920

DataStax Enterprise in the Field

Daniel Cohen Solutions Engineer @ DataStax

Page 2: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

But Enough About Me…

• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media

2

Page 3: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

But Enough About Me…

• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media

2

Page 4: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

But Enough About Me…

• Solutions Engineer at DataStax • LA ➜ SF ➜ NYC ➜ SF ➜ London • Previously at JP Morgan in London • Finance & digital media

2

Page 5: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

1 Introductions

2 Top Customer Questions

3 Field Lessons: Big Irish Bank

4 Field Lessons: Big British Bank

3

Page 6: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Top Customer Questions

• What are all the other [banks] doing? • How many nodes do I need? • What do you mean SSDs? • How do I load data from [Oracle]? • We already have [MongoDB] for NoSQL.

What’s the difference? • What are all the other [banks] doing?

4

Page 7: DataStax Enterprise in the Field – 20160920

What are all the other [banks] doing?

“Tell me secrets about my competitors.”

Page 8: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Transform Legacy Infrastructure

6

…USA Equities

UK FX

UK Bonds

Global Users

Legacy Systems

USA FX

DataStax Enterprise ClusterDSE

User Interface / Application Services

Page 9: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Transition Legacy to Microservices

7

Users µServices

DC NY1A B

C D

DC LDN1A Z

B

Messages

DC NY1

DC LDN1

DC NY1

DC LDN1

USA Customers

Data

UK Accounts

Legacy

C

DSE

DSE

Page 10: DataStax Enterprise in the Field – 20160920

How many nodes do I need?

“How long is a piece of string?”

Page 11: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

9

Page 12: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

9

Page 13: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs

9

Page 14: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

9

Page 15: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

9

Page 16: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

• Lesson 1 ➔ Computer science is about trade-offs.

9

Page 17: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.

9

Page 18: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.• Lesson 3 ➔ Good news! DSE scales linearly.

9

Page 19: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

The Node Count Dance

• “How many nodes do I need?” is a natural question. – Large organizations buy hardware months in advance.

• Desires ➔ Storage, Throughput, Latency, SLAs• Realities

– Cost – Data center capacity (space) – Operational capacity (people) – Your hardware – Your use cases

• Lesson 1 ➔ Computer science is about trade-offs.• Lesson 2 ➔ Test, iterate, test.• Lesson 3 ➔ Good news! DSE scales linearly.

9

Page 20: DataStax Enterprise in the Field – 20160920

What do you mean SSDs?

“We have an amazing SAN.”

Page 21: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Storage Matters

11

SSD (consumer grade)

• 10K – 1M IOPS • 400 MB – 3 GB bandwidth • < 200us latency

✴ Acknowledgements to my colleague Kathryn Erickson.

15K RPM HDD (spinning rust)

• ~ 200 IOPS • ~ 160 MB bandwidth • > 5 ms latency

Page 22: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Storage Interfaces Matter

12

Interface Transfer Rate

SATA III 6 Gb/s

SAS II 6 Gb/s

SAS III 12 Gb/s

PCIe Gen 2 x8 32 Gb/s

Page 23: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

13

Page 24: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?

13

Page 25: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

13

Page 26: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

• But our SAN is awesome! We paid a lot of money for it.

13

Page 27: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.

13

Page 28: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.

• Fine. What about EBS?

13

Page 29: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.

• Fine. What about EBS?– Let’s discuss!

13

Page 30: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

A Nondeterministic Path to Failure

• What about my incredible SAN?– Do not use network attached storage with DSE.

• But our SAN is awesome! We paid a lot of money for it.– No! Do not use network attached storage with DSE.

• Fine. What about EBS?– Let’s discuss!

13

Page 31: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Starting Points

Workload CPU RAM Storage

DSE (Read Heavy) 8-24 cores 32-128 GB ✴ Local SSD (.5 - 2 TB)

DSE (Write Heavy) 12-32 cores 32-128 GB Local SSD (1-3 TB)

DSE + Search 16-32 cores 128 GB Local SSD (1-3 TB)

DSE + Analytics 16-32 cores 128+ GB Local SSD (1-3 TB)

✴ Got extra RAM? Cache is king.

✴✴ 1 Gb ethernet is fine. 10Gb is future-proof.

14

Page 32: DataStax Enterprise in the Field – 20160920

We already have [MongoDB] for NoSQL. What’s the difference?

“Behold the one true NoSQL database.”

Page 33: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQL16

Page 34: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQL16

Page 35: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQL16

Page 36: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQL16

Page 37: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQLFan

tasy16

Page 38: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

NoSQLFan

tasy16

Page 39: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

1 Introductions

2 Top Customer Questions

3 Field Lessons: Big Irish Bank

4 Field Lessons: Big British Bank

17

Page 40: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Proof of Technology @ Big Irish Bank

18

Initial Goals

• Deploy on AWS • Ingest ten years of (fake)

customer data efficiently • Fast retrieval & search

Synopsis

• Payment Services Directive (PSD II) and Open Banking

• Customer access to current and historical data via APIs

• Competitive PoT versus other database vendors

Page 41: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Hardware

19

Page 42: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 43: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 44: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 45: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 46: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs

PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 47: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

PoT Mark 1 • c4.8xlarge (AWS) • 36 vCPU, 60 GB RAM • EBS only

Hardware

19

PoT Recommendation • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Production • 8 nodes across 2 data centers (4:4) • HP DL380 Gen9 ➔ 32 cores, 256 GB RAM, 3.2 TB SSDs on SAS III • 10 Gb ethernet, fiber between DCs

PoT Final • 6 x i2.xlarge (AWS) • 4 vCPU, 30.5 GB RAM • 1 x 800 local SSD

Page 48: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

Page 49: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.

Page 50: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance

doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?

1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.

Page 51: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance

doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?

1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.

3) EBS is still network attached. • 99% Read Latency (milliseconds)

▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD

• Competing vendor falsified numbers. • Lies, damned lies, and statistics.

Page 52: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance

doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?

1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.

4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data

▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets

• PoT Final ➔ 2 years of hot data • Do not architect by convenience.

3) EBS is still network attached. • 99% Read Latency (milliseconds)

▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD

• Competing vendor falsified numbers. • Lies, damned lies, and statistics.

Page 53: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

20

2) Quis custodiet ipsos custodes? • Hit performance plateau at 5,000 ops/s. • Added second jMeter, performance

doubled to 10,000 ops/s. • jMeter was the bottleneck! • Who will test the testers?

1) The Node Count Dance is iterative. • Initial node count estimates were low. • Early refusal to modify AWS setup. • Avoid rigidity. Test, iterate, test.

4) Not all data needs to be hot. • PoT Mark 1 ➔ 10 years of hot data

▫ ~ 20 billion transactions ▫ ~ 30 nodes to reach latency targets

• PoT Final ➔ 2 years of hot data • Do not architect by convenience.

3) EBS is still network attached. • 99% Read Latency (milliseconds)

▫ 3.311 ➔ local SSD ▫ 35.425 ➔ EBS Provisioned SSD

• Competing vendor falsified numbers. • Lies, damned lies, and statistics.

Page 54: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

1 Introductions

2 Top Customer Questions

3 Field Lessons: Big Irish Bank

4 Field Lessons: Big British Bank

21

Page 55: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Production Pilot @ Big British Bank

22

Initial Goals

• Transition from mothballed trials of OrientDB, Titan

• Ingest enormous quantities of data from legacy DB

• Prove graph at scale

Synopsis

• Customer 360° use case across banking group

• DSE Graph • Dissatisfied with other

graph databases

Page 56: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Hardware

23

Page 57: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Page 58: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Page 59: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Page 60: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Page 61: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs

▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches

Page 62: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs

Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs

▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches

Page 63: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Pilot Mark 2 • “Hadoop Leftovers” • 4 x HP DL380s • 24 cores, 512 GB RAM • 1 x 2.1 TB SSD • 14 x 2 TB HDDs

Hardware

23

Pilot Mark 1 • “Private Cloud” • N x Hosted VM • 8 vCPU, 112 GB RAM • SAN only (for now)

Production Target 16 nodes across 2 data centers (8:8) HP DL380 Gen9 ➔ 24 cores, 528 GB RAM, 3.4 TB SSDs

Pilot Final • 3 x Dell C6220 • 12 cores, 128 GB RAM • 6 x 1 TB SATA HDDs

▫ 2 x OS ▫ 1 x commit log ▫ 3 x data, caches

Page 64: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

Page 65: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.

Page 66: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.

1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.

Page 67: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.

1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.

3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do

load tests on these boxes.”

Page 68: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.

1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.

4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt.

3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do

load tests on these boxes.”

Page 69: DataStax Enterprise in the Field – 20160920

© DataStax, All Rights Reserved.

Lessons

24

2) Node Count Dance applies to Graph. • Data size unknown due to privacy. • Load 5% of data, extrapolate. • Test, iterate, test.

1) DSE essentials are critical. • Great team but zero DSE experience. • Ad hoc education introduces risk. • Walk before you run.

4) Avoid surprises before deadlines. • Upgraded from RHEL 6.7 to 7.1. • CPU spikes made nodes unusably slow. • Revert! • Nobody move, nobody gets hurt.

3) Hardware matters, of course. • Leftover Hadoop boxes, spinning rust. • Get creative with configuration & tuning. • “Under no circumstances should you do

load tests on these boxes.”

Page 70: DataStax Enterprise in the Field – 20160920

Thank you!

Daniel Cohen Solutions Engineer @ DataStax

[email protected] @CodaAzzurra