Are you Kudu-ing me?!

Post on 21-Mar-2017

1.798 views 1 download

Transcript of Are you Kudu-ing me?!

This folks must be all wrong, aren’t they?

uuid first_name last_name dob

ee-c6-47-2c John Connor Feb 28th, 1985

84-ee-ff-d5 Sarah Connor May 11th, 1965

57-4f-d9-d8 Kyle Reese Mar 1st, 2002

SELECT MIN(dob) FROM characters WHERE last_name=”connor”

uuid

ee-c6-47-2c

84-ee-ff-d5

57-4f-d9-d8

last_name

Connor

Connor

Reese

first_name

John

Sarah

Kyle

dob

Feb 28th, 1985

May 11th, 1965

Mar 1st, 2002

SELECT MIN(dob) FROM characters WHERE last_name=”connor”

What’s the problem with Apache Parquet then?

Ever implemented Lambda Architecture?

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

CREATE TABLE ’characters’ (last_name STRING,first_name STRING,movie STRING,actor STRING,actor_age INT

)DISTRIBUTE BY HASH (last_name, first_name) INTO 4 BUCKETSTBLPROPERTIES (

’kudu.key_columns’ = ’last_name, first_name, movie, actor’)

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

last_name first_name movie actor actor_age

Connor John Terminator 2 Edward Furlong 14

Connor John Terminator 2 Michael Edwards 47

Connor Sarah Terminator Linda Hamilton 28

Connor Sarah Terminator 2 Linda Hamilton 35

Reese Kyle Terminator 2 Michael Biehn 35

T-800 Terminator Arnold Schwarzenegger

37

Somewhere between BigTable/HBase range partitioning and Cassandra’s hash partitioning.

last_name

Connor

Connor

Reese

first_name

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator 2

actor

Edward Furlong

Michael Edwards

Michael Biehn

actor_age

14

47

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

last_name

Connor

Connor

Reese

first_name

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator 2

actor

Edward Furlong

Michael Edwards

Michael Biehn

actor_age

14

47

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

INSERT INTO characters (last_name, first_name, movie, actor, actor_age)

VALUES(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

INSERT INTO characters (last_name, first_name, movie, actor, actor_age)

VALUES(’Connor’, ’John’, ’Terminator Genisys’, ’Jason Clarke’, 36)

Delta

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE last_name=’Connor’

MPP FTW

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’

last_name

Connor

Connor

Connor

Reese

first_name

John

John

John

Kyle

movie

Terminator 2

Terminator 2

Terminator Genisys

Terminator 2

actor

Edward Furlong

Michael Edwards

Jason Clarke

Michael Biehn

actor_age

14

47

36

35

last_name

Connor

Connor

first_name

Sarah

Sarah

movie

Terminator

Terminator 2

actor

Linda Hamilton

Linda Hamilton

actor_age

28

35

last_name

T-800

first_name movie

Terminator

actor

Arnold Schwarzenegger

actor_age

37

SELECT MAX(actor_age) FROM characters WHERE movie=’Terminator 2’

Bloom filters FTW

Tablet Server 1

Tablet Server 2

Master

Leader

Leader

MasterMaster replica

Leader

Leader

Tablet Server 1

Tablet Server 2

Tablet Server 3

Leader

Leader

Tablet Server 1

Tablet Server 2

MasterMaster replica

Tablet Server 3

Leader

Leader

Typically 10-100 tablets per machine.

DiskRowSet

• Col A

• Col B

• …

• [Delta store]

DiskRowSet

• Col A

• Col B

• …

• [Delta store]

MemRowSet

• Col A

• Col B

• …

In-memory concurrent B-tree,Keeps all recently-inserted rows

Each column separately written in a single contiguous block of data

Base data

Deltas organized by rows(until compaction happens)

Long story short:- 30% faster than Parquet 1.0 (TPC-H)- 16-187 times faster than Phoenix or HBase (TPC-H again)- hundreds of thousands of rows inserted per second on a single tablet server

TPC-H test, scale factor 100, RF 3- 75 nodes, each: 64 GB RAM, 12 spinning disks, 2x 6-core Xeon- Expansion of 62 GB of data (post-replication, compactions done):

- 570 GB in Hbase (9.2x)- 227 GB in Kudu (3.7x)

http://getkudu.io/kudu.pdf

http://getkudu.io/

http://getkudu.io/faq.html

pmm@collective-sense.com