Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John...

31
Make HTAP Real with TiFlash A TiDB native Columnar Extension

Transcript of Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John...

Page 1: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Make HTAP Real with TiFlash

A TiDB native Columnar Extension

Page 2: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

About me

● Liu Cong, 刘聪

● Technical Director, Analytical Product@PingCAP

● Previously

○ Principal Enginer@QiniuCloud

○ Technical Director@Kingsoft

● Focus on distributed system and database engine

Page 3: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

OLTP

ETL

Traditional data platform relies on complex

architecture moving data around via ETL. This

introduces maintenance cost and delay of data

arrival in data warehouse.

OLTPOLTP

NoSQL

HadoopData Lake

Analytical Database

Big DataComputeEngine

Data Warehouse

Traditional Data Platform

Page 4: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Fundamental Conflicts

● Large / batch process vs point / short access

○ Row format for OLTP

○ Columnar format for OLAP

● Workload Interference

○ A single large analytical query might cause

disaster for your OLTP workload

Page 5: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

A Popular Solution

● Use different types of databases

○ For live and fast data, use an OLTP specialized

database

○ For historical data, use Hadoop / analytical database

● Offload data via the ETL process into your Hadoop cluster

or analytical database

○ Maybe once per day

Page 6: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Good enough, really?

Page 7: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

TiFlash Extension

Page 8: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

What’s TiFlash Extension

● TiFlash is an extended analytical engine for TiDB

● Powered by columnar storage and vectorized compute engine

● Tightly integrated with TiDB

● Clear isolation of workload not impacting OLTP

● Partially based on ClickHouse with tons of modifications

● Speed up read for both TiSpark and TiDB

Page 9: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

ArchitectureSpark Cluster

TiDBTiDB

Region 1

TiKV Node 1

Store 1

Region 2

Region 3

Region 4

Region 2

TiKV Node 3

Store 3

Region 3

Region 4

Region 1

Region 4

TiKV Node 2

Store 2

Region 3

Region 2

Region 1

TiFlash Node 1TiFlash Node 2

TiFlash Extension Cluster TiKV Cluster

TiSparkWorker

TiSparkWorker

Page 10: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Columnstore vs Rowstore● Columnar Storage stores data in columns instead of rows

○ Suitable for analytical workload

■ Possible for column pruning

○ Compression made possible and further IO reduction

■ ⅕ of average storage requirement

○ Bad small random IO

■ Which is the typical workload for OLTP

● Rowstore is the classic format for databases

○ Researched and optimized for OLTP scenario for decades

○ Cumbersome in analytical use cases

Page 11: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Columnstore vs Rowstore

id name age

0962 Jane 30

7658 John 45

3589 Jim 20

5523 Susan 52

Rowstore

id

0962

7658

3589

5523

name

Jane

John

Jim

Susan

age

30

45

20

52

Columnstore

SELECT avg(age) FROM employee;

Usually you don’t read all columns in a table performing analytics.

In columnstore, you avoid unnecessary IO while you have to read them all in rowstore.

Page 12: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Raft LearnerTiFlash synchronizes data in columnstore via Raft Learner

● Strong consistency on read enabled by the Raft protocol

● Introduce almost zero overhead for the OLTP workload

○ Except the network overhead for sending extra replicas

○ Slight overhead on read (check Raft index for each region in 96 MB by

default)

○ Possible for multiple learners to speed up hot data read

Page 13: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Raft Learner

Region A

Region A Region A

TiKV

TiKV TiKV

TiFlash

R e g i o n A

Instead of connecting as a Raft Follower, regions in TiFlash act as Raft Learner.

When data is written, Raft leader does not wait for learner to finish writing. Therefore, TiFlash introduces almost no overhead replicating data.

Page 14: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Raft Learner

4

3

Raft Leader

Raft Learner

When being read, Raft Learner sends request to check the Raft log index with Leader to see if its data is up-to-date.

Page 15: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Raft Learner

4

4

Raft Leader

Raft Learner

After data catches up via Raft log, Learner serves the read request then.

Page 16: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

TiFlash is beyond columnar format

Page 17: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Scalability

● An HTAP database needs to store huge amount of data

● Scalability is very important

● TiDB relies on multi-raft for scalability

○ One command to add / remove node

○ Scaling is fully automatic

○ Smooth and painless data rebalance

● TiFlash adopts the same design

Page 18: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Isolation

● Perfect resource isolation

● Data rebalance based on the “label” mechanism

○ Dedicated nodes for TiFlash / Columnstore

○ TiFlash nodes have their own AP label

○ Rebalance between AP label nodes

● Computation Isolation is possible by nature

○ Use a different set of compute nodes

○ Read only from nodes with AP label

Page 19: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Isolation

Region 1

TiKV Node 1

Store 1

Region 2

Region 3

Region 4

Region 2

TiKV Node 3

Store 3

Region 3

Region 4

Region 1

Region 4

TiKV Node 2

Store 2

Region 3

Region 2

Region 1

TiFlash Node 1TiFlash Node 2

TiFlash Extension Cluster TiKV Cluster

Label: AP Label: TP

TiDB/

TiSpark

Peer 1 Label: TP

Peer 2 Label: TP

Peer 3 Label: TP

Peer 4 Label: AP

Region 1AP label constrained

Page 20: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Integration

● TiFlash Tightly Integrated with TiDB / TiSpark

○ TiDB / TiSpark might choose to read from either side

■ Based on cost

○ When reading TiFlash replica failed, read TiKV replica

transparently

○ Join data from both sides in a single query

Page 21: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Integration

Region 1

TiKV Node 1

Store 1

Region 2

Region 3

Region 4

Region 2

TiKV Node 3

Store 3

Region 3

Region 4

Region 1

Region 4

TiKV Node 2

Store 2

Region 3

Region 2

Region 1

TiFlash Node 1TiFlash Node 2

TiFlash Extension Cluster TiKV Cluster

TiDB/

TiSpark

SELECT AVG(s.price) FROM product p, sales sWHERE p.pid = s.pid AND p.batch_id = ‘B1328’;

Index Scan(batch_id = B1328)TableScan(price, pid)

Page 22: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

MPP Support

● TiFlash nodes form a MPP cluster by themselves

● Full computation support on MPP layer

○ Speed up TiDB since it is not MPP design

○ Speed up TiSpark by avoiding writing disk during shuffle

Page 23: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

MPP Support

TiFlash Node 1

MPP Worker

TiDB/

TiSpark

TiFlash Node 2 TiFlash Node 3

Coordinator

MPP WorkerMPP Worker

Plan Segment

TiFlash nodes exchange data and enable complex operators like distributed join.

Page 24: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Performance

● Comparable performance against vanilla Spark on Hadoop +

Parquet

○ Benchmarked with Pre-Alpha version of TiFlash + Spark

(without MPP support)

○ TPC-H 100

Page 25: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Performance

Parquet TiFlash

Page 26: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

TiDB Data Platform

Page 27: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

OLTP

ETL

Traditional data platform relies on complex

architecture moving data around via ETL. This

introduces maintenance cost and delay of data

arrival in data warehouse.

OLTPOLTP

NoSQL

HadoopData Lake

Analytical Database

Big DataComputeEngine

Data Warehouse

Traditional Data Platform

Page 28: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

OLTP

ETL

Traditional data platform relies on complex

architecture moving data around via ETL. This

introduces maintenance cost and delay of data

arrival in data warehouse.

OLTPOLTP

NoSQL

HadoopData Lake

Analytical Database

Big DataComputeEngine

Data Warehouse

TiDB Data Platform

TiDB with TiFlash Extension

Page 29: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Fundamental Change

● “What happened yesterday” vs “What’s going on right now”

● Realtime report for sales campaign and adjust price in no time

○ Risk management with up-to-date info always

○ Very fast paced replenishment based on live data and

prediction

Page 30: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Roadmap

● Beta / User POC in May, 2019

○ With columnar engine and isolation ready

■ Access only via Spark

● GA, By the end of 2019

○ Unified coprocessor layer

■ Ready for both TiDB / TiSpark

■ Cost based access path selection

○ Possibly MPP layer done

Page 31: Make HTAP Real with TiFlash - percona.com HTAP Real with... · id name age 0962 Jane 30 7658 John 45 3589 Jim 20 5523 Susan 52 Rowstore id 0962 7658 3589 5523 name Jane John Jim Susan

Thanks!

Contact us:www.pingcap.com