Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy...

22
Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy [email protected] DELI: Deferred Lightweight Indexing on Log-structured Key-Value Stores

Transcript of Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy...

Page 1: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy [email protected]

DELI: Deferred Lightweight Indexing on Log-structured Key-Value Stores

Page 2: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

2

Agenda

Introduction

DELI design Online DELI Offline DELI

Implementation

Performance evaluation

Conclusion

Page 3: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

NoSQL data stores key-based queries: Put/Get

Get(k)vPut(k,v)

3

Page 4: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Critical to enable value-based queries (ReadValue)

Get(k)v ReadValue(v){k}

SELECT <k,v> FROM {<k,v>}WHERE v==vq

SQL

Put(k,v)

NoSQL needs index

4

Page 5: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Challenge: write-intensive workload

Write-intensive big data:

Latest data need to be queried in real time. A social user wants to read the latest status of friends. A broker needs to know the latest price of a stock. An online gamer needs to see other players’ current status. ReadValue needs to return the latest version.

=Web 2.0

Data update (k,v)

5

Page 6: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

6

Problem formulation

Support value-based queries on log-structured key-value stores in the presence of write-intensive data.

Value based queries return the latest result in real-time (consistent read)

Page 7: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Base table (key, value)

Index table (value, key)

ReadValue (latest)

Get(k)v’ Del(v ’,k)

Data update (k,v)

Put(v,k)Put(k,v)

• Index creation - Always• Index distribution - Global• Index maintenance -

Partial

Operations in index maintenance

7

Page 8: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

8

Index maintenance goal

Low overhead–Index maintenance should bring low overhead to the write-intensive workload

Consistent query result–Read should get the latest value

Page 9: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

9

Agenda

Introduction

DELI design Online DELI Offline DELI

Implementation

Performance evaluation

Conclusion

Page 10: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Performance issue:– Get is slow.– Critical: in update-intensive workload, this slow get is to

impede base updates.

• Put(k,v)BaseTable• Put(v,k)IndexTable• Get(k):v’BaseTabl

e• Del (v’)IndexTable

Data update (k,v)

Synch. execution

Synch. execution

Synch. execution

Synch. execution

Baseline design: update index in-place

10

Page 11: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

• Get is much slower than Put.• Not only in HBase but in many NoSQLs:

• Due to the design of log-structure merge tree • Detailed reason

Get is slow in log-structured NoSQL stores

11

Page 12: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

© 2012 IBM Corporation12

Log Structure Merge (LSM) Trees

Write (sub ms)

Commit log

Mem Store

C1

C2

memory

disk

Flush(memstore disk store)

Commit log’

Mem Store’

C1

C2

memory

disk

C3

mergeRead (~10ms)

LSM Tree = a in-memory store + several on-disk stores

Writes go to a commit log (seq. IO) and in-memory store – update not in-place, FAST

Memstore periodically pushed to disk

Reads go to mem+disk stores (random IO)-- SLOW

On-disk stores periodically compacted

write/inserts

read

s

Slow FastSlo

w

Fast

B+tree (RDB)

logging

LSM tree (HBase)

[O'Neil, Acta Informatica’96]

v1

v2

v3

Commit log’

Mem Store’

C1’

memory

disk

compact(merge disk stores)

V1,2,3

Page 13: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

DELI: online append-only indexing and offline index repairs

PutBaseTable PutIndexTable GetBaseTable DelIndexTable IndexRepair

op. (Expensive)

Append-only op.

Deferred execution

Synch. execution

Data update

13

Why? IndexRepair (or Get) is expensive, so defer it to offline hours IndexRepair/Get is faster in offline hours, after compaction.

Page 14: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

14

Agenda

Introduction

DELI design Online DELI Offline DELI

Implementation

Performance evaluation

Conclusion

Page 15: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Offline: deferred index repair

DataUpdate(k,v)• Put(v,k)• Put(k,v)

Online: append-only indexing

DELI implementation

15

Page 16: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

HBase compactionHBase compaction with DELI offline repair

Implementation: Deferred index repair (offline)

16

Page 17: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

17

Agenda

Introduction

DELI design Online DELI Offline DELI

Implementation

Performance evaluation

Conclusion

Page 18: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Platform setup: – 20 nodes in Emulab: 1 client, 1 master, 18 slaves– Client: generate workload by YCSB [SOCC2010]– Servers: HBase/HDFS w. default configuration.

Evaluation: Setup

18

Page 19: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Evaluation: write latency

DELI write performance: – DELI versus update-in-place indexing, no-index case– Average write latency

19

Page 20: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Evaluation: read-write latency Compare against two alternatives:

Update-in-place on HBase, and B-Tree on MySQL.

Performs better in write-intensive workloads.

20

Page 21: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

21

Conclusion

DELI: deferred lightweight indexing for log-structured NoSQL stores

Code: https://github.com/tristartom/nosql-indexing

HBase integration: https://issues.apache.org/jira/browse/HBASE-13519

Part of it has been included in IBM InfoSphere BigInsights– https://ibm.biz/BdXiLa

Check our other paper– Diff-Index: Differentiated Index in Distributed Log-Structured Data

Stores (EDBT 2014)

Page 22: Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong,  Ling Liu, Balaji Palanisamy wtan@us.ibm.comwtan@us.ibm.com DELI: Deferred Lightweight Indexing on Log-structured.

Questions?

Contact: Yuzhe TangAssistant ProfessorSyracuse UniversityEmail:[email protected]: ecs.syr.edu/faculty/yuzhe