Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy...
-
Upload
kimberly-howard -
Category
Documents
-
view
218 -
download
0
Transcript of Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy...
Yuzhe Tang, Arun Iyengar, Wei Tan, Liana Fong, Ling Liu, Balaji Palanisamy [email protected]
DELI: Deferred Lightweight Indexing on Log-structured Key-Value Stores
2
Agenda
Introduction
DELI design Online DELI Offline DELI
Implementation
Performance evaluation
Conclusion
NoSQL data stores key-based queries: Put/Get
Get(k)vPut(k,v)
3
Critical to enable value-based queries (ReadValue)
Get(k)v ReadValue(v){k}
SELECT <k,v> FROM {<k,v>}WHERE v==vq
SQL
Put(k,v)
NoSQL needs index
4
Challenge: write-intensive workload
Write-intensive big data:
Latest data need to be queried in real time. A social user wants to read the latest status of friends. A broker needs to know the latest price of a stock. An online gamer needs to see other players’ current status. ReadValue needs to return the latest version.
=Web 2.0
Data update (k,v)
…
5
6
Problem formulation
Support value-based queries on log-structured key-value stores in the presence of write-intensive data.
Value based queries return the latest result in real-time (consistent read)
Base table (key, value)
Index table (value, key)
ReadValue (latest)
Get(k)v’ Del(v ’,k)
Data update (k,v)
Put(v,k)Put(k,v)
• Index creation - Always• Index distribution - Global• Index maintenance -
Partial
Operations in index maintenance
7
8
Index maintenance goal
Low overhead–Index maintenance should bring low overhead to the write-intensive workload
Consistent query result–Read should get the latest value
9
Agenda
Introduction
DELI design Online DELI Offline DELI
Implementation
Performance evaluation
Conclusion
Performance issue:– Get is slow.– Critical: in update-intensive workload, this slow get is to
impede base updates.
• Put(k,v)BaseTable• Put(v,k)IndexTable• Get(k):v’BaseTabl
e• Del (v’)IndexTable
Data update (k,v)
Synch. execution
Synch. execution
Synch. execution
Synch. execution
Baseline design: update index in-place
10
• Get is much slower than Put.• Not only in HBase but in many NoSQLs:
• Due to the design of log-structure merge tree • Detailed reason
Get is slow in log-structured NoSQL stores
11
© 2012 IBM Corporation12
Log Structure Merge (LSM) Trees
Write (sub ms)
Commit log
Mem Store
C1
C2
memory
disk
Flush(memstore disk store)
Commit log’
Mem Store’
C1
C2
memory
disk
C3
mergeRead (~10ms)
LSM Tree = a in-memory store + several on-disk stores
Writes go to a commit log (seq. IO) and in-memory store – update not in-place, FAST
Memstore periodically pushed to disk
Reads go to mem+disk stores (random IO)-- SLOW
On-disk stores periodically compacted
write/inserts
read
s
Slow FastSlo
w
Fast
B+tree (RDB)
logging
LSM tree (HBase)
[O'Neil, Acta Informatica’96]
v1
v2
v3
Commit log’
Mem Store’
C1’
memory
disk
compact(merge disk stores)
V1,2,3
DELI: online append-only indexing and offline index repairs
PutBaseTable PutIndexTable GetBaseTable DelIndexTable IndexRepair
op. (Expensive)
Append-only op.
Deferred execution
Synch. execution
Data update
13
Why? IndexRepair (or Get) is expensive, so defer it to offline hours IndexRepair/Get is faster in offline hours, after compaction.
14
Agenda
Introduction
DELI design Online DELI Offline DELI
Implementation
Performance evaluation
Conclusion
Offline: deferred index repair
DataUpdate(k,v)• Put(v,k)• Put(k,v)
Online: append-only indexing
DELI implementation
15
HBase compactionHBase compaction with DELI offline repair
Implementation: Deferred index repair (offline)
16
17
Agenda
Introduction
DELI design Online DELI Offline DELI
Implementation
Performance evaluation
Conclusion
Platform setup: – 20 nodes in Emulab: 1 client, 1 master, 18 slaves– Client: generate workload by YCSB [SOCC2010]– Servers: HBase/HDFS w. default configuration.
Evaluation: Setup
18
Evaluation: write latency
DELI write performance: – DELI versus update-in-place indexing, no-index case– Average write latency
19
Evaluation: read-write latency Compare against two alternatives:
Update-in-place on HBase, and B-Tree on MySQL.
Performs better in write-intensive workloads.
20
21
Conclusion
DELI: deferred lightweight indexing for log-structured NoSQL stores
Code: https://github.com/tristartom/nosql-indexing
HBase integration: https://issues.apache.org/jira/browse/HBASE-13519
Part of it has been included in IBM InfoSphere BigInsights– https://ibm.biz/BdXiLa
Check our other paper– Diff-Index: Differentiated Index in Distributed Log-Structured Data
Stores (EDBT 2014)
Questions?
Contact: Yuzhe TangAssistant ProfessorSyracuse UniversityEmail:[email protected]: ecs.syr.edu/faculty/yuzhe