Tasmo: Building HBase Applications From Event Streams
description
Transcript of Tasmo: Building HBase Applications From Event Streams
![Page 1: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/1.jpg)
TasmoMaterialized Views of Event Streams using HBase
Presenters:Pete MaternJonathan Colt
![Page 2: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/2.jpg)
2 © Jive confidential
What’s the problem
• Joining to death at read time
• With our operational constraints of a single point of failure (single db instance)
• Can only scale up - not out
• Read load far exceeds write load
• Read every field of an object every time any field changed to support indexing
• Read every field of an object to update one
![Page 3: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/3.jpg)
3 © Jive confidential
What we needed
• Joins performed at write time (materialized views)
• Horizontally scalable
• No single point of failure
• Incremental updates
• Notification of changes
• Idempotency
• Tolerance of duplicate and out of order input
• Front end developers work against their object model rather than HBase specific constructs.
![Page 4: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/4.jpg)
4 © Jive confidential
What we built: Tasmo
Stateless HA service which
• Maintains materialized views of data
• Consumes our model (declaration of input and output types)
• Notifies consumers when views change
• Replaces all our relational db usage
![Page 5: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/5.jpg)
5 © Jive confidential
How we consume and render our model
• Every reader of our model defines views for Tasmo to maintain
• Views contain joined/filtered data specific to point of use
• Readers of these views render output or further process the data
eventsHbase ReadersTasmo
read viewsread / write
View definition ViewsViewsViews
![Page 6: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/6.jpg)
6 © Jive confidential
How we declare our input and output (Model)
Type: Content● Subject: String● Body: String● Container: Reference● Author: Reference
Event Declarations
Type: User● Username: String● First Name: String● Last Name: String● Creation Date: Long
Type: Content● Subject● Container (Type: Folder)
○ Name○ ModDate
● Author (Type: User)○ Username○ CreationDate
View Declaration
![Page 7: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/7.jpg)
7 © Jive confidential
Event > Model > View > Web Page
body = “When can we try it?”
Model
Container
Content Author
Comment
TasmoHbase
View
Comment Event
![Page 8: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/8.jpg)
8 © Jive confidential
Web Page backed by View Instance
![Page 9: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/9.jpg)
9 © Jive confidential
How we notify consumers
• Consumers register for notifications on a type of view
• Applying an event to the model in Tasmo results in the set of affected view instances.
• We push the modified view instances to registered consumers
Search
eventsTasmo
notify
Binary storage
Activity Analysis
![Page 10: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/10.jpg)
10 © Jive confidential
How we maintain search indices
• Define views of data which correspond to the index schemas
• Indexing engine registers for notifications of these view types
• Tasmo fires notifications for affected view instances per event
• Indexing engine reads the modified views, which represent complete and up to date documents for indexing.
Search
events
Hbase
Tasmo
notify
readindexviews
read / write
![Page 11: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/11.jpg)
11 © Jive confidential
10,000 feet how it works
Consumes events, consults configuration describing joins and selects, applies all relevant changes in event to update data views
Values ExistenceRelationships
Writeevents
Relationships Views
Traverse Join / Select
writes scans
concurrencyconsistency
retry ( multiversion concurrency)
updates /removes
Tasmo
![Page 12: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/12.jpg)
12 © Jive confidential
Taking over time
• Snowflake id for every event - makes them unique and time orderable
• Event time is based on when the system receives an event
• Event time is used as HBase cell timestamp - logically stale writes no op
• Event time has the room to disambiguate add vs remove:o Snowflake ids are even numbers.o Snowflake is used directly for addso Snowflake -1 is used for removeso For a given event - adds trump removes
![Page 13: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/13.jpg)
13 © Jive confidential
Concurrency Issues
• Problem: As different events add/remove relationships in parallel, we can fail to add/remove elements of views.
• Solution: Per relationship high water marks maintained in an HBase table. We test the per relationship times we saw during a path traversal against the high water mark. If we detect we are stale, we retry the operation.
![Page 14: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/14.jpg)
14 © Jive confidential
Why HBase?
• Timestamp control
• Row level atomicity of changes
• Performance and proven scalability
![Page 15: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/15.jpg)
15 © Jive confidential
Roadmap
• Production later this year. Currently heavily used by developers at Jive.
• Looking at what work could be moved into coprocessors.
• Considering double writes into two HBase clusters for higher availability if MTTR is too high in our environment.
![Page 16: Tasmo: Building HBase Applications From Event Streams](https://reader035.fdocuments.in/reader035/viewer/2022062405/554f7243b4c905c8088b568a/html5/thumbnails/16.jpg)
16 © Jive confidential
Questions and Answers
Open source
https://github.com/jivesoftware/tasmo
Please Help!
[email protected]@jivesoftware.com