Give Me My Damn Report: Making NoSQL Data Accessible to the Business

45
@slamdata @jdegoes John A. De Goes — CTO SlamData Inc. Give Me My Damn Report: Making NoSQL Data Accessible to the Business

Transcript of Give Me My Damn Report: Making NoSQL Data Accessible to the Business

Page 1: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

John A. De Goes — CTO SlamData Inc.

Give Me My Damn Report: Making NoSQL Data Accessible to the

Business

Page 2: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Agenda

1. The Rise of NoSQL2. The Dark Side of NoSQL3. Options for Reporting

a. Extract-Transform-Loadb. Fat Driversc. Code to NoSQL APIsd. Native NoSQL Analytics

4. Why NoSQL Analytics is Hard5. NoSQL Databases: Not Equal6. Question & Answer

Page 3: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Rise of NoSQL

Page 4: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Rise of NoSQL

Page 5: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Rise of NoSQL

● Massively scalable

● Operational Ease-of-Use

● Native support for rich data structures

● Native Support for heterogeneity

● Rapid Time-to-Deployment

Page 6: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Rise of NoSQL

Page 7: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Dark Side of NoSQL

Page 8: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Dark Side of NoSQLOverview

Page 9: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Dark Side of NoSQL

Give Me My Damn Report!

● Ad hoc analytics

● Exploratory analytics

● Operational analytics

● Analytics dashboards

● Batch reporting

● IoT / Event analytics

Need for Analytics

Page 10: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Dark Side of NoSQLSQL Analytics

Page 11: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The Dark Side of NoSQL

1. ETL2. Fat Drivers

3. Code to NoSQL API4. Native NoSQL ANalytics

Choices

Page 12: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Options for Reporting

Page 13: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-Load

{"user_id": "[email protected]",

"profile": {

"name": "Mary Jane",

"addresses": [{

"city": "London",

"country": "UK"

}],

"band_plays": {

"Squirrel Nut Zippers": 56,

"Red Hot Tomatoes": 19,

"Big Bad Voodoo Daddy": 102

}

}

SQL /Hadoop

Overview

Page 14: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-Load1. Flattening

users

user_id

[email protected]

...

...

band_plays

user_id band_name play_count

[email protected] Squirrel Nut Zippers 56

[email protected] Red Hot Tomatoes 19

[email protected] Big Bad Voodoo Daddies 102

profiles

profile_id user_id name

1 [email protected] Mary Jane

addresses

profile_id city country

1 London UK

Page 15: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-Load2. Homogenization

events

type user_id genre_name artist_name band_name play_count

“band_play” ... NULL NULL “Squirrel Nut Zippers” 56

“artist_play” ... NULL “Frank Sinatra” NULL 19

“genre_play” ... “New Age” NULL NULL 102

Page 16: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-Load3. Incremental ETL

1. Last_modified Field2. Import changed data*

* Less relevant for Hadoop

Page 17: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-LoadTools

Page 18: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Extract-Transform-LoadReport Card

✗ Slow

✗ Painful

✗ Brittle

✓ Tunable Performance

✓ Unlimited Flexibility in Reporting / Analytics

Page 19: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Fat DriversOverview

Driver

Embedded SQL Engine

Real-Time ETL(Filtered Table Scan)

Page 20: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Fat DriversApproaches

Magic Config

Page 21: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Fat DriversVendors

Page 22: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Fat DriversReport Card

✗ Slow

✗ Limited to Small Data

✗ Limited to Simple Analytics

✗ Limited to Simple Data

✓ Low Friction

✓ Flexibility in Analytics / Reporting

Page 23: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Code to NoSQL APIOverview

Code

CSV

HTML5/Javascript

Page 24: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Code to NoSQL APIReport Card

✗ Slow

✗ Painful

✗ Brittle

✗ Performance

✓ No ETL

Page 25: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Native NoSQL AnalyticsOverview

Native NoSQL Analytics

Page 26: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Native NoSQL AnalyticsTools

SQL (+/-)

Visual Analytics

ETL (+/-) Native

ZoomData

Cloud 9 Charts

JSON Studio

Apache Drill

Quasar

SlamData

Impala

Page 27: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Native NoSQL AnalyticsReport Card

✗ Immature

✗ Learning Curve

✗ Limited Choices

✓ No ETL

✓ Flexible & Fast

✓ Any data, Anywhere

✓ Tunable Performance

Page 28: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

Why NoSQL Analytics Is Hard

Page 29: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

The

Eight

Deadly Obstacles

to NoSQL Analytics

Page 30: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics1. Generic Data Model

Page 31: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics2 Isomorphic Data Model

Data SQL²

{

"userId": 8927524,

"profile": {

"name": "Mary Jane",

"age": 29,

"gender": "female"

},

"comments": [{

"id": "F2372BAC",

"text": "I concur.",

"replyTo": [9817361, "F8ACD164F"],

"time": "2015-02-03"

}, {

"id": "GH732AFC",

"replyTo": [9654726, "A44124F"],

"time": "2015-03-01"

}]

}

SELECT comments[*].replyTo[*] FROM data

Page 32: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics3. Multidimensionality

Data SQL²

{"user_id": 928347234,

"email": null,

"events": [

{"impression":{

"ts": 912348934,

"page": "index.html"}}]}

SELECT user_id, [events[_] WHERE events[_].ts < 9347234 ...] AS events FROM visitors

Page 33: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics4. Unified Schema/Data

Data SQL²

{"user_id": "[email protected]",

"band_plays":{

"Squirrel Nut Zippers": 56,

"Red Hot Tomatoes": 19,

"Big Bad Voodoo Daddy": 102}}

SELECT band_plays{*:} AS artistName, SUM(band_plays{*}) AS votes FROM music GROUP BY band_plays{*:}

Page 34: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics5. Polymorphic Queries

Data SQL²

{"type": "click",

"link": "http://foo.com"

"timestamp": 123987172}

{"type": "impression",

"page": "index.html"

"timestamp": 92372}

SELECT COUNT(*) AS count, timestamp FROM data GROUP BY timestamp

Page 35: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics6. Post-Relational

Data SQL²

{"name": "John Doe",

"blog_posts": [

{"post_id": "89934"},

{"post_id": "92371"}

]}

SELECT authors.name, posts.title FROM authors JOIN posts ON authors.blog_posts[*].post_id = posts._id

Page 36: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics7. Runtime Type Id & ConverSION

Data SQL²

{"email": ["[email protected]",

"[email protected]"]}

{"email": {

"home": "[email protected]",

"work": "[email protected]"}}

SELECT

CASE TYPEOF email

-- old: email stored in 2nd el:

WHEN 'array' THEN email[1]

-- new format:

WHEN 'map' THEN email.work

ELSE email

END AS email

FROM users

Page 37: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

CHaracteristics8. Structural Pattern Matching

Data SQL²

{"user_id": "[email protected]",

"events": [{"type": "purchase",

"timestamp": 12392342,

"order_id": "2ffa34aa"},

{"type": "click",

"timestamp": 92327123,

"link": "http://foo.com"}]}

SELECT

CASE user_events

WHEN […, e1, e2, …] THEN

e1.timestamp - e2.timestamp

END AS delta

FROM users

Page 38: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

Page 39: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

Desired Characteristics

1. DUal Operations & Analytics2. In-Database Analytics

3. General-Purpose Analytics4. Native Report Tooling

Page 40: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

Couchbase

✓ Dual Operations / Analytics

✓ In-Database Analytics

✓ General-Purpose Analytics

✗ Native Report Tooling

Best Reporting Option: Fat DriversRunner-Up: Code to NoSQL APIs

Page 41: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

MarkLogic

✓ Dual Operations / Analytics

✓ In-Database Analytics

✓ General-Purpose Analytics

✗ Native Report Tooling

Best Reporting Option: ETLRunner-Up: Code to NoSQL APIs

Page 42: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

MongoDB

✓ Dual Operations / Analytics*

✓ In-Database Analytics

✗ General-Purpose Analytics

✓ Native Report Tooling

Best Reporting Option: Native NoSQL AnalyticsRunner-Up: Code to NoSQL APIs

* Further maturation needed

Page 43: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

ElasticSearch

✓ Dual Operations / Analytics

✓ In-Database Analytics

✗ General-Purpose Analytics

✗ Native Report Tooling

Best Reporting Option: Code to NoSQL APIsRunner-Up: ETL to Hadoop

Page 44: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

NoSQL Databases: Not Equal

Cassandra

✗ Dual Operations / Analytics*

✗ In-Database Analytics*

✗ General-Purpose Analytics

✗ Native Report Tooling

Best Reporting Option: ETLRunner-Up: Code to NoSQL APIS*

* Real-time analytics

Page 45: Give Me My Damn Report: Making NoSQL Data Accessible to the Business

@slamdata @jdegoes

THE ENDQuestions?