BA4All Sweden: State of the Art in In-Memory Analytics

Post on 08-Apr-2017

4.165 views 6 download

Transcript of BA4All Sweden: State of the Art in In-Memory Analytics

Use this title slide only with an image

What is the Present State of the Art Of In-Memory Analytics?

Timo Elliott, Innovation Evangelist timoelliott.com

Disclaimer

“i think you’ll find it’s a bit more

complicated than that.”

A Bit of History

LEO: Lyon’s Electronic Office, 1951

Sixty-four 5ft-long mercury tubes, each weighing half a ton, were used to provide a massive 8.75 Kb of memory (i.e. one hundred-thousandth of a today’s entry-level iPhone).

1980s – first in-memory BI tools

Usefulness limited by high cost of memory and limitations of 16bit memory addressing

640KB max memory

1995: Windows 95 & 32-bit Architectures

Qlikview, TimesTen, and others take advantage of new 32bit memory addressing to provide in-memory analytics

Complex Event Processing

Sensor readings – 10’s of thousands per second

Virtually no useful information in a single

isolated event history

e.g. Compare variance of trends

across multiple sensors against historical norms

Event window – e.g. 30 min

Alert

Extracting insight from events

Complex Event Processing

Tradtional BI: “How manyFraudulent credit card transactionsoccurred last week in Madrid?”

1 2 3 4 5 6 7 8 9

time

Complex Event Processing: “when three credit card authorizations for the same card occur in any five secondswindow, deny the requests and check for fraud.”

Continuous Queries

In-Memory and The Internet of Things

CEP Engine

Studio

Input Streams

Sensors

Messages

Transactions

Market data

Clicks

Other datastorage

Alerts

Dashboards

Applications

adapters

Reporting

“Traditional” Business Intelligence

SlowPainful

Expensive

Operational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETLCalculation EngineBusiness Intelligence

Query ResultsQuery

Slow

Painful

ExpensiveOperational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

DataMarts

It’s Like An Onion…

The more layers there are, the more it makes you cry…

What Was The Problem?

Slow Disks & CPUsI/O Bottleneck

Expensive Memory

Optimized for TransactionsBI is an Afterthought

30 Year-Old Database Design Principles

Why Talk About In-Memory?

Analysts Recommend In-Memory

.

“An in-memory data platform offers more than performance benefits”

“Recommendations: Invest in an in-memory data platform to gain competitive edge”

“In-Memory Database Is Gaining Momentum Across All Use Cases”

“In-Memory Delivers Extreme Performance And Scalability”

“In-Memory Data Platform Is No Longer AnOption — It’s A Necessity!”

Companies Like Yours Are Implementing In-Memory

32%run in-memory databases at their location today

75%expect to expand their in-memory use in the next 3 years

More rapid deployments

Greater flexibility

Faster response times / less latency

21%

25%

88%

IT operations

Core business functions

Analytics

25%

42%

58%

Source: 2014 DBTA survey of IT and data managers

Top Uses

Top Benefits

Database vendors are investing in in-memory

The Forrester Wave: In-Memory Database Platforms, Q3 ‘15

All Analytics Vendors Now Support In-Memory To Some Extent

Oracle Database In-Memory Option“The Oracle Database In-Memory option dramatically accelerates the performance of analytic queries by storing data in a highly optimized columnar in-memory format.”

Microsoft SQL Server In-Memory OLTP‘When data lives totally in memory, we can use much, much simpler data structures. When a table is declared memory-optimized, all of its records live in memory.”

DB2 with BLU Acceleration“IBM DB2 with BLU Acceleration speeds analytics and reporting using dynamic in-memory columnar technologies. In-memory columnar technologies provide an extremely efficient way to scan and find relevant data.“

Qlik“In-memory indexing automatically builds and maintains all data relationships from multiple sources for unrestricted exploration”

SAP HANA“A good example of a modern in-memory database technology is SAP's HANA platform. “

Teradata“Teradata uses a hybrid approach to in-memory that intelligently puts the right data in memory to deliver high-speed in-memory performance at a fraction of the cost of putting all data in memory.“

Tableau“The Data Engine is a high-performing analytics database on your PC. It has the speed benefits of traditional in-memory solutions without the limitations that your data must fit in memory.“

Spark“Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.“

What Is In-Memory?And why now?

What Is In-Memory?

Data access times of various storage types relative to RAM (logarithmic scale)

RAM is 300,000 times faster than hard disks

CPU register is 61 million times faster than hard disks

In-Memory Databases vs. Caching

“Much of the work that is done by a conventional, disk-optimized RDBMS is done under the assumption that data primarily resides on disk. Even when a disk-based RDBMS has been configured to hold all of its data in main memory, its performance is hobbled by assumptions of disk-based data residency. When the assumption of disk-residency is removed, complexity is dramatically reduced.”

- Oracle TimesTen Overview

In-Memory Computing Costs have Plummeted

Turning Torso: 190m

Cost of 1 Mb of memory in 2000: ≈$1

In-Memory Computing Costs have Plummeted

Cost of 1 Mb of memory today: ≈ ½ cent

75cm

And shrinking, and shrinking, and shrinking….

IKEAMICKESkrivbord399 kr

Prices Continue to Slide

DRAM production costs drop by 30% every 12 months

In-Memory Computing

Operational Data Store

Data Warehouse

Indexes

Aggregates

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

Up to 1,000x fasterNo optimizations required Data

Marts

Row vs. Column Databases

My Filing SystemMy Wife’s Filing System

Row-based Column-based

Data WarehouseData Warehouse

Column Databases

Operational Data Store

Data Warehouse

DataBusiness ApplicationsCopy

ETL

Calculation EngineBusiness Intelligence

Query ResultsQuery

Up to 1,000x fasterMore data in less space

Massively Parallel Systems

E.g. Netezza technology now part of IBM PureSystems

E.g. Greenplum, now part of EMC

Column Stores, Compression, and Parallel Processing

E.g. DB2 with BLU acceleration

“In-Chip” Processing

E.g. SiSense

Vector-based instructionsCache-optimizedDecompression

Close collaboration between in-memory software vendors and chip developers (e.g. SAP & Intel Haswell)

Data Warehouse

Massively Parallel Hardware

Operational Data Store

DataBusiness ApplicationsCopy

ETL

Business IntelligenceQuery Results

Query

Up to 1,000x fasterOptimized for hardware

Calculation Engine

In-Database Processing

E.g. SAS & Teradata

Move Processing to the Data

Operational (OLTP)

Analytics (OLAP)

Planning Predictive

TextSearch

Spatial

Processing Engines

Relational Stores

Row based Columnar

ETLData Quality

DocumentStore

Object Graph Store

Data Warehouse

In-Database Analytics

Operational Data Store

DataBusiness ApplicationsCopy

ETL

Business IntelligenceQuery Results

Query

Up to 1,000x fasterPush processing down to dedicated hardware, less traffic

Analytic Appliance

Calculation Engine

Real-Time Data

Operational Data StoreCopy

ETL

Real-time replication — why have a separate operational data store?

DataBusiness Applications

Analytic ApplianceBusiness Intelligence

Transactions

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

ACIDACIDcompliance

In-Memory Enterprise Applications

E.g. Microsoft SQL Server In-Memory OLTP

In-Memory Enterprise Applications

E.g. SAP S/4 HANA

Hybrid Transactional Analytical Processing

CopyBusiness Applications

Analytic ApplianceBusiness Intelligence

Use a single platform for both analytics and applications

Data

Virtuous Circle of Technology

In-Memory

Columnar Databases

Hardware Acceleration

Calculation Engine

Columnar storage increases the amount of data that can be stored in limited memory (compared to disk)

Column databases enable easier parallelization of queries

In-memory processing gives more time for

relatively slow updates to column data

In-memory allows sophisticated calculations

in real-time

Hardware acceleration makes sophisticated calculations possible

Each technology works well on its own, but combining them all is the real opportunity — provides all of the upside benefits while mitigating the downsides

Apache Spark

MAP

Reduce

HDFS

MAP

Reduce

DataSource 2

map()

join()

cache()

transform

Hadoop V1 Spark

Lots of Support for Spark

YARN

HDFS

Other Apps

Files Files Files

HANA-Spark Adapter for improved performance between distributed systems

Compiled queries enable applications & data analysis to work more efficiently across nodes

Familiar OLAP experience on Hadoop to derive business insights from big data such as drill-down into HDFS data

Compiled Queries

Spark Adapter

Drill Downs

SAP HANA in-memory platform

Vora

Spark

Vora

SparkIn-Memory

Store

Application Services

Database Services

Integration Services

Processing Services

Vora

SparkHANA-Spark

Adaptor

HANA Smart Data Access, UDFs, Others

Extensive programming support for Scala, python, C, C++, R, and Java allow data scientists to use their tool of choice,

Enable data scientists and developers who prefer Spark R, Spark ML to mash up corporate data with Hadoop/Spark data easily

Optionally, leverage HANA’s multiple data processing engines for developing new insights from business and contextual data.

Spark Extensions

SAP HANA Vora

Persistence & Failover

Next-Generation Chips Are On Their Way

NVMnon-volatile memory

Scale Up

4,294,967,296x256x

16 bit 32 bit 64 bit64 kilobytes 4 gigabytes 16 exabytes

Directly addressable memory

What About Scale?

There are now systems with more than half a petabyte of in-memory, and growing…

Balancing Data Temperature and Costs

Hot

Warm

Cold

Data is accessed frequently

Data is not accessed frequently

Data is only accessed sporadically

Volumeof data

Performance(and direct cost)

Many different solutions possible

What Type of In-Memory Is The Right One?

 

Complex ROI calculations

Data volumes

Relative costs (?)

Cost of storage

Value of speed

Value of agility

Fast-Moving Market

Hybrid vs. Pure In-Memory Tradeoffsdata duplication vs single source

Legacy + In Memory Approach

ad hoc: made or done without planning because of an immediate need. (Merriam-Webster dictionary)

DISPATCH/MERGE

Results

Query

Current Data

StaleDuplicated

Data

Current DataQuery

Select all data from one memory store

Results

Pure In-Memory Approach

Unpredictable Response Times Responses based on Obsolete Data

Real-time Responses on Current Data

replicated vs real-timeunpredictable response times vs consistent response times

Top Benefits

Speed

“If things seem under control, you’re just not going fast enough.”

- Mario Andretti

Real-Time Operations

Instead of analyzing the shards of glass after the accident, what if you could catch the vase BEFORE it hit the ground?

Agility (Speed of Change)

Simplification = Lower Costs

“In-memory changes the cost equation through simplification.

It can help save costs on hardware and software, as well as reduce labor required for administration and development needs.

Based on a composite cost model, an in-memory platform can save an organization 37% across hardware, software, and labor costs, depending on various factors.”

Lower Costs

“Don’t let somebody say to you we can’t go in-memory because it’s so much more money. Acquisition costs may be higher. If you calculate out a TCO, it’s going to be less.”

Donald Feinberg, Gartner

The price of light… …is less than the cost of darkness

ROI = Return On Ignorance?

New, Simpler Infrastructures and Business Models

Weissbeerger Beverage Analytics

Conclusion

Myths & Facts

It’s a niche technology to run analytics faster

It has been around since late 1990s

The main users of in-memory analytics are SMBs

Entire industries (SaaS, social networks, financial trading, online gaming) would not exist as we know them today without in-memory computing

More than 50 software vendors deliver in-memory technology

Small number of in-memory vendors

Only for deep-pocketed organizations

New and unproven

Myths Facts

Business Impact of In-Memory Computing

• Reducing applications running cost via data base/legacy applications offloading

• Improving transactional applications performance• Enabling horizontal, elastic scalability (scale up/down)• Boosting response time in analytical applications• Low latency (<1 microsecond) application messaging• Dramatically shortening batch processes execution time• Enabling real-time, "self-service" business intelligence and

unconstrained data exploration• Detecting correlations/patterns across million of events in "a

blink of an eye"• Supporting "big data" (big data needs big memory)• Running transactional and analytical applications on the

same physical dataset

Run the business

Grow the business

Transform the business

Opportunities:

Bus

ines

s Im

pact

In-Memory Changes Everything

“In-memory computing will have a long-term, disruptive impact by radically changing users’ expectations, application design principles, products’ architecture and vendors’ strategy.”

— Gartner

Thank you!

telliott@timoelliott.comtwitter

web site

email