The New Frontier: Optimizing Big Data Exploration

39
Grab some coffee and enjoy the pre-show banter before the top of the hour!

Transcript of The New Frontier: Optimizing Big Data Exploration

Grab some coffee and enjoy the pre-show banter before the top of the hour!

The Briefing Room

The New Frontier: Optimizing Big Data Exploration

Twitter Tag: #briefr

The Briefing Room

Welcome

Host: Eric Kavanagh

[email protected] @eric_kavanagh

Twitter Tag: #briefr

The Briefing Room

!   Reveal the essential characteristics of enterprise software, good and bad

!   Provide a forum for detailed analysis of today’s innovative technologies

!   Give vendors a chance to explain their product to savvy analysts

!   Allow audience members to pose serious questions... and get answers!

Mission

Twitter Tag: #briefr

The Briefing Room

Topics

This Month: BIG DATA

March: CLOUD

April: BIG DATA

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

The Age of Exploration

The Age of DATA

Twitter Tag: #briefr

The Briefing Room

Analyst: Robin Bloor

Robin Bloor is Chief Analyst at The Bloor Group

[email protected] @robinbloor

Twitter Tag: #briefr

The Briefing Room

Cirro

! Cirro provides a single method to access any type of data, on any platform, in any environment

!   Its product suite consists of Cirro Data Hub, Analyst for Excel and Multi Store – all designed to remove complexity from Big Data analytics

! Cirro’s products are cloud based and can run in public, private and on-premise environments

Twitter Tag: #briefr

The Briefing Room

Guest: Mark Theissen

Mark is CEO at Cirro. He is a respected analytics and data warehousing expert with more than 22 years in the industry. Most recently Mark was the worldwide data warehousing technical lead at Microsoft following the acquisition of DATAllegro. At DATAllegro Mark was the COO and a member of the board of directors. Prior to joining DATAllegro, Mark was Vice President and Research Lead at META Group

(Gartner Group) for Enterprise Analytics Strategies, covering data warehousing, business intelligence and data integration markets. Before META, Mark was VP of Professional Services at Accruent where he was responsible for domestic and overseas services and operations. Mark has a BS in Computer Information Systems from Chapman University and a MBA from the University of California, Irvine.

Briefing Room 2/11/14

Next  Genera*on    Data  Federa*on  

©2014 Cirro Inc. All rights reserved.

•  Access any data •  On any platform •  Without ETL or the cost and complexity of a

semantic layer

Cirro is the ONLY Solution that can:

   “  What  used  to  take  2-­‐4  weeks  is  now  done  in  a  ma;er  of  minutes.    Cirro  is  a  ‘game-­‐changing’  approach  to  visualizing  mul*-­‐structured  big  data  and  integra*ng  it  with  other  data  sources.”  

Director  of  Business    Intelligence  

On Demand Distributed Analysis

©2014 Cirro Inc. All rights reserved.

Cirro Enterprise Data Hub

Visualization Tools

Real-time Federation

Data Language Translation

Data Movement & Management

Cirro Data Hub

RDBMS

HDFS

NoSql

Legacy

BI Tools

CLI

Real-time Federation

Data Language Translation

Excel

SaaS

©2014 Cirro Inc. All rights reserved.

How Federation Works

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

©2014 Cirro Inc. All rights reserved.

How Federation Works

Oracle   Hadoop   SQL  Server  

SQL  predicates,  local  joins  

SQL  predicates  

Standard  SQL  

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

Row  processing  pushed    into  data  systems  

MapReduce  

©2014 Cirro Inc. All rights reserved.

How Federation Works

Oracle   Hadoop   SQL  Server  

SQL  predicates,  local  joins  

SQL  predicates  

Standard  SQL  

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

Row  processing  pushed    into  data  systems  

MapReduce  

50k  Rows    5k  Rows  50m  Rows  

©2014 Cirro Inc. All rights reserved.

How Federation Works

Oracle   Hadoop   SQL  Server  

SQL  predicates,  local  joins  

SQL  predicates  

Standard  SQL  

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

Row  processing  pushed    into  data  systems  

MapReduce  

50k  Rows    5k  Rows  50m  Rows  

Limited  movement  

Limited  movement  

©2014 Cirro Inc. All rights reserved.

How Federation Works

Oracle   Hadoop   SQL  Server  

SQL  join,  aggregaEon  

Standard  SQL  

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

Row  processing  pushed    into  data  systems  

©2014 Cirro Inc. All rights reserved.

How Federation Works

Oracle   Hadoop   SQL  Server  

Results  

Standard  SQL  

I  have  a  table  on  SQL  Server  that  needs  to  join  to  tables  on  Oracle  and  Hadoop  

Row  processing  pushed    into  data  systems  

UI  Tools  

Data  Marts;  in  the  Cloud  or  Data  Center  

BI  Server  

Results  Des)na)on  Op)ons  

©2014 Cirro Inc. All rights reserved.

Completing The Solution…

•  Cirro Data Hub – Federated query processing •  Use any tool •  The fastest distributed

processing possible •  Cirro Analyst

•  Data discovery •  Mash up data like never before •  Go beyond SQL •  Publish

•  Cirro Multi Store •  Stage, Store, Process •  Highly scalable

©2014 Cirro Inc. All rights reserved.

Next Generation Data Federation

•  Designed & Built for Big Data •  Compatible with structured, semi-structured & unstructured data •  Works in the cloud, in the data center, or both

•  Real-Time Federation •  Queries are dynamically optimized and executed, taking the

processing to the data

•  Enables ad-hoc query and exploration of all data •  No Semantic Layer Required

Ask Questions You Couldn’t Ask Before

©2014 Cirro Inc. All rights reserved.

Cirro Federation vs. Data Virtualization

•  Excellent  for  data  exploraEon  and  discovery  

•  Excellent  for  ad-­‐hoc  queries  •  True  federated  processing  –  minimal  

data  movement,  no  server  boPlenecks  –  ‘pushes  processing  to  the  data’  

•  Easy  setup,  maintenance  &  administraEon  

•  Hadoop  –  can  execute,  Hive  &  Impala  queries  along  with  MapReduce  programs  

•  Pathway  to  NoSQL  

•  Not  appropriate  for  data  exploraEon  or  discovery  –  requires  you  to  know  the  quesEons  you  want  to  ask  in  advance  of  accessing  the  data  

•  Not  true  federated  processing  –  final  joins  and  aggregaEons  done  on  VirtualizaEon  Server  

•  Good  for  structured  data  processing  workloads  

•  Labor  intensive  setup,  maintenance  &  administraEon  for  modeling  and  semanEc  layer  

•  Hadoop  –  limited  to  Hive  access    

Cirro   Data  Virtualiza)on  

©2014 Cirro Inc. All rights reserved.

Data Federation Use Cases

•  On Demand Distributed Data Analysis •  Data warehouse offloading •  Business intelligence federation •  Self-service data exploration and discovery •  Entry point for private cloud analytics •  SaaS, Hadoop and/or NoSQL integration with

enterprise data sources •  Simplify application development

©2014 Cirro Inc. All rights reserved.

On Demand Intra-Day Analytics

Solution •  Cirro Data Hub; Cirro Analyst

Results •  On demand analytics supports faster

trend analysis and the ability to identify data anomalies

•  Cross-platform data access reduced from weeks to minutes

•  Flexible/iterative using in-house BI tools •  Enables self-service data mash-ups by

analysts across all data sources

   

Business Challenge •  Data that drives trading analytics &

decisions in data silos •  Inability to analyze data ‘fast enough’ to

make informed trading decisions •  ETL tools and manual data consolidation is

too slow and inflexible for hourly or daily iterative analysis

•  Inability to join traditional data with cloud sources

Financial  &  Energy  /  UElity  Markets  

©2014 Cirro Inc. All rights reserved.

Ask Questions You Couldn’t Ask Before

Last Market Price

Oracle - Pricing DW Transaction Data

Tableau Actionable Visualizations

Subscrip)on  Market  Data  

©2014 Cirro Inc. All rights reserved.

Ask Questions You Couldn’t Ask Before

A

Anonymous Behavior

Transactional Data

Ads viewed/clicked

Actionable Visualizations

©2014 Cirro Inc. All rights reserved.

The Business Impact

• Agility;  conducEng  analysis  previously  unavailable  • CompeEEve  advantage  • Supports  ad-­‐hoc  analysis  ,  Fastest  Eme  to  value  • Leverage  in-­‐house  BI  tools  –  no  new  tools  to  learn  

Improved  Business  OperaEons  

• TradiEonal  architectures  not  designed  for  Big  Data  • Easily  add  new  data  sources  –  RDBMS,  Hadoop,  NoSQL  • Easy  to  install,  use  &  manage  • Future  proof  analyEcs  developed  

Streamline  IT  Processes  

• ReducEon  in  license  costs  on  EDW  and  RDBMS    • Time  &  cost  savings  associated  with  data  staging,  modeling,  ETL  work,  etc.  

• No  new  BI  applicaEons  to  buy  -­‐  use  exisEng  BI  tools  • No  new  skills  to  develop  

Cost  Savings  

Twitter Tag: #briefr

The Briefing Room

Perceptions & Questions

Analyst: Robin Bloor

The Visible “Big Data” Trend

u  Corporate data volumes grow at about 55% per annum - exponentially

u  Data has been growing at this rate for, maybe, 40 years

u  There is nothing new about big data; it clings to an established exponential trend

The Invisible Trend: Moore’s Law Cubed

u  The biggest databases are new databases

u  They grow at the cube of Moore’s Law

u  Moore’s Law = 10x every 6 years u  VLDB: 1000x every 6 years

•  1991/2 megabytes •  1997/8 gigabytes •  2003/4 terabytes •  2009/10 petabytes •  2015/16 exabytes

Whys and Wherefores?

u  Why do we assemble such gargantuan heaps of data?

u  While the data volume has grown like bamboo in spring, the size of executables has not?

u  Why not just move the processing to the data?

u  This is surely an option worth exploring – maybe it is even one of the foundations for Big Data Architecture…

No Country for Old DBMS (Thinking)

Questions are Easy, Answers Difficult

The WORKLOAD Conundrum

The DISTRIBUTION

Conundrum

The DATA FLOW

Conundrum

The REAL-TIME

Conundrum

u  What are the primary applications where Cirro makes a big impact?

u  What is (roughly) the largest number of data sources Cirro federates in any implementation?

u  What’s the most resource deployed for the largest Cirro implementation? How much memory?

u  How does “fault tolerance” work?

u  How difficult is it to develop applications employing Cirro? Is it significantly different to a DBMS?

u  Are any companies adopting this technology strategically?

u  Which technologies/companies do you regard as competition?

Twitter Tag: #briefr

The Briefing Room

Twitter Tag: #briefr

The Briefing Room

Upcoming Topics

www.insideanalysis.com

2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room

This Month: BIG DATA

March: CLOUD

April: BIG DATA

Twitter Tag: #briefr

The Briefing Room

THANK YOU for your

ATTENTION!

Twitter Tag: #briefr

The Briefing Room

Photo credit for Slide 28: Lenny’s Alice in Wonderland site: http://www.alice-in-wonderland.net/