Self-Service Access and Exploration of Big Data
-
Upload
inside-analysis -
Category
Technology
-
view
584 -
download
0
Transcript of Self-Service Access and Exploration of Big Data
![Page 1: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/1.jpg)
The Briefing Room
![Page 3: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/3.jpg)
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
![Page 4: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/4.jpg)
Twitter Tag: #briefr
The Briefing Room
December: Innovators
January: Big Data
February: Analytics
March: Data in Motion
![Page 5: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/5.jpg)
Twitter Tag: #briefr
The Briefing Room
Innovators
! Charles Babbage conceived the Analytical Engine in 1834.
! Automation and ease of use have driven innovation in computing ever since.
! The Cloud and Big Data are raising the bar.
![Page 6: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/6.jpg)
Twitter Tag: #briefr
The Briefing Room
Robin Bloor is Chief Analyst at The Bloor Group
Analyst: Robin Bloor
![Page 7: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/7.jpg)
Twitter Tag: #briefr
The Briefing Room
! Cirro provides a single method to access any type of data, on any platform, in any environment.
! Its product suite consists of Cirro Data Hub, Analyst for Excel and Multi Store – all designed to remove complexity from Big Data analytics.
! Cirro’s products are cloud based and can run in public, private and on-premise environments.
Cirro
![Page 8: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/8.jpg)
Twitter Tag: #briefr
The Briefing Room
Mark Theissen
Mark is CEO at Cirro. He is a respected analytics and data warehousing expert with more than 22 years in the industry. Most recently Mark was the worldwide data warehousing technical lead at Microsoft following the acquisition of DATAllegro. At DATAllegro Mark was the COO and a member of the board of directors. Prior to joining DATAllegro, Mark was Vice President and Research Lead at META Group
(Gartner Group) for Enterprise Analytics Strategies, covering data warehousing, business intelligence and data integration markets. Before META, Mark was VP of Professional Services at Accruent where he was responsible for domestic and overseas services and operations. Mark has a BS in Computer Information Systems from Chapman University and a MBA from the University of California, Irvine.
![Page 9: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/9.jpg)
©2012 Cirro Inc. All rights reserved.
Corporate Overview
Bringing Big Data to the Desktop
![Page 10: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/10.jpg)
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
![Page 11: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/11.jpg)
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
![Page 12: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/12.jpg)
©2012 Cirro Inc. All rights reserved.
The Big Data Dilemma
![Page 13: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/13.jpg)
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
![Page 14: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/14.jpg)
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
![Page 15: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/15.jpg)
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
![Page 16: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/16.jpg)
©2012 Cirro Inc. All rights reserved.
Accessing Big Data
Incumbent Approach Hadoop Approach
![Page 17: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/17.jpg)
©2012 Cirro Inc. All rights reserved.
What the Market Needs
An enterprise data hub to access any type of data, on
any platform, in any environment
![Page 18: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/18.jpg)
©2012 Cirro Inc. All rights reserved.
The Enterprise Data Hub
![Page 19: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/19.jpg)
©2012 Cirro Inc. All rights reserved.
Simplifying the Access to Your Data
Structured -‐ Unstructured Mashups
SQL (mul;ple versions)
Java
Sqoop
Map Reduce
HIVE Hadoop Install & Config
Hive – Scoop Install & Config
Source Control
DataBase Management
Cirro Data Hub
Access tool
Conven/onal Approach People manage the access to data
Cirro Approach Cirro Data Hub manages
access to data
![Page 20: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/20.jpg)
©2012 Cirro Inc. All rights reserved.
Architecture Overview
Cirro Data Hub • Cost based federa;on op;mizer • Smart caching • Dynamic op;miza;on • Normalized cost es;mates • Metadata for unstructured sources
Cirro Func;on Library
• Library of Func;ons • Logic to build complex specific formulas
Cirro Analyst
• Excel plug-‐in that allows analysts to explore & process Big Data and tradi;onal data
Cirro Mul; Store (op;onal)
• Pre-‐built structured/unstructured data store • Used for holding data or addi;onal workspace
![Page 21: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/21.jpg)
©2012 Cirro Inc. All rights reserved.
Typical Deployment
IT Staff • Programmers • Developers • DBA’s
Extend, Add Proprietary
Functions to CFL
Excel Analyst Users • Design Views
• Minimal IT Support
• Publish Views • Data Exploration • Analysis Tableau
Business Objects
Other BI Tools
Data Consumers Access CDH Views via ODBC & JDBC across all data types
RDBMS Oracle Teradata MySQL SQL Ver;ca
HQL
No SQL Splunk Cassandra MongoDB
MapReduce
Cirro Data Hub • Cirro Function Library • Proprietary MapReduce
• Custom Views
MapReduce
Hadoop Distributed File System
Hive
![Page 22: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/22.jpg)
©2012 Cirro Inc. All rights reserved.
Sample Use Case
Summarize the number of tweets per hour with certain keywords from a raw twitter feed.
Requirements: • Use raw twitter data files in Hadoop • Keywords stored in SQL table for easy
manipulation • Results into Tableau Excel for visualization
![Page 23: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/23.jpg)
©2012 Cirro Inc. All rights reserved.
Too Many Skills, Coding, Processing
Write mapper/reducer in java using development tool : • parse twi[er text -‐ convert to lower case -‐ parse words -‐ exclude common words -‐ group words by hour
Import java classes into Hadoop
Execute command line hadoop using CLI • bin/hadoop jar Twi[erParse /home/cloudera/WordCount.jar /usr/tweet/input /usr/local/output –libjars
Move result into HIVE using JDBC SQL tool • create table output1 (text STRING,created_at STRING,count BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
• LOAD DATA INPATH '/usr/data/1-‐88f1-‐864e22e77801/part*'OVERWRITE INTO TABLE output1
Move SQL table with keywords to HIVE through Scoop using CLI • export -‐-‐connect jdbc:mySQL://10.17.185.44/mytable -‐-‐password mypasswd -‐-‐username root -‐-‐table words -‐-‐export-‐dir '/home/cloudera/inpumile
• create table mytable (word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE • LOAD DATA INPATH '/home/cloudera/inpumile/part*'OVERWRITE INTO TABLE mytable
Run HIVE query using JDBC SQL tool • select a.text ,a.created_at ,a.count from output1 a join mytable b on (a.text = b.word )
Import results into Excel using Excel
![Page 24: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/24.jpg)
©2012 Cirro Inc. All rights reserved.
Too Many Skills, Coding, Processing
Write mapper/reducer in java using development tool : • parse twi[er text -‐ convert to lower case -‐ parse words -‐ exclude common words -‐ group words by hour
Import java classes into Hadoop
Execute command line hadoop using CLI • bin/hadoop jar Twi[erParse /home/cloudera/WordCount.jar /usr/tweet/input /usr/local/output –libjars
Move result into HIVE using JDBC SQL tool • create table output1 (text STRING,created_at STRING,count BIGINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE
• LOAD DATA INPATH '/usr/data/1-‐88f1-‐864e22e77801/part*'OVERWRITE INTO TABLE output1
Move SQL table with keywords to HIVE through Scoop using CLI • export -‐-‐connect jdbc:mySQL://10.17.185.44/mytable -‐-‐password mypasswd -‐-‐username root -‐-‐table words -‐-‐export-‐dir '/home/cloudera/inpumile
• create table mytable (word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE • LOAD DATA INPATH '/home/cloudera/inpumile/part*'OVERWRITE INTO TABLE mytable
Run HIVE query using JDBC SQL tool • select a.text ,a.created_at ,a.count from output1 a join mytable b on (a.text = b.word )
Import results into Excel using Excel
B1=Twi[erParse("/user/twi[er/sample","text,created_at")
B2=ToLower(B1,"text")
B3=WordSeparate(B2,"text")
B4=Exclude(B3,"text")
B5=GroupBy(B4,"text,created_at")
B6=Cirro_Match(B5,"text","MYSQL.KeyWords","word",C9)
Results displayed at cell C9
![Page 25: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/25.jpg)
©2012 Cirro Inc. All rights reserved.
Corporate Overview
Bringing Big Data to the Desktop
![Page 26: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/26.jpg)
Twitter Tag: #briefr
The Briefing Room
Analyst: Robin Bloor
Perceptions & Questions
![Page 27: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/27.jpg)
The Bloor Group
Big Data, Hot Data?
![Page 28: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/28.jpg)
The Bloor Group
Hadoop & The Big Data Dynamic
Hadoop has become the de facto reservoir for data
![Page 29: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/29.jpg)
The Bloor Group
Hadoop & The Big Data Dynamic
– We witnessed something like this a long time ago, with ISAM files - before the advent of RDBMS
– The difference this time is that Hadoop has an ecosystem and it is growing
– Big Data (usually caught first by Hadoop) is mostly new data and mostly event data
– Hadoop is not (yet) a performance engine. It is an all-purpose capability
– It is delivering business benefits in a big way: it is hot….
![Page 30: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/30.jpg)
The Bloor Group
BI Categories
Regular reporting/operational BI, Excel
Dashboards, OLAP, BPM, Excel
Data mining, statistical analysis (trends and relationships)
Predictive analytics
HINDSIGHT
OVERSIGHT
INSIGHT
FORESIGHT
![Page 31: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/31.jpg)
The Bloor Group
The New BI Universe (?)
![Page 32: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/32.jpg)
The Bloor Group
Data Sources
Hadoop and
Hadoop ++
Standard SQL NoSQL
Graph DBMS, XML
DBMS, Flat files
Metadata Hub?
![Page 33: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/33.jpg)
The Bloor Group
Problems Of The Data Layer
Hadoop is capable of ETL and often used for ETL, but that usually
involves coding of a kind
A connectivity architecture is needed
IT REQUIRES SIMPLE CONNECTORS
Point to point connectivity usually was, is and may always be a bad
idea
BI tools, which had good-enough interfaces to RDBMS, don’t link to
Hadoop directly, and probably shouldn’t
The data layer is more complicated than it was and its
complexity is increasing
Hadoop is multi-role and hence can spawn multiple instances
![Page 34: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/34.jpg)
The Bloor Group
! How would one use the Cirro Multi Store?
! Which companies/products do you regard as competitors (either directly or close competitors)?
! How does a Cirro implementation proceed, i.e., where do you start, what are the medium term goals, what do you replace?
! Conceptually a hub for the data layer is attractive. But how well does it scale out?
![Page 35: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/35.jpg)
The Bloor Group
! Can the hub be physically distributed, i.e., one logical instance with multiple physical instances?
! How does your proprietary MapReduce differ from Hadoop MapReduce?
! Is there any aspect of BI that you don’t or can’t cater for (CEP, Data governance, MDM, etc.)?
![Page 36: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/36.jpg)
Twitter Tag: #briefr
The Briefing Room
![Page 37: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/37.jpg)
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
January: Big Data
February: Analytics
March: Data in Motion
2013 Editorial Calendar www.insideanalysis.com
![Page 38: Self-Service Access and Exploration of Big Data](https://reader033.fdocuments.in/reader033/viewer/2022051617/55a57e161a28ab4f468b4600/html5/thumbnails/38.jpg)
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention