www.company.com
Big Data Integration-SnapLogic-
www.company.com
Defining Big Data
• Volume
– MB/ GB/ TB/ PB/
• Variety
– Table/ Data Base/ Photo, Web, Audio/Social, Video,
Unstructured, Mobile
• Velocity
– Batch/ Periodic/ Near Real Time/ Real Time
www.company.com
Unstructured Data Growing Rate
0
2
4
6
8
10
12
14
16
2004 2008 2012 Growing exponentially
Structured Data Unstructured Data
www.company.com
Hadoop
• Hadoop is an open source
framework.
• HDFS – Hadoop
Distributed File System
• MapReduce – Batch
Processing
www.company.com
Big Data Analytics
Hadoop vs. SnapLogic
– Hadoop usually runs on Linux and it is built on top of
Linux.
– SnapLogic offers cloud based integration, which means
it can run on any operating system.
www.company.com
Big Data Analytics - Hadoop• Must be familiar with
Linux commands
• Understanding
architecture of HDFS&
MapReduce
• Configuration files
• Dependencies
• Managing Cluster and
each node
• Understanding and
Managing Hadoop
Ecosystem components
• HiveQL, Pig Latin, Java,
Python, Scala………etc
www.company.com
Big Data Analytics - SnapLogic• Basic Understanding of
HDFS& MapReduce
• Linux commands are not
required – Drag & Drop
• Not much programming
needed
• Configuration files are
already set to go
• No dependency issues
• No Hadoop Ecosystem
Components
• Good compatibility with
other tools such as
Tableau, RedShift and
many others.
www.company.com
Cloud Based Integration
www.company.com
Example – Twitter Analysis
• Hadoop
• Hadoop Ecosystem Components
• Download Flume – extract the file
• Configure Variables in ~/.bashrc – setting
directories
• Setup Twitter API = channel, host name, file
format, batch size, write format, transaction
capacity and etc.
• Start Stream the Twitter data into HDFS –
• Download Hive – extract the file
• Configure variables in ~/.bashrc……
www.company.com
Example – Twitter Analysis
SnapLogic
www.company.com
Hospitla Evaluation Data Example
• Used Hadoop Ecosystem Components – Flume,
Hive. Used SerDe for quotated values.
Dependencies needed such as JDK 1.7.
www.company.com
Hospitla Evaluation Data Example
• By using SnapLogic, this can be easily done.
www.company.com
However…
• Not an open source tool.
• Less variety compare to Apache projects.
www.company.com
Thank you
• Contact Information:
• Hyun Kim, Practice Head for Big Data
Top Related