Big data, just an introduction to Hadoop and Scripting Languages
Introduction to Big Data and Hadoop
-
Upload
febiyan-rachman -
Category
Technology
-
view
496 -
download
1
Transcript of Introduction to Big Data and Hadoop
Big Data Sharing Session
Febiyan Rachman Data Science Indonesia
@febiyanr
http://id.linkedin.com/in/febiyan
Logos, images, and pictures shown in this presentation
belong to their respective owners. No copyright
infringement intended.
• Background of Big Data
• History of Hadoop
• Technologies Around Big Data
• What Big Data Should Be
Agenda
Data Explosion
Human Generated
Business Generated
Machine Generated
Interaction Generated
Artwork Courtesy of Teradata
YEAR WHAT
2002 Doug C. & Mike C. started working on Nutch
2003 Google’s GFS paper
2004 Nutch Distributed File System (NDFS) – Doug Cutting
2004 Google’s MapReduce paper
2004-2005 Nutch MapReduce Implementation
2006 NDFS and Nutch MapReduce became Hadoop
2008 Hadoop became top-level Apache Project
History of Hadoop
• Store and process huge amount of data – “Big Data”
• Designed for affordable commodity servers
• Scale horizontally
Why Hadoop?
• Distributed file system
• A single logical storage
• Breaks files into blocks
• 3 replications – fault-
tolerant
HDFS
• A processing framework
• Process data locally – bring
apps to data!
• Distributed process
MapReduce
Ingest
• Flume
• Kafka
• Sqoop
• …
Technologies Around Big Data
Store
• HDFS
• HBase
• Cassandra
• MapR-FS
• MapR-DB
• …
Orchestrate
• ZooKeeper
• YARN
• Oozie
• Mesos
• Hue
• …
APIs and Interfaces
• Hive
• Impala
• Pig
• Mahout
• Zeppelin
• …
Technologies Around Big Data (II)
Framework/Platform
• MapReduce
• Spark
• Storm
• Flink
• Teradata Aster
• …
It is not just about technology.
It is not just about acquiring storing data.
Big Data?
“It is more of an initiative that
demands more analytics from all
available data.”
Data-Driven Companies Outperform
Data-driven Companies
Companies with Low Reliance on Data
Data-driven companies are more likely to outperform their competitors when it comes to profitability
They are also more likely to have a culture of creativity and innovation
And are better positioned for top-down and bottom-up cultural evolution and success:
Top Leaders who Launch and Drive Data Initiatives
68% 40%
VS.
78% 37%
65% 42%
70% 41%
59% 33%
55% 24%
55% 28%
They are more likely to realize the benefits of data, including:
Better Knowledge Sharing
More Collaborative Organization
Greater Quality and Speed of Execution
Faster Decisions
VS.
VS.
VS.
VS.
VS.
VS.
VS.
Artwork Courtesy of Teradata
Start with a
vision.
Start with
valuable use
cases.
Thank You
@febiyanr
http://id.linkedin.com/in/febiyan
Logos, images, and pictures shown in this presentation
belong to their respective owners. No copyright
infringement intended.