Sharing bisnis big data v3 part1
-
Upload
dwika-sudrajat -
Category
Business
-
view
126 -
download
0
Transcript of Sharing bisnis big data v3 part1
![Page 1: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/1.jpg)
Akselerasi Pertumbuhan Startupdengan Big Data
Dwika SudrajatIT Consultant
Florida, Hong Kong & Jakarta.November 23th, 2016
▐ email: [email protected]▐ Florida: +1-407-2502812▐ Hong Kong: +852-54152971▐ Jakarta: +62-8161108571▐ FB: dwika.sudrajat▐ TW: @dwikasudrajat▐ managingconsultant.blogspot.com▐ dwikasudrajat.blogspot.com▐ dwikasudrajat.wordpress.com
![Page 2: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/2.jpg)
Peluang Pekerjaan
Page 2
![Page 3: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/3.jpg)
Page 3
Startup Team at Work
![Page 4: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/4.jpg)
Page 4
Startup Team Creating Mobile Apps
![Page 5: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/5.jpg)
Page 5
What technologies do you think they are running on?
![Page 6: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/6.jpg)
Page 6
Conventional Startup Development Team
![Page 7: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/7.jpg)
Page 7
Today Startup Development Team
![Page 8: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/8.jpg)
Page 8
From LAMP to MEAN
![Page 9: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/9.jpg)
Page 9
![Page 10: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/10.jpg)
Page 10
Modern web development stack
![Page 11: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/11.jpg)
Page 11
MEAN.JS a full-stack JavaScript using MongoDB, Express, AngularJS, and NodeJS
![Page 12: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/12.jpg)
What is Big Data?
![Page 13: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/13.jpg)
Page 13
Data
![Page 14: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/14.jpg)
Page 14
Hadoop, Why?
![Page 15: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/15.jpg)
Hadoop, Volume, Velocity, Variety
![Page 16: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/16.jpg)
Page 16
Data Growing
![Page 17: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/17.jpg)
Real Application of Big Data Today
![Page 18: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/18.jpg)
SHORT LIFESPAN OF THE DATA
FAST
MO
VIN
G D
ATA
FAST
DAT
A PR
OC
ESSI
NG
HIGH VARIETY OF DATA
Challenges
![Page 19: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/19.jpg)
Page 19
Data Volume and Variety
![Page 20: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/20.jpg)
Four V’s and a C
Not only volume makes big data big, it’s all about the three V’s: High Volume, Variety, Velocity High Value!
In addition the Challenge : the data is very complex in nature, often unstructured: Text documents, emails, images and videos, etc. Click stream data, social media feed data, etc.
![Page 21: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/21.jpg)
Page 21
Eliminate A Single Point Of Failure load balancer itself does not become a single point of failure. Load balancers must be implemented in high availability cluster
![Page 22: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/22.jpg)
Page 22
![Page 23: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/23.jpg)
Page 23
![Page 24: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/24.jpg)
Page 24
![Page 25: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/25.jpg)
Page 25
![Page 26: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/26.jpg)
Rack 2 Rack 3Rack 1
A Typical Hadoop Cluster
ClientDATA ASSIGNMENT TO NODES
DATA READDATA WRITE
METADATA FORBLOCK INFO
Task Tracker
Task Tracker
Map Reduce
Map Reduce
Job Tracker
Data Node
Data Node
Task Tracker
Map Reduce
Data Node
Task Tracker
Task Tracker
Map Reduce
Map Reduce
Data Node
Data Node
Task Tracker
Map Reduce
Data Node
Task Tracker
Task Tracker
Map Reduce
Map Reduce
Data Node
Data Node
Task Tracker
Map Reduce
Data Node
Master Node
Slave Nodes
Slave Nodes
Slave Nodes
Name Node
JOB ASSIGNMENT
TASK ASSIGNMENT
1. Client2. Master Node
Name Node Job Tracker
3. Slave Nodes Data Nodes Task
Trackers Map /
Reduce
![Page 27: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/27.jpg)
1. Client consults Name Node2. Client writes block to Data
Node3. Data Node replicates block4. Cycle repeats for next blocks
Rack 2 Rack 3Rack 1
Hadoop File System (HDFS)
Data Node 1 Data Node 4 Data Node 7
Data Node 2 Data Node 5 Data Node 8
Data Node 3 Data Node 6 Data Node 9
Name Node
Client
FILE
FILE
DATA ASSIGNMENT TO NODES
DATA READDATA WRITE
METADATA FORBLOCK INFO
Rack 1: Data Node 1 Data Node 2 …Rack 2: Data Node 4 …
![Page 28: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/28.jpg)
MapReduce
the, 1quick, 1brown, 1fox, 1
the, 1fox, 1ate, 1the, 1mouse, 1
how, 1now, 1brown, 1cow, 1
the, 1the, 1the, 1
fox, 1fox, 1
quick, 1
brown, 1brown, 1
ate, 1
mouse, 1
how, 1
now, 1
cow, 1
the, 3
fox, 2
quick, 1
brown, 2
ate, 1
mouse, 1
how, 1
now, 1
cow, 1
the, 3fox, 2quick, 1brown, 2ate, 1mouse, 1how, 1now, 1cow, 1
Input Splitting Map ShuffleSort
Reduce
OutputThe Map function processes one line at a time, splits it into tokens seperated by a withespace and emits a key-value pair
<word, 1>.
The Reducer function just sums up the values, which are the occurence counts for each key (i.e. words in this example).
![Page 29: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/29.jpg)
MapReduce Wordcount Example in R
Map function.
Reduce function.
Reading the input from HDFS from.dfs().
Writing the results back to HDFS to.dfs().
![Page 30: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/30.jpg)
What is MapReduce used for?
• At Google:– Index building for Google Search– Article clustering for Google News– Statistical machine translation
• At Yahoo!:– Index building for Yahoo! Search– Spam detection for Yahoo! Mail
• At Facebook:– Data mining– Ad optimization– Spam detection
![Page 31: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/31.jpg)
Who uses Hadoop?
▐ Facebook (Hadoop, Hive, Scribe)▐ Google File System (HDFS)▐ Yahoo! (Hadoop in Yahoo Search)▐ IBM Transarc (Andrew File System)▐ Amazon/A9
Goals of HDFS - Hadoop Distributed File System ▐ Very Large Distributed File System
– 10K nodes, 100 million files, 10 PB▐ Assumes Commodity Hardware
– Files are replicated to handle hardware failure– Detect failures and recovers from them
▐ Optimized for Batch Processing– Provides very high aggregate bandwidth
![Page 32: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/32.jpg)
Hadoop, Why?
▐ Need to process Multi Petabyte Datasets▐ Need common infrastructure
– Efficient, reliable, Open Source Apache License▐ The above goals are same as Condor, but
Workloads are IO bound and not CPU bound
Hive, Why?▐ Need a Multi Petabyte Warehouse▐ Hive is a Hadoop subproject!
![Page 33: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/33.jpg)
What is MapReduce?▐ Data-parallel programming model for clusters of commodity
machines▐ Pioneered by Google Processes 20 PB of data per day▐ Popularized by open-source Hadoop project
Used by Yahoo!, Facebook, Amazon, …
Hadoop at Facebook▐ Production cluster
4800 cores, 600 machines, 16GB per machine – April 20098000 cores, 1000 machines, 32 GB per machine – July 20094 SATA disks of 1 TB each per machine2 level network hierarchy, 40 machines per rackTotal cluster size is 2 PB, projected to be 12 PB in Q3 2009
▐ Test cluster• 800 cores, 16GB each
![Page 34: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/34.jpg)
2016 - Hadoop clusters
▐ ~20,000 machines running Hadoop▐ Largest clusters are currently 2000 nodes▐ Several Petabytes of user data (compressed, unreplicated)▐ Run hundreds of thousands of jobs every month
![Page 35: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/35.jpg)
2016 - Big Data Server Farm
Page 35
![Page 36: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/36.jpg)
Conclusions
The Digital Age brings many opportunities but also challenges.
Big Data and Analytics can face the challenges and realize the opportunities.
It is within anyone’s grasp, do it incremental and iterative. Hadoop cloud solutions are scalable, flexible and cost-
efficient, but sometimes limited in functionality (or not standardized).
Need for good Data Scientists in a mixed team of competences to make the right choices.
![Page 37: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/37.jpg)
Conclusions
![Page 38: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/38.jpg)
![Page 39: Sharing bisnis big data v3 part1](https://reader034.fdocuments.in/reader034/viewer/2022042706/5885da961a28ab906d8b4be3/html5/thumbnails/39.jpg)
QUESTIONS?
39
Q&A
Thanks