Akselerasi Pertumbuhan Startupdengan Big Data
Dwika SudrajatIT Consultant
Florida, Hong Kong & Jakarta.November 23th, 2016
Part 2▐ email: [email protected]▐ Florida: +1-407-2502812▐ Hong Kong: +852-54152971▐ Jakarta: +62-8161108571▐ FB: dwika.sudrajat▐ TW: @dwikasudrajat▐ managingconsultant.blogspot.com▐ dwikasudrajat.blogspot.com▐ dwikasudrajat.wordpress.com
FOUNDED IN 2004
Case Study :
INDEXARCHITECTURE OVERVIEW SOFTWARE & SCALABILITY LAMP LINUX&APACHE MYSQL PHP & HIPHOP DISADVANTAGES OF LAMP MEMCACHED HADOOP ECOSYSTEM HADOOP HIVE
123
MORE THAN 60.000 SERVERS
THE LAST DATACENTER IS BASED ON ENTIRELY SELF-DESIGN HARDWARE THAT WAS RECENTLY UNVEILED AS
“OPEN COMPUTE PROJECT”
300 TB OF DATA STORED IN MEMCACHE PROCESSES
Scaling challenge
THE HADOOP AND HIVE CLUSTER IS MADE OF 300.000 SERVERS WITH 8 CORES, 32GB RAM, 12TB DISKS
100BILLION HITS, 50BILLION PHOTOS, 3TRILLION OBJECTS CACHED, 130TB OF LOGS PER DAY
TOTAL: 24.000 CORES, 96TB RAM AND 36PB DISKS
ARCHITECTURE OVERVIEWFront end & Back end
FRONTEND
presentationlayer
BACKEND
presentationlayer
Business &data access
layers
DATABASE
VISITORS WEB SERVER STAFF
7
Architecture of Facebook
Components of Facebook
PHP & HIPHOP
PARSER STATICANALYZER
PRE-OPTIMIZER
TYPEINFERENCE
ENGINE
POST-OPTIMIZER
CODEGENERATOR
g++
FACEBOOK’S HIPHOP IS A SOURCE CODE TRANSFORMER THAT CONVERTS THE PHP INTO C++ AND COMPILES IT USING G+
+, THUS PROVIDING AHIGH PERFORMANCE TEMPLATING A WEB LOGIC
EXECUTION LAYER
DISADVANTAGES OF LAMP
FACEBOOK HAS REALIZED THAT THERE ARE DISADVANTAGES TO USING THE LAMP STACK, IS NOT NECCESSARILY OPTIMIZED FOR WEBSITES SIZE AND THEREFORE DIFFICULT TO SCALE.
IT IS THE FASTEST EXECUTING LANGUAGE AND THE FRAMEWORK OF THE EXTENSION IS DIFFICULT TO USE
Web/AppServer
Database
HTTP Request
HTML
HTTP Request
API/FQL
Response
FBML
Browser
CLIENT SERVERPUT/GET/REMOVE(Sync)
Mem
Cach
ed
WebServer
1
WebServer
2
WebServer
3
MemCachedServer Partition 1
MemCachedServer Partition 2
MemCachedServer Partition 3
MemCachedServer Partition 4
Memory Management using Memcached
05/03/2023
Memcached based Architecture
Protects the main database from high read demands from users
HADOOP HIVE
APACHE HIVE IS A DATA WAREHOUSE INFRASTRUCTURE BUILT ON TOP OF HADOOP FOR PROVIDING DATA SUMMARIZATION, QUERY AND
ANALYSIS, DEVELOPED BY FB
HADOOP WAS BUILT TOORGANIZE AND STOREMASSIVE AMOUNTS OF
DATA
HIVE ALLOWS USERS TOEXPLORE AND STRUCTURE
THAT DATA, ANALYZE ITAND THEN TURN IT INTO
BUSINESS INSIGHT
FAMILIARSCALABLE &EXTENSIBLE FAST INFORMATIVE
SCRIBEServer logs
IT IS A SERVER FOR AGGREGATING LOG DATA STREAM IN REAL TIME ON MANY OTHER SERVERS, IT IS SCALABLE
FRAMEWORK USEFUL FOR RECORDING A WIDE RANGE OF DATA.
IT IS BUILT ON TOP OF SAVINGS.
DATA SUCH AS LOGIN, CLICKS AND FEEDS TRANSIT USING SCRIBE AND ARE AGGRAVATING AND
STORED IN HDFS USING SCRIBE-HDFS, ALLOWING EXTENDED ANALYSING USING MAPREDUCE
MOVES DATA FROM THE SERVERTO A CENTRAL REPOSITORY
Storing
05/03/2023
• Apache Hadoop is being used in three broad types of systems:
• as a warehouse for web analytics
• as storage for a distributed database
• and for MySQL database backups.
VARNISH CACHE
IT IS USED FOR HTTP PROXYINGTHEY HAVE IT FOR ITS HIGH PERFORMANCE AND
EFFICIENCY
Request
Response
CachingProxy
WebServer
WEB APPLICATION ACCELERATOR
HAYSTACKTHE STORAGE OF THE BILLIONS OF PHOTOS POSTED BY USERS IS HANDLED WITH THIS AD-HOC STORAGE SOLUTION DEVELOPED BY FACEBOOK WHICH BRINGS LOW LEVEL OPTIMIZATIONS AND APPEND-ONLY WRITES
QUESTIONS?
18
Q&A
Thanks