Yarn at LinkedIn
-
date post
22-Sep-2014 -
Category
Technology
-
view
658 -
download
0
description
Transcript of Yarn at LinkedIn
![Page 1: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/1.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING©2013 LinkedIn Corporation. All Rights Reserved.
Welcome to YARN MeetupSeptember 2013
![Page 2: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/2.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
YARN @ LinkedInState of the Art
Mohammad Islam
![Page 3: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/3.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter
YARN is good fit for many LinkedIn problems Many initiatives by multiple teams LI Engineers enjoy the fun of emergent
technologies
![Page 4: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/4.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter Samza : Real-time stream processing
system– Developed by LinkedIn team– Apache incubator project– Use YARN and Kafka– Detailed presentation coming later today
![Page 5: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/5.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter
Helix – Generic cluster management system
– Built and used in LinkedIn– Apache Incubator project– Incorporating YARN resource management– Stay tuned to learn more today
![Page 6: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/6.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter
Not yet open sourced– Few projects are incubating at LI– Mostly around custom and near-realtime
execution engine– Status: Some in POC and some are in
design state
![Page 7: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/7.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter
Administering YARN:– One of the pioneers of a 2.1.0-beta prod-like
deployment– Led by our Ops/Dev team– Found a lot of issues
Kerberos auth (YARN -621 & others)
– Contributing back to Apache to stabilize YARN Streamlined operational tools (HADOOP-
9902)
![Page 8: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/8.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Early Adopter
Pig on Tez: Actively working with Pig community
Hosted a small “Pig on Tez” dev meeting– Participants include: Yahoo, HortonWorks, Netflix
and LinkedIn
Developed a high-level implementation plan
![Page 9: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/9.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Apache Giraph on YARN
![Page 10: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/10.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Overview of Giraph
A distributed graph processing framework– Master/slave architecture– In-memory computation– Vertex-centric high-level programming model– Based on Bulk Synchronous Parallel (BSP)
10
![Page 11: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/11.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Quick History HortonWorks/LinkedIn intern (Eli) wrote the
early version of Giraph AM Based on 2.0.3 Since then YARN has evolved a lot! API overhauled
Action: Overhaul Giraph onYARN
![Page 12: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/12.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Giraph on YARN
12
Node Manager
Worker Worker
Node Manager
App Mstr
Worker
Node Manager
Worker
Resource Manager
Client ZooKeeper
Master
![Page 13: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/13.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
New Giraph AM
Girpah AM : Nearly a complete rewrite by LinkedIn Hadoop dev.
– Used new stable API – Adopt new asynchronous/event based model– Status: Patch ready
Client– Used new API– Status: Patch ready
Security– Added Kerberos support for Giraph YARN client and AM– Status: Testing
![Page 14: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/14.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Memory Footprint - Page Rank Algorithm
Iteration 3 Iteration 27
Reachable 1.5
Un-reach-able 3
Reachable 1.5
Unreachable 6
GB
GB
GB
GB
![Page 15: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/15.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Challenges in Giraph
Memory intensive Java based system Various (GC) knobs to tune the system and
application Depends heavily on skillful application
developers Performance degradation from scaling up Not a good player for multi-tenant system
15
![Page 16: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/16.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Future Direction
Option 1: “Worker” in C++ – C++provides direct control over memory management– No need to rewrite the whole Giraph
Issue : Adoption barrier– Writing C++ application– Possible solution: Giraph scripting language
Like Hive or Pig
Option 2: Off-heap memory usage
16
Option 3: Leave it alone!
![Page 17: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/17.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Final Thoughts on Giraph
LinkedIn is the 1st player of Giraph on YARN Successfully executed full LinkedIn graph run
– Page Rank algorithm– 200M+ vertices and XX Billions edges– On 40-node cluster with 650GB memory– Total time taken: 28 minutes
Ready to go! Scope for improvements utilizing YARN’s
flexibility
17
![Page 18: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/18.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Challenges in YARN
Failover of various components (RM/AM etc.) APIs stabilization –almost there! Representative examples for quick dev ramp-up Better documentation
– Book on its way!
Operational friendly– Centralized logging– SLA support – timed resource constraint.
![Page 19: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/19.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Concluding on YARN
YARN is the way to go forward! Reduce the innovation barrier Support non-MR execution platform Improved utilization/performance
– By removing the split of map/reduce slot– Through distribution of JT responsibility
![Page 20: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/20.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Q& A
Thanks for coming!
![Page 21: Yarn at LinkedIn](https://reader033.fdocuments.in/reader033/viewer/2022061104/541f30717bef0ab16e8b46d2/html5/thumbnails/21.jpg)
©2013 LinkedIn Corporation. All Rights Reserved. ENGINEERING
Giraph Architecture
Master / Workers Zookeeper
21
Master Worker Worker Worker
Worker Worker Worker
Worker Worker Worker