Pepper: An Elastic Web Server Farm for Cloud based on...

24
Pepper: An Elastic Web Server Farm for Cloud based on Hadoop Subramaniam Krishnan, Jean Christophe Counio Yahoo! Inc. MAPRED 1 st December 2010

Transcript of Pepper: An Elastic Web Server Farm for Cloud based on...

Page 1: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Pepper: An Elastic Web Server Farm for Cloud based on Hadoop

Subramaniam Krishnan, Jean Christophe CounioYahoo! Inc. MAPRED

1st December 2010

Page 2: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Agenda

Motivation

Design

Features

Applications

Evaluation

Conclusion

Future Work

Yahoo! Inc 1

Page 3: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Motivation

2Yahoo! Inc

Page 4: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

What’s in a Name

Wave 2: Content FreshnessProcess 100s of

feeds/sec, size in KBs in seconds

Web feeds like breaking news, tweets, finance

quotes

Scalable, high throughput & low latency platform

Pepper – elastic web server farm on grid

Wave 1: Grid-ification

Crunch 10-100s of GBs of data in

hours

Large data like wikipedia

Hosted, multi-tenant platform

Grid workflow management

system (PacMan)

3Yahoo! IncMotivation

Page 5: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Requirements

Elastic: handle intra/inter application load variance

Multi-tenant: provide process/memory isolation

Sub-second platform overhead

Simple API

Execute user code in platform context

Reliability: transparent fault tolerance

4Yahoo! IncMotivation

Page 6: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Design

5Yahoo! Inc

Page 7: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Deployment Flow

• Web application deployed as WAR onto HDFS – Job Manager

• Embedded Jetty server runs in Map task, registers with ZooKeeper

• 1 Hadoop job = 1 Map task = 1 Web Server = 1 Web application

6Yahoo! IncDesign

Page 8: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Processing Flow

• Proxy Router receives incoming requests, looks up ZooKeeper & redirects to appropriate Web Server

7Yahoo! IncDesign

Page 9: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

ZooKeeper Hierarchy

8Yahoo! IncDesign

Page 10: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Features

9Yahoo! Inc

Page 11: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Features• Scalability: Web application can scale by configuring more

instances (Elasticity), system can scale with addition of Hadoop nodes

• Performance: High throughput by ensuring that all the heavy lifting is done during deployment

• High Availability/Self-healing: Redundant server instances. Health check piggybacked on TaskTracker heartbeat

• Isolation: Hadoop map provides process isolation

• Ease of Development: Standard Servlet API & WAR packaging

• Reuse of Grid Infrastructure: The system runs on a Grid that can be shared across several applications

10Yahoo! IncFeatures

Page 12: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Applications

11Yahoo! Inc

Page 13: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Applications• Web Feeds Processing: Configure workflow

orchestration engine to run in-memory, 1 workflow = 1 web-application.

Benefits: Scalability

Isolation

Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

• Online Clustering: Extracts features and assigns incoming feeds to clusters predetermined by offline clustering. Performed online for Yahoo! News to identify hot news clusters during ingestion of articles.

12Yahoo! IncApplications

Page 14: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Evaluation

13Yahoo! Inc

Page 15: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Setup

• Hardware: Intel Xeon L5420 2.50GHz with 8GB DDR2 RAM

• Software: 64-bit SUN JDK 1.6 update 18 on RHEL AS 4 U8, Linux 2.6.9- 89.ELsmp x86_64

• Configuration: 8 map slots/node with 512MB heap, 25 threads/Jetty server

• Number of Computing Hadoop nodes: 3

14Yahoo! IncEvaluation

Page 16: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Linear Scaling for Predefined Capacity

• Throughput: number of requests handled successfully per second for a specified number of tasks

15Yahoo! IncEvaluation

Page 17: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Elastic Scaling for Dynamic Capacity

• Rejection: failure to execute within predefined timeout

• Load is increased and additional map task allocated at points A and B based on predefined schedule

• Failure rate of < 0.001% observed in Production

16Yahoo! IncEvaluation

Page 18: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Pepper Performance Numbers

System Burst Rate (request/mi

n)

Throughput (requests/da

y)

Platform Latency (Avg.)

Response Time (Avg.)

Pepper 2,000 3 million 75 ms 4s

PacMan 50 10,000 90s 120s

• Dataset is Yahoo! News feeds with sizes < 1MB• Processing is typically computation intensive like processing and enriching web feeds that involves validation, normalization, geo tagging, persistence in service stores, etc

17Yahoo! IncEvaluation

Page 19: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Conclusion

18Yahoo! Inc

Page 20: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Conclusion

• Pepper marries the benefits of traditional server farms i.e. low latency and high throughput with those of cloud i.e. elasticity and isolation

• In production within Yahoo! from December 2009

• Current Y! properties - Newspaper Consortium, Finance & News. Sports & Entertainment are in pipeline

• System scales linearly with addition of more Hadoop computing nodes

19Yahoo! IncConclusion

Page 21: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Future Work

20Yahoo! Inc

Page 22: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Future Work

• On demand allocation of servers

• Experimenting with async NIO between Proxy Router & Map Web Engine to increase scalability

• Improving distribution of requests across web servers

• Integrate into Hadoop (?)

21Yahoo! IncFuture Work

Page 23: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

References

• Hadoop, Web Page http://hadoop.apache.org/

• J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Cluster”, 6th Symposium on Operating Systems Design and Implementation (OSDI’04), San Francisco, CA, December 2004, pp. 137–150

• P. Hunt, M. Konar, F.P. Junqueira, and B. Reed, “ZooKeeper: Wait-free coordination for Internet-scale systems”, Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, Boston, MA, June 2010, pp. 11- 11

• Oozie (successor to PacMan), Web Page http://yahoo.github.com/oozie/, http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3-b2- oozie/

22Yahoo! Inc

Page 24: Pepper: An Elastic Web Server Farm for Cloud based on Hadoopsalsahpc.indiana.edu/CloudCom2010/slides/PDF/Pepper...Avoids Hadoop job bootstrap latency and HDFS small files bottleneck.

Questions ?

23Yahoo! Inc