Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency...
Transcript of Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency...
![Page 1: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/1.jpg)
Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17)
Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthik Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, Keith Winstein
https://github.com/excamera
![Page 2: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/2.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
2
![Page 3: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/3.jpg)
![Page 4: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/4.jpg)
for Video
* not really a Google product (yet)
![Page 5: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/5.jpg)
"Make this movie black and white."
![Page 6: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/6.jpg)
"Apply some awesome filter to my video."
![Page 7: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/7.jpg)
"Pixelate this face everywhere in this video."
![Page 8: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/8.jpg)
"Remake Star Wars Episode I without Jar Jar."
![Page 9: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/9.jpg)
Can we achieve interactive speeds with very granular video
processing in a distributed system?
![Page 10: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/10.jpg)
The dilemmas
• Low-latency video processing requires fine-grained parallelism, but the finer-grain the parallelism, the worse the compression efficiency.
• Even if we have a way to avoid this penalty, we still need thousands of threads, running in parallel, with instant startup.
10
![Page 11: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/11.jpg)
ExCamera
• In this project, we tried to directly address these two dilemmas.
• We made two contributions:
• A video encoder intended for massive fine-grained parallelism (the first dilemma).
• A framework that orchestrates thousands of threads running in parallel on AWS Lambda (the second dilemma).
• We call the whole system ExCamera.
11
![Page 12: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/12.jpg)
Fine-grained Cloud Computing
• Parallel make
• "Laptop Extension"
12
![Page 13: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/13.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
13
![Page 14: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/14.jpg)
What's a video?
• A series of still images, displayed in order, at a specific rate.
• Movies are usually shown at 24 frames per second.
14
![Page 15: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/15.jpg)
4K Video 4096×2160 pixels/frame
~14 MB/frame
24 fps 2.5 Gbps
15
![Page 16: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/16.jpg)
Video Codec
• A piece software or hardware that compresses and decompresses digital video.
16
1011000101101010001000111111101100111001100111011100110010010000...001001101001001101101101101011111010011001010000010011011011011010
Encoder Decoder
![Page 17: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/17.jpg)
How to compress?
• Compress each frame individually.
• JPEG Compression of a 4K frame: ~1 MB/frame → 192 Mbps
17
![Page 18: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/18.jpg)
How to compress?
• Exploit the temporal redundancy in adjacent frames.
• Store the first frame on its entirety: a key frame.
• For other frames, we could just store the “diff” with the previous frame: interframes.
18
![Page 19: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/19.jpg)
Encoder & Decoder
encode(img[1..n]) → [kf] + if[2..n]
decode([kf] + if[2..n]) → img’[1..n]
19
compressed video
![Page 20: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/20.jpg)
4K Video
20
Key Frame SizeInter Frame Size
Encoded with VP8
~1 MB~25 KB
![Page 21: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/21.jpg)
Traditional Parallel Video Encoding
encode(i[1..200]) → [kf1] + if[2..200]
⇣
encode(i[1..10]) → [kf1], if[2..10] encode(i[11..20]) → [kf11], if[12..20] encode(i[21..30]) → [kf21], if[22..30] ⠇ encode(i[191..200]) → [kf191], if[192..200]
21
Finer-grained parallelism → more frequent key frames
![Page 22: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/22.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
22
![Page 23: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/23.jpg)
14.8-minute 4K Video
Encoding with vpxenc (VP8)
Single-Threaded ~7.5 hours
Multi-Threaded ~2.5 hours
23
![Page 24: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/24.jpg)
Decoder State
• Decoder needs to maintain some information that evolves with each decoded frame.
• Traditional video codecs do not expose this information.
encode(img[1..n]) → [kf] + if[2..n]decode([kf] + if[2..n]) → img’[1..n]
• We call this the state and we made it explicit.
24
![Page 25: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/25.jpg)
Decoder State
25
sourcestate
![Page 26: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/26.jpg)
Decoder State
26
sourcestate
targetstate
frame
![Page 27: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/27.jpg)
Decoder State
27
sourcestate
targetstate
frame
output
![Page 28: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/28.jpg)
Decoder State
28
interframeinterframekey frame interframe
![Page 29: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/29.jpg)
What we built: a video codec in explicit state-passing style
• VP8 decoder with no inner state:
decode(state, frame) → (state′, image)
• VP8 encoder: resume from specified state
encode(state, image) → (state′, interframe)
29
![Page 30: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/30.jpg)
ExCamera: Encoding, Fast and Slow
• Divide the work into tiny tasks (6 frames each).
• [Parallel, the slow work] Make tiny independent chunks.
• [Serial, the fast work] Stitch the chunks together and remove keyframes.
30
![Page 31: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/31.jpg)
ExCamera: Encoding, Fast and Slow
• Divide the video into 4-second batches.
• Each batch is divided further to 16 chunks, ¼ second each.
31
![Page 32: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/32.jpg)
1. [Parallel] Download 6-frame chunk of raw video
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
32
![Page 33: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/33.jpg)
2. [Parallel] vpxenc → keyframe, interframe[5]
33
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Google's VP8 encoderencode(img[1..n]) → [kf] + if[2..n]
![Page 34: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/34.jpg)
3. [Parallel] decode → state ↝ next thread
34
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Our explicit-state style decoderdecode(state, frame) → (state′, image)
![Page 35: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/35.jpg)
4. [Parallel] last thread’s state ↝ encode
35
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Our explicit-state style encoderencode(state, image) → (state′, interframe)
![Page 36: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/36.jpg)
4. [Parallel] last thread’s state ↝ encode
36
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Adapt a frame to a different source state rebase(state, image, interframe) → interframe′
![Page 37: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/37.jpg)
5. [Serial] last thread’s state ↝ rebase → state ↝ next thread
37
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Adapt a frame to a different source state rebase(state, image, interframe) → interframe′
![Page 38: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/38.jpg)
5. [Serial] last thread’s state ↝ rebase → state ↝ next thread
38
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
Adapt a frame to a different source state rebase(state, image, interframe) → interframe′
![Page 39: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/39.jpg)
6. [Parallel] Upload finished video
39
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
![Page 40: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/40.jpg)
Thousands of tiny threads1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
1 61115
thread 1
7 1211111
thread 2
13 1811117
thread 3
19 2411123
thread 4
![Page 41: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/41.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
41
![Page 42: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/42.jpg)
Execution Environment
• We need an execution environment, where we can run thousands of threads in parallel, each for a few minutes.
• Cloud functions!
42
![Page 43: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/43.jpg)
AWS Lambda is an underutilized supercomputer
• Intended for event handlers and Web microservices
• Supports JavaScript (node.js), Java, Python
• But in practice, you can upload a zip file containing a binary executable and simply exec it from the Python handler
43
![Page 44: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/44.jpg)
Lambda can launch 3,600 threads in 3 seconds
44
![Page 45: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/45.jpg)
Hardware
• 1.5 GiB RAM, 2.8 GHz CPU, very little disk space
• 5-minute execution limit
45
![Page 46: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/46.jpg)
Costs
• Price: 2.5 milli-cents per second (billed in 100 ms increments)
• 3,600 threads for one minute → $5.40
• Unique features:
• Sub-second billing,
• Thousands of threads,
• Fast startup.46
![Page 47: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/47.jpg)
μ, supercomputing as a service
• We built mu, a library for designing and deploying massively parallel computations on AWS Lambda.
• ExCamera is just an example, possibilities are endless.
47
![Page 48: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/48.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
48
![Page 49: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/49.jpg)
Quality vs. Bitrate
49
16
17
18
19
20
21
22
5 10 20 30 40 50 60 70
SSIM
(dB)
average bitrate (Mbit/s)
vpx (single-th
readed)
vpx (
multithrea
ded)
![Page 50: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/50.jpg)
Quality vs. Bitrate
50
16
17
18
19
20
21
22
5 10 20 30 40 50 60 70
SSIM
(dB)
average bitrate (Mbit/s)
vpx (single-th
readed)
vpx (
multithrea
ded)
vpx (6-fra
me chunks)
![Page 51: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/51.jpg)
Quality vs. Bitrate
51
16
17
18
19
20
21
22
5 10 20 30 40 50 60 70
SSIM
(dB)
average bitrate (Mbit/s)
vpx (single-th
readed)
vpx (
multithrea
ded)
vpx (6-fra
me chunks)
ExCamera
![Page 52: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/52.jpg)
14.8-minute 4K Video
Encoding with vpxenc (VP8)
Single-Threaded ~7.5 hours
Multi-Threaded ~2.5 hours
52
ExCamera 2.1 minutes
![Page 53: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/53.jpg)
Outline
• Vision & Goals
• Background on Video Processing
• Fine-grained Parallel Video Encoding
• μ: Supercomputing as a Service
• Evaluation
• Conclusion & Future Work
53
![Page 54: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/54.jpg)
Vision: Real-time Collaborative Video Processing
• Editing and sharing: Google Docs for Video.
• "Make this movie black and white."
• "Apply some awesome filter to my video."
• "Pixelate this face everywhere in this video."
• "Remake Star Wars Episode I without Jar Jar."
54
![Page 55: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/55.jpg)
Takeaways
• Low-latency video processing
• Two major contributions:
• A video encoder intended for massive fine-grained parallelism.
• A framework that orchestrates thousands of threads running in parallel on AWS Lambda.
• 64× faster than existing encoder, for less than $10.
• The future is granular & massively parallel • Parallel make • "Laptop Extension"
![Page 56: Encoding, Fast and Slow - Stanford University Talks/Sadjad...Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads (To appear in NSDI’17) Sadjad Fouladi,](https://reader035.fdocuments.in/reader035/viewer/2022062603/5f0330f17e708231d407fd3a/html5/thumbnails/56.jpg)
How time breaks down
56
download vpxenc decode encode-given-state waitwait rebase upload
start 25 s 50 s 75 s 100 s
thre
ads
1
4
7
10
13
16