Download - Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Transcript
Page 1: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

MED304 - Automated Media Workflows in the

Cloud

John Mancuso, Amazon Web Services

November 14, 2013

Page 2: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Agenda

• Why automate

• Workflow steps

• Automating the workflow

• Demo of an end-to-end media workflow

• How Netflix approaches their digital supply chain

Page 3: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Why Automate?

Analog VCD DVD 720p 1080p (3D) 2K 4K

SIZE USERS

FORMAT

Page 4: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Scenario

• At any given time, company X produces 10 broadcast quality shows

• Each show consists of 200 30-minute episodes per year

• High-res post-production copies of each show are temporarily stored at company X’s studio in Tokyo

• The content must be made available for distribution to consumers via web, mobile devices, and media players

• The high-res content must be archived for future access

Page 5: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Page 6: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 7: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 8: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest

Image courtesy of porbital FreeDigitalPhotos.net

Amazon S3 –

US East

Page 9: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Data Transfer

AWS Command

Line Interface (CLI)

Amazon S3

Server Side

Amazon S3 parallel

multipart

uploads

Page 10: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Data Transfer

Amazon S3

Tsunami UDP

Amazon EC2 Image courtesy of porbital FreeDigitalPhotos.net

Page 11: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest –Timing Comparison 885 MB Video File

Single thread to S3 13 minutes 25 seconds --

Multiple threads to

S3

1 minute 93% reduction

Tsunami UDP +

multiple threads

15 seconds + 7 seconds

= 22 seconds

63% further reduction

Instance size: CC2.8xlarge

OS: Amazon Linux

Page 12: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Ingest – Code Snippet def doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes):

#Transfer using TSUNAMI

cmd_s = '/usr/local/bin/tsunami connect {} set rate 500m get {} quit'

cmd_s = cmd_s.format(remoteIP,remoteFileName)

execCMD(cmd_s)

#Upload to S3 using AWS CLI

s3Path = 's3://{}/{}'

s3Path = s3Path.format(s3Bucket_HighRes,s3Key_HighRes)

cmd_s = 'aws s3 cp {} {} --region us-east-1'

cmd_s = cmd_s.format(remoteFileName,s3Path)

execCMD(cmd_s)

#Delete the local file

os.remove(localFilePath)

Page 13: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 14: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing

• Transcoding

• Thumbnail selection

• Archiving of high-res videos

Page 15: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing – Transcoding

Amazon S3 Amazon S3

(RRS)

Amazon Elastic

Transcoder

Page 16: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Transcoding – Code Snippet def doWork_PROCESS_TRANSCODE(Key_HighRes,s3PreFix_TranscodeRoot):

etc = ElasticTranscoderConnection()

job_input_name={"Key": s3Key_HighRes, "FrameRate": "auto", "Resolution": "auto", "AspectRatio": "auto", "Interlaced": "auto", "Container": "auto" }

job_outputs=[

{"Key": "MP4.mp4", "ThumbnailPattern": "MP4{count}", "Rotate": "auto", "PresetId": ET_PresetId_MP4},

{"Key": "HLS", "ThumbnailPattern": "HLS{count}", "Rotate": "auto", "PresetId": ET_PresetId_HLS}]

job = etc.create_job(pipeline_id=ET_Pipeline_ID,input_name=job_input_name,outputs=job_outputs,output_key_prefix=s3PreFix_TranscodeRoot)

jid = job['Job']['Id']

#Ideally you would leverage the SNS capabilities of ET to signal SWF on completion

waitForCompletion(etc,jid)

Page 17: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing –Thumbnail selection

Amazon S3

(RRS)

Amazon

DynamoDB

Amazon

Mechanical Turk

Page 18: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Thumbnail Selection – Code Snippet def getRequest(s3WebPath_Thumbnails):

request_params = {"Title":"Thumbnail Selcection",

"Description":"Please choose a thumbnail",

"MaxAssignments":"1",

"HITLayoutId": MTurk_HITLAYOUTID,

"Reward": {"Amount": "0.10","CurrencyCode":"USD"},

"LifetimeInSeconds":"300",

"AssignmentDurationInSeconds":"300",

"HITLayoutParameter": [

{"Name": "image1","Value": s3WebPath_Thumbnails + "MP400001.png"},

.

.

.

{"Name": "image10","Value": s3WebPath_Thumbnails + "MP400010.png"},

]

}

print request_params

Page 19: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Thumbnail Selection – Code Snippet def doWork_PROCESS_THUMBNAIL(s3PreFix_Thumbnails):

m = mturkcore.MechanicalTurk()

mtc = MTurkConnection()

s3WebPath_Thumbnails = 'http://{}.s3-website-us-east-1.amazonaws.com/{}'

s3WebPath_Thumbnails = s3WebPath_Thumbnails.format(s3Bucket_Thumbs, s3PreFix_Thumbnails)

request_params = getRequest(s3WebPath_Thumbnails)

hit = m.create_request("CreateHIT", request_params)

hid = hit['CreateHITResponse']['HIT']['HITId']

#Wait for an answer

answer = getAnswer(mtc,hid)

#Get the imagename from the answer

answer = answer[5:]

answer = answer.zfill(5)

imagekey = '{}MP4{}.png'

imagekey = imagekey.format(s3WebPath_Thumbnails,answer)

return imagekey

Page 20: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Processing – Archiving of High-res Videos

Amazon S3 Amazon

Glacier

Page 21: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Archiving – Code Snippet def doWork_PROCESS_ARCHIVE(s3Key_HighRes):

#Move the high-res video to a path in S3 configured to archive

#to Amazon Glacier with a lifecycle policy

s3PathA = 's3://{}/{}'

s3PathA = s3PathA.format(s3Bucket_HighRes,s3Key_HighRes)

s3PathB = 's3://{}/toArchive/{}'

s3PathB = s3PathB.format(s3Bucket_HighRes,s3Key_HighRes)

cmd_s = 'aws s3 mv {} {} --region us-east-1'

cmd_s = cmd_s.format(s3PathA,s3PathB)

execCMD(cmd_s)

Page 22: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 23: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Discovery & Delivery

Amazon S3

(RRS)

Amazon CloudFront

CMS Running on Amazon EC2

Page 24: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Automating the Workflow

Page 25: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Media Workflow

Ingest Processing Discovery &

Delivery

Amazon Simple Workflow Service (SWF)

Amazon Storage Services

Amazon S3 – Standard & RRS, Amazon Glacier

Page 26: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Amazon Simple Workflow (SWF)

• SWF – Maintains distributed

application state

– Tracks workflow executions

– Dispatches tasks

(activities & deciders)

– Retains history

– Provides visibility

• Activities tasks – Do the “work” associated

with a workflow step

• Decider tasks – Determines which activity

task should come next

• Activities & deciders can run anywhere (on prem, in cloud)

Page 27: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Decider Logic

Task = GetDecision Task

Exists?

NextActivity =

ACTIVITIES[len(EventList)]

Signal Completion of

Execution

NextActivity.Input =

PreviosActivity.Result

NextActivity.Input =

Execution Input

Is First

Activity?

Yes

No

Yes Yes No

Start

EventList with

[‘ActivityTaskCompleted’,

‘WorkflowExecutionStarted’]

All Activities

Completed?

No

Page 28: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Activity Worker – Code Snippet from mwf_Ingest import *

swf_l1 = swf.Layer1()

while True:

task = swf_l1.poll_for_activity_task(domain['name'], workflow_type['task_list'])

if 'taskToken' in task:

task_token = task['taskToken']

task_input = json.loads(task['input'])

try:

if task['activityType']['name'] == activities[0]['name']:

remoteIP = task_input['remoteIP']

remoteFileName = task_input['remoteFileName']

s3Key_HighRes = get_rand() + remoteFileName[remoteFileName.rindex('.'):]

doWork_INGEST(remoteIP,remoteFileName,s3Key_HighRes)

dataToPass = {'s3Key_HighRes' : s3Key_HighRes}

task_status_s = json.dumps(dataToPass)

out = swf_l1.respond_activity_task_completed(task_token,task_status_s)

except:

out = swf_l1.respond_activity_task_failed(task_token,'','')

Page 29: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Workflow Steps

• Start workflow execution

• Ingest (transfer file to Amazon EC2 using

Tsunami UDP & upload to Amazon S3)

• Transcode file (multiple output formats)

• Select thumbnail

• Archive high-res file

• Signal completion of execution

Page 30: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Scalability & Fault Tolerance Analysis

Step Is Scalable? Is Fault Tolerant?

Ingest

Transcode

Archive to Amazon Glacier

Amazon Mechanical Turk

for thumbnails

Delivery with Amazon

CloudFront

Automation elements

Page 31: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Demo External references: MTurkCore, Boto

Page 32: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix’s Transcoding Transformation

Tony Koinov, Director Engineering, Netflix

Page 33: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix : The Netflix media pipeline

• MAPLE : New generation media

pipeline

• Concluding thoughts

33

Page 34: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media Pipeline

34

EC2

S3 EC2

S3

Open

Connect

EC2

FTP Media

Processing

Page 35: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Driving to Hollywood Game

35

Page 36: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Rules of the Game

• 200 MPH!

• Purchase only

• Quantities limited

• It breaks, you fix it

• Pay for parking

• Obsolete in 1 year

• 85 MPH

• Lease, cancel anytime

• Unlimited quantity

• It breaks, replace it, no charge

• No parking, just walk away

• Brand new each year

36

Page 37: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Industry Heritage : Optimize for Latency • Interactive editing

– Master creation

– DVD/Blu-ray authoring

– Edits for television

37

Page 38: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix 2008 • Custom data center

• Custom GPU encoders

• Fixed size

• New format needed – PC, Mac, Xbox

• Content library doubled

• Frequent HW failures

• Fail! Catalog incomplete

38

Page 39: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Fall 2009 – Launch Netflix PS3 Player

• First 100% AWS

transcode

• New format, unique to

Netflix PS3 player

• Encode recipe nailed

down late

• 3 weeks, transcode

entire catalog

39

Page 40: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix 2009 to Present

• US East AWS

• Variable sized EC2 farm

• S3 for storage

• Optimized for throughput, not

latency

• No more missed deadlines – Devices, catalogs, countries

40

Page 41: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Spring 2010 – Launch Netflix iPad Player

• Launch April 10th

• Apple approached us in mid February

• Grew EC2 farm to 4,000 instances

• Entire library transcoded in 2 weeks

• New format ready for launch

41

Page 42: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media Pipeline

42

EC2

S3 EC2

S3

Open

Connect

EC2

FTP Media

Processing

Page 43: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

For Netflix, Throughput Trumps Latency

• Think horizontal, not vertical

• Priuses move more people than Ferraris

• Frequent re-encodes of growing libraries

• Netflix is nimble because of AWS

43

Page 44: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

More Proof That Horizontal Wins

• New countries, new content

• Codec innovation

44

Page 45: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

AWS Handles Netflix Scale

• 6 regional catalogs

• 4 formats supported today – 1 VC-1, 3 H.264

– Multiple bit rates per format

• 10s of 1000s of hours of content

• Petabytes of S3 storage

45

Page 46: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix: The Netflix media pipeline

• MAPLE: New generation media

pipeline

• Concluding thoughts

46

Page 47: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

New Generation : Address Faults and Latency • More than 1 week 4K

transcode

• 2 – 3 days for HD transcode

• Fault intolerant

• Maintenance is challenging

• Often too slow – Day after broadcast

– Redelivery of damaged content

47

EC2: C1 Medium

S3

~700 Mbps

10-16 Mbps

Page 48: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

MAPLE : Massively Parallel Encoding

• 5-minute chunks – Close to real time

• Fault tolerant

• Easy maintenance

• Address low latency use cases – Day after broadcast

– Redelivery of damaged content

48 S3

EC2

Page 49: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Netflix Media in AWS

• Matrix : The Netflix media pipeline

• MAPLE : New generation media

pipeline

• Concluding thoughts

49

Page 50: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

We Would Do It All Over Again

• Don’t be fooled by IT cost

comparisons – We don’t administer the gear

• 6,000 EC2 instances

• Petabytes of storage

• High network traffic

– Storage is durable

– It is a moving target

• You cannot put a price on nimble

50

Page 51: Automated Media Workflows in the Cloud (MED304) | AWS re:Invent 2013

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

MED304