Rendering Takes Flight

55
Three Steps to Modern Media Asset Management with Active Archive

Transcript of Rendering Takes Flight

Three Steps to Modern Media Asset Management with Active Archive

Housekeeping

• Recording• Available on-demand approximately 5 minutes after

today’s presentation• Resources• Questions• Please rate this webinar

Agenda

• Learn about Moonbot & Taking Flight• Hear specifics about Moonbot Studios

– Why the cloud?– Goals– Details, details, details– Result Statistics

• Using Google Cloud Platform for Rendering• Ensuring Accessibility and Performance

Today’s Speakers

Sara HebertDirector of Marketing

Moonbot Studios

Jeff KemberCloud Solutions Architect

Google Cloud Platform

Aaron WetheroldSystems Engineer

Avere Systems

Brennan ChapmanPipeline SupervisorMoonbot Studios

Moonbot Studios

Sara Hebert, Director of Marketing, Moonbot Studios

• FOUNDED: 2009• LOCATION: Shreveport, Louisiana• EMPLOYEES: 50+• AWARDS: 239, including…

– 1 Oscar– 4 Emmys– 14 Cannes Lions– 12 Clios– 5 Webbys

“Top-shelf creative pedigree” – FAST COMPANY“Storytelling of the future” – LA TIMES

Taking Flight to the Cloud

Brennan Chapman, Pipeline Supervisor, Moonbot Studios

Why we decided to use the cloud

• 3 overlapping projects with high volume of renders– 30 second spot– 5 minute short film– 11 minute pilot episode

• No space on-site for required equipment– Additional space, power, networking needed for on-site

• Faster Scaling

Our Current Setup

• Moonbot has about 50 people on staff• 12 Lighters across the 3 projects• Software

– Maya 2015– Arnold– Nuke

• Qube Render Management• Shotgun

Our Current Setup

• Isilon Storage NAS – 100TB• 32 IBM blades

– 2x E5450 @ 3Ghz– 32GB RAM

• 40 Dell FX2 (Rentals)– 2x E5-2650 v3 @ 2.3Ghz– 64GB RAM

Arnold Notes

• We found it more cost effective to use Arnold licenses on fastest nodes.

• Old IBM blades were 4x slower than the new FX2s

Targets We Set

• Overnight Renders• Compute power should scale hourly if needed• Add up to 2,500 additional cores• Easy to use for both pipeline and artists• Need forecasting system to predict when we need cloud

capacity• Minimal amount of new tools• Consistency between setup of on-site render nodes and

cloud

Targets We Set

• Use Qube for Render Management• Only use cloud for Arnold renders, keep Nuke renders on-

site

Previous Experiences with Cloud

• Zync for Silent, a project for Dolby Laboratories– Rendered a few thousand

frames– Worked well, but added

complexity for artists

How to Handle Storage

• Projects required transfer of about 12TB• Limited to a 100Mbps connection• Need to copy assets to cloud• Need to copy results back on-site

How to Handle Storage

• Avere solved most of this for us• Used a vFXT Edge filer

How to Handle Storage

• Presents the Avere storage to render nodes the same way on-site renders nodes see on-site storage

• Handles all transfers to and from cloud for assets and rendered images

• Uses clustered system to spread load across multiple nodes• Simple to setup• Mounts via NFS

How to Handle Storage

• Faster and easier to setup using Avere to automatically load only the required files needed to perform each render

Vs.

• Developing scripts that find and copy all dependencies, then start the render

How to Handle Storage

• We used the cache read and write with 30 second write-back

• Using the write cache allowed the render nodes finish faster, transfer was then completed by the Avere cluster

• ISSUE: No way to get notification from Avere when the writeback transfers were finished.

Cloud Render Node Configuration

• Cent OS• Use fstab to mount nfs storage from Avere just like on-site

nodes mount the nfs storage from the Isilon• Used Ansible to configure our instance, and then saved it to

an image in Google Cloud

Preemptible vs On-Demand Instances

• On-Demand– Higher cost– Availability guaranteed

• Preemptible instances– Lower Cost, 1/3 of price– Restarts every 24 hours– Availability not guaranteed

Instance Groups

• Google Cloud offers a system for managing pools of instances using Instance Groups and Templates.

• This automatically handles starting and stopping instances.• When using pre-emptible instances, they are automatically

added/removed based on their availability.• Using this system we had spin-up times at around 2-3

minutes.

Security

• No public IP’s on render nodes• All traffic goes through VPN tunnel

Arnold Notes

• Not all n1-standard-32 nodes are the same• Haswell was best performance, only available in certain

zones• Arnold was 20% faster on the Haswell vs Ivy Bridge

Qube Integration

• Utilized startup and shutdown scripts from Google to facilitate adding and removing workers to Qube

• Instance Groups create instances with new names every time

• Need to register and unregister workers on startup and shutdown

• Didn’t have to configure much else, everything else was setup just like it is on-site

Forecasting Renders

• We utilized preview frames to estimate render times for the farm

• Preview frames usually include the first, middle and last frames of each shot sent to the farm

• Preview frames have the highest priority on the farm

Forecasting Workflow

• Artists submit their jobs to the farm before they leave.• Pipeline team waits until preview frames have been

rendered• Run forecasting tool which calculates the # of cloud render

nodes required to finish the renders by the next morning• Spin-up the required amount of nodes on Google Cloud• Shutdown the instances the next morning

Tile Rendering

• On submission of jobs to Qube we built support to split frames into tiles.

• Goal with tiles is to decrease maximum time for a render to about 30 minutes.

• Less risk of losing work in Pre-emptible instances • Allows frames to finish quicker, especially preview frames

Stats

• Ping time to Google Cloud Iowa data: 35ms• Ingress: 12TB• Egress: 2TB• 60% of renders were done on Google Cloud• 4,000 Render Jobs Total• 3,000 Arnold Jobs• ~1,800 were completed on Google Cloud

Stats

• 124,000 Frames• 50,000 Render Hours• Average render time per frame: ~30min

Results

• We met our deadlines!• Worked really well once we got the VPN connection figured

out• Will save a lot of time planning and budgeting• Allows render farm to work around the schedule instead of

the schedule around the farm.• Saves time managing local hardware• Allows us to stay nimble as a small company, don’t have to

invest large capital into render farm

Google Cloud Platform for Rendering & Animation

Jeff Kember, Cloud Solutions Architect,Google Cloud Platform

Cloud Accessibility

Aaron Wetherold, Systems Engineer, Avere Systems

Render Cloud Bursting Use Case

Customer Requirements• Overflow render capacity

into Google Cloud Platform• No copying of data back

and forth • Avoid deploying additional

hardware• No application rewrite• Provide flexibility to adjust

to project demands

Avere Solutions• Deploy the scalable vFXT

cluster within GCE• Access and accelerate existing

on-prem NAS and GCS storage while hiding access latency

• Dynamically tier active working set into SSD and DRAM

• Standard NFS and SMB client access

• Ease of deployment, management and expansion

The Challenge

Google Cloud Storage

Google Compute Engine

On-Prem Storage

NAS

On-Prem Compute

Virtual Render Farm

Render Farm

Artist Workstations

The Avere vFXT Solution

Google Cloud Storage

Google Compute Engine

On-Prem Storage

NAS

On-Prem Compute

Virtual Render Farm

Render Farm

Artist Workstations

Virtual FXT

Avere Deployment Flexibility

Google Cloud Storage

Google Compute Engine

Physical FXT

On-Prem Storage

NAS Object

On-Prem Compute

Virtual Render Farm

Render Farm

Artist Workstations

Virtual FXT

Avere Product Line

Virtual FXTP

erfo

rman

ce

Google

Google

Hardware n1-highmem-8

n1-highmem-32 FXT 3200 FXT 3850 FXT 4850

DRAM (GB) 52 208 96 288 288

SSD (TB)1

(persistent)or 1.5 (local)

4 (persistent) - 0.8 4.8

SAS (TB) - - 4.8 7.8 -Network 10GbE 10GbE 2x10GbE, 6x1GbE

Physical FXT

4850

3850

3200

Protocols•To Client: NFSv3 (TCP/UDP), CIFS (SMB1.0 & 2.0)•To Core Filer: NFSv3 (TCP), S3 APIClustering•Cluster from 3 to 50 FXT nodes for performance and capacity scaling•HA failover, mirrored writes, redundant network ports & powerManagement•GUI, analytics, email alerts, SNMP, XML-RPC interface, policy-based management

•Persistent and Local SSD Support•Standard, DRA and Nearline GCS Support•Per minute billing

Next Steps

• Ask questions• Review the attachments section for relevant resources• Rate this webinar

Today’s Speakers

Sara HebertDirector of Marketing

Moonbot Studios

Jeff KemberCloud Solutions Architect

Google Cloud Platform

Aaron WetheroldSystems Engineer

Avere Systems

Brennan ChapmanPipeline SupervisorMoonbot Studios

Moonbotstudios.comcloud.google.com Averesystems.com