AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

50
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Tim Sullivan and Ari Bixhorn, Panopto December 2, 2016 Searching Inside Video at Petabyte-Scale using Spot

Transcript of AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Page 1: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Tim Sullivan and Ari Bixhorn, Panopto

December 2, 2016

Searching Inside Video at

Petabyte-Scale using Spot

Page 2: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

What to Expect from the Session

Primer on inside-video search

Dive into how we use Spot to search video at scale

Overview of our cross-platform architecture

Best practices for scaling Spot Instances elastically

Page 3: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Searching Inside Videos

Page 4: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Video: A Last-mile Problem for Search

30 trillion web pagesEmail and documentsFile system contentsVideo?

Page 5: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

3 minutes, 53 seconds

Page 6: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

15 - 90 minutes

Page 7: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Title: An Introduction to Network Security

Description: A broad overview of network

security as defined by today’s hybrid

corporate WANs.

Tags: Network security, intrusion detection,

corporate WAN, firewall, authentication!?

Page 8: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

125 words per minute

5,625 words spoken

Page 9: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

The network is the entry point to your application. It provides the first gatekeepers that

control access to the various servers in your environment. Servers are protected with

their own operating system gatekeepers, but it is important not to allow them to be

deluged with attacks from the network layer. It is equally important to ensure that network

gatekeepers cannot be replaced or reconfigured by imposters. In a nutshell, network

security involves protecting network devices and the data that they forward.

The basic components of a network, which act as the front-line gatekeepers, are the

router, the firewall, and the switch. An attacker looks for poorly configured network

gatekeepers to exploit. Common vulnerabilities include weak default installation settings,

wide-open access controls, and unpatched devices.

50%

Page 10: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

5,625 words spoken

50% have no search value

2,813 words with search value

With10 tags, you’ve

only covered 0.3%

of valuable content

Page 11: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)
Page 12: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Six Types of Video Content Indexing

1. Manually entered metadata

2. Transcription

3. Automatic Speech Recognition (ASR)

4. Optical Character Recognition (OCR)

5. Slide extraction

6. Viewer notes

Page 13: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Demo – Video Search

Page 14: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

What Led Us to Spot?

Page 15: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Our Challenge

2013-01 2014-01 2015-01 2016-01

Running on AWS since 2009

Growing exponentially

Need to index every video – quickly & cost-efficiently

15 years of video (400TB) content uploaded monthly

Need to extract metadata out of 4PB of video

122M unique images have been indexed for OCR

>3TB SOLR index

* Numbers are inclusive of both enterprise and education accounts; numbers do not include on-premises customers

Page 16: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 1: On-Demand Amazon EC2 Instances

Hours of Content

$

Budget

Today

Cost-prohibitive to

offer to all

customers

Cost

Enable

ASR/OCR

Page 17: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Content Ingestion

Windows and

Mac Clients

Mobile Apps

Video Capture

Appliance

Remote Capture

Client

Other Ingestion

Content DiscoveryContent Management Content DeliveryContent

Consumption

Transcoding

Editing

Search Indexing

Governance

Option 2: Make Search an Upsell Capability

Analytics

Access Control

Video CMS

Public Hosting

SmartSearch™

Email and Social

Integrations

Search

Federation

Panopto

Streaming

CDN Integration

P2P Streaming

Panopto ECDN

WAN Op

Solutions

Interactive

Player

Panopto Mobile

Audio Podcast

Embedded

Player

Quizzing and

Polls

Page 18: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 3: Use Reserved Instances (RIs)

Theoretically would save costs

RIs work best for predictable workloads

30 sec SLA to begin indexing results in spiky demand curve vs. flat line

Upfront Monthly Effective

Hourly

Savings over

On-Demand

On-Demand

Hourly

$0 $213.16 $0.292 30%

$0.42$1304 $75.92 $0.253 40%

$2170 $0.00 $0.248 41%

c3.2

xla

rge

Page 19: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 3: Use Reserved Instances (RIs)

RI

Delayed

Start

WasteWaste

# Instances

t

Page 20: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 3: Use Reserved Instances (RIs)

RI

Overspend Overspend Overspend

Waste Waste

# Instances

t

Page 21: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 4: Buy Our Own Hardware

Page 22: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Option 5: Spot Instances

Excess EC2 capacity auctioned at steeply discounted prices

Spot Instances can be accessed on demand to meet our variable needs

On-Demand

Instances

Spot Instances added

when bid ≥ market

Page 23: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Pre-configured or custom machine images

Configure security and network access

Choose from instance types and locations

Use static IP endpoints

Attach persistent block storage to instances

Pay fixed price by the hour

On-Demand vs. Spot Instances

Pre-configured or custom machine images

Configure security and network access

Choose from instance types and locations

Use static IP endpoints

Attach persistent block storage to instances

Pay variable by the hour

Page 24: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Hours of Content

$

Budget

Today

On-Demand

Spot

Page 25: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

The Spot Auction

Set a bid price (for example, $0.27)

Instance runs while bid ≥ market price

Instances terminate bid < market price

Instances run

Instances terminate

Page 26: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Spot Considerations

Is your workload appropriate for potential volatility?

How to deal with a lack of capacity?

Can you run on a wide range of instance types

(via Spot Fleet)?

Look at historical bid prices for your instance types and

regions to estimate your savings.

Page 27: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Our Implementation

Page 28: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

The Importance of Windows to our

Architecture

Single codebase for cloud and on-premises

For on-prem customers, Windows is often a requirement

Windows is therefore critical to our cloud architecture as well

On-Prem Cloud

Page 29: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Panopto Cloud on AWSDistributed across Availability Zones

Page 30: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Cross-Platform Implementation

Web Servers

App Servers

Database

Speech Recognition

Apache SOLR

Page 31: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Using Auto Scaling Groups

Demand

Running Instances

Page 32: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Using AWS CloudFormation

Define ASGs and auto-scale rules

Page 33: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

From On-Demand to Spot

OnDemandLaunchConfig : {

Type : AWS::AutoScaling::LaunchConfiguration

Properties : {

SecurityGroups : { Ref : backendSecurityGrpIds },

IamInstanceProfile : { Ref : BackendEncoders...},

ImageID : { Ref : ami },

InstanceType : { Ref : instanceType },

InstanceMonitoring : false,

AssociatePublicIpAddress : true,

EbsOptimized : { Ref : ebsOptimized },

BlockDeviceMappings : [

{

DeviceName : xvdca

}

]

}

}

SpotLaunchConfig : {

Type : AWS::AutoScaling::LaunchConfiguration

Condition : CreateSpotGroup,

Properties : {

SecurityGroups : { Ref : backendSecurityGrpIds },

IamInstanceProfile : { Ref : BackendEncoders...},

ImageID : { Ref : ami },

InstanceType : { Ref : instanceType },

SpotPrice : { Ref : spotPrice },

InstanceMonitoring : false,

AssociatePublicIpAddress : true,

EbsOptimized : { Ref : ebsOptimized },

BlockDeviceMappings : [

{

DeviceName : xvdca

}

]

}

}

Page 34: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Bidding Strategy: Start SimpleSealed-bid, second-price auction

Set your bid to market price

of an On-Demand Instance

$0.14

$0.24

$0.34

On-Demand

Instance Price: $0.84

Page 35: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

The Challenge of Long-Running Jobs

The longer the job, the greater the

chance of instance revocation

Short window to determine how best

to failover (2 minutes)

Job Length

Cha

nce o

f In

sta

nce R

evoca

tion

Page 36: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Managing Jobs in the Face of Instance Revocation

$Market price

increase

Spot

“Spotter”

service

Wait until

T-30s Is Job

Done?

Yes

No Action

No1. Save State

2. Kill Job

3. Reallocate

!

Page 37: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Scaling Up with Predictive Job Modeling

1. Number of waiting jobs

2. Number of jobs currently processing

3. When current jobs expected to finish

4. Incoming jobs in the last <interval>

5. Number of jobs expected to arrive

6. Time to spin up new machine

7. SLA by job

Inputs

More processing

capacity required?

Data

Scientists

?

Page 38: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Amazon CloudWatch Dashboards

Page 39: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Scaling Down

Active

Active

Hold

Hold

If the rate of incoming and in-process jobs is less than current processing capacity,

then we’re in a scale-down state.

Identify instances, not processing jobs. Then identify those within 15 minutes of a billing hour.

Active

Hold

Scale

Down

Scale

Down

Hold

Active

Active HoldScale

Down

Scale

DownActive

Page 40: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

But what if there’s a deficit of Spot capacity?

Operate two Auto Scaling groups for each backend worker pool

One for Spot ASG, one for on-demand ASG

When actual Spot capacity < desired capacity, offload to on-demand

Automatic Speech Recognition

Spot

On-Demand

Page 41: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Spot Futures at Panopto

Page 42: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Move to Spot Fleet

Ability to launch the most cost-efficient

instance type for any job

Lower prices with diversified resources

Ability to apply custom weighting (create

capacity units based on our app needs)

Challenge: no accounting for the cost of

EBS

Challenge: lacking ASG’s health checks

Challenge: lacking ASG’s tag propagation

Page 43: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

From Immutable to Dynamic

Instance Configuration

Need to account for different processing capacity of different instance types

Will need to optimize number of workers being run in parallel on each VM

Substantial cost savings potential

Today: Immutable

Pro: Spin up instances quickly

Con: Could be more cost-efficient

Future: Dynamic

Choose the best Availability Zone,

instance type based on market price

Page 44: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Subdivide job

for grid processing

Future

Painful to cancel a 90% complete,

30 minute OCR indexing job

Today

Subdividing Jobs

Grid processing minimizes impact of Spot Instance loss

Also allows greater parallelization for faster user-visible time to task completion

Page 45: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

In Summary

Page 46: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

53%Cost Reduction

Page 47: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Scenarios Spot has Unlocked for Panopto

Scale our inside-video search

technology across our entire

customer base.

Accelerate business growth. The

money saved with Spot is being

reinvested in expanding our team.

Page 48: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

We’re hiring!https://www.panopto.com/careers/

[email protected]

Seattle, London, Pittsburgh

Page 49: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Thank you!

https://www.panopto.com/careers/

[email protected]

Page 50: AWS re:Invent 2016: Searching Inside Video at Petabyte Scale Using Spot (WIN307)

Remember to complete

your evaluations!