ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating...
Transcript of ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating...
![Page 1: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/1.jpg)
ECS & Docker:Secure Async Execution @
Brennan Saeta
![Page 2: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/2.jpg)
![Page 3: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/3.jpg)
The Beginnings — 2012
10courses
1 million learners
worldwide
4partners
![Page 4: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/4.jpg)
Education at Scale
1,800courses
18 million learners
worldwide
140partners
![Page 5: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/5.jpg)
Outline
• Evolution of Coursera’s nearline execution systems
• Next-generation execution framework: Iguazú
• Iguazú application deep dive: GrID — evaluating programming assignments
![Page 6: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/6.jpg)
Key Takeaways
• What is nearline execution, and why it is useful
• Best practices for running containers in production in the cloud
• Hardening techniques for securely operating container infrastructure at scale
![Page 7: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/7.jpg)
A history of nearline execution
![Page 8: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/8.jpg)
![Page 9: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/9.jpg)
Coursera Architecture (2012)
PHP Monolith
![Page 10: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/10.jpg)
Early days - Requirements
• Video re-encoding for distribution
• Grade computation for 100,000+ learners
• Pedagogical data exports for courses
![Page 11: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/11.jpg)
Coursera Architecture (2012)
PHP Monolith
![Page 12: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/12.jpg)
Cascade Architecture
PHP Monolith
PHP Monolith
Cascade
![Page 13: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/13.jpg)
Cascade Architecture
PHP Monolith
PHP Monolith
Cascade
Queue
![Page 14: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/14.jpg)
Upgrading to ScalaRe-architecting delayed execution for our 2nd generation learning platform.
![Page 15: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/15.jpg)
Upgrading to the JVM
• Leverage mature Scala & JVM ecosystems for code sharing
• JVM much more reliable (no memory leaks)
• New job model: scheduled recurring jobs.• Named: Saturn
![Page 16: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/16.jpg)
Saturn Architecture
Service A
Service B
Service C
C*
Online ServingScala/micro-service architecture
C*
![Page 17: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/17.jpg)
Saturn Architecture
Service A
Service B
Service C
C*
Online ServingScala/micro-service architecture
Saturn
C*
![Page 18: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/18.jpg)
Saturn Architecture
Service A
Service B
Service C
C*
Saturn
C*
ZK Ensemble
![Page 19: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/19.jpg)
Saturn Architecture
SaturnLeader ZK
Ensemble
Service A
Service B
Service C
C*C*
![Page 20: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/20.jpg)
Problems with Saturn
• Single master meant naïve implementation ran all jobs in same JVM• Huge CPU contention @ top of the hour
• OOM Exceptions & GC issues
![Page 21: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/21.jpg)
Enter: Docker
Containers allow for resource isolation!
CC-by-2.0 https://www.flickr.com/photos/photohome_uk/1494590209
![Page 22: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/22.jpg)
Supported Features
Platform
Saturn DockerAmazon
ECSIguazú
Run code ✅ ✅ ✅ ✅
Resource Isolation ❌ ✅ ✅ ✅
Clusters /HA ☑️ ❌ ✅ ✅
Greatdeveloper workflow
✅ ❌ ❌ ✅
ScheduledJobs ✅ ❌ ❌ ✅
![Page 23: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/23.jpg)
Supported Features
Platform
Saturn DockerAmazon
ECSIguazú
Run code ✅ ✅ ✅ ✅
Resource Isolation ❌ ✅ ✅ ✅
Clusters /HA ✅ ❌ ✅ ✅
Greatdeveloper workflow
✅ ❌ ❌ ✅
ScheduledJobs ✅ ❌ ❌ ✅
![Page 24: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/24.jpg)
Supported Features
Platform
Saturn DockerAmazon
ECSIguazú
Run code ✅ ✅ ✅ ✅
Resource Isolation ❌ ✅ ✅ ✅
Clusters /HA ✅ ❌ ✅ ✅
Greatdeveloper workflow
✅ ❌ ❌ ✅
ScheduledJobs ✅ ❌ ❌ ✅
![Page 25: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/25.jpg)
Supported Features
Platform
Saturn DockerAmazon
ECSIguazú
Run code ✅ ✅ ✅ ✅
Resource Isolation ❌ ✅ ✅ ✅
Clusters /HA ✅ ❌ ✅ ✅
Greatdeveloper workflow
✅ ❌ ❌ ✅
ScheduledJobs ✅ ❌ ❌ ✅
![Page 26: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/26.jpg)
Supported Features
Platform
Saturn DockerAmazon
ECS???
Run code ✅ ✅ ✅ ✅
Resource Isolation ❌ ✅ ✅ ✅
Clusters /HA ✅ ❌ ✅ ✅
Greatdeveloper workflow
✅ ❌ ❌ ✅
ScheduledJobs ✅ ❌ ❌ ✅
![Page 27: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/27.jpg)
Solution: Iguazú
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
![Page 28: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/28.jpg)
Solution: Iguazú
• Framework & service for asynchronous execution• Optimized Scala developer
experience for Coursera
• Unified scheduler supports:• Immediate execution (nearline)
• Scheduled recurring execution (cron-like)
• Deferred execution (run once @ time X)
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
![Page 29: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/29.jpg)
Iguazú Architecture
Iguazú Frontend
Iguazú Scheduler
Iguazú Backend
CassandraServices Services
Iguazú Admin
IguazúWorkers
SQS
ECS API
Devs
Users
![Page 30: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/30.jpg)
Iguazú Architecture
Iguazú Frontend
Iguazú Scheduler
Iguazú Backend
CassandraServices Services
Iguazú Admin
IguazúWorkers
SQSQueue
ECS API
Devs
Users
![Page 31: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/31.jpg)
Iguazú Architecture
Iguazú Frontend
Iguazú Scheduler
Iguazú Backend
CassandraServices Services
Iguazú Admin
IguazúWorkers
ECS API
Devs
Users
SQSQueue
![Page 32: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/32.jpg)
Iguazú Architecture
Iguazú Frontend
Iguazú Scheduler
Iguazú Backend
CassandraServices Services
Iguazú Admin
IguazúWorkers
ECS API
Devs
Users
ZK Ensemble
SQSQueue
![Page 33: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/33.jpg)
Iguazú Architecture
Iguazú Frontend
Iguazú Scheduler
Iguazú Backend
CassandraServices Services
Iguazú Admin
IguazúWorkers
ECS API
Devs
Users
ZK Ensemble
SQSQueue
![Page 34: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/34.jpg)
Autoscale, autoscale, autoscale!
![Page 35: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/35.jpg)
Autoscaling⇄ Iguazú⇆ ECS
IguazuECS APIAutoscaling
EC2Worker
EC2Worker
ShutdownLifecycle
Notification Poll WorkerJob Status
All finishedProceed
Term-inate EC2
Worker
![Page 36: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/36.jpg)
Failure in Nearline Systems
• Most jobs are non-idempotent
• Iguazú: At most once execution• Time-bounded delay
• Future: At least once execution• With caveats
![Page 37: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/37.jpg)
Iguazú adoption by the numbers
~100 jobs in production
>1000 runs per day
>100 different job schedules
![Page 38: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/38.jpg)
Iguazú Applications
Nearline Jobs
• Pedagogical Instructor Data Exports
• System Integrations• Course Migrations
Scheduled Recurring Jobs
• Course Reminders• System Integrations
• Payment reconciliation• Course translations
• Housekeeping• Build artifact archival• A/B Experiments
![Page 39: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/39.jpg)
While containers may help you on your journey, they are not themselves a destination.
CC-by-2.0 https://www.flickr.com/photos/usoceangov/5369581593
![Page 40: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/40.jpg)
Writing an Iguazu Job
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)
extends AbstractJob {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
val experiments = abClient.findForgotten()
logger.info(s"Found ${experiments.size} forgotten experiments.")
experiments.foreach { experiment =>
sendReminder(experiment.owners, experiment.description)
}
}
}
![Page 41: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/41.jpg)
Writing an Iguazu Job
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)
extends AbstractJob {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
val experiments = abClient.findForgotten()
logger.info(s"Found ${experiments.size} forgotten experiments.")
experiments.foreach { experiment =>
sendReminder(experiment.owners, experiment.description)
}
}
}
![Page 42: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/42.jpg)
Writing an Iguazu Job
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)
extends AbstractJob {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
val experiments = abClient.findForgotten()
logger.info(s"Found ${experiments.size} forgotten experiments.")
experiments.foreach { experiment =>
sendReminder(experiment.owners, experiment.description)
}
}
}
![Page 43: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/43.jpg)
Writing an Iguazu Job
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)
extends AbstractJob {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
val experiments = abClient.findForgotten()
logger.info(s"Found ${experiments.size} forgotten experiments.")
experiments.foreach { experiment =>
sendReminder(experiment.owners, experiment.description)
}
}
}
![Page 44: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/44.jpg)
Writing an Iguazu Job
class AbReminderJob @Inject() (abClient: AbClient, email: EmailAPI)
extends AbstractJob {
override val reservedCpu = 1024 // 1 CPU core
override val reservedMemory = 1024 // 1 GB RAM
def run(parameters: JsValue) = {
val experiments = abClient.findForgotten()
logger.info(s"Found ${experiments.size} forgotten experiments.")
experiments.foreach { experiment =>
sendReminder(experiment.owners, experiment.description)
}
}
}
![Page 45: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/45.jpg)
Testing an Iguazu job
![Page 46: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/46.jpg)
The Hollywood Principle applies to distributed systems.
CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327
![Page 47: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/47.jpg)
Deploying a new Iguazu Job
• Developer• merge into master… done
• Jenkins Build Steps• Compile & package job JAR
• Prepare Docker image
• Pushes image into registry
• Register updated job with Amazon ECS API
![Page 48: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/48.jpg)
Invoking an Iguazú Job
// invoking a job with one function call
// from another service via REST framework RPC
val invocationId = iguazuJobInvocationClient
.create(IguazuJobInvocationRequest(
jobName = "exportQuizGrades",
parameters = quizParams))
![Page 49: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/49.jpg)
A clean environment
increases reliability.CC-by-2.0 https://www.flickr.com/photos/raindog808/354080327
![Page 50: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/50.jpg)
Evaluating Programming AssignmentsAn application of Iguazú
![Page 51: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/51.jpg)
![Page 52: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/52.jpg)
![Page 53: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/53.jpg)
Design Goals
Elastic Infrastructure
No Maintenance
Near Real-time Secure Infrastructure
![Page 54: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/54.jpg)
Design Goals
Elastic Infrastructure
No Maintenance
Near Real-time Secure Infrastructure
![Page 55: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/55.jpg)
Design Goals
Elastic Infrastructure
No Maintenance
Near Real-time Secure Infrastructure
![Page 56: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/56.jpg)
Solution: GrID
Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0
• Service + framework for gradingprogramming assignments
• Builds on Iguazú
• Named for Tron’s “digital frontier”• Backronym: Grading Inside Docker
![Page 57: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/57.jpg)
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS APIs
Grading MachinesVPC Firewalls
Coursera Production Account Coursera GrID Grading Account
![Page 58: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/58.jpg)
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS APIs
Grading MachinesVPC Firewalls
Coursera Production Account Coursera GrID Grading Account
![Page 59: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/59.jpg)
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS API
Grading MachinesVPC Firewalls
Production Acct GrID Grading Account
![Page 60: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/60.jpg)
High-level GrID Architecture
Learners
GrID
Iguazú
S3 Bucket
ECS API
Grading Machines
VPC Firewalls
Production Acct GrID Grading Account
![Page 61: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/61.jpg)
Design Goals
Elastic Infrastructure
No Maintenance
Near Real-time Secure Infrastructure
![Page 62: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/62.jpg)
Programming Assignments
![Page 63: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/63.jpg)
The Security Challenge
Compiling and running untrusted, arbitrary code on our cluster in near real time.
Would you like to compile and run C code from random
people on the Internet on your servers?
![Page 64: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/64.jpg)
FROM redis
FROM ubuntu:latest
FROM jane’s-image
![Page 65: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/65.jpg)
Security Assumptions
• Run arbitrary binaries
• Instructor grading scripts may have vulnerabilities• ∴ Grading code is untrusted
• Unknown vulnerabilities in Docker and Linux name-spacing and/or container implementation
![Page 66: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/66.jpg)
Security Goals
Prevent submitted code from:• impacting the evaluation of other submissions.
• disrupting the grading environment (e.g., DoS)
• affecting the rest of the Coursera learning platform
![Page 67: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/67.jpg)
Grading assignment submissions
CC-by-2.0 https://www.flickr.com/photos/dherholz/4367511580/
![Page 68: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/68.jpg)
![Page 69: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/69.jpg)
CPU CPU CPU CPU
RAM
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk
![Page 70: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/70.jpg)
CPU CPU CPU CPU
RAM
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk
![Page 71: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/71.jpg)
CPU cgroups CPU cgroups
RAM — cgroups
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk
![Page 72: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/72.jpg)
CPU cgroups CPU cgroups
RAM — cgroups
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk
![Page 73: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/73.jpg)
CPU cgroups CPU cgroups
RAM — cgroups
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk — blkio limits & btrfs quotas
![Page 74: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/74.jpg)
CPU cgroups CPU cgroups
RAM — cgroups
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel
Disk — blkio limits & btrfs quotas
![Page 75: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/75.jpg)
Attacks: Kernel Resource Exhaustion
• Open file limits per container (nofile)
• nproc Process limits
• Limit kernel memory per cgroup
• Limit execution time
![Page 76: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/76.jpg)
![Page 77: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/77.jpg)
CPU cgroups CPU cgroups
RAM — cgroups
Alice’s Container
Alice’s Submission
Grader
Bob’s Container
Bob’s Submission
Grader
Mallory’s Container
Mallory’s Submission
Grader
Kernel — cgroups, ulimits
Disk — blkio limits & btrfs quotas Network
![Page 78: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/78.jpg)
Attacks: Network attacks
Attacks:
• Bitcoin mining
• DoS attacks on other systems
• Access Amazon S3 and other AWS APIs
Defense:
• Deny network access
![Page 79: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/79.jpg)
Docker Network Modes
NetworkDisabled too restrictive• Some graders require local loopback• Feature also deprecated
--net=none + deny net_admin + audit network
• Isolation via Docker creating an independent network stack for each container
github.com/coursera/amazon-ecs-agent
![Page 80: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/80.jpg)
CC-by-2.0 https://www.flickr.com/photos/valentinap/253659858
![Page 81: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/81.jpg)
CC-by-2.0 https://www.flickr.com/photos/jessicafm/2834658255/
![Page 82: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/82.jpg)
CC-by-2.0 https://www.flickr.com/photos/donnieray/11501178306/in/photostream/
![Page 83: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/83.jpg)
Defense in Depth
• Mandatory Access Control (App Armor)• Allows auditing or denying access to a
variety of subsystems
• Drop capabilities from bounding set• No need for NET_BIND_SERVICE,
CAP_FOWNER, MKNOD
• Deny root within container
![Page 84: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/84.jpg)
Deny Root Escalations
• We modify instructor grader images
before allowing them to be run
• Clears setuid
• Inserts C wrapper to drop privileges from
root and redirect stdin/stdout/stderr
• Run cleaning job on another Iguazú
cluster
• Run Docker in Docker!
• Docker 1.10 adds User Namespaces
![Page 85: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/85.jpg)
If all else fails…
• Utilizes VPC security measures to
further restrict network access
• No public internet access
• Security group to restrict
inbound/outbound access
• Network flow logs for auditing
• Separate AWS account
• Run in an Auto Scaling group
• Regularly terminate all grading EC2
instances
![Page 86: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/86.jpg)
Other Security Measures
• Utilize AWS CloudTrail for audit logs
• Third-party security monitoring
(Threat Stack)• No one should log in, so any TTY is an alert
• Penetration testing by third-party red
team (Synack)
![Page 87: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/87.jpg)
Lessons Learned - GrID
• Building a platform for code execution is hard!
• Carefully monitor disk usage
• Run the latest kernels• Latest security patches
• btrfs wedging on older kernels• Default Ubuntu 14.04 kernel not new
enough!
![Page 88: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/88.jpg)
Reliable deploytooling pays for itself.
![Page 89: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/89.jpg)
Thank you!Brennan Saeta
github/saeta@bsaeta
Frank Chengithub/frankchn
GrID lead Iguazú Lead
![Page 90: ECS & Docker: Secure Async Execution · •Iguazú application deep dive: GrID —evaluating programming assignments. Key Takeaways •What is nearline execution, and why it is useful](https://reader030.fdocuments.in/reader030/viewer/2022041013/5ec460a411d3e40cc173fc07/html5/thumbnails/90.jpg)
Questions?Brennan Saeta
github/saeta@bsaeta
Frank Chengithub/frankchn
GrID lead Iguazú Lead