Docker for scientists
-
Upload
krzysztof-gorgolewski -
Category
Education
-
view
550 -
download
0
Transcript of Docker for scientists
DOCKER FOR SCIENTISTSCHRIS GORGOLEWSKI
neurohackweek.github.io/docker-for-scientists/
IN THE BEGINNING – VIRTUAL MACHINES
IN THE BEGINNING – VIRTUAL MACHINES
• A WHOLE COMPUTER SIMULATED IN SOFTWARE• SLOW BUT THIS IS IMPROVING)• BIG (NOT SHARING ANY SOFTWARE WITH HOST
MACHINE)• FULLY ENCAPSULATED (YOU DECIDE WHAT
“HARDWARE” TO SIMULATE – NUMBER OF CORES, MEMORY ETC.)
• TAKES TIME TO BOOT, SUSPEND, RESUME…
IN THE BEGINNING – VIRTUAL MACHINES
• IN THE BEGINNING MOSTLY USED FOR CROSS PLATFORM DEVELOPMENT AND TESTING
• SOME PROPOSED USING IT TO CAPTURE SOFTWARE DEPENDENCIES IN CONTEXT OF BIOINFORMATICS (ANGIUOLI ET AL. 2011)
• GOOD FOR SETTING UP WORKSHOP (HTTPS://GITHUB.COM/POLDRACK/FMRI-ANALYSIS-VM)
• CONDORHT SUPPORTS RUNNING JOBS IN VIRTUAL MACHINES
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computingAngiuoli
THE NEEDS OF THE INDUSTRY
• RENTING VIRTUAL SERVERS IN THE CLOUD BECAME EASIER AND CHEAPER• THAT MEANT PROVISIONING NEW SYSTEMS
WAS MORE COMMON• MULTIPLE SERVICES (HTTP, SQL, DNS ETC.)
WERE OFTEN SPREAD ACROSS MULTIPLE HOSTS
WHAT ARE CONTAINERS?
• CONTAINERS ARE A FORM OF KERNEL LEVEL VIRTUALIZATION• THEY ARE VERY SIMILAR TO A VIRTUAL
MACHINE, BUT INSTEAD OF SIMULATING HARDWARE ALL VIRTUAL INSTANCE (“CONTAINER”) SHARE THE KERNEL WITH THE HOST MACHINE• IT’S FASTER THAN VM• THE IMAGES ARE MORE OR LESS THE SAME
SIZE
WHAT IS DOCKER?
• DOCKER IS THE MOST POPULAR CONTAINER IMPLEMENTATION• IT CONSISTS OF:• DOCKER ENGINE (RUNS CONTAINERS)• DOCKER HUB (CENTRALIZED WEB SERVICE
FOR STORING AND SHARING CONTAINER IMAGES)
WHY SHOULD SCIENTISTS CARE?
CONTAINERS CAN BE USEFUL FOR SCIENTIST IN THE FOLLOWING SITUATIONS:• CAPTURING AND SHARING COMPLICATED SET
OF BINARY DEPENDENCIES• VIRTUALENV ON STEROIDS
• MAINTAINING THE SAME SOFTWARE STACK DURING A LONGITUDINAL STUDY OR BETWEEN MACHINES• HELPING OTHER RESEARCHERS REPRODUCE
YOUR FINDINGS
PULLING AND RUNNING A CONTAINER
NEUROHACKWEEK.GITHUB.IO/DOCKER-FOR-SCIENTISTS/03-PULLING_AND_RUNNING/
BUILDING YOUR OWN CONTAINER IMAGE
NEUROHACKWEEK.GITHUB.IO/DOCKER-FOR-SCIENTISTS/04-BUILDING_IMAGES/
RUNNING CONTAINERS ON CLUSTERS/HPCS
NEUROHACKWEEK.GITHUB.IO/DOCKER-FOR-SCIENTISTS/05_SINGULARITY/
RUNNING CONTAINERS ON CLUSTERS/HPCS
• DOCKER HAS LIMITATIONS:• IT WAS DESIGNED FOR THE CLOUD,
WHERE YOU ARE IN TOTAL CONTROL• REQUIRES MODERN KERNEL VERSION• ALLOWS USERS TO ELEVATE PERMISSIONS
RUNNING CONTAINERS ON CLUSTERS/HPCS
• THE ADVANCED DOCKER FEATURES ARE USEFUL FOR:• NETWORKING MANAGEMENT• SANDBOXING RESOURCES• MAPPING USERNAMES
• ALL SCIENTISTS CARE ABOUT IS PORTABILITY (CAPTURING BINARY DEPENDENCIES)
RUNNING CONTAINERS ON CLUSTERS/HPCS
• SINGUALRITY IS A CONTAINER FRAMEWORK THAT• WAS BUILD GROUND UP TO SUPPORT
CLUSTERS/HPCS• HAS MINIMAL REQUIREMENTS• RUNS ON LEGACY KERNELS• DOES NOT ELEVATE PERMISSIONS• ALLOWS IMPORTING DOCKER IMAGES
RUNNING CONTAINERS ON CLUSTERS/HPCS
NEUROHACKWEEK.GITHUB.IO/DOCKER-FOR-SCIENTISTS/05_SINGULARITY/
SUMMARY
CONTAINERS CAN HELP YOU:• MANAGE AND SHARE YOUR SCIENTIFIC
SOFTWARE STACK• SPECIFIC CONFLICTING DEPENDENCIES FOR
DIFFERENT PROJECTS• KEEPING THE STACK UNCHANGED OVER YEARS FOR
A LONGITUDINAL STUDY• PROTOTYPE ANALYSIS ON LOCAL MACHINE AND
SCALE USING THE SAME STACK ON A CLUSTER• MAKE YOUR RESEARCH MORE REPRODUCIBLE
NEUROHACKWEEK 2016#WOWMUCHCODEVERYBRAINS