A short presentation at a CSC internal workshop of the prospects of using container technologies, especially Docker, in the context of High Performance Computing (HPC).
2. Making It Easy to Do Custom HPC
Environments
Or rather..
Olli-Pekka Lehto (@ople)
Services for Research Work Together Days
Dec 17th 2015
3. What is Docker?
• Helps to manage and run applications with
complex dependencies quickly and efficiently
• Management framework for Linux Containers
• Containers:
– Instances isolated within a namespace
– Kernel shared with host OS
– Resources guaranteed using Linux cgroups
• Grown into a complete ecosystem
– docker-swarm, docker-machine, docker-compose…
6. Interest in HPC is growing
http://www.lanl.gov/projects/apex/_assets/docs/APEX2020_draft_tech_specs_v1.0.pdf
http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=2112970
7. Why Docker
• Fast initialization (10-1000x vs VMs)
• Small memory overhead
• Efficient disk usage
• Bare-metal access to devices
• Built-in version control and repository support
– Simple sharing of containers
• Simple launch mechanism
– Can be run within batch job queue system
8. Bare-metal HPCCloud HPC
CSC-built
apps
User-built CSC-
compatible
apps
Hosting
Windows
Non-SLURM batch
queue system
Ubuntu
“I need/want Root”
VM image app
Web Servers
Preservation of SW stack
Secure access
Complex stack
Current Choices for HPC Workloads
9. Cloud HPC Container HPC Bare-metal HPC
CSC-built
apps
User-built CSC-
compatible
apps
Hosting
Windows
Non-SLURM batch
queue system
Ubuntu
“I need/want Root”
VM image app
Secure access
Web Servers
Preservation of SW stack
Complex stack
Containerized app
Future Choices for HPC Workloads
10. Example Case
“My HPC application needs Ubuntu”
• Alternative 1: Adapt the application
– Takes time and work
– May not be possible with ISV codes
• Alternative 2: Run it in the cloud
– You may get to play cluster admin!
– Scheduling is limited in OpenStack (no backfill etc.)
– Running a short job has large initialization overhead
– Performance penalties
• Alternative 3: Run it in a container
– No need to touch the application
– Nearly as easy to run as a normal job
– Very little overhead
– Can use the normal batch job scheduler
11. Challenges for HPC Use
• Security model is problematic
– Initially designed for server environments
• Only trusted users have shell access to server
– Containers launched as root
– Access to bare metal & device drivers
• Requires an overlay FS & kernel modules
– Relatively new Linux OS version needed
• Daemon must run on every compute node
• Low-level driver compatibility?
– GPU, InfiniBand, Lustre
12. Shifter
• Alternative to Docker daemon
– Adapts containers to HPC use
• Repacks into a new filesystem (squashfs)
– Integrates with batch job queue systems
• No need to run a daemon on compute nodes
• Developed by NERSC for Cori (Cray XC40)
– Pre-release version available
– Cray is productizing it
• Parallel jobs? Driver issues?
https://www.nersc.gov/research-and-development/user-defined-images/
https://bitbucket.org/berkeleylab/shifter
13. Next Steps / Ideas
• Piloting Shifter in 2016
– Sisu, Taito, Taito-shell
• Custom interactive containers in taito-shell
• Containerized CSC compute environment?
– Customers could run on their own laptops,
workstations and/or clusters
– Starting point for users’ own customizations
– Having our environment in config management
would make it easier
15. EasyBuild + Docker?
• Dockerfile can be used to define a container
– Simple flat file with a script-like syntax
– Specific to Docker
• Using EasyBuild with Docker?
– Target also VMs or bare-metal with same config
– Update portions of the stack easily
– Manage dependencies
– Leverage the rich set of EasyBuild configs
– Which way to do it?
• Using EasyBuild in a container or
• Building containers with EasyBuild?