Skip to content

Containers

Awesome Linux Containers

#StandWithBelarus Voices From Belarus Stand With Ukraine

Awesome

About the Author

Hello, everyone! My name is Filipp, and I have been working with high load distribution systems and services, security, monitoring, continuous deployment and release management (DevOps domain) since 2012.

One of my passions is developing DevOps solutions and contributing to the open-source community. By sharing my knowledge and experiences, I strive to save time for both myself and others while fostering a culture of collaboration and learning.

I had to leave my home country, Belarus, due to my participation in protests against the oppressive regime of dictator Lukashenko, who maintains a close affiliation with Putin. Since then, I'm trying to build my life from zero in other countries.

If you are seeking a skilled DevOps lead or architect to enhance your project, I invite you to connect with me on LinkedIn or explore my valuable contributions on GitHub. Let's collaborate and create some cool solutions together :)

Foundations

  • OPEN CONTAINER INITIATIVE
    The Open Container Initiative is a lightweight, open governance structure, to be formed under the auspices of the Linux Foundation, for the express purpose of creating open industry standards around container formats and runtime.
  • Cloud Native Computing Foundation
    The Cloud Native Computing Foundation will create and drive the adoption of a new set of common container technologies informed by technical merit and end user value, and inspired by Internet-scale computing.
  • Cloud Foundry Foundation
    The Cloud is our foundry.

Specifications

  • Open Container Specifications
    This project is where the Open Container Initiative Specifications are written. This is a work in progress.
  • App Container basics
    App Container (appc) is an open specification that defines several aspects of how to run applications in containers: an image format, runtime environment, and discovery protocol.
  • Systemd Container Interface
    Systemd is a suite of basic building blocks for a Linux system. It provides a system and service manager that runs as PID 1 and starts the rest of the system. If you write a container solution, please consider supporting the following interfaces.
  • Nulecule Specification
    Nulecule defines a pattern and model for packaging complex multi-container applications and services, referencing all their dependencies, including orchestration metadata in a container image for building, deploying, monitoring, and active management.
  • Oracle microcontainer manifesto
    This is not a new container format, but simply a specific method for constructing a container that allows for better security and stability.
  • Cloud Native Application Bundle Specification
    A package format specification that describes a technology for bundling, installing, and managing distributed applications, that are by design, cloud agnostic.

Clouds

  • Amazon EC2 Container Service
    Container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances.
  • Google Cloud Platform
    Run Docker containers on Google Cloud Platform, powered by Kubernetes. Google Container Engine actively schedules your containers, based on declared needs, on a managed cluster of virtual machines.
  • Jelastic
    Unlimited PaaS and Container-Based IaaS in a Joint Cloud Solution for DevOps.
  • Joyent
    High-Performance Container-Native Infrastructure for Today's Demanding Real-Time Web and Mobile Applications.
  • Kubernetes
    Manage a cluster of Linux containers as a single system to accelerate Dev and simplify Ops.
  • Mesosphere
    The Mesosphere Datacenter Operating System (DCOS) is a new kind of operating system that spans all of the machines in your datacenter or cloud. It provides a highly elastic, and highly scalable way of deploying applications, services and big data infrastructure on shared resources.
  • OpenShift Origin
    OpenShift Origin is a distribution of Kubernetes optimized for continuous application development and multi-tenant deployment. Origin adds developer and operations-centric tools on top of Kubernetes to enable rapid application development, easy deployment and scaling, and long-term lifecycle maintenance for small and large teams.
  • Warden
    Manages isolated, ephemeral, and resource controlled environments. Part of Cloud Foundry - the open platform as a service project.
  • Virtuozzo
    A platform, built on Virtuozzo containers, that can be easily run on top of any bare-metal or virtual servers in any public or private cloud, to automate, optimize, and accelerate internal IT and development processes.
  • Rancher
    Rancher is a complete, open source platform for deploying and managing containers in production. It includes commercially-supported distributions of Kubernetes, Mesos, and Docker Swarm, making it easy to run containerized applications on any infrastructure.
  • Docker Swarm
    Docker Swarm is native clustering for Docker.
  • Azure Container Service
    Azure Container Service optimizes the configuration of popular open source tools and technologies specifically for Azure.
  • CIAO
    Cloud Integrated Advanced Orchestrator for Intel Clear Linux OS.
  • Alibaba Cloud Container Service
    Container Service is a high-performance and scalable container application management service that enables you to use Docker and Kubernetes to manage the lifecycle of containerized applications.
  • Nomad
    HashiCorp Nomad is a single binary that schedules applications and services on Linux, Windows, and Mac. It is an open source scheduler that uses a declarative job file for scheduling virtualized, containerized, and standalone applications.

Operating Systems

  • CoreOs
    A lightweight Linux operating system designed for clustered deployments providing automation, security, and scalability for your most critical applications.
  • RancherOS
    RancherOS is a tiny Linux distro that runs the entire OS as Docker containers.
  • Project Atomic
    Project Atomic provides the best platform for your Linux Docker Kubernetes (LDK) application stack. Use immutable infrastructure to deploy and scale your containerized applications.
  • Snappy Ubuntu Core
    Ubuntu Core is the perfect system for large-scale cloud container deployments, bringing transactional updates to the world’s favourite container platform.
  • ResinOS
    A host OS tailored for containers, designed for reliability, proven in production.
  • Photon
    Photon OS is a minimal Linux container host designed to have a small footprint and tuned for VMware platforms. Photon is intended to invite collaboration around running containerized and Linux applications in a virtualized environment.
  • Clear Linux Project
    The Clear Linux Project for Intel Architecture is a distribution built for various Cloud use cases.
  • CargOS
    CargOS is a new lightweight, open source, platform for Docker hosts that aims for speed, manageability and security. Releases are built for 64-bit Intel/AMD CPUs.
  • OSv
    OSv is the open source operating system designed for the cloud. Built from the ground up for effortless deployment and management, with superior performance.
  • HypriotOS
    Minimal Debian-based operating systems that is optimized to run Docker. It made it dead easy use Docker on any Raspberry Pi.
  • MCL
    MCL (Minimal Container Linux) is a from scratch minimal Linux OS designed specifically to run containers. It has a small footprint of ~50MB and boots within seconds. It is currently optimized to run Docker.

Hypervisors

  • Docker
    An open platform for distributed applications for developers and sysadmins. Standard de facto.
  • LXD
    Daemon based on liblxc offering a REST API to manage LXC containers.
  • OpenVZ
    OpenVZ is container-based virtualization for Linux. OpenVZ creates multiple secure, isolated Linux containers (otherwise known as VEs or VPSs) on a single physical server enabling better server utilization and ensuring that applications do not conflict.
  • MultiDocker
    Create a secure multi-user Docker machine, where each user is segregated into an indepentent container.
  • Lithos
    Lithos is a process supervisor and containerizer for running services. It is not intended to be system init, but rather tries to be a base tool to build container orchestration.
  • containerd
    A container runtime which can manage a complete container lifecycle - from image transfer/storage to container execution, supervision and networking.

Containers

  • runc
    runc is a CLI tool for spawning and running containers according to the OCS specification.
  • Bocker
    Docker implemented in around 100 lines of bash.
  • Rocket
    rkt (pronounced "rock-it") is a CLI for running app containers on Linux. rkt is designed to be composable, secure, and fast. Based on AppC specification.
  • LXC
    LXC is the well known set of tools, templates, library and language bindings. It's pretty low level, very flexible and covers just about every containment feature supported by the upstream kernel.
  • Vagga
    Vagga is a fully-userspace container engine inspired by Vagrant and Docker, specialized for development environments.
  • libct
    Libct is a containers management library which provides convenient API for frontend programs to rule a container during its whole lifetime.
  • libvirt
    A big toolkit to interact with the virtualization capabilities of recent versions of Linux (and other OSes).
  • systemd-nspawn
    Spawn a namespace container for debugging, testing and building. Part of systemd.
  • porto
    The main goal of Porto is to create a convenient, reliable interface over several Linux kernel mechanism such as cgroups, namespaces, mounts, networking etc.
  • udocker
    A basic user tool to execute simple containers in batch or interactive systems without root privileges.
  • Let Me Contain That For You
    LMCTFY is the open source version of Google’s container stack, which provides Linux application containers.
  • cc-oci-runtime
    Intel Clear Linux OCI (Open Containers Initiative) compatible runtime.
  • railcar
    Railcar is a rust implementation of the opencontainers initiative's runtime spec. It is similar to the reference implementation runc, but it is implemented completely in rust for memory safety without needing the overhead of a garbage collector or multiple threads.
  • Kata Containers
    Kata Containers is a new open source project building extremely lightweight virtual machines that seamlessly plug into the containers ecosystem.
  • plash
    Lightweight, rootless containers.
  • runv
    Hypervisor-based (KVM, Xen, QEMU) Runtime for OCI. Security by isolation.
  • podman
    Full management of container lifecycle.
  • firecracker
    Firecracker runs workloads in lightweight virtual machines, called microVMs, which combine the security and isolation properties provided by hardware virtualization technology with the speed and flexibility of containers.
  • sysbox
    Sysbox is a "runc" that creates secure (rootless) containers / pods that run not just microservices, but most workloads that run in VMs (e.g., systemd, Docker, and Kubernetes), seamlessly.
  • youki
    A container runtime written in Rust.
  • footloose
    Containers that look like Virtual Machines.

Sandboxes

  • Firejail
    Firejail is a SUID sandbox program that reduces the risk of security breaches by restricting the running environment of untrusted applications using Linux namespaces, seccomp-bpf and Linux capabilities.
  • NsJail
    NsJail is a process isolation tool for Linux. It makes use of the namespacing, resource control, and seccomp-bpf syscall filter subsystems of the Linux kernel.
  • Subuser
    Securing the Linux desktop with Docker.
  • Snappy
    Snappy Ubuntu Core is a new rendition of Ubuntu with transactional updates - a minimal server image with the same libraries as today’s Ubuntu, but applications are provided through a simpler mechanism.
  • xdg-app
    xdg-app is a system for building, distributing and running sandboxed desktop applications on Linux.
  • Bubblewrap
    Run applications in a sandbox using Linux namespaces without root privileges, with user namespacing provided via setuid binary.
  • singularity
    Universal application containers for Linux.
  • Lxroot
    Lxroot is a flexible, lightweight, and safer alternative to chroot and/or Docker for non-root users on Linux.

Partial Access

  • nsenter
    Run program with namespaces of other processes. Part of the util-linux.
  • ip-netns
    Process network namespace management. Part of the iproute2.
  • unshare
    Run program with some namespaces unshared from parent. Part of the util-linux.
  • python-nsenter
    This Python package allows entering Linux kernel namespaces (mount, IPC, net, PID, user and UTS) by doing the "setns" syscall.
  • butter
    Python library to interface to low level linux features (inotify, fanotify, timerfd, signalfd, eventfd, containers) with asyncio support.
  • pyspaces
    Works with Linux namespaces through glibc with pure python.
  • CRIU
    Checkpoint/Restore In Userspace is a software tool for Linux operating system. Using this tool, you can freeze a running application (or part of it) and checkpoint it to a hard drive as a collection of files. CRIU integrated with Docker and LXC to implement Live migration of containers.
  • Moby
    A "Lego set" of toolkit components for containers software created by Docker.

Filesystem

  • container-diff
    A tool for analyzing and comparing container images.
  • buildah
    A tool which facilitates building OCI container images.
  • skopeo
    Work with remote images registries - retrieving information, images, signing content.
  • img
    Standalone, daemon-less, unprivileged Dockerfile and OCI compatible container image builder.
  • dgr
    Command line utility designed to build and to configure at runtime App Containers Images (ACI) and App Container Pods (POD) based on convention over configuration.
  • Whaler
    Whaler is designed to reverse engineer a Docker Image into the Dockerfile that created it.
  • dive
    A tool for exploring each layer in a docker image.
  • go-containerregistry
    Go library and CLIs for working with container registries.
  • kaniko
    Kaniko is a tool to build container images from a Dockerfile, inside a container or Kubernetes cluster.
  • umoci
    Umoci is a tool to manipulate OCI container images, and can be used as a rudimentary build tool.
  • docker pushrm
    A Docker CLI plugin that that lets you push the README.md file from the current directory to a container registry. Supports Docker Hub, Quay and Harbor.

Dashboard

  • LXC-Web-Panel
    Web panel for LXC on Ubuntu.
  • Liman
    Basic docker monitoring web application.
  • portainer
    Lightweight Docker management UI.
  • swarmpit
    Lightweight mobile-friendly Docker Swarm management UI.

Best practices

  • The Twelve-Factor App
    The twelve-factor app is a methodology for building software-as-a-service apps.
  • Container Best Practices
    A collaborative project to document container-based application architecture, creation and management from Project Atomic.

Security

Tools

  • Docker bench security
    The Docker Bench for Security is a script that checks for dozens of common best-practices around deploying Docker containers in production.
  • CoreOS Clair
    Open Source Vulnerability Analysis for your Containers.
  • bane
    Custom AppArmor profile generator for docker containers.
  • OpenSCAP
    The OpenSCAP ecosystem provides multiple tools to assist administrators and auditors with assessment, measurement and enforcement of security baselines.
  • drydock
    Drydock provides a flexible way of assessing the security of your Docker daemon configuration and containers using editable audit templates.
  • trireme
    Security by segmentation for Docker and Kubernetes.
  • goss
    Quick and Easy server testing/validation.
  • sockguard
    A proxy for docker.sock that enforces access control and isolated privileges.
  • gvisor
    gVisor is a user-space kernel, written in Go, that implements a substantial portion of the Linux system surface. It includes an Open Container Initiative (OCI) runtime called runsc that provides an isolation boundary between the application and the host kernel. The runsc runtime integrates with Docker and Kubernetes, making it simple to run sandboxed containers.
  • docker-explorer
    A tool to help forensicate offline docker acquisitions.
  • oci-seccomp-bpf-hook
    OCI hook to trace syscalls and generate a seccomp profile.

Levels of security problems

1) regular application

  • always untrusted -> know it
  • suid bit -> mount with nosuid
  • limit available syscall -> seccomp-bpf, grsec
  • leak to another container (bug in namespaces, filesystem) -> user namespaces with different uid inside for each container: 1000 in container - 14293 and 15398 outside; security modules like selinux or apparmor

2) system services like cron, ssh

  • run as root -> isolate via bastion host or vm
  • using /dev -> "devices" control group
    The following device nodes are created in the container by default.
    The Docker images are also mounted with nodev, which means that even if a device node was pre-created in the image, it could not be used by processes within the container to talk to the kernel.
    /dev/console,/dev/null,/dev/zero,/dev/full,/dev/tty*,/dev/urandom,/dev/random,/dev/fuse
  • root calls -> capabilities (cap_sys_admin warning!)
    Here is the current list of capabilities that Docker uses: chown, dac_override, fowner, kill, setgid, setuid, setpcap, net_bind_service, net_raw, sys_chroot, mknod, setfcap, and audit_write.
    Docker removes several of these capabilities including the following:
    CAP_SETPCAP Modify process capabilities
    CAP_SYS_MODULE Insert/Remove kernel modules
    CAP_SYS_RAWIO Modify Kernel Memory
    CAP_SYS_PACCT Configure process accounting
    CAP_SYS_NICE Modify Priority of processes
    CAP_SYS_RESOURCE Override Resource Limits
    CAP_SYS_TIME Modify the system clock
    CAP_SYS_TTY_CONFIG Configure tty devices
    CAP_AUDIT_WRITE Write the audit log
    CAP_AUDIT_CONTROL Configure Audit Subsystem
    CAP_MAC_OVERRIDE Ignore Kernel MAC Policy
    CAP_MAC_ADMIN Configure MAC Configuration
    CAP_SYSLOG Modify Kernel printk behavior
    CAP_NET_ADMIN Configure the network
    CAP_SYS_ADMIN Catch all
    uses /proc, /sys -> remount ro, drop cap_sys_admin; security modules like selinux or apparmor; some part of this fs are "namespace-aware"
    Docker mounts these file systems into the container as "read-only" mount points.
    . /sys
    . /proc/sys
    . /proc/sysrq-trigger
    . /proc/irq
    . /proc/bus
    Copy-on-write file systems
    Docker uses copy-on-write file systems. This means containers can use the same file system image as the base for the container. When a container writes content to the image, it gets written to a container specific file system. This prevents one container from seeing the changes of another container even if they wrote to the same file system image. Just as important, one container can not change the image content to effect the processes in another container.
  • uid 0 -> user namespaces, uid 0 mappet to random uid outside

3) system services like devices, network, filesystems

  • root -> more of services should work on host outside; isolate sensitive functions, run as non-privileged context
  • full privileges -> isolate on kernel level

4) kernel drivers, network stack, security policies

  • absolute privileges -> run it in separate vm

5) general like immutable infrastructure

  • container is ro
  • write to small separate rw nosuid part

src
src

Technologies for security

Things are better. For example, most modern container technologies can make use of Linux's built-in security tools such as:
AppArmor, SELinux and Seccomp policies;
Grsecurity;
Control groups (cgroups);
Kernel namespaces
src

Sure, you're deploying seccomp, but you can't use selinux inside your container, because the policy isn't per-namespace (?? lxc uses apparmore for each container...)
sVirt - selinux for kvm
src

Major kernel subsystems are not namespaced like:
- SELinux
- Cgroups
- file systems under /sys
- /proc/sys, /proc/sysrq-trigger, /proc/irq, /proc/bus

Devices are not namespaced:
- /dev/mem
- /dev/sd* file system devices
- kernel modules

If you can communicate or attack one of these as a privileged process, you can own the system.
src

Another Information Sources

  • sysdig-container-ecosystem
    The ecosystem of awesome new technologies emerging around containers and microservices can be a little overwhelming, to say the least. We thought we might be able to help: welcome to the Container Ecosystem Project.
  • doger.io
    This page is an attempt to document the ins and outs of containers on Linux. This is not just restricted to programmers looking to implement containers or use container like features in their own code but also Sysadmins and Users who want to get more of a handle on how containers work 'under the hood'.