This article is a part of a series on the fundamentals of Kubernetes – for those that could benefit from a basic explanation of Kubernetes concepts.
In order understand the problem Kubernetes solves, it is necessary to understand what a container is. After all, Kubernetes is a container orchestration system.
Software Development Before Containers
Before we get into talking about containers, let’s look at some of the challenges in software development and deployment that existed before the widespread adoption of containers.
1. Differences in Environments
Applications often have complex dependencies on specific libraries, runtime environments and system configurations. During development, these dependencies are usually setup on the developer’s local environment (i.e., laptop) so that the developer can try the application and perform some basic tests.
However, a difficult challenge arises once the application is ready to be shipped to the servers that will run in the “production” environment – the environment where real customer traffic and business logic is being processed:
All of the servers in the production environment (there can be many) should have the same set of these dependencies as on the developer’s laptop.
Solving this problem is tricky because dependencies and runtimes are often installed with different methods, depending on the operating system.
For example, let’s assume a Python application is developed on a MacBook Pro (running the OSX operating system) while the production servers are running the Linux Ubuntu operating system.
On a MacBook, the developer may use the Homebrew tool to install Python (i.e., the “runtime”) :
>brew install python
>python --version
Python 3.11.4
On the Ubuntu Linux system, the command might look like this:
>sudo apt update
>sudo apt install python3
>python3 --version
Python 3.8.10
As you can see, not only do the installation tools differ, but the versions of python may not match exactly. These small differences in runtimes and libraries can sometimes cause unexpected bugs and problems to appear in production.
This source of friction has often lead to the “it works on my machine” argument, since the application may have worked just fine on the developer’s laptop, with the problems first appearing in the production environment.
2. Virtual Machines
An earlier alternative to using containers was to run one application per virtual machine (VM), but this wasted a lot of resources (CPU and memory) since a VM runs not only your application, but also a full operating system that needs resources too. This overhead limited the number of VMs that could run on a single physical host, and was much more expensive to scale. This inefficiency is what people mean when they say VMs are “heavy” compared to “light” containers.
3. Deployment Complexity
As previously mentioned, deploying an application involves configuring the operating system, installing or updating dependencies, and managing runtime environments. While some teams achieved semi-automation through configuration management tools and scripts, the process was still error-prone, time-consuming, and required a deep understanding of the application’s requirements.
4. Scaling Challenges
Scaling up applications to handle higher traffic loads often required manual intervention and long wait times as server instances were added and the application was prepared and installed on each new server – not to mention requiring specialized knowledge about networks and load balancers.
5. Version Control and Rollbacks
Managing different versions of an application and performing rollbacks was challenging. This usually involved a multi-step process as the old versions of the application were replaced on each server. Dependencies may also need updating if they differ between versions. This is especially a challenge on servers that are reused to host different versions of the application – the process of switching application versions could result in downtime if not done carefully.
6. Interpersonal Challenges
These technical complexities are exacerbated by conflicts of interest inherent to each role, and if not kept in check, negatively impact morale and cooperation among employees.
Whereas the development team is primarily concerned with programming the features requested by the business, the operations team is responsible for the performance and stability of the application in the production environment.
For the operations team, preserving the Service Level Agreement (SLA) is a core concern. Any change to the production environment comes with an increased risk that something will break, requiring time-consuming investigations and post-incident reports. Furthermore, the operations team often lacks detailed knowledge of the intricacies of the various applications they are tasked with deploying. A common analogy is that developers would throw the application over the fence to the operations team, leaving them with the burden of publishing the new version of the application without causing downtime.
For the development team, the goal is to make new features available to use as quickly as possible, satisfying the business goals. Any delays could result in angry emails from management and negatively impact performance evaluations of the team.
Conflicts of interest still exist in today’s world, however, containers and improved automation have reduced the friction of deploying applications, making it a less painful experience for all involved.
As an aside, you might have heard of the DevOps role. This role came about to bridge the gap between software development (Dev) and IT operations (Ops) – addressing these kinds of problems.
Containers – The Game Changer
In the context of Kubernetes and Docker, a container refers to a lightweight and portable software package that contains everything needed to run a piece of software, including the code, runtime, system tools, system libraries, and settings. Containers are isolated from each other and from the host system, which makes them a reliable and consistent environment for deploying and running applications.
Key Benefits of Containers
- They bundle all dependencies and configuration along with the application in a single container “image”
- Container images are portable and can run on any host system that has a container runtime engine
- Multiple “copies” of the container can be started based on a single container image
- Each container has its own state and lifecycle
- Multiple containers – even with completely different applications – can run on a single host system without risk of dependency conflicts
- Containers can be stopped, started, and replaced without requiring changes to the host system, making deployments and rollbacks easier
Container Development Lifecycle
Container images can be built using a variety of open-source and commercial tools. One of the more popular toolset for building and running containers is Docker.
Docker, or other tool, takes an instruction file provided by the developer, and creates a container image layer-for-layer, based on the instructions provided. The starting point for any container image is another container image – a “base image” – which contains the desired operating system and runtime combination. Very rarely do people build the base image from scratch; more on that in a bit. This highlights another excellent benefit of containers – the ability to share container images and extend them as needed!
The created layers are combined to create the final container image, which can then be launched by the container runtime engine.
There are numerous public “libraries” – called registries – with thousands of pre-built container images having various combinations of operating systems and runtime environments. A popular public image registry is https://hub.docker.com/ and is managed by the company behind Docker. You simple search for the flavor of operating system, runtime environment (php, java, golang), or even container images with pre-installed, ready-to-run software. Need a web server or a database? There are plenty of pre-built containers to choose from, no coding required!
Each container image is given a name and a version id – known as a tag. The image tag provides an identifiable version for container images that are updated over time.
A Word of Caution
As convenient as publicly shared container images are, they should also be used with care. A malicious actor could build a container image to hide bad software (malware) and make the image public under a false name and description. Examples of malware found in container images include software that does cryptocurrency mining or scanning for other vulnerable systems inside the host system’s network.
To mitigate this risk, reputable image registries (including Dockerhub) regularly scan for malware hidden in container images using special detection software.
It is also wise to only use images provided by a reputable company or organization. Many container registries provide a kind of verification “badge” to more easily identify a reputable image source.
Lastly, since container images contain operating system libraries and other software, they are also vulnerable to security exploits and bugs. Running up-to-date versions of images is the best defense to protect against known software vulnerabilities.
A Brief Overview of How Containers Work
Containers effectively virtualize the host operating system (or kernel) and isolate an application’s dependencies from other containers running on the same machine. This means the application’s code and configuration within the container are not visible to other containers, allowing each container can have its own unique runtime environment.
Containers solve this problem with two pieces: a container engine and a container image. The container engine runs application containers derived from container images, taking advantage of the host operating system’s features that isolate cpu, memory, and network resources between applications running on the same host machine.
As previously mentioned, virtual machines (VM) were also used to make deploying and scaling applications easier, however they have the disadvantage of requiring a complete operating system and a “heavy” virtualization abstraction layer. With containers, a VM containing a separate operating system for each application is not longer needed, more applications running in containers can be packed onto a single host machine, reducing the overall number of host machines required and translating to lower costs.
This article was an overview of the problems containers solve and an explanation of why they are so widely used today. Now you might be wondering, if containers solve so many of these problems, what exactly does Kubernetes do? That is a topic for another article, but to summarize: Kubernetes manages a fleet of hosts systems, making sure containers are spread across the infrastructure fairly and manages the lifecycle of containers. Without orchestration software like Kubernetes, this would be an extremely tedious thing to do.
By the way, container technology is not exclusive to Kubernetes. There are many tools and systems that can build and run containers: Docker, AWS Elastic Container Service, containerd just to name a few. Kubernetes is however one of the more popular and widespread solutions.
Please also read my introductory article on Kubernetes.