Availability Disclaimer

This article can be found on other sources:


Kubernetes Logo

You probably already heard of Kubernetes, a powerful orchestrator that will ease deployment and automatically manage your applications on a set of machines, called a Cluster.

With great power comes great complexity, even in the eyes of Google. Thus, learning Kubernetes is oftentimes considered as cumbersome and complex, namely because of the number of new concepts you have to learn. On the other hand, those very same concepts can be found in other orchestrators. As a result, mastering them will ease your onboarding on other orchestrators, such as Docker Swarm.

The aim of this article is to explain the most used concepts of Kubernetes relying on basic system administration concepts, then use some of these to deploy a simple web server and showcase the interactions between the different resources. Lastly, I will lay out the usual CLI interactions while working with Kubernetes.

This article mainly focuses on the developer side of a Kubernetes cluster, but I will leave some resources about cluster administration at the end.

Terminology and concepts

Architecture

The Kubernetes realm is the cluster, everything needed is contained within this cluster. Inside it, you will find two types of nodes: the Control Plane and the Worker Nodes.

The control plane is a centralized set of processes that manages the cluster resources, load balance, health, and more. A Kubernetes cluster usually has multiple controller nodes for availability and load balancing purposes. As a developer, you will most likely interact through the API gateway for interactions.

The worker node is any kind of host running a local Kubernetes agent Kubelet and a communication process Kube-Proxy. The former handles the operations commanded by the control plane on the local container runtime (e.g. docker), while the latter redirects connectivity to the right pods.

Kubernetes Architecture

Namespaces

After some time, a Kubernetes cluster may become huge and heavily used. In order to keep things well organized, Kubernetes created the concept of Namespace. A namespace is basically a virtual cluster inside the actual cluster.

Most of the resources will be contained inside a namespace, thus unaware of resources from other namespaces. Only a few kinds of resources are completely agnostic of namespaces, and they define computational power or storage sources (i.e. Nodes and PersistentVolumes). However, access to those can be limited by namespace using Quotas.

Namespace-aware resources will always be contained in a namespace as Kubernetes creates and uses a namespace named default if nothing is specified.

Namespace Organization

There is no silver bullet on the way to use namespaces, as it widely depends on your organization and needs. However, we can note some usual namespaces usages:

  1. Divide the cluster by team or project, to avoid naming conflict and help repartition of resources.
  2. Divide the cluster by environment (i.e. dev, staging, prod), to keep a consistent architecture.
  3. Deploy with more granularity (e.g. blue/green deployment), to quickly fall back on an untouched working environment in case of issue.

Further reading:

Namespace Documentation

Manage The Cluster Namespaces

Glossary

Kubernetes did a great work of remaining agnostic of any technology in their design. This means two things: handle multiple technologies under the hood and there is a whole new terminology to learn.

Fortunately, these concepts are pretty straightforward and can most of the time be compared to a unit element of classic system infrastructure. The table below will summarize the binding of the most basic concepts. The comparison might not be a hundred per cent accurate but rather here to help understand the need behind each concept.

Abstraction Layer Physical Layer Uses Namespace Description
Pod Container A Pod is the minimal work unit of Kubernetes, it is generally equivalent to one applicative container but it can be composed of multiple ones.
Replicaset Load Balancing A ReplicaSet keeps track of and maintain the amount of instances expected and running for a given pod.
Deployment - A Deployment keeps track of and maintain the required configuration for a pod and replicaset.
StatefulSet - A StatefulSet is a Deployment with insurance on the start order and volume binding, to keep state consistent in time.
Node Host A Node can be a physical or virtual machine that is ready to host pods.
Service Network A Service will define an entrypoint to a set of pods semantically tied together.
Ingress Reverse Proxy An Ingress publishes Services outside the Cluster.
Cluster Datacenter A Cluster is the set of available nodes, including the Kubernetes controllers.
Namespace - A Namespace defines an isolated pseudo cluster in the current cluster.
StorageClass Disk A StorageClass configures filesystems sources that can be used to dynamically create PersistentVolumes.
PersistentVolume Disk Partition A PersistentVolume describe any kind of filesystem ready to be mounted on a pod.
PersistentVolumeClaim - A PersistentVolumeClaim binds a PersistentVolume to a pod, which can then actively use it while running.
ConfigMap Environment Variables A ConfigMap defines widely accessible properties.
Secret Secured Env. Var. A Secret defines widely accessible properties with potential encryption and access limitations.

Further reading:

Official Kubernetes Glossary

Official Concepts Documentation

Definition files

The resources in Kubernetes are created in a declarative fashion, and while it is possible to configure your application deployment through the command line, a good practice is to keep track of the resource definitions in a versioned environment. Sometimes named GitOps, this practice is not only applicable for Kubernetes but widely applied for delivery systems, backed up by the DevOps movement.

To this effect, Kubernetes proposes a YAML representation of the resource declaration, and its structure can be summarized as follow:

Field File type Content
apiVersion All files Version to use while parsing the file.
kind All files Type of resource that the file is describing.
metadata All files Resource identification and labeling.
data Data centric files (Secret, ConfigMap) Content entry point for data mapping.
spec Most files (Pod, Deployment, Ingress, ...) Content entry point for resource configuration.

Watch out: some resources such as StorageClass do no use a single entry point as described above

Further reading:

Guide on apiVersion

Yaml Specifications

Metadata and labels

The metadata entry is critical while creating any resource as it will enable Kubernetes and yourself to easily identify and select the resource.

In this entry, you will define a name and a namespace (defaults to default), thanks to which the control plane will automatically be able to tell if the file is a new addition to the cluster or the revision of a previously loaded file.

On top of those elements, you can define a labels section. It is composed of a set of key-value pairs to narrow down the context and content of your resource. Those labels can later be used in almost any CLI commands through Selectors. As those entries are not used in the core behavior of Kubernetes, you can use any name you want, even if Kubernetes defines some best practices recommendations.

Finally, you can also create an annotations section, which is almost identical to labels but not used by Kubernetes at all. Those can be used on the applicative side to trigger behaviors or simply add data to ease debugging.

 1# <metadata> narrows down selection and identify the resource
 2metadata:
 3  # The <name> entry is required and used to identify the resource
 4  name: my-resource
 5  namespace: my-namespace-or-default
 6  # <labels> is optional but often needed for resource selection
 7  labels:
 8    app: application-name
 9    category: back
10  # <annotations> is optional and not needed for the configuration of Kubernetes
11  annotations:
12    version: 4.2

Further reading:

Naming and Identification

Labels and Selectors

Annotations

Data centric configuration files

Those files define key-value mappings that can be used later in other resources. Usually, those resources (i.e. Secrets and ConfigMap) are loaded before anything else, as it is more likely than not that your infrastructure files are dependent on them.

 1apiVersion: v1
 2# <kind> defines the resource described in this file
 3kind: ConfigMap
 4metadata:
 5  name: my-config
 6data:
 7  # <data> configures data to load
 8  configuration_key: "configuration_value"
 9  properties_entry: |
10    # Any multiline content is accepted
11    multiline_config=true

Infrastructure centric configuration files

Those files define the infrastructure to deploy on the cluster, potentially using content from the data files.

1apiVersion: v1
2# <kind> defines the resource described in this file
3kind: Pod
4metadata:
5  name: my-web-server
6spec:
7  # <spec> is a domain specific description of the resource.
8  # The specification entries will be very different from one kind to another

Resources definition

In this section, we will take a closer look at the configuration of the most used resources on a Kubernetes application. This is also the occasion to showcase the interactions between resources.

At the end of the section, we will have a running Nginx server and will be able to contact the server from outside the cluster. The following diagram summarizes the intended state:

Intended Deployment

ConfigMap

ConfigMap is used to hold properties that can be used later in your resources.

1apiVersion: v1
2kind: ConfigMap
3metadata:
4  name: simple-web-config
5  namespace: default
6data:
7  configuration_key: "Configuration value"

The configuration defined above can then be selected from another resource definition with the following snippet:

1valueFrom:
2  configMapKeyRef:
3    name: simple-web-config
4    key: configuration_key

Note: ConfigMaps are only available in the namespace in which they are defined.

Further reading:

ConfigMap Documentation

Secret

All sensitive data should be put in Secret files (e.g. API keys, passphrases, …). By default, the data is simply held as base64 encoded values without encryption. However, Kubernetes proposes ways of mitigating leakage risks by integrating a Role-Based Access Control or encrypting secrets.

The Secret file defines a type key at its root, which can be used to add validation on the keys declared in the data entry. By default, the type is set to Opaque which does not validate the entries at all.

1apiVersion: v1
2kind: Secret
3metadata:
4  name: simple-web-secrets
5# Opaque <type> can hold generic secrets, so no validation will be done.
6type: Opaque
7data:
8  # Secrets should be encoded in base64
9  secret_configuration_key: "c2VjcmV0IHZhbHVl"

The secret defined above can then be selected from another resource definition with the following snippet:

1valueFrom:
2  secretKeyRef:
3    name: simple-web-secrets
4    key: secret_configuration_key

Note: Secrets are only available in the namespace in which they are defined.

Further reading:

Secrets Documentation

Available Secret Types

Pod

A Pod definition file is pretty straightforward but can become pretty big due to the quantity of configuration available. The name and image fields are the only mandatory ones, but you might commonly use:

  • ports to define the ports to open on both the container and pod.
  • env to define the environment variables to load on the container.
  • args and entrypoint to customize the container startup sequence.

Pods are usually not created as standalone resources on Kubernetes, as the best practice indicates to use pod as part of higher level definition (e.g. Deployment). In those cases, the Pod file's content will simply be embedded in the other resource's file.

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: my-web-server
 5spec:
 6  # <containers> is a list of container definition to embed in the pod
 7  containers:
 8    - name: web
 9      image: nginx
10      ports:
11        - name: web
12          containerPort: 80
13          protocol: TCP
14      env:
15        - name: SOME_CONFIG
16          # Create a line "value: <config_entry>" from the ConfigMap data
17          valueFrom:
18            configMapKeyRef:
19              name: simple-web-config
20              key: configuration_key
21        - name: SOME_SECRET
22          # Create a line "value: <config_entry>" from the Secret data
23          valueFrom:
24            secretKeyRef:
25              name: simple-web-secrets
26              key: secret_configuration_key

Note: Pods are only available in the namespace in which they are defined.

Further reading:

Pod Documentation

Advanced Pod Configuration

Fields available in Pod <spec> entry

Fields available in Pod <containers> entry

Deployment

The Deployment is generally used as the atomic working unit since it will automatically:

  • Create a pod definition based on the template entry.
  • Create a ReplicaSet on pods selected by the selector entry, with the value of replicas as a count of pods that should be running.

The following file requests 3 instances of an Nginx server running at all times. The file may look a bit heavy, but most of it is the Pod definition copied from above.

 1apiVersion: apps/v1
 2kind: Deployment
 3metadata:
 4  name: my-web-server-deployment
 5  namespace: default
 6  labels:
 7    app: webserver
 8spec:
 9  # <selector> should retrieve the Pod defined below, and possibly more
10  selector:
11    matchLabels:
12      app: webserver
13      instance: nginx-ws-deployment
14  # <replicas> asks for 3 pods running in parallel at all time
15  replicas: 3
16  # The content of <template> is a Pod definition file, without <apiVersion> nor <kind>
17  template:
18    metadata:
19      name: my-web-server
20      namespace: default
21      labels:
22        app: webserver
23        instance: nginx-ws-deployment
24    spec:
25      containers:
26        - name: web
27          image: nginx
28          ports:
29            - name: web
30              containerPort: 80
31              protocol: TCP
32          env:
33            - name: SOME_CONFIG
34              # Create a line "value: <config_entry>" from the ConfigMap data
35              valueFrom:
36                configMapKeyRef:
37                  name: simple-web-config
38                  key: configuration_key
39            - name: SOME_SECRET
40              # Create a line "value: <config_entry>" from the Secret data
41              valueFrom:
42                secretKeyRef:
43                  name: simple-web-secrets
44                  key: secret_configuration_key

Note: Deployments are only available in the namespace in which they are defined.

Further reading:

Deployment Documentation

Service

A pod might be deleted and recreated at any time. When it occurs the pod's IP address will change, which could result in a loss of connection if you are directly contacting it. To solve this issue, a Service provides a stable contact point to a set of Pods, while remaining agnostic of their state and configuration. Usually, Pods are chosen to be part of a Service through a selector entry, thus based on its labels. A Pod is selected if and only if all the labels in the selector are worn by the pod.

There are three types of services that are acting quite differently, among which you can select using the type entry.

The ClusterIP service is bound to an internal IP from the cluster, hence only internally reachable. This is the type of service created by default and is suitable for binding different applications inside the same cluster.

A NodePort service will bind a port (by default in range 30000 to 32767) on the nodes hosting the selected pods. This enables you to contact the service directly through the node IP. That also means that your service will be as accessible as the virtual or physical machines hosting those pods.

Note: Using NodePort can pose security risks, as it enables a direct connection from outside the cluster.

A LoadBalancer service will automatically create a load balancer instance from the cloud service provider on which the cluster is running. This load balancer is created outside the cluster but will automatically be bound to the nodes hosting the selected pods.

This is an easy way to expose your service but can end up being costly as each service will be managed by a single load balancer.

If you are setting up your own Ingress as we will do here, you may want to use a ClusterIp service, as other services are made for specific use cases.

 1apiVersion: v1
 2kind: Service
 3metadata:
 4  name: simple-web-service-clusterip
 5spec:
 6  # ClusterIP is the default service <type>
 7  type: ClusterIP
 8  # Select all pods declaring a <label> entry "app: webserver"
 9  selector:
10    app: webserver
11  ports:
12    - name: http
13      protocol: TCP
14      # <port> is the port to bind on the service side
15      port: 80
16      # <targetPort> is the port to bind on the Pod side
17      targetPort: 80

Note: Services are defined in a namespace but can be contacted from other namespaces.

Further reading:

Service Documentation

In Depth Service Comparison

Create an External Load Balancer

Ingress

Ingress enables you to publish internal services without necessarily using a load balancer from cloud service providers. You usually need only one ingress per namespace, where you can bind as many routing rules and backends as you want. A backend will typically be an internally routed ClusterIP service.

Please note that Kubernetes does not handle ingress resources by itself and relies on third-party implementations. As a result, you will have to choose and install an Ingress Controller before using any ingress resource. On the other hand, it makes the ingress resource customizable depending on the needs of your cluster.

 1apiVersion: networking.k8s.io/v1
 2kind: Ingress
 3metadata:
 4  name: simple-web-ingress
 5  annotations:
 6    nginx.ingress.kubernetes.io/rewrite-target: /
 7spec:
 8  rules:
 9    # Using <host> redirects all request matching the given DNS name to this rule
10    - host: "*.minikube.internal"
11      http:
12        paths:
13          - path: /welcome
14            pathType: Prefix
15            backend:
16              service:
17                name: simple-web-service-clusterip
18                port:
19                  number: 80
20    # All other requests will be redirected through this rule
21    - http:
22        paths:
23          - path: /
24            pathType: Prefix
25            backend:
26              service:
27                name: simple-web-service-clusterip
28                port:
29                  number: 80

Note: Ingresses are defined in the namespace but may contact services from other namespaces and are publicly accessible outside the cluster.

Further reading:

Ingress Documentation

Available Ingress Controllers

Enable Ingress on Minikube

Nginx Ingress Annotations

CLI Usage

Create and manage resources

This section showcases the basic CLI commands to manipulate resources. As said before, while it is possible to manually manage resources, a better practice is to use files.

1# <kind> is the type of resource to create (e.g. deployment, secret, namespace, quota, ...)
2$ kubectl create <kind> <name>
3$ kubectl edit   <kind> <name>
4$ kubectl delete <kind> <name>
5
6# All those commands can be used through a description file.
7$ kubectl create -f <resource>.yaml
8$ kubectl edit   -f <resource>.yaml
9$ kubectl delete -f <resource>.yaml

To ease resources manipulations through files, you can reduce the interactions to the CLI to the two following commands:

1# Create and update any resource
2$ kubectl apply   -f <resource>.yaml
3# Delete any resource
4$ kubectl delete  -f <resource>.yaml

Further reading:

Managing Resources

Monitor and Debug

Fetch resources

You can see all resources running through the CLI using kubectl get <kind>. This command is pretty powerful and lets you filter the kind of resources to display or select the resources you want to see.

Note: if not specified, Kubernetes will work on the default namespace. You can specify -n <namespace> to work on a specific namespace or -A to show every namespace.

 1# Fetch everything
 2$ kubectl get all
 3NAME                                            READY   STATUS    RESTARTS   AGE
 4pod/my-web-server-deployment-58c4fd887f-5vm2b   1/1     Running   0          128m
 5pod/my-web-server-deployment-58c4fd887f-gq6lr   1/1     Running   0          128m
 6pod/my-web-server-deployment-58c4fd887f-gs6qb   1/1     Running   0          128m
 7
 8NAME                                   TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
 9service/simple-web-service-clusterip   ClusterIP      10.96.96.241     <none>        80/TCP,443/TCP               60m
10service/simple-web-service-lb          LoadBalancer   10.108.182.232   <pending>     80:31095/TCP,443:31940/TCP   60m
11service/simple-web-service-np          NodePort       10.101.77.203    <none>        80:31899/TCP,443:31522/TCP   60m
12
13NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE
14deployment.apps/my-web-server-deployment   3/3     3            3           136m
15
16NAME                                                  DESIRED   CURRENT   READY   AGE
17replicaset.apps/my-web-server-deployment-58c4fd887f   3         3         3       128m
18
19# We can ask for more details
20$ kubectl get deployment -o wide
21NAME                       READY   UP-TO-DATE   AVAILABLE   AGE    CONTAINERS   IMAGES  SELECTOR
22my-web-server-deployment   3/3     3            3           121m   web          nginx   app=webserver
23
24# Some resources are not visible using "all" but available
25$ kubectl get configmap
26NAME                DATA   AGE
27kube-root-ca.crt    1      38d
28simple-web-config   3      3h17m

Dig into a particular resource

This section will show you how to dig into resources. Most of the required day-to-day operations are doable through the three following commands.

The first command will give you the resource's complete configuration, using kubectl describe <kind>/<name>.

 1# Let's describe the ingress for the sake of example
 2$ kubectl describe ingress/simple-web-ingress
 3Name:             simple-web-ingress
 4Namespace:        default
 5Address:          192.168.64.2
 6Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
 7Rules:
 8  Host                 Path  Backends
 9  ----                 ----  --------
10  *.minikube.internal
11                       /welcome   simple-web-service-clusterip:80 (172.17.0.4:80,172.17.0.5:80,172.17.0.6:80 + 1 more...)
12  *
13                       /   simple-web-service-clusterip:80 (172.17.0.4:80,172.17.0.5:80,172.17.0.6:80 + 1 more...)
14Annotations:           nginx.ingress.kubernetes.io/rewrite-target: /
15Events:
16  Type    Reason  Age                 From                      Message
17  ----    ------  ----                ----                      -------
18  Normal  UPDATE  7m6s (x6 over 23h)  nginx-ingress-controller  Ingress default/simple-web-ingress

Another important command is kubectl logs <kind>/<name>, as you might expect it shows you the resources' logs if applicable. As the logs are produced by Pods, running such a command on a resource above a Pod will dig through Kubernetes to display the logs of a randomly chosen Pod underneath it.

1$ kubectl logs deployments/my-web-server-deployment
2Found 3 pods, using pod/my-web-server-deployment-755b499f77-4n5vn
3# [logs]

Finally, it is sometimes useful to connect on a pod, you can do so with the command kubectl exec -it <pod_name> -- /bin/bash. This will open an interactive shell on the pod, enabling you to interact with its content.

1# As for logs, when called on any resource enclosing Pods,
2# Kubernetes will randomly chose one to  execute the action
3$ kubectl exec -it deployment/my-web-server-deployment -- /bin/bash
4root@my-web-server-deployment-56c4554cf9-qwtm6:/# ls
5# [...]

Conclusion

During this article, we saw the fundamentals behind deploying and publishing stateless services using Kubernetes. But you can do a lot more complex things with Kubernetes. If you want to learn more about it, I can recommend you to look at these resources:

Incidentally, there are multiple subjects I could not deeply talk about in this article and that may be of interest.

On the developer side:

On the cluster administrator side:

Furthermore, if you are interested in the ecosystem around Kubernetes, you may want to take a look at the following technologies:

  • Openshift is wrapping Kubernetes with production friendly features.
  • Helm is a charts manager for Kubernetes helping improve re-usability of configuration files.
  • ArgoCD is keeping your Kubernetes Cluster up to date with your configurations from Git.

Appendix

Resources' repository

The resources definitions used in this article are available in the following GitHub repository.

CLI equivalents - Docker and Kubernetes

Managing containers with Docker and pods with Kubernetes is very similar, as you can see on the following table describing equivalent operations between both technologies.

Operation Docker Kubernetes
Running containers docker ps kubectl get pods
Configuration details docker inspect <name> kubectl describe <name>
Show logs docker logs <name> kubectl logs <name>
Enter container docker exec -it <name> /bin/bash kubectl exec -it <name> -- /bin/bash

Thanks to Sarra Habchi, Dimitri Delabroye, and Alexis Geffroy for the reviews