Kubernetes


Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management.
It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation. It aims to provide a "platform for automating deployment, scaling, and operations of application containers across clusters of hosts". It works with a range of container tools, including Docker.
Many cloud services offer a Kubernetes-based platform or infrastructure as a service on which Kubernetes can be deployed as a platform-providing service. Many vendors also provide their own branded Kubernetes distributions.

History

Kubernetes was founded by Joe Beda, Brendan Burns, and Craig McLuckie, who were quickly joined by other Google engineers including Brian Grant and Tim Hockin, and was first announced by Google in mid-2014. Its development and design are heavily influenced by Google's Borg system, and many of the top contributors to the project previously worked on Borg. The original codename for Kubernetes within Google was Project 7, a reference the Star Trek ex-Borg character Seven of Nine. The seven spokes on the wheel of the Kubernetes logo are a reference to that codename. The original Borg project was written entirely in C++, but the rewritten Kubernetes system is implemented in Go.
Kubernetes v1.0 was released on July 21, 2015. Along with the Kubernetes v1.0 release, Google partnered with the Linux Foundation to form the Cloud Native Computing Foundation and offered Kubernetes as a seed technology. On March 6, 2018, Kubernetes Project reached ninth place in commits at GitHub, and second place in authors and issues to the Linux kernel.

Release Versions

Support

Kubernetes follows an N-2 support policy.
This generally results in a particular minor version being supported for ~9 months; as illustrated by the chart below

ImageSize = width:1000 height:auto barincrement:35
PlotArea = left:100 right:50 bottom:30 top:10
DateFormat = dd/mm/yyyy
Period = from:01/01/2018 till:01/01/2022
TimeAxis = orientation:horizontal
ScaleMajor = unit:year increment:1 start:2018
ScaleMinor = unit:month increment:1 start:01/01/2018
Define $dx = 25 # shift text to right side of bar
Colors =
id:out_of_support value:gray legend:Out_of_support
id:in_support value:green legend:In_support
id:pre_release value:gray legend:Pre_release
PlotData=
mark:
fontsize:S
bar:1.19.x from:04/08/2020 till:30/04/2021 text:1.19.x color:pre_release
bar:1.18.x from:25/03/2020 till:30/01/2021 text:1.18.x color:in_support
bar:1.17.x from:09/12/2019 till:30/10/2020 text:1.17.x color:in_support
bar:1.16.x from:18/09/2019 till:30/07/2020 text:1.16.x color:in_support
bar:1.15.x from:19/06/2019 till:23/03/2020 text:1.15.x color:out_of_support
bar:1.14.x from:25/03/2019 till:09/12/2019 text:1.14.x color:out_of_support
bar:1.13.x from:03/12/2018 till:18/09/2019 text:1.13.x color:out_of_support
bar:1.12.x from:27/09/2018 till:19/06/2019 text:1.12.x color:out_of_support
bar:1.11.x from:27/06/2018 till:25/03/2019 text:1.11.x color:out_of_support
bar:1.10.x from:26/03/2018 till:03/12/2018 text:1.10.x color:out_of_support

Kubernetes Objects

Kubernetes defines a set of building blocks, which collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by internal components as well as extensions and containers that run on Kubernetes. The platform exerts its control over compute and storage resources by defining resources as Objects, which can then be managed as such. The key objects are:

Pods

A pod is a higher level of abstraction grouping containerized components. A pod consists of one or more containers that are guaranteed to be co-located on the host machine and can share resources.. The basic scheduling unit in Kubernetes is a pod.
Each pod in Kubernetes is assigned a unique Pod IP address within the cluster, which allows applications to use ports without the risk of conflict. Within the pod, all containers can reference each other on localhost, but a container within one pod has no way of directly addressing another container within another pod; for that, it has to use the Pod IP Address. An application developer should never use the Pod IP Address though, to reference / invoke a capability in another pod, as Pod IP addresses are ephemeral - the specific pod that they are referencing may be assigned to another Pod IP address on restart. Instead, they should use a reference to a Service, which holds a reference to the target pod at the specific Pod IP Address.
A pod can define a volume, such as a local disk directory or a network disk, and expose it to the containers in the pod. Pods can be managed manually through the Kubernetes API, or their management can be delegated to a controller. Such volumes are also the basis for the Kubernetes features of ConfigMaps and Secrets.

ReplicaSets

A ReplicaSet’s purpose is to maintain a stable set of replica Pods running at any given time. As such, it is often used to guarantee the availability of a specified number of identical Pods.
The ReplicaSets can also be said to be a grouping mechanism that lets Kubernetes maintain the number of instances that have been declared for a given pod. The definition of a Replica Set uses a selector, whose evaluation will result in identifying all pods that are associated with it.

Services

A Kubernetes service is a set of pods that work together, such as one tier of a multi-tier application. The set of pods that constitute a service are defined by a label selector. Kubernetes provides two modes of service discovery, using environmental variables or using Kubernetes DNS. Service discovery assigns a stable IP address and DNS name to the service, and load balances traffic in a round-robin manner to network connections of that IP address among the pods matching the selector. By default a service is exposed inside a cluster, but a service can also be exposed outside a cluster.

Volumes

Filesystems in the Kubernetes container provide ephemeral storage, by default. This means that a restart of the pod will wipe out any data on such containers, and therefore, this form of storage is quite limiting in anything but trivial applications. A Kubernetes Volume provides persistent storage that exists for the lifetime of the pod itself. This storage can also be used as shared disk space for containers within the pod. Volumes are mounted at specific mount points within the container, which are defined by the pod configuration, and cannot mount onto other volumes or link to other volumes. The same volume can be mounted at different points in the filesystem tree by different containers.

Namespaces

Kubernetes provides a partitioning of the resources it manages into non-overlapping sets called namespaces. They are intended for use in environments with many users spread across multiple teams, or projects, or even separating environments like development, test, and production.

ConfigMaps and Secrets

A common application challenge is deciding where to store and manage configuration information, some of which may contain sensitive data. Configuration data can be anything as fine-grained as individual properties or coarse-grained information like entire configuration files or JSON / XML documents. Kubernetes provides two closely related mechanisms to deal with this need: "configmaps" and "secrets", both of which allow for configuration changes to be made without requiring an application build. The data from configmaps and secrets will be made available to every single instance of the application to which these objects have been bound via the deployment. A secret and / or a configmap is only sent to a node if a pod on that node requires it. Kubernetes will keep it in memory on that node. Once the pod that depends on the secret or configmap is deleted, the in-memory copy of all bound secrets and configmaps are deleted as well. The data is accessible to the pod through one of two ways: a) as environment variables available on the container filesystem that is visible only from within the pod.
The data itself is stored on the master which is a highly secured machine which nobody should have login access to. The biggest difference between a secret and a configmap is that the content of the data in a secret is base64 encoded.

StatefulSets

It is very easy to address the scaling of stateless applications: one simply adds more running pods—which is something that Kubernetes does very well. Stateful workloads are much harder, because the state needs to be preserved if a pod is restarted, and if the application is scaled up or down, then the state may need to be redistributed. Databases are an example of stateful workloads. When run in high-availability mode, many databases come with the notion of a primary instance and a secondary instance. In this case, the notion of ordering of instances is important. Other applications like Kafka distribute the data amongst their brokers—so one broker is not the same as another. In this case, the notion of instance uniqueness is important. StatefulSets are controllers that are provided by Kubernetes that enforce the properties of uniqueness and ordering amongst instances of a pod and can be used to run stateful applications.

DaemonSets

Normally, the location where pods are run are determined by the algorithm implemented in the Kubernetes Scheduler. For some use cases, though, there could be a need to run a pod on every single node in the cluster. This is useful for use cases like log collection, and storage services. The ability to do this kind of pod scheduling is implemented by the feature called DaemonSets.

Secrets

Secrets contain the ssh keys, passwords and OAuth tokens for the pod.

Managing Kubernetes objects

Kubernetes provides some mechanisms that allow one to manage, select, or manipulate its objects.

Labels and selectors

Kubernetes enables clients to attach keys called "labels" to any API object in the system, such as pods and [|nodes]. Correspondingly, "label selectors" are queries against labels that resolve to matching objects. When a service is defined, one can define the label selectors that will be used by the service router / load balancer to select the pod instances that the traffic will be routed to. Thus, simply changing the labels of the pods or changing the label selectors on the service can be used to control which pods get traffic and which don't, which can be used to support various deployment patterns like blue-green deployments or A-B testing. This capability to dynamically control how services utilize implementing resources provides a loose coupling within the infrastructure.
For example, if an application's pods have labels for a system tier and a release_track, then an operation on all of back-end and canary nodes can use a label selector, such as:
tier=back-end AND release_track=canary

Field selectors

Just like labels, field selectors also let one select Kubernetes resources. Unlike labels, the selection is based on the attribute values inherent to the resource being selected, rather than user-defined categorization. metadata.name and metadata.namespace are field selectors that will be present on all Kubernetes objects. Other selectors that can be used depend on the object/resource type.

Replication Controllers and Deployments

A [|ReplicaSet] declares the number of instances of a pod that is needed, and a Replication Controller manages the system so that the number of healthy pods that are running matches the number of pods declared in the ReplicaSet.
Deployments are a higher level management mechanism for ReplicaSets. While the Replication Controller manages the scale of the ReplicaSet, Deployments will manage what happens to the ReplicaSet - whether an update has to be rolled out, or rolled back, etc. When deployments are scaled up or down, this results in the declaration of the ReplicaSet changing - and this change in declared state is managed by the Replication Controller.

Cluster API

The design principles underlying Kubernetes allow one to programmatically create, configure, and manage Kubernetes clusters. This function is exposed via an API called the Cluster API. A key concept embodied in the API is the notion that the Kubernetes cluster is itself a resource / object that can be managed just like any other Kubernetes resources. Similarly, machines that make up the cluster are also treated as a Kubernetes resource. The API has two pieces - the core API, and a provider implementation. The provider implementation consists of cloud-provider specific functions that let Kubernetes provide the cluster API in a fashion that is well-integrated with the cloud-provider's services and resources.

Architecture

Kubernetes follows the primary/replica architecture. The components of Kubernetes can be divided into those that manage an individual node and those that are part of the control plane.

Kubernetes control plane

The Kubernetes master is the main controlling unit of the cluster, managing its workload and directing communication across the system. The Kubernetes control plane consists of various components, each its own process, that can run both on a single master node or on multiple masters supporting high-availability clusters. The various components of Kubernetes control plane are as follows:
A Node, also known as a Worker or a Minion, is a machine where containers are deployed. Every node in the cluster must run a container runtime such as Docker, as well as the below-mentioned components, for communication with the primary for network configuration of these containers.
Add-ons operate just like any other application running within the cluster: they are implemented via pods and services, and are only different in that they implement features of the Kubernetes cluster. The pods may be managed by Deployments, ReplicationControllers, and so on. There are many add-ons, and the list is growing. Some of the more important are:
Kubernetes is commonly used as a way to host a microservice-based implementation, because it and its associated ecosystem of tools provide all the capabilities needed to address key concerns of any microservice architecture.

Kubernetes Persistent Storage

Containers emerged as a way to make software portable. The container contains all the packages you need to run a service. The provided filesystem makes containers extremely portable and easy to use in development. A container can be moved from development to test or production with no or relatively few configuration changes.
Historically Kubernetes was suitable only for stateless services. However, many applications have a database, which requires persistence, which leads to the creation of persistent storage for Kubernetes. Implementing persistent storage for containers is one of the top challenges of Kubernetes administrators, DevOps and cloud engineers. Containers may be ephemeral, but more and more of their data is not, so one needs to ensure the data's survival in case of container termination or hardware failure.
When deploying containers with Kubernetes or containerized applications, companies often realize that they need persistent storage. They need to provide fast and reliable storage for databases, root images and other data used by the containers.
In addition to the landscape, the Cloud Native Computing Foundation, has published other information about Kubernetes Persistent Storage including a blog helping to define the container attached storage pattern. This pattern can be thought of as one that uses Kubernetes itself as a component of the storage system or service.
More information about the relative popularity of these and other approaches can be found on the CNCF's landscape survey as well, which showed that OpenEBS from MayaData and Rook - a storage orchestration project - were the two projects most likely to be in evaluation as of the Fall of 2019.