Container orchestration condenses many repetitive and complex tasks into a simple declarative config file. The popularity of orchestration is driven by two things. First, organizations are moving toward microservices applications instead of monolithic applications. The resulting environments need to manage thousands of these services. Second, infrastructure is cheaper and more disposable. As a result, you can just replace a failed machine, instead of trying to fix the problem.
Orchestration solves the microservices issue by leveraging the benefits of disposable infrastructure. Orchestration replaces failed machines with new identical machines that run the same containers. As a result, the services continue to run without any manual intervention.
Kubernetes is the most popular container orchestration system. Unfortunately, most databases cannot handle the dynamic environment of Kubernetes. Especially SQL databases that require strong consistency. Databases don’t work well in Kubernetes because they have to spin up new interchangeable instances, requiring coordination across zones.
Database orchestration is falling behind since many companies are still making database changes manually. As a result, issues like team conflicts, configuration drift, and undocumented changes are responsible for numerous failed database releases.
Database orchestration with Kubernetes is difficult because:
- Replicas of a database are not interchangeable—each replica has a unique state. As a result, you cannot immediately bring them up and down.
- Database replica deployment requires coordination—you have to ensure things like version upgrades and schema changes are always visible.
- Databases are not a collection of files—you cannot make changes by replacing files since the changes are incremental. Also, you cannot easily copy databases between environments.
- Practical rollbacks—are almost impossible to achieve due to the nature of persistent storage. You have to preserve the data in your schema when rolling forward to restore a previous state later.
One approach to overcome the challenges of database orchestration is to run the database outside Kubernetes. There are a lot of options to orchestrate databases outside Kubernetes. You can categorize these options into two main sections—external tools and cloud services.
You can use external tools instead of the internal Kubernetes tools for orchestration and automation. This includes tools like Monit for process monitoring, Chef or Puppet for configuration management, and HAProxy for load balancing. The downside is that using external tools can add complexity and create duplication of work.
Another way to manage databases outside Kubernetes is with a Database-as-a-Service (DBaaS). DBaaS is a cloud computing service that enables users to access a database without setting up physical hardware, installing software or configuration.
The downside is that DBaaS adds an extra layer of complexity since you are still running a single service outside of Kubernetes. Managed Kubernetes services can encounter vendor lock-in issues with DBaaS. In addition, DBaaS services are often built on old technology that does not scale easily.
Kubernetes provides two solutions to run databases inside Kubernetes. These solutions are StatefulSets and DaemonSets.
StatefulSets are a group of pods with unique stable hostnames and persistent identities. StatefulSets are designed to run replicated and stateful Kubernetes services. Kubernetes maintains these pods regardless of whether pods are scheduled. The state information and other resilient data for any given StatefulSet pod are stored in the persistent disks of the StatefulSet.
The local persistent volumes feature reached general availability in Kubernetes 1.14. A local persistent volume serves as a local disk directly attached to a single Kubernetes node. This means you do not need remote services to attach and detach the same disk to multiple machines.
StatefulSets use a unique persistent ID for each pod to enable Kubernetes run a replicated database. IDs persist even when the pod is rescheduled to another machine. The persistent ID allows you to attach a specific volume to the pod. As a result, the state of the pod is retained even when Kubernetes moves it across your datacenter.
The database and Kubernetes itself run on the same machines. They both consume resources and can affect overall performance. Additionally, StatefulSets enables you to reschedule database pods to other nodes. As a result, sometimes other Kubernetes services compete with the stateful service over available resources.
A DaemonSet is a Kubernetes service that ensures a pod is running on every node. For example, if you create a DaemonSet on a cluster with six nodes, the DaemonSet schedules six pods total. This enables you to run your database on a particular set of nodes. Kubernetes makes sure that the database always remains available. This is perfect for stateful services because you do not have to run anything else on your database nodes.
Moreover, DaemonSets use local disks in a more reliable way since you don’t have to reschedule database pods and worry about losing a disk. Keep in mind, however, that local disks are more prone to failure since they rarely have any redundancy or replication.
DaemonSet databases occupy entire sets of nodes. As a result, the number of connections between other applications and your database is limited. This improves database security and reduces resource dependencies.
You can reschedule Kubernetes StatefulSets onto the same machine just like all other pods. You just need to ensure that your database pods have sufficient resources by setting appropriate limits. In addition, StatefulSets have potentially greater performance implications than DaemonSets due to the reliance on remote services.
On the other hand, DaemonSets represent a more natural abstraction for database orchestration on dedicated nodes. The biggest downside of DaemonSets is the limited ability of Kubernetes to help your cluster recover from failures. For instance, Kubernetes cannot replace a node that is about to fail with a new node because it’s currently running a database pod on all the associated nodes. This resembles running a database directly on physical machines that are manually replaced.
Managing databases in Kubernetes can be a challenge. Unfortunately it is impossible to avoid since many applications rely on databasing. Hopefully, this article helped you understand some of the challenges around using databases in Kubernetes. The options covered here are a good place to start when considering how to manage your databases should help guide you in the right direction.
Gilad David Maayan is a technology writer who has worked with over 150 technology companies including SAP, Samsung NEXT, NetApp and Imperva, producing technical and thought leadership content that elucidates technical solutions for developers and IT leadership.