Designing a fault-tolerant etcd cluster

Introduction In this article, we are going to discuss a strongly consistent, distributed key-value pair datastore used for shared configuration, service discovery, and scheduler coordination in Kubernetes, this database is called etcd (pronounced et-see-dee). This article is part of a series that will focus on understanding, mastering, and designing efficient etcd clusters. In this article, we will discuss the justification behind using etcd, the leader election process, and finally, the consensus algorithm used in etcd, in the following parts, we will follow up with a technical implementation of a highly available etcd cluster. and also backing up an etcd database to prevent failures. This article requires a basic understanding of Kubernetes, algorithms, and system design. etcd etcd (https://etcd.io/) is an open-source leader-based distributed key-value datastore designed by a vibrant team of engineers at CoreOS in 2013 and donated to Cloud Native Computing Foundation (CNCF) in 2018. Since then, etcd has grown to be adopted as a datastore in major projects like Kubernetes, CoreDNS, OpenStack, and other relevant tools. etcd is built to be simple, secure, reliable, and fast (benchmarked 10,000 writes/sec), it is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log. etcd is strongly consistent because it has strict serializability (https://jepsen.io/consistency/models/strict-serializable), which means a consistent global ordering of events, to be practical, no client subscribed to an etcd database will ever see a stale database (this isn't the case for NoSQl databases the eventual consistency of NoSQL databases ). Also unlike traditional SQL databases, etcd is distributed in nature, allowing high availability without sacrificing consistency. etcd is that guy. Why etcd? Why is etcd used in Kubernetes as the key-value store? Why not some SQL database or a NoSQL database? The key to answering this question is understanding the core storage requirements of the Kubernetes API-server.

May 7, 2025 - 09:17

Introduction

In this article, we are going to discuss a strongly consistent, distributed key-value pair datastore used for shared configuration, service discovery, and scheduler coordination in Kubernetes, this database is called etcd (pronounced et-see-dee). This article is part of a series that will focus on understanding, mastering, and designing efficient etcd clusters. In this article, we will discuss the justification behind using etcd, the leader election process, and finally, the consensus algorithm used in etcd, in the following parts, we will follow up with a technical implementation of a highly available etcd cluster. and also backing up an etcd database to prevent failures. This article requires a basic understanding of Kubernetes, algorithms, and system design.

etcd

etcd (https://etcd.io/) is an open-source leader-based distributed key-value datastore designed by a vibrant team of engineers at CoreOS in 2013 and donated to Cloud Native Computing Foundation (CNCF) in 2018. Since then, etcd has grown to be adopted as a datastore in major projects like Kubernetes, CoreDNS, OpenStack, and other relevant tools. etcd is built to be simple, secure, reliable, and fast (benchmarked 10,000 writes/sec), it is written in Go and uses the Raft consensus algorithm to manage a highly-available replicated log. etcd is strongly consistent because it has strict serializability (https://jepsen.io/consistency/models/strict-serializable), which means a consistent global ordering of events, to be practical, no client subscribed to an etcd database will ever see a stale database (this isn't the case for NoSQl databases the eventual consistency of NoSQL databases ). Also unlike traditional SQL databases, etcd is distributed in nature, allowing high availability without sacrificing consistency.

etcd is that guy.

Why etcd?

Why is etcd used in Kubernetes as the key-value store? Why not some SQL database or a NoSQL database? The key to answering this question is understanding the core storage requirements of the Kubernetes API-server.

Designing a fault-tolerant etcd cluster

Introduction

etcd

Why etcd?

Tags:

Related Posts

Popular Posts

Recommended Posts