[AWS] EKS Auto Mode Node lifecycle [EKS]
Introduction Node Lifecycle nodes launched by EKS Auto Mode have a maximum lifetime of 21 days (which you can reduce), after which they are automatically replaced with new nodes. Terminates instances after 336 hours by default https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html spec: expireAfter: 336h The upper use Node disruption. https://karpenter.sh/docs/concepts/disruption/ Karpenter automatically discovers disruptable nodes and spins up replacements when needed. Concept of Disruption Controller Deciding the priority of interrupted nodes Interruption node checks disruption budget spec.disruption.budgets. If undefined, Karpenter will default to one budget with nodes: 10% spec: disruption: budgets: - nodes: 10% The need for replacement nodes taints: - effect: NoSchedule key: CriticalAddonsOnly terminationGracePeriod: 24h0m0s By assigning CriticalAddonsOnly as a taint to a node, you can prevent Pods other than system Pods from being deployed to that node. Wait until the replacement node starts up. Delete the node(s) and wait for the Termination Controller to gracefully shutdown the node(s). Consolidation is configured by consolidationPolicy and consolidateAfter. spec: disruption: budgets: - nodes: 10% consolidateAfter: 30s This can be used in cases where ECS application spin-up is slow, to delay node replacement to a certain extent. Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed Node resource efficiency is automatically adjusted by adjusting the node instance type. Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation When using anti-affinity or topology, this setting takes precedence. If interruption-handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads. It is advisable to monitor interrupt events. Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster,but node repair feature is alpha feature. Since APIs other than GA cannot be enabled with EKS Feature gate, I believe this cannot be used. Try Custom NodePool Get Nodepools kubectl get nodepools -o yaml > nodepools.yaml expireAfter parameter edit kubectl apply -f nodepools.yaml The settings will be reflected immediately.
![[AWS] EKS Auto Mode Node lifecycle [EKS]](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F17w3ir50fez7dwievjij.png)
Introduction
Node Lifecycle
nodes launched by EKS Auto Mode have a maximum lifetime of 21 days (which you can reduce), after which they are automatically replaced with new nodes.
Terminates instances after 336 hours by default
https://docs.aws.amazon.com/eks/latest/userguide/create-node-pool.html
spec:
expireAfter: 336h
The upper use Node disruption.
https://karpenter.sh/docs/concepts/disruption/
Karpenter automatically discovers disruptable nodes and spins up replacements when needed.
Concept of Disruption Controller
Deciding the priority of interrupted nodes
Interruption node checks disruption budget
spec.disruption.budgets. If undefined, Karpenter will default to one budget with nodes: 10%
spec:
disruption:
budgets:
- nodes: 10%
- The need for replacement nodes
taints:
- effect: NoSchedule
key: CriticalAddonsOnly
terminationGracePeriod: 24h0m0s
By assigning CriticalAddonsOnly as a taint to a node, you can prevent Pods other than system Pods from being deployed to that node.
- Wait until the replacement node starts up.
Delete the node(s) and wait for the Termination Controller to gracefully shutdown the node(s).
Consolidation is configured by consolidationPolicy and consolidateAfter.
spec:
disruption:
budgets:
- nodes: 10%
consolidateAfter: 30s
This can be used in cases where ECS application spin-up is slow, to delay node replacement to a certain extent.
Multi Node Consolidation - Try to delete two or more nodes in parallel, possibly launching a single replacement whose price is lower than that of all nodes being removed
Node resource efficiency is automatically adjusted by adjusting the node instance type.
Using preferred anti-affinity and topology spreads can reduce the effectiveness of consolidation
When using anti-affinity or topology, this setting takes precedence.
If interruption-handling is enabled, Karpenter will watch for upcoming involuntary interruption events that would cause disruption to your workloads.
It is advisable to monitor interrupt events.
Node Auto Repair is a feature that automatically identifies and replaces unhealthy nodes in your cluster,but node repair feature is alpha feature.
Since APIs other than GA cannot be enabled with EKS Feature gate, I believe this cannot be used.
Try Custom NodePool
Get Nodepools
kubectl get nodepools -o yaml > nodepools.yaml
expireAfter parameter edit
kubectl apply -f nodepools.yaml
The settings will be reflected immediately.