Introducing Pod Disposal Operator
I have released my new work 🚀 called Pod Disposal Operator, a Kubernetes Operator for automatic pod relocating and controlled chaos-engineering.
https://github.com/jlandowner/pod-disposal-operator
Pod Disposal Operator performs a simple operation that dispose some pods in Deployment regularly specified by a cron expression.
Background
There are three(+α) factors as background of the operator.
- Descheduler
- Chaos-Engineering
- Pod Rotation
- (wanted to use kubebuilder)
Descheduler
Descheduler is generally used to relocate running Pods. Descheduler has an advantage that you can configure a variety of policies, strategies and options. However, it has another aspect that is therefore not easy to configure sometimes.
Also it is common to run Descenduler by cronjob in the cluster. But in a each Job execution, it will target to all Pods in the cluster excluding some Pods. Therefore, in the multi-tenant cluster for example, cluter administrators like platform teams sometimes have to run it understanding all of the behaviors/characteristics of applications of the service teams in the cluster.
Chaos-Engineering
Kubernetes users are probably interested in chaos engineering more or less, but many of them might be not sure how to begin it.
For example as application containers, the levels you would like to try Chaos-Engineering may be depending on the characteristics and maturity of the application, even if it’s a fault-tolerant application (e.g., you want to do it at a time of day when there are few accesses).
You would like to start with a small start from the point where problems does not effect for the entire service, even if some problems occur.
Pod Rotation
Many applications running for long periods of time sometimes experience problems. Legacy applications might have a maintenance job to release memories and processes by restart periodically.
Also, if your applications are outputting files per pods to NFS or like that, you might still need rotations the files or other mechanisms. When creating files per pods, its name might be with a unique Pod name or other key. In that case, when you delete the pods, ReplicaSet recreate new pods and the files are naturally rotated as well. It is one of the easiest way to maintain them.
In my case, some legacy application container exists in cluster. It output a lot of files to AWS EFS, and the file names has the hostname prefix (meaning hostname is Pod name in k8s).
For maintain the EFS, Pod deletion is easy way to rotate the files.
Features
- 
Pod Disposal Operator perform just dispose old Pods by default. Recreation of Pods are dependent on K8s auto healing mechanisms. (ReplicaSet will detect that the number of Pods are not enough.) You can check the cluster’s and your application’s fault-torrelant behavior is working enough. 
- 
Based on the idea that “Pods running for a longer period of time causes the unbalanced Pod allocation”, the operator removes Pods in old order by default. This has also the aspect of a Pod rotation. 
- 
You can configure disposal timing for each Deployment. Pod Disposal Operator watchs custom resources called the PodDisposalSchedulethat you can create for each Deployment. Depending on the behavior/charactoristic of your application, you can rotate a large number of pods at once during small traffic-coming periods, or remove a small number of pods more frequently.
- 
Combined with Kubernetes distributed scheduling mechanisms such as Inter-Pod Anti-Affinity and Pod Topology Spread Constraints, the possibility that pods will be properly relocated in recreation after disposal may increase. 
Get started
All you need to install is a single command as below.
kubectl apply -f https://raw.githubusercontent.com/jlandowner/pod-disposal-operator/master/deploy/quickstart.yaml
Configure disposal
You can cofigure a disposal schedule and strategy by a single Custom Resource Difinition PodDisposalSchedule.
Sample configuration is as following.
apiVersion: operator.k8s.jlandowner.com/v1
kind: PodDisposalSchedule
metadata:
  name: poddisposalschedule-sample
  namespace: default
spec:
  # Target Pods selector
  selector:
    # Only Deployment is available
    type: Deployment
    # Deployment name
    name: sample-nginx
  # Cron format's disposal schedule
  schedule: "* */3 * * *"
  strategy:
    # Max number of pods to be deleted at the same time
    disposalConcurrency: 2
    # Pods that are living over lifespan will be deleted only
    lifespan: 12h
    # Number of pods to be kept
    minAvailable: 2
You can see the details of each field in the docs.
Example
I prepared a example guide which you can experience how disposal works.
See https://github.com/jlandowner/pod-disposal-operator/tree/master/example
Let’s get started👏
