Kubernetes-hosted application checklist (part 1)

At work, we’ve been running Kubernetes (k8s) in production for almost 1 year. During this time, I’ve learnt a few best practices for designing and deploying an application hosted on k8s. I thought I might share it today and hopefully it will be useful to newbie like me.

Liveness and readiness probes

  • Liveness probe: check whether your app is running
  • Readiness probe: check whether your app is ready to accept incoming request

Liveness probe is only check after the readiness probe passes.

If your app does not support liveness probe, k8s won’t be able to know when to restart your app container and in the event your process crashes, it will stay like that while k8s still directing traffic to it.

If your app takes some time to bootstrap, you need to define readiness probe as well. Otherwise, requests will be direct to your app container even if the container is not yet ready to service.

Usually, I just make a single API endpoint for both liveness and readiness probes. Eg. if my app requires database and Redis service to be able to work, then in my health check API, I will simply check if the database connection and redis service are ready.

try {
    const status = await Promise.all([redis.ping(), knex.select(1)])
    ctx.body = 'ok'
} catch (err) {
    ctx.throw(500, 'not ok')
}

Graceful termination

When an app get terminated, it will receive SIGTERM and SIGKILL from k8s. The app must be able to handle such signal and terminate itself gracefully.

The flow is like this

  • container process receives SIGTERM signal.
  • if you don’t handle such signal and your app is still running, SIGKILL is sent.
  • container get deleted.

Your app should handle SIGTERM and should not get to the SIGKILL step.

Example of this would be something like below:

process.on('SIGTERM', () => {
    state.isShutdown = true
    initiateGracefulShutdown()
})

function initiateGracefulShutdown() {
    knex.destroy(err => {
        process.exit(err ? 1 : 0)
    })
}

Also, the app should start returning error on liveness probe.