Health checks in nginx

nginx is an awesome web server. If you haven't heard of it, that's okay; it has a lot of other users.

The problem

I have a project that involves a bunch of nginx servers in a bunch of different locations. If one of them stops working, I'd like to stop using it until it gets better. I'll be using Amazon Route 53 health checks to provide this behavior, but plenty of other things work the same way.

All I need to do is to make each server indicate if it's healthy or not. But, hmm… what does "healthy" mean in this context?

This project uses nginx as caching proxies. Most of the content is immediately available most of the time, but some of the time, nginx has to consult other systems over the network. So, even if the local system is working fine, it might not actually be capable of serving clients, in which case I want the health check to fail.

The status check is an HTTP GET request. The upstream system will interpret any 200 OK response as success, any any other response as a failure. That sounds like the logic I want.

Except… no, I need to check several paths. Crap.

Well, let's see. nginx is built around an asynchronous event processing loop, which internally supports the notion of sub-requests. All the health check needs to do is to make some sub-requests, wait for them to complete, and return 200 OK if and only if they all returned 200 OK. That totally fits the nginx request processing model, so this shouldn't be hard, right?

The solution

Well, turns out it wasn't hard. nginx supports running arbitrary logic in-process via the ngx_lua module, and this module exposes enough of the internals to do exactly what I want.

Specifically, I can use ngx.location.capture_multi() to run multiple simultaneous sub-requests, ngx.status = 500 to signal failure, and ngx.print() to return a request body just for good measure. It's been a long time since I wrote Lua, but:

location /health_check {
  default_type 'text/plain';
  content_by_lua '
    local reqs = {
      { "/" },
      { "/path1" },
      { "/path2" }
    }
    local resps = { ngx.location.capture_multi(reqs) }
    local ok = true

    for i, resp in ipairs(resps) do
      local req = reqs[i]
      table.insert(body, "- GET ")
      table.insert(body, req[1])
      table.insert(body, " returned ")
      table.insert(body, resp.status)
      table.insert(body, "\\n")

      if resp.status ~= 200 then
        ok = false
      end
    end

    table.insert(body, "\\n")

    if ok then
      table.insert(body, "ok\\n")
    else
      table.insert(body, "failed\\n")
      ngx.status = 500
    end

    ngx.print(table.concat(body))
  ';
}

This satisfies my goal of testing multiple paths (in parallel!) and returning 200 OK if and only if each of them are working.

Health check: accomplished.

As a bonus, /health_check returns a useful plain-text result should I care to inspect it from a browser:

- GET / returned 200
- GET /path1 returned 200
- GET /path2 returned 500

failed

Implementation notes

Note that I'm buffering the response in body rather than immediately calling ngx.print(). This is necessary because ngx.print() tells nginx to write the passed string as part of the response body, which implicitly tells it to send the response headers, including the status code. Therefore, I can't call ngx.print() before I've determined the status code I want to send.

Additionally, for those try-it-at-home types, be aware that ngx_lua is not an official part of the nginx source tree. It is, however, available in the nginx-extras package on Debian-related distributions, and is therefore also part of Ubuntu LTS, which is good enough for me.