Load balancing queries

basil · July 4, 2021, 3:18pm

Just some initial queries on load balancing using the Caddyfile. Consider the following proxy request to two backends offering an identical service.

office.domain.com {
  reverse_proxy 10.1.1.13:8080  10.1.3.132:9006
}

Q1. Is it possible to convert this to use the map handler? In other words, is it possible to specify multiple backends with the map handler?

*.domain.com {

  map {labels.2} {backend} {
    office 'what goes here?'
  }
  reverse_proxy {backend}
}

Q2. Is there be any evidence in the process or access logs to suggest that load balancing is occurring, or, do I test that load balancing is working by switching off each backend in turn?

Q3. During normal operation, how will I know if a backend is down? I’m none the wiser if the remaining backend kicks in.

francislavoie · July 4, 2021, 5:00pm

I don’t think so. One placeholder can’t expand into multiple config tokens. And if you tried to use a placeholder which had an empty value, the reverse_proxy wouldn’t know any better and try to proxy to "" which doesn’t make much sense.

I talked a bit with Matt and Mohammed about some ideas for making the upstream selection functionality more flexible and extendable. Currently, only two ways are supported; listing out the upstreams in the config manually, or using a SRV endpoint for selecting an upstream. Some other ways are desired, like using DNS A records with multiple values, or some better support for plugins to push their list of upstreams dynamically. The latter would probably be what you’d need. But no work has been done on this yet.

Yeah, the best way to test it is by simulating downtime on one of them, and making sure they returns slightly different responses so you can tell the difference. For example, you can make your backends return a different value for some response header of your choosing to indicate which backend handled it.

There’s also the /reverse_proxy/upstreams API endpoint that was recently added which can show you the internal state of the set of configured upstreams.

Currently, the only way would be to use the aforementioned API endpoint to periodically check the health.

That said, I am working on a configurable event system for Caddy that should make this possible in the future, to trigger a specific CLI command for example if a certain event occurs, like an upstream being marked unhealthy. But that’s still a ways out, I’m still working out the details there. Tricky to land on the ideal configuration setup.

basil · July 5, 2021, 6:09am

So this is the Caddyfile code block that is relevant here.

office.udance.com.au {
  reverse_proxy 10.1.1.13:8880 10.1.3.132:9006
}

My Caddy version…

root@caddy:/usr/local/www # caddy version
v2.4.4-0.20210621175641-2de7e14e1c5f h1:/Kzlg8YluMMiXJBPoL8MkmArv5yqieoLHqKUDNuHtjE=

Attempting to identify the backend servers…

root@caddy:/usr/local/www # curl "https://office.udance.com.au/reverse_proxy/upstreams" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   162  100   162    0     0   8526      0 --:--:-- --:--:-- --:--:--  8526
parse error: Invalid numeric literal at line 1, column 10

Without parsing to jq…

root@caddy:/usr/local/www # curl "https://office.udance.com.au/reverse_proxy/upstreams"
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Error</title>
</head>
<body>
<pre>Cannot GET /reverse_proxy/upstreams</pre>
</body>
</html>

Not sure what’s happening here?

matt · July 5, 2021, 6:15am

The admin API is on port 2019 (by default), not 443.

basil · July 5, 2021, 6:28am

The output is useful. I can do something with it, though I notice it just shows a subset of backends, but doesn’t reveal the status of backends managed using the map handler.

root@caddy:/usr/local/www # curl "http://localhost:2019/reverse_proxy/upstreams" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   418  100   418    0     0   408k      0 --:--:-- --:--:-- --:--:--  408k
[
  {
    "address": "10.1.1.52:80",
    "healthy": true,
    "num_requests": 0,
    "fails": 0
  },
  {
    "address": "{backend}",
    "healthy": true,
    "num_requests": 1,
    "fails": 0
  },
  {
    "address": "10.1.1.13:8880",
    "healthy": true,
    "num_requests": 0,
    "fails": 0
  },
  {
    "address": "10.1.3.132:9006",
    "healthy": true,
    "num_requests": 0,
    "fails": 0
  },
  {
    "address": "10.1.1.51:80",
    "healthy": true,
    "num_requests": 0,
    "fails": 0
  },
  {
    "address": "10.1.1.53:80",
    "healthy": true,
    "num_requests": 0,
    "fails": 0
  }
]

francislavoie · July 5, 2021, 7:49am

Yep, that’s the downside of dynamic upstreams via placeholders (I think I’ve mentioned this in the past in a previous answer probably a month or two ago when you were first setting this up)

basil · July 5, 2021, 10:01am

The map directive has been a favourite of mine. but the shortcomings may be starting to outweigh the benefits

francislavoie · July 5, 2021, 12:56pm

I might be able to make this work though, if the placeholder returns empty it could just error out immediately instead of trying to use it.

This means you could use two map variables, one for the primary and the other for the fallback.

Would this solve it for you?

basil · July 5, 2021, 2:43pm

It’s an interesting approach to allow the map directive to be used in load balancing scenarios. It’s an idea you may wish to follow through with. Don’t let my thoughts below dull a good idea.

The dilemma for me, and what I need to weigh up, is the opportunity to use map in load balancing scenarios versus the loss of intel that arises through its use because the API endpoint mentioned previously can’t be used to check backend health. It’s a tough call. My gut feeling is that the intel is probably worth more to me in an operational sense than the niceties of being able to use map. If I didn’t think that was important, I wouldn’t have raised Q2 and Q3 in the OP.

I believe there’s still value in using map with wildcard certificates to summarise, tabulate and manage a wildcard environment, but, personally, any backends that I want to have participate in load balancing, I’d be inclined move out of the map block and into its own Caddy block so I can continue to use the API endpoint to check backend health.

francislavoie · July 5, 2021, 3:16pm

You can continue to use map to make the matching decisions for other things than the backend to use, but then use regular host matchers with handle blocks and use actual IP addresses for each reverse_proxy. Longer config, but gets you that benefit.

It’s a bit of a chicken and egg problem. The request is needed to decide what the {backend} placeholder means, so anything outside of the request context kinda loses out.

francislavoie · July 5, 2021, 3:23pm

By the way, the host matcher is fast but somewhat limiting because it just supports exact or prefix/suffix matching, so you may want to use header_regexp Host instead to write a regexp to match hostname patterns to avoid repetition of handle blocks.

basil · July 5, 2021, 3:24pm

Thanks for the tip!

basil · July 5, 2021, 4:45pm

I had a look at other forum threads that use header_regexp in the hope that I could figure this out, but I must backtrack and say that I’m bamboozled.

When we first started looking at options for converting the Caddyfile to use a wildcard certificate, you originally proposed an alternative approach using handle blocks instead of map in this post Migrate to using a wildcard certificate - #2 by francislavoie. I’m having difficulty making the mental leap to what you’re now suggesting with header_regex host. Can you please point me in the right direction?

francislavoie · July 5, 2021, 5:00pm

@someHosts header_regexp some Host (sub1|sub2)\.example\.com
handle @someHosts {
	reverse_proxy 123.123.123.123
}

@otherHosts header_regexp other Host (sub3|sub4)\.example\.com
handle @otherHosts {
	reverse_proxy 234.234.234.234
}

handle {
    # fallback
}

Or:

@someHosts host sub1.example.com sub2.example.com
handle @someHosts {
	reverse_proxy 123.123.123.123
}

@otherHosts host sub3.example.com sub4.example.com
handle @otherHosts {
	reverse_proxy 234.234.234.234
}

handle {
    # fallback
}

header_regexp can be shorter in the long run if you have lots of options to match, more flexible. But you can always use host and list them all out as well.

basil · July 5, 2021, 7:06pm

Okay. I think I see what you’re getting at. I’ve learnt something about header_regexp, but I’m not sure it’s so useful in my use case. For instance, I don’t have multiple subdomains pointing to the same backend (or was that just some random example?). The reverse is true though… a subdomain pointing to multiple backends.

I’m thinking more along these lines of what you’re suggesting here. For example, this is the sort of structure I currently employ::

*.example.com {
  map {labels.2} {backend} {switch1} {switch2} {
    subdomain1  192.168.0.2:8001  on off   # Service A
    subdomain2  192.168.0.2:8002  off on  # Service B
  }

  route {
    @sw1 expression `{switch1} == "on"`
    route @sw1 {
      # Do something. 
    }

    @sw2 expression `{switch2} == "on"`
    route @sw2 {
      # Do something. 
    }
    reverse_proxy {backend}
  }
}

What I’m thinking is that I retain the bulk of this structure, but remove references to {backend} from it. The host matchers with handle blocks will be placed after the route block. For example:

*.example.com {
  map {labels.2} {switch1} {switch2} {
    subdomain1  on off   # Service A
    subdomain2  off on  # Service B
  }

  route {
    @sw1 expression `{switch1} == "on"`
    route @sw1 {
      # Do action 1
    }
    @sw2 expression `{switch2} == "on"`
    route @sw2 {
      # Do action 2
    }
  }

  @sub1 host subdomain1.example.com
  handle @sub1 {
    reverse_proxy ip1
  }
  @sub2 host subdomain2.example.com
  handle @sub2 {
    reverse_proxy ip2 ip3
  }
}

Is this sort of what you were alluding to? If so, I get the best of both worlds…I can still use map for switching, but my backends are no longer coupled to it and therefore not limited by it. If regexp_header can help me simplify this even further, I’m all ears.

francislavoie · July 5, 2021, 7:21pm

Ah okay, I didn’t go look at your existing maps to find out but it’s typically the more common thing, to have 2 domains pointing to the same thing as old/new aliases or whatever to have to deal with.

route is after handle in the directive order, so take care there. Maybe put the handles in the route, or use the order global option to force reorder them, whatever.

Yeah that should do.

basil · July 5, 2021, 7:25pm

Ooh…thanks for flagging this. I forgot about the directive order.

basil · July 7, 2021, 8:58am

I thought it might be interesting to show how the wildcard caddy block changed as a result of discussions arising from this thread.

This is an excerpt of the original Caddyfile including the wildcard Caddy block and a ‘patch’ for load balancing.

office.udance.com.au {
  encode gzip zstd
  import logging udance.com.au

  reverse_proxy 10.1.1.13:8880 10.1.3.132:9006
}

*.udance.com.au {

  encode gzip zstd
  import logging udance.com.au

  map {labels.3} {backend} {online} {mtls} {phpmyadmin} {

#   HOSTNAME     BACKEND         ONLINE mTLS PHPMYADMIN #COMMENT
#---------------------------------------------------------------

    # Docker containers

#    office       10.1.1.13:8880  yes    no   no         # OnlyOffice
    portainer    10.1.1.13:9000  yes    no   no         # Portainer
    truecommand  10.1.1.13:8086  yes    no   no         # TrueCommand 2 nightly
    tc123        10.1.1.13:8082  yes    no   no         # TrueCommand v1.2.3
    tc132        10.1.1.13:8084  yes    no   no         # TrueCommand v1.3.2
    nc-fpm       10.1.1.13:8031  yes    no   no         # Nextcloud+Caddy
    wordpress    10.1.1.13:5050  yes    no   no         # WordPress
    nc-apache    10.1.1.13:8030  yes    no   no         # Nextcloud+Apache
    collabora    10.1.1.13:9980  yes    no   no         # Collabora

    # Jails

    rslsync      10.1.1.22:8888  yes    no   no         # Resilio Sync
    cloud        10.1.1.29:80    yes    no   no         # Nextcloud
    heimdall     10.1.1.23:80    yes    no   no         # Heimdall
    test         test.lan:80     yes    no   yes        # test.udance.com.au
    blog         10.1.1.54:80    yes    no   yes        # blog.udance.com.au
    basil        10.1.1.56:80    yes    no   yes        # basil.udance.com.au
    sachika      10.1.1.57:80    yes    no   yes        # sachika.udance.com.au
    file         file.lan:443    yes    yes  yes        # file.udance.com.au
    default      unknown         yes    no   no         # subdomain does not exist
}

  route {
# Error handling
    @unknown expression `{backend} == "unknown"`
    respond @unknown "Denied" 403

# Site offline
    @offline expression `{online} == "no"`
    redir @offline https://udance.statuspage.io temporary

    @split {
      expression `{online} == "split"`
      not remote_ip 10.1.1.0/24 10.1.2.0/24
    }
    redir @split https://udance.statuspage.io temporary

# Authenticate phpMyAdmin on production WordPress sites
    @phpmyadmin expression `{phpmyadmin} == "yes"`
    route @phpmyadmin {
      import authorise /phpmyadmin*
    }

# Fix when using the Nextcloud+Apache Docker image with Caddy.
    @nc-apache host nc-apache.udance.com.au
    route @nc-apache {
      redir /.well-known/carddav /remote.php/carddav 301
      redir /.well-known/caldav /remote.php/caldav 301
    }

# Enable HSTS for Nextcloud
    @hsts host cloud.udance.com.au
    header @hsts Strict-Transport-Security "max-age=31536000;"

# Secure backend communication
    @mtls expression `{mtls} == "yes"`
    reverse_proxy @mtls {backend} {
      header_up Host {http.reverse_proxy.upstream.hostport}
      header_up X-Forwarded-Host {host}
      transport http {
        tls
      }
    }

# Unsecured backend communication
    @nomtls expression `{mtls} == "no"`
    reverse_proxy @nomtls {backend}

  }
}

After decoupling reverse_proxy from the map directive, this is what the Caddyfile excerpt looks like now. The outcome is the same. The difference is that the API endpoint described earlier in this thread can now be used to monitor the availability of upsteam servers.

*.udance.com.au {

  encode gzip zstd
  import logging udance.com.au

  map {labels.3} {online} {

#   HOSTNAME     ONLINE # HSTS  mTLS    PHPMY   COMMENT
#                                       ADMIN
#---------------------------------------------------------------

    # Docker

    collabora    yes    # no    no      no      Collabora
    nc-apache    yes    # yes   no      no      Nextcloud+Apache
    nc-fpm       yes    # yes   no      no      Nextcloud+Caddy
    office       yes    # no    no      no      OnlyOffice
    portainer    yes    # no    no      no      Portainer
    tc123        yes    # no    no      no      TrueCommand v1.2.3
    tc132        yes    # no    no      no      TrueCommand v1.3.2
    truecommand  yes    # no    no      no      TrueCommand 2 nightly
    wordpress    yes    # no    no      no      WordPress

    # Jails

    basil        yes    # no    no      yes     basil.udance.com.au
    blog         yes    # no    no      yes     blog.udance.com.au
    cloud        yes    # yes   no      no      Nextcloud
    file         yes    # no    yes     no      file.udance.com.au
    heimdall     yes    # no    no      no      Heimdall
    rslsync      yes    # no    no      no      Resilio Sync
    sachika      yes    # no    no      yes     sachika.udance.com.au
    test         yes    # no    no      yes     test.udance.com.au

    default      alien  # no    no      no      subdomain does not exist
}

  route {

### Exception handling ###

# Non-existent subdomain
    @unknown expression `{online} == "alien"`
    respond @unknown "Denied" 403

# Site offline
    @offline expression `{online} == "no"`
    redir @offline https://udance.statuspage.io temporary

    @split {
      expression `{online} == "split"`
      not remote_ip 10.1.1.0/24 10.1.2.0/24
    }
    redir @split https://udance.statuspage.io temporary

# Authenticate phpMyAdmin on production WordPress sites
    @phpmyadminhosts header_regexp phpmyadmin host (test|blog|basil|sachika)\.udance\.com\.au
    route @phpmyadminhosts {
      import authorise /phpmyadmin*
    }

# Enable HSTS for Nextcloud sites
    @hstshosts header_regexp hsts host (cloud|nc-apache|nc-fpm)\.udance\.com\.au
    header @hstshosts Strict-Transport-Security "max-age=31536000;"

# Fix when using the Nextcloud+Apache Docker image with Caddy.
    @nc-apachefix host nc-apache.udance.com.au
    route @nc-apachefix {
      redir /.well-known/carddav /remote.php/carddav 301
      redir /.well-known/caldav /remote.php/caldav 301
    }

### Reverse Proxies ###

# Docker

    @collabora host collabora.udance.com.au
    @nc-apache host nc-apache.udance.com.au
    @nc-fpm host nc-fpm.udance.com.au
    @office host office.udance.com.au
    @portainer host portainer.udance.com.au
    @tc123 host tc123.udance.com.au
    @tc132 host tc132.udance.com.au
    @truecommand host truecommand.udance.com.au
    @wordpress host wordpress.udance.com.au

    reverse_proxy @collabora    10.1.1.13:9980
    reverse_proxy @nc-apache    10.1.1.13:8030
    reverse_proxy @nc-fpm       10.1.1.13:8031
    reverse_proxy @office       10.1.1.13:8880 10.1.3.132:9006
    reverse_proxy @portainer    10.1.1.13:9000
    reverse_proxy @tc123        10.1.1.13:8082
    reverse_proxy @tc132        10.1.1.13:8084
    reverse_proxy @truecommand  10.1.1.13:8086
    reverse_proxy @wordpress    10.1.1.13:5050

# Jails

    @basil host basil.udance.com.au
    @blog host blog.udance.com.au
    @cloud host cloud.udance.com.au
    @file host file.udance.com.au
    @heimdall host heimdall.udance.com.au
    @rslsync host rslsync.udance.com.au
    @sachika host sachika.udance.com.au
    @test host test.udance.com.au

    reverse_proxy @basil        10.1.1.56
    reverse_proxy @blog         10.1.1.54
    reverse_proxy @cloud        10.1.1.29
    reverse_proxy @heimdall     10.1.1.23
    reverse_proxy @rslsync      10.1.1.22:8888
    reverse_proxy @sachika      10.1.1.57
    reverse_proxy @test         test.lan

    reverse_proxy @file https://file.lan {
      header_up Host {http.reverse_proxy.upstream.hostport}
      header_up X-Forwarded-Host {host}
    }
  }
}

There are basically three sections to the Caddy wildcard block. At the top is a map block; in the middle is some exception handling, and in the lower third are the reverse proxies. Some observations:

I still find the map handler really useful for an ‘at a glance’ bird’s eye view of what’s happening in the wildcard Caddy block. Time-dependent actions, like whether a site online or not, are an active part of the map handler. More permanent subdomain specifics are tabulated in the comments section, but dealt with under exception handling.
The exception handling (routing-first) section immediately following the map Caddy block serves several purposes.
a. Handles boundary conditions and switching for the map handler.
b. Addresses subtle differences between subdomains.
In the exception handling section, I found reg_exp very useful for describing traits common to a subset of subdomains. For instance, it was an effective alternative to using a combination of map, expression logic and a snippet for phpmyadmin basic auth.
Though I lost the association of subdomains and ip addresses when backends were included in the map block, I got that association back again in the lower portion of the wildcard Caddy block where the reverse proxies are documented. The trick here was to match the names of the request handlers for the reverse proxies to the labels used in the map block. This form of documentation also lends itself to extra upstream servers being included for reverse proxies.
A further benefit of decoupling reverse_proxy and map is the slightly less abstract logic for mTLS (last reverse_proxy in the wildcard Caddy block).

francislavoie · July 7, 2021, 1:35pm

basil:

    @collabora host collabora.udance.com.au
    @nc-apache host nc-apache.udance.com.au
    @nc-fpm host nc-fpm.udance.com.au
    @office host office.udance.com.au
    @portainer host portainer.udance.com.au
    @tc123 host tc123.udance.com.au
    @tc132 host tc132.udance.com.au
    @truecommand host truecommand.udance.com.au
    @wordpress host wordpress.udance.com.au

    reverse_proxy @collabora    10.1.1.13:9980
    reverse_proxy @nc-apache    10.1.1.13:8030
    reverse_proxy @nc-fpm       10.1.1.13:8031
    reverse_proxy @office       10.1.1.13:8880 10.1.3.132:9006
    reverse_proxy @portainer    10.1.1.13:9000
    reverse_proxy @tc123        10.1.1.13:8082
    reverse_proxy @tc132        10.1.1.13:8084
    reverse_proxy @truecommand  10.1.1.13:8086
    reverse_proxy @wordpress    10.1.1.13:5050

Maybe it’s just me, but I’d do it like this:

    @collabora host collabora.udance.com.au
    reverse_proxy @collabora    10.1.1.13:9980

    @nc-apache host nc-apache.udance.com.au
    reverse_proxy @nc-apache    10.1.1.13:8030
    
    @nc-fpm host nc-fpm.udance.com.au
    reverse_proxy @nc-fpm       10.1.1.13:8031
    
    @office host office.udance.com.au
    reverse_proxy @office       10.1.1.13:8880 10.1.3.132:9006
    
    @portainer host portainer.udance.com.au
    reverse_proxy @portainer    10.1.1.13:9000
    
    @tc123 host tc123.udance.com.au
    reverse_proxy @tc123        10.1.1.13:8082
    
    @tc132 host tc132.udance.com.au
    reverse_proxy @tc132        10.1.1.13:8084
    
    @truecommand host truecommand.udance.com.au
    reverse_proxy @truecommand  10.1.1.13:8086
    
    @wordpress host wordpress.udance.com.au
    reverse_proxy @wordpress    10.1.1.13:5050

i.e. keep the matcher closest to its usage. Less hopping around in the Caddyfile when you need to make a change.

You could reduce all of these to an import snippet to make them one liners as well:

(proxy-host) {
	@{args.0} host {args.0}.udance.com.au
	reverse_proxy @{args.0} {args.1}
}

(proxy-back) {
	@{args.0} host {args.0}.udance.com.au
	reverse_proxy @{args.0} {args.1} {args.2}
}

...

import proxy-host collabora 10.1.1.13:9980
import proxy-back office 10.1.1.13:8880 10.1.3.132:9006

Unfortunately an omitted arguments do not get replaced, so you can’t use just a single snippet for this with an “optional argument”, need two to support both cases correctly.

I’d use Host (uppercase H) here for the header field, makes it a bit clearer what it’s doing, and that it’s not the host matcher but rather the Host header.

basil · July 8, 2021, 1:44am

I did agonise over the use of a snippet when I was reworking the wildcard Caddy block, but then ran into the problem of multiple upsteam servers. Multiple snippets will do the trick and lead to a neater result. Thanks for that!

EDIT: A third snippet for the mTLS reverse proxy will tidy that up too!

Noted. I’ll make sure I do that in future.

EDIT: There’s a very subtle reference to this in the header_regexp documentation example.

The final transformation for the reverse proxies using snippets for multiple upsteams and mtls,

### Reverse Proxies ###

# Docker

    import proxy-host   collabora       10.1.1.13:9980
    import proxy-host   nc-apache       10.1.1.13:8030
    import proxy-host   nc-fpm          10.1.1.13:8031
    import proxy-host2  office          10.1.1.13:8880 10.1.3.132:9006
    import proxy-host   portainer       10.1.1.13:9000
    import proxy-host   tc123           10.1.1.13:8082
    import proxy-host   tc132           10.1.1.13:8084
    import proxy-host   truecommand     10.1.1.13:8086
    import proxy-host   wordpress       10.1.1.13:5050

# Jails

    import proxy-host   basil           10.1.1.56
    import proxy-host   blog            10.1.1.54
    import proxy-host   cloud           10.1.1.29
    import proxy-mtls   file            file.lan
    import proxy-host   heimdall        10.1.1.23
    import proxy-host   rslsync         10.1.1.22:8888
    import proxy-host   sachika         10.1.1.57
    import proxy-host   test            test.lan

This is a much neater solution. Thanks for the feedback @francislavoie