`http.proxy` and non-GET retries

DeanMarkTaylor · October 3, 2019, 7:50pm

I’m currently trying to confirm how Caddy v1.0.3 handles proxy retries for GET requests vs POST requests.

The Questions

Is there a way of setting a proxy to only handle GET (read) requests and pass other POST etc (write) requests to another proxy?
Does Caddy proxy retries handle post verb types (GET, POST, etc) differently?

Thoughts

Currently reading through both the proxy documentation and having a cursory look at the code.

In Caddy “proxy reties” assuming requests are modelled on REST-ish like fashion (difference being where POST is not idempotent, e.g. WordPress), I’m lead to the following initial thoughts :

There is no difference between the handling of GET (read) requests vs. POST, PUT, etc (write) requests.
As default it would be fine to retry GET requests as no modifications are being made.
It’s possible for a POST (write) request partially or fully processed by an upstream server and another (retry) request to be made during / after in the case where there is a connection delay / issue / error between Caddy and the upstream server.
In order for “proxy retries” to be useful for POST (or any write) requests these must handled by the upstream server in an idempotent fashion or risk data being submitted multiple times.
It’s not possible to set the “proxy” part of Caddy to only retry GET requests.

Test case?

So as a simple test case consider submitting a simple contact form (no idempotent keys etc), it would be possible for single user submission of contact details to be “posted” to each proxy server in turn causing the same details to be stored in a database multiple times.

Side note: I’m pretty sure we see this kind of effect in the wild (not Caddy) where a user submits a single comment to a forum or YouTube once - but you actually see the same comment repeated multiple times.

Caddy 2?

If not possible with Caddy 1.x is it possible Caddy 2 has this feature already?

Whitestrake · October 3, 2019, 11:29pm

Not in Caddy v1. You could extend it yourself to have this behaviour if you wanted to get handy with Go code.

I don’t believe there’s any distinction made between method.

Yes. You can define an entirely separate responder based on request matchers, including method, in Caddy 2. (It even lets you share middleware over different matchers on the same request, too, so you can keep most of the site configuration shared and just specify a separate responder.)

matt · October 4, 2019, 3:01am

Actually this is half-done locally, I just never got it pushed; you can configure request matchers just like you do for routes, but for proxy retries. By default, only GET will match. You can do "method": ["GET"] (which will be the default) or "path": "/idempotent" or "header": {"Idempotency-Key": ...}) so you can define only certain requests as retry-able.

DeanMarkTaylor · October 4, 2019, 9:26pm

Thanks, great to hear of improvements coming with Caddy 2.

Just to confirm right now can I match on the “method” verb alone with Caddy 2 (without your unchecked in changes)?

Basically I want to ensure double POST’ing won’t occur and that if a server doesn’t respond with a nominal period for GET requests they are retried on another server.

Under load one a server can take 30-60 seconds to respond to a simple static file GET request where another server will respond in under 2 seconds. Setting a timeout of 2 seconds would potentially return that same content to the user in <4 seconds rather than 30-60 seconds.

This is just one example of an issue this might help with.

matt · October 4, 2019, 11:10pm

In Caddy 2, you define the server’s HTTP behavior by specifying routes: GitHub - caddyserver/caddy: Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS

Routes have matchers and handlers. If the matchers match the request, the associated handlers are evaluated. A handler would be, for example, a reverse_proxy responder.

But because Caddy 2 is modular (yay), we’re working on embedding those matchers into the reverse proxy directly, so that you can conditionally retry the request based on whatever matchers you specify within the proxy (as opposed to for the entire encapsulating route). This is a really powerful feature. Matchers can match on, well, LOTS of stuff (and more to come) (scroll through from this point to see all of them): GitHub - caddyserver/caddy: Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS

Once this change is done, you can define a single proxy handler to conditionally retry the requests, as opposed to needing two routes/handlers (which is currently what you have to do, because matchers toggle route evaluation).

matt · October 5, 2019, 5:31pm

@DeanMarkTaylor Is it OK if the proxy retries if the error is specifically a connection failure? i.e. the Dial() to the upstream failed, so it’s known that no HTTP request was received by the upstream at all.

Said another way, the proxy can retry if the Dial fails, but checks the matchers if the connection succeeded, and only if the matchers match, will it retry the request.

DeanMarkTaylor · October 5, 2019, 5:42pm

I’m unsure what exactly you mean by “Dial” but perhaps if I put it this way…

Up until the point actual data is sent would work, before any of the following are sent: headers, URL, post data.
That way there is nothing that can cause the the submission to have occurred multiple times.

matt · October 5, 2019, 7:31pm

Right, dialing is what happens before an HTTP request can be sent, because dialing establishes a connection.

Just wanted to make sure that simply attempting a connection would not alter state on the upstream.

matt · October 5, 2019, 10:25pm

@DeanMarkTaylor I’ve implemented this as retry_match in reverse_proxy: Implement retry_match; by default only retry GET requests · caddyserver/caddy@be7abda · GitHub.

However, I’m headed out the door for now and haven’t had a chance to test it. I’ll do more testing later – but for now, would you please give it a try ASAP?

This is the default setting:

{
    "handler": "reverse_proxy",
    "load_balancing": {
        "retry_match": [{
            "method": ["GET"]
        }]
    }
}

You can use any group of matcher sets for the value of retry_match. In other words, you can match on path, headers, … anything about the request, really, just like HTTP routes.

Take a look at the diff and see if the logic I chose is agreeable as well.

Let me know how it goes!

DeanMarkTaylor · October 6, 2019, 12:43am

@matt I’m just getting a handle on getting one of my instances moved over to Caddy 2 for testing with - so not going to be the super-fast-mover in testing this one.

Hoping to have the instance up and running before the weekend is out.

Could you confirm for me that the CADDYPATH=/etc/ssl/caddy for Lets Encrypt issued certificates can be kept the same between Caddy 1 and Caddy 2 - so I can flip flop between them during testing?

Really want to avoid obtaining new certs.

This sounds great though!

matt · October 6, 2019, 3:23am

Caddy 2 doesn’t use env variables for configuration with the exception of $XDG_DATA_HOME (and a few other misc. ones that aren’t Caddy-specific) or if unset, $HOME, for choosing where the default storage location is. In other words, $CADDYPATH is not recognized. We want to keep all the config in 1 place, that is a JSON document.

The default storage location in Caddy 2 is $XDG_DATA_HOME/caddy or, if unset, $HOME/.local/share/caddy.

You can set another location using:

{
	"storage": {
		"module": "file_system",
		"root": "/path/to/storage"
	}
}

As documented here: Home · caddyserver/caddy Wiki · GitHub

system · January 4, 2020, 3:23am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.