Dynamically set reverse_proxy upstreams from custom module

I am working on a custom module that integrates with a popular service discovery. The module will essentially act as an ingress and load balancer based on the request.

Instead of implementing the reverse proxy using my own httputil.NewSingleHostReverseProxy(url) I would prefer to get the hard work, benefits, and options that are offered in the core reverse_proxy module.

  1. Is it a good idea to continue my use case of httputil.NewSingleHostReverseProxy(u) despite the lack of multiple upstreams?
  2. If I ordered the custom module to be triggered before the reverse_proxy, is there a way to “set” a variable in the config upon request that I can pass onto the reverse_proxy?
  3. Is there a way to use the caddyhttp.Handler that is provided in the caddyhttp.MiddlewareHandler?

I know this is a custom module, but again would love to get some of the benefits from the awesome work already completed in reverse_proxy.

2 Likes

Cool!

You can embed the reverse proxy module in your own module, this would probably be much better. In config it looks like a wrapper.

But depending on what you’re doing you might want to implement a reverse proxy transport module. Can you provide more details about what your module does? Specifically?

I’m mobile right now but will answer your other questions when I get to my computer.

Thanks Matt!

One of the challenges for service mesh/discoveries tools I find often is handling Ingress capabilities. A lot of these tools, in this case Consul, are great at providing an API and DNS server for internal discovery, but they lack a real integration with a web server for ingress (e.g. external domain name points to this internal service).

What the module I am working on does is resolve the domain name (using a caddyfile) from the Consul API to get the IP and ports the internal service is available and then reverse proxies the request - also adding some useful headers and etc. However, that service might be available on more than one IP/port assignment. So the httputil.NewSingleHostReverseProxy does not really seem ideal although I could implement by own mini load balancer but that does not seem like a smart idea.

I just found this page with the Module Namespaces and I’m guessing that I should focus on implementing http.reverse_proxy.transport?

1 Like

The transport layer is separate from the upstream pool. The proxy module first makes a decision using the upstream pool as to which backend to contact, then once having done that, sends it through the configured transport.

What Matt was suggesting is making your plugin have the reverse_proxy module be a submodule of it. I.e. your plugin would be a parent to the reverse_proxy module. Your plugin would be an HTTP handler module which embeds a reverse_proxy.

You can use placeholders + the replacer to set data in the request context which can later be reused in the handler pipeline. You might set a placeholder value like http.your_custom_proxy.actual_upstream, and then you’d configure the reverse_proxy to use {http.your_custom_proxy.actual_upstream} as the placeholder for the dial address.

Alternatively, it might be possible to generalize the “upstream decision” part of the reverse_proxy module to make it pluggable, so a different type of “decider” could be configured. This might be kinda tricky though, and the value might be limited. But it’s an idea, if you’re willing to give it a shot.

2 Likes

Okay, back at the computer!

Probably not; the standard lib reverse proxy has many limitations. Our reverse proxy – particularly the streaming code that shuttles bytes between sockets – is loosely based on that one, but our demands are more rigorous. I rewrote the v2 reverse proxy from scratch, bringing over only the portions that were useful and already well-vetted. Even some of that needs work, for example handling errors, and tuning the flushing, etc.

Yep! Seems that @francislavoie has answered this pretty well already. I will clarify though that the variable need not be a placeholder per-se, which is exposed to the user, which may or may not be what you want. Caddy has a “bucket” of variables you can set on a request, and the caddyhttp package has these nifty functions called GetVar and SetVar:

https://pkg.go.dev/github.com/caddyserver/caddy/v2@v2.2.1/modules/caddyhttp#SetVar

Under the hood, it just uses the request’s context value, but it wraps up the getting/setting a little more elegantly. (Of course, we subvert some of Go’s standard context conventions, but we also aren’t a standard library per-se and we’re allowed to just say “Well, don’t do that” when trying to use it improperly. Anyway. Not a big deal.)

Actually, these variables are also exposed as placeholders, as {http.vars.*} but that’s a side-effect of using vars, as opposed to using placeholders as the actual solution. I’d recommend using vars.

I’m not sure what you mean by this, nor why this is useful to you?

Francis already clarified, but I will go into more detail.

The reverse_proxy module literally embeds the headers module:

That exposes all the same config surface to the user as if they were configuring the actual Headers handler separately.

Then we also wrap the provisioning:

This sets it up when the server starts (or the config loads).

And then here we finally use it when proxying, once for the request, once for the response:

And that’s it!

So, you can, if you need to, embed the reverse_proxy handler in your module if you find it useful. I just showed you one example of where something similar is done.

A few suggestions before going much further:

  • Always think in terms of JSON config first, then Caddyfile second. The JSON config is what really matters. One way or another, you can probably express it in Caddyfile, and it’s OK to think about how it would look in the Caddyfile, but always start with the JSON structure.

  • Think of the right place for the functionality you want to accomplish. It sounds like you need to manipulate the list of Upstreams. One way to do this is external to the entire config, i.e. some sort of admin plugin that can watch the consule API and get notified when upstreams change, then update the config (add/remove upstreams to the list in the JSON) and reload it. Another way would be to make a type that literally embeds the reverseproxy.Handler type (so that the JSON config looks exactly the same) but all it does is set the Upstreams list for the user:

type Handler struct {
    *reverseproxy.Handler
    Consul string `json:"consul,omitempty"`
}

Or something like that. But, I am still unclear on the exact requirements and vision, so maybe this is how it works, or maybe not. Just an idea.

On second thought, I do not think a Transport module is right for this use case. A Transport is for doing a RoundTrip, not choosing upstreams.

1 Like

I think this route makes the most sense. What I’m a stuck on is how the config would look for the module… Would embedding the reverse_proxy then look like the following?

:3000 {
    consul_proxy {
        api http://192.168.64.95:8500
        reverse_proxy {http.consul_proxy.actual_upstream}
    }
}

With this setup, I have a few additional questions:

  1. I would assume that as soon as I set the placeholder, I would then call next.ServeHTTP, and the module would then be a simple middleware/chain before the reverse_proxy. Ideally I would like to set all of the possible upstreams.
  2. If I am writing my module to set the placeholders, do I really need to define a custom placeholder? What if I just dynamically set the placeholders the reverse_proxy is looking for? That way I would not have to really need to embed the reverse_proxy (maybe?)
  3. From what @matt is saying, I would still need to provision the embedded module, which I think negates my previous point (#2)

If all you have to do is set a variable, then you could just do this:

  • In your ServeHTTP, do what you need to do to get the upstream address
  • Call caddyhttp.SetVar() to set the variable
  • The user can use a placeholder like {http.vars.consul_upstream} or whatever the variable name is you choose.
  • Call Next.ServeHTTP() to pass the handler to the next one in the chain.

This is kind of how the root directive works: it’s just a middleware that sets a variable that can be used later.

We could even consider the reverse_proxy handler reading from a variable like {http.vars.proxy_upstream} by default so that any handler which sets that variable earlier in the chain will automatically tell the proxy where to go. Then the user’s config could be as simple as reverse_proxy – but this does make the config much more implicit, of course… so I dunno. Just brainstorming.


The whole thing about embedding the reverse_proxy module is so that to the user, your module would work and look just like the reverse_proxy, only slightly different by setting its own upstreams. So yes, what you suggested above for config would be pretty similar, although you’d omit the reverse_proxy line, so it’d probably be more like this:

:3000 {
    consul_proxy {
        api http://192.168.64.95:8500
    }
}

Because your module already knows what the upstream will be, the user won’t have to configure it manually. That’s the benefit of wrapping/embedding it.

But if you go the route that Francis suggests, it’s probably simpler, but the user will have to manually configure their proxy to use that value. Not a big deal, easy to document; but just depends what you want to do or how you want it to work.

In this case, the users will be mostly internal - so supporting lots of configs would not make sense in this case as it will be shipped in a Docker image for internal use.

Here is the example ServeHTTP with the Proxy:

type Proxy struct {
	API           string                `json:"api,omitempty"`
	Token         string                `json:"token,omitempty"`

	log *zap.Logger
}

func (p Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
	client, err := consulapi.NewDefaultClient()
	if err != nil {
		p.log.Error("unable to create a new client", zap.Error(err))

		return err
	}

	services, err := client.FindService(r.Host)
	if err != nil {
		p.log.Error("unable to find the service from consul", zap.Error(err), zap.String("service", r.Host))
		return err
	}

	var placeholder string
	for i, s := range services {
        // only working with the first service for now
		if i == 0 {
			a := fmt.Sprintf("http://%s:%d", s.TaggedAddresses.LanIpv4.Address, s.TaggedAddresses.LanIpv4.Port)

			placeholder = a
		}
	}

	caddyhttp.SetVar(r.Context(), "http.consul_proxy.actual_upstream", placeholder)

	if err := next.ServeHTTP(w, r); err != nil {
		p.log.Error("error passing to next handler", zap.Error(err))

		return fmt.Errorf("error passing to next handler. %w", err)
	}

	return nil
}

If I switch the var to http.vars.proxy_upstream I should just be able to return ServeHTTP correct?

Close! The way it works would be:

caddyhttp.SetVar(r.Context(), "actual_upstream", placeholder)

And then your placeholder in your config would simply become {http.vars.actual_upstream}.

And then please change the last bit to, simply: return next.ServeHTTP(w, r). Don’t do error handling in your handler, unless it’s an error handler handler. :wink:

You might also want to wrap your other return err lines with return caddyhttp.Error(http.StatusWhatever, err) for richer, more conventional error messages.

1 Like

Alright, that got me a lot closer but now to a EOF error. I verified the services are running on the port, but this is what I get now:

The error handling was for my debugging :wink:

Maybe it would make sense to post the whole file here for clarity? So here it is, minus the consulapi of course:

package proxy

import (
	"fmt"
	"net/http"
	"os"
	"strings"

	"github.com/caddyserver/caddy/v2"
	"github.com/caddyserver/caddy/v2/caddyconfig/caddyfile"
	"github.com/caddyserver/caddy/v2/caddyconfig/httpcaddyfile"
	"github.com/caddyserver/caddy/v2/modules/caddyhttp"
	// "github.com/caddyserver/caddy/v2/modules/caddyhttp/reverseproxy"
	"go.uber.org/zap"

	"github.com/org/proxy/consulapi"
)

// Interface guards
var _ caddyhttp.MiddlewareHandler = (*Proxy)(nil)

func init() {
	caddy.RegisterModule(Proxy{})
	httpcaddyfile.RegisterHandlerDirective("consul_proxy", parseCaddyfileHandlerDirective)
}

type Proxy struct {
	API           string `json:"api,omitempty"`
	Token         string `json:"token,omitempty"`
	PrimaryDomain string `json:"primary_domain,omitempty"`
	// *reverseproxy.Handler

	log *zap.Logger
}

func (Proxy) CaddyModule() caddy.ModuleInfo {
	return caddy.ModuleInfo{
		ID:  "http.handlers.consul_proxy",
		New: func() caddy.Module { return new(Proxy) },
	}
}

func (p Proxy) Validate() error {
	if p.API == "" {
		return fmt.Errorf("missing `api` in `consul_proxy`")
	}

	return nil
}

func (p *Proxy) Provision(ctx caddy.Context) error {
	p.log = ctx.Logger(Proxy{})

	return nil
}

func (p *Proxy) UnmarshalCaddyfile(d *caddyfile.Dispenser) error {
	d.NextArg()
	
	for d.NextBlock(0) {
		switch d.Val() {
		case "api":
			if !d.AllArgs(&p.API) {
				return d.ArgErr()
			}
		case "primary_domain":
			if !d.AllArgs(&p.PrimaryDomain) {
				return d.ArgErr()
			}
		default:
			return fmt.Errorf("unknown option `%s` in `consul_proxy`", d.Val())
		}
	}

	return nil
}

func (p Proxy) ServeHTTP(w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
	var host string
	switch strings.Contains(r.Host, "."+p.PrimaryDomain) {
	case false:
		p.log.Error("only subdomains are supported at this time")

		return fmt.Errorf("only subdomains are currently supported")
	default:
		h := strings.TrimSpace(r.Host)
		host = strings.Split(h, ".")[0]
	}

	// TODO remove this hack
	_ = os.Setenv("CONSUL_API", p.API)

	// find the available ips/ports for the service
	client, err := consulapi.NewDefaultClient()
	if err != nil {
		p.log.Error("unable to create a new consul api client", zap.Error(err))

		return err
	}

	services, err := client.FindService(host)
	if err != nil {
		p.log.Error("unable to find the service from consul", zap.Error(err), zap.String("service", host))
		
		return err
	}

	// setup the robots.txt options
	w.Header().Set("X-Robots-Tag", "noindex, nofollow")
	if r.URL.Path == "/robots.txt" {
		w.Write([]byte("User-agent: * Disallow: /"))
		return nil
	}

	var placeholder string
	for i, s := range services {
		if i == 0 {
			placeholder = fmt.Sprintf("%s:%d", s.TaggedAddresses.LanIpv4.Address, s.TaggedAddresses.LanIpv4.Port)
		} else {
			placeholder = placeholder + " " + fmt.Sprintf("%s:%d", s.TaggedAddresses.LanIpv4.Address, s.TaggedAddresses.LanIpv4.Port)
		}
	}

	// example placeholder for single 192.168.64.96:30153
	caddyhttp.SetVar(r.Context(), "upstream", placeholder)

	p.log.Info("setting upstream to " + placeholder)

	return next.ServeHTTP(w, r)
}

func parseCaddyfileHandlerDirective(h httpcaddyfile.Helper) (caddyhttp.MiddlewareHandler, error) {
	var p Proxy
	err := p.UnmarshalCaddyfile(h.Dispenser)
	return p, err
}

The Caddyfile looks like this:

{
    order consul_proxy first
}

:3000 {
    encode zstd gzip
    consul_proxy {
        api http://192.168.64.95:8500
        primary_domain domain.com
    }
    reverse_proxy * {http.consul_proxy.upstream}
}

You don’t happen to be proxying to Caddy, are you? It looks like some sort of infinite loop…

No, but you did give me an idea to go down a rabbit hole. I am using Multipass which is running the Ubuntu server and Docker containers. I thought maybe the Hyperkit networking was messing up the requests.

I decided to hard code the reverse proxy upstreams and they worked as expected:

{
   order consul_proxy first
}

:3000 {
    encode zstd gzip

    consul_proxy {
        api http://192.168.64.95:8500
    }

    respond /test 200 {
    	body {http.vars.consul_upstream}
    	close
    }

    reverse_proxy / {
        to 192.168.64.96:30153 192.168.64.96:21569
    }
}

I also added the /test path to verify the var was set correctly, and it was. However, passing the var to the reverse proxy caused a strange error. Below is the Caddyfile:

{
   order consul_proxy first
}

:3000 {
    encode zstd gzip

    consul_proxy {
        api http://192.168.64.95:8500
    }

    respond /test 200 {
    	body {http.vars.consul_upstream}
    	close
    }

    reverse_proxy / {
        to {http.vars.consul_upstream}
    }
}

The var is still set correctly and confirmed by accessing /test, but the Caddy port appears to be appended to the reverse proxy upstream.

http.log.error making dial info: upstream {http.vars.consul_upstream}:80: invalid dial address 192.168.64.96:30153:80: address 192.168.64.96:30153:80: too many colons in address {"request": {"remote_addr": "127.0.0.1:53134", "proto": "HTTP/2.0", "method": "GET", "host": "some.domain.com", "uri": "/", "headers": {"Accept": ["text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"], "Upgrade-Insecure-Requests": ["1"], "User-Agent": ["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15"], "Accept-Language": ["en-us"], "Accept-Encoding": ["gzip, deflate"]}, "tls": {"resumed": false, "version": 772, "cipher_suite": 4865, "proto": "h2", "proto_mutual": true, "server_name": "some.domain.com"}}, "duration": 0.002350082, "status": 502, "err_id": "z5mi3frax", "err_trace": "reverseproxy.(*Handler).ServeHTTP (reverseproxy.go:388)"}

Stack trace points to the error when trying to get the dial info:

dialInfo, err := upstream.fillDialInfo(r)

I’m going to dive through that and see if I can identify why that is throwing an error.

1 Like

Gah. Sounds like a regression from the following PR (and the related ones) for the Caddyfile.

https://github.com/caddyserver/caddy/pull/3780

If you caddy adapt --pretty your Caddyfile config, you’ll probably notice that the :80 is added by the Caddyfile to the JSON config. I think we should avoid that if a placeholder is used? I guess? It’s tricky because it’s ambiguous whether the placeholder will add a port or not.

1 Like

Just verified and you are correct, it is appending :80. Would you like me to file an issue on Github with a simple example and link to this post?

Maybe just me being naive but I do like the concept of setting a placeholder and passing on the request, specifically for things like reverse_proxy. It feels a little bit like having super powers if you know the placeholder names and has the same approach as the “cloud native” flow of setting an environment var to define the behavior.

As an example, if I know I had to set the following placeholders to configure the reverse_proxy in an upstream module, I could just define any of the placeholders listed here and they would be given priority. I’m sure that is easier said then done but still an interesting idea.

As far as the current issue, would it make sense to embed the reverse_proxy into the proxy struct and manually configure the upstreams?

@matt told me in a DM that he’ll look into it next week, but if you’re impatient, you could try to implement a fix:

^ that’s where the 80 is being added in. Basically, this should probably only happen if the port is empty AND the host part isn’t a placeholder (i.e. doesn’t have {, a check like !strings.Contains(host, "{") might be good enough).

1 Like

Submitted a PR check if the upstream is a placeholder by jasonmccallister · Pull Request #3819 · caddyserver/caddy · GitHub

2 Likes

Thanks for the fix! I couldn’t think of a better way to do it, and it seems to work well and make sense, so I went ahead and merged it in.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.