Bulk Matcher Advice?

1. Caddy version (caddy version):

2.2.1

2. How I run Caddy:

bin/caddy run -config assets/conf/Caddyfile

a. System environment:

macOS Catalina

b. Command:

bin/caddy run -config assets/conf/Caddyfile

c. Service/unit/compose file:

paste full file contents here

d. My complete Caddyfile or JSON config:

@blocklist {
      import blockeduseragents.conf
      import blockedreferers.conf
      import blockedproxytargets.conf
    }
    rewrite @blocklist /bad-actor
    respond /bad-actor 403

blockeduseragents.conf

# List all bad User-Agents:
header User-Agent STRING1*
header User-Agent STRING2*

blockedreferers.conf

# List all bad Referrers:
header Referer *STRING3
header Referer *STRING4

3. The problem I’m having:

I want to build lists of known bad actor attributes and if a request matches ANY of the many configured tests I’ll rewrite and 403 as above.

I can build and format the conf file content from lists available externally.

But as a newbie I’m confused. As I understand it, all these tests will be AND’d and not OR’d… unless each test is defined in its own match set.

Am I right? Each conf file could contain a few hundred tests…

Is the V2 matcher design intended for this sort of use or should I go ahead and write a custom matcher to work on bulk configuration like this?

Maybe as a newbie I just missed the right feature to use :laughing:

4. Error messages and/or full log output:

5. What I already tried:

6. Links to relevant resources:

Okay, well I was writing out a somewhat deep explanation, then I realized half way through that it doesn’t work how I thought it, nor how @matt thought he implemented it (which was nearly a year ago now, when he first implemented the matchers).

So turns out that right now, in the Caddyfile adapter, the last matcher for a specific header field in a named matcher will win. It doesn’t allow for matching on multiple specific header values.

The JSON config is fully capable of making this work though, so I’d call this a bug that should be fixed in the Caddyfile adapter (edit: good news, it’s super easy, I’ll have a PR ready soon).

But I’ll leave what I wrote below, because I think it’s still useful, albeit incomplete… don’t feel like deleting it :cry:


I agree, it’s not perfectly clear in the documentation:

So, if you use multiple matchers of the same type (i.e. header or path in your case) in a named matcher, then it will trigger merging logic of that matcher.

For path, it’s simple, you can’t really AND two paths together, because there’s only one path value. It doesn’t make much sense to require two different paths at the same time. So for example, these two are the same:

@foo {
	path /abc*
	path /bcd*
}
@foo path /abc* /bcd*

But if we then look at header, it’s a bit more complicated. There’s two dimensions to header matchers, because a headers are key-value pairs, i.e. the header name (aka “field”). So what Caddy does is it will merge header matchers that use the same field and OR those, but it will AND with the ones with different header fields. So for example:

(cue sound of :man_facepalming: when I was typing out the example and realized the merging logic didn’t work)

1 Like

Alright here’s a PR to fix that header merging logic:

https://github.com/caddyserver/caddy/pull/3832

So yeah, the matcher docs will need cleaning up to make it less ambiguous how it behaves.


Anyways, with that out of the way, yes I think you’re on the right path with the header matching… but I do think it might be pretty inefficient if you have that many values that would need to be checked on every request. You might start to see a performance hit if you don’t make sure to keep that list in check. A single regexp might be faster if at all possible, but it depends what your goals are here.

Unfortunately you’ll probably have to wait until Caddy v2.3.0, or build from source if you want this earlier. You can grab the CI build artifact right now if you want to try it: caddyhttp: Add Caddyfile support for merging header matchers · caddyserver/caddy@38150c3 · GitHub

1 Like

Thank you for the fast response to my question!

So because of the merging of header matchers it should work as an OR type of match across multiple values for the same header…this is good

So if I want to block bad actors on say …User-Agent I create a single matcher of multiple

header User-Agent <value>

conditions

If I also want to block bad actors on Referer, I need to create another matcher with multiple

header Referer <value>

conditions

However, I cannot combine the two into a single matcher otherwise I’d have to get a match on one of the User-Agent headers AND and a match on one of the Referer header values.

Is my understanding correct here?

Yes that’s true…

BUT, with the magic of boolean logic, you can use the not matcher to make it work!

:crazy_face:

Deeper explanation because I was on my phone earlier and didn’t want to type it all out:

So if you just put both your header matchers in a named matcher, you end up with this :

User-Agent OR Referer

But you can do this to get AND

not (not User-Agent OR not Referer)

This is possible by this law:

So in Caddyfile it would look like this:

@foo {
	not {
		not {
			import your-user-agent-rules
		}
		not {
			import your-referer-rules
		}
	}
}

Super dumb, but :man_shrugging:

I also wrote about this “either” logic here:

https://github.com/caddyserver/caddy/issues/3448#issue-624400876

1 Like

Thanks for the not/not matcher logic… it makes perfect sense :rofl:
Seriously, is it worth considering a simple AND/OR switch to a matcher for the sake of readability?

Yeah well, that’s what that issue I linked at the end was meant to try and help with… but it’s not that simple because we actually have to think about how it’s set up in the underlying JSON. If the JSON doesn’t support it, then the Caddyfile can’t either. We could have Caddyfile sugar for AND, but I don’t think it would really fit the current config structure. There’s no obvious way to do it correctly without breaking our parser’s assumptions.

The JSON supports many combinations of AND and OR. You just have to compose it right if your logic is complex. The Caddyfile supports only mildly complex logic, so for full flexibility you should use the JSON.

@matt well actually, the Caddyfile maps pretty much 1:1 to JSON when it comes to the not stuff. This isn’t something that JSON really lets you do regardless.

Unless you mean using more than one matcher set per handler, then yeah I guess that’s the one thing the Caddyfile can’t do right now.

Edit: Yeah that is what you mean :see_no_evil:

Yeah, I mean having full control by composing multiple matcher sets exactly as you want.

That’s an interesting point about JSON @matt and something I was wondering

Should I be using Caddyfile or JSON? Much of your documentation is expressed in terms of Caddyfile configuration no?

Our Getting Started tutorial, which we recommend everyone do, will help you decide. :slight_smile:

Also you can use a hybrid approach if you need to, if you need something only available in JSON.

Use Caddyfile as your base, run caddy adapt --config /path/to/Caddyfile then pipe that into jq to make any additional changes to the config.

https://stedolan.github.io/jq/

Thanks Guys :+1:

1 Like

This topic was automatically closed after 30 days. New replies are no longer allowed.