Regular expression negative lookaheads workaround

1. The problem I’m having:

Since then I’m using Apache and Nginx. Now I’d like to switch to Caddy because of the h3 support.

A pity that regular expression negative lookaheads aren’t supported in Caddy.
Now I’m trying to find a way around and it might be a feature request(?).

Here is how the config looked in Apache/Nginx:

Apache

<IfModule mod_authz_core.c>
    <LocationMatch "(^|/)\.(?!well-known/)">
        Require all denied
    </LocationMatch>
</IfModule>

Nginx

location ~* /\.(?!well-known\/) {
  deny all;
}

What works in Caddy is the following:

@block {
        path_regexp ^(\/\..*)$
        not path_regexp ^/\.well-known\/.*$
}
respond @block 403

Is this the expected way?

2. Error messages and/or full log output:

Error: sending configuration to instance: caddy responded with error: HTTP 400: {"error":"loading config: loading new config: loading http app module: provision http: server srv0: setting up route handlers: route 0: loading handler modules: position 0: loading module 'subroute': provision http.handlers.subroute: setting up subroutes: route 1: loading matcher modules: module name 'path_regexp': provision http.matchers.path_regexp: compiling matcher regexp ^/(?!\\.well-known/)\\.[^/]*($|/): error parsing regexp: invalid or unsupported Perl syntax: `(?!`"}

3. Caddy version:

2.6.4

4. How I installed and ran Caddy:

dpkg

a. System environment:

Debian 12 bookworm (testing)

1 Like

This works as well:

@no_dot_files {
        not path /.well-known/*
        path /.*
}
respond @no_dot_files 403

Is such configuration safe to use in a prod environment?
This feels to easy.

Also, is something like this possible?

@blockedFiles {
        path /*.{bak,conf,dist,fla,inc,ini,log,orig,psd,sh,sql,swp}
}
respond @blockedFiles 403

Instead of

@blockedFiles {
        path_regexp (^#.*#|\.(bak|conf|dist|fla|in[ci]|log|orig|psd|sh|sql|sw[op])|~)$
}
respond @blockedFiles 403
1 Like

I asked ChatGPT what it thought would be a solution and it came up with this hilarious regexp:

/\.[^w]|w[^e]|we[^l]|wel[^l]|well[^-]|well-[^k]|well-k[^n]|well-kn[^o]|well-kno[^w]/ 

That said, a simpler solution is to use the expression matcher, which would look like this:

@no-dot-files `{path}.startsWith('/.') && !{path}.startsWith('/.well-known/')`
respond @no_dot_files 403

No, the path matcher doesn’t do expansion. But you can just do path *.bak *.conf *.dist etc. The regexp approach is fine too (and more powerful).

2 Likes

Thanks for replying Francis.

Yeah I’ve actually used ChatGPT as well. :sweat_smile:

So from your perspective it’s better to use the expression matcher instead of just the example I posted above, path /.*?
Can you explain why?

Btw Caddy is so nice to use. I’ve to get used to some things but it’s great (except missing support for negative lookaheads :smiling_face_with_tear:)

It does the exact same thing, but the expression is shorter and more expressive because it’s an actual boolean expression. Easier to read, easier to modify.

You could even use the actual path matcher within the expression like this, which is a bit shorter but probably slightly slower because it has to do more function invocations under the hood (we’re talking about nanoseconds here, not an appreciable difference):

@no-dot-files `path('/.*') && !path('/.well-known/*')`
respond @no_dot_files 403

I gave a talk last September about this: https://youtu.be/QSerOHpMjgY?t=640

It’s because we’re using Go, which uses RE2 as the regexp engine. See https://swtch.com/~rsc/regexp/regexp3.html#analysis for an explanation of why it doesn’t support it. Essentially lookaheads (and anything with backreferences) don’t have predictable performance, so it’s risky to use in situations where regexp might be user input. In general, regular Go code can do a better job of doing the same tasks if absolutely necessary.

2 Likes

Great questions and answers with discussion in this thread, just wanted to say :smiley:

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.