Map directive and regular expressions

Hi!

I worked with the map directive and found the following:

Using the example of the map directive page verbatim only adding

~(bar) ABC 99

right before the default line, when I request https://bar.demodomain.com the variables are set as follows:

{my_placeholder} = ABC.demodomain.com
{magic_number} = 99.demodomain.com

I expected ABC and 99 as I did not reference anything like with ${1}. I assumed that without references also regexp lines let the mapping act like “found something, now simply set the desired value to the destination/placeholder”.

Do I misunderstand the directive? Changing the regexp to something more specific is not my point of confusion here, just why it acts more like search/replace.

Please provide a complete example to reproduce what you’re seeing.

And next time, please fill out the help topic template. Your post is lacking critical information that we need to make sure we’re on the same page, such as the Caddy version, how you’re running Caddy, etc.

1 Like

Sorry Francis for leaving the template. Here it is:

1. Caddy version (caddy version):

2.4.5

2. How I run Caddy:

a. System environment:

Ubuntu 20.04.3 LTS

b. Command:

# extend localhost line with abc.localhost:
# 127.0.0.1 localhost abc.localhost
sudo vim /etc/hosts
caddy run
curl -I https://abc.localhost

c. Service/unit/compose file:

orig from ppa

d. My complete Caddyfile or JSON config:

ABC.localhost, localhost {

    map {host}             {my_placeholder}  {magic_number} {
	        example.com        "some value"      3
	        ###.example.com    "another value"
            (.*)\.example.com  "${1} subdomain"  5

            ~.*\.net$          -                 7
            ~.*\.xyz$          -                 15

      	    ~(abc)   	       "ABC" 	      	  99

            default            "unknown domain"  42
    }

    header mapresult {my_placeholder}/{magic_number}

}

3. The problem I’m having:

I don’t understand the map result. The map directive. The regexp flavour. Or all of it :slight_smile:

I expected the result to be

mapresult: ABC/99

not

mapresult: ABC.localhost/99.localhost

as I did not use any ${capture_group} for the placeholder/destinations.

4. Error messages and/or full log output:

$ curl -I https://abc.localhost
HTTP/2 200
mapresult: ABC.localhost/99.localhost
server: Caddy
date: Fri, 08 Oct 2021 06:12:48 GMT

5. What I already tried:

read: what I also tried (replacing the line with the “ABC” value) with:

~(.*) "DEF" 123

curl -I https://abc.localhost then results in

mapresult: DEF/123

6. Links to relevant resources:

map directive

1 Like

Please note one can not post foo dot example dot com so I replaced foo in my template above with three #

Addendum: Is it possible to use case insensitive matching when using the regular expressions with map?

Thanks for filling out the template!

Okay, I see what’s going on. I found it easy to replicate with this this config:

:8881 {

    map {path}      {my_placeholder}    {magic_number} {
        ~(/abc)     "ABC"               11
        ~(/xyz).*   "XYZ"               22

        default     "unknown"           42
    }

    header mapresult {my_placeholder}/{magic_number}
}

Then making requests like this:

$ curl -v localhost:8881/abcdef
< Mapresult: ABCdef/11def

$ curl -v localhost:8881/xyzdef
< Mapresult: XYZ/22

So basically, we’re using the Regexp.ReplaceAllString function to perform the replacements, and apparently the way it works is that if the match doesn’t completely cover the input string, then it appends the remainder of the input to the output.

By fully consuming the rest of the input with .*, then it has the desired result.

Yeah, you can use (?i) at the start of your regexp string to enable case insensitive mode. For example, ~(?i)(/xyz).* will match /xyz and /XYZ

2 Likes

Hi Francis,

thanks for giving it a try. So I guess I have to rethink what I try to accomplish (more on that below).
I thought I can use the map to set variables when the regexp fires and use the result in a boolean way in other directives e.g. to trigger errors or handle rewrites. Like in the nginx map.

Yeah, you can use (?i) at the start of your regexp string to enable case insensitive mode. For example, ~(?i)(/xyz).* will match /xyz and /XYZ

Perfect! It would be great if the docs could always have a small hint like “… to enable case insensitive mod place a (?i) at the start of your regular expression” wherever regexes are available or mentioned. There are not so much flavours out there but it would certainly help to know how to accomplish it. What do you think?

So I found this post on market visibility and also this older one about wep app integration. Using the mentioned 6G/7G firewall myself with both Apache and nginx projects I thought about doing the conversion of it for Caddy. While some are trivial to migrate like HTTP verbs blocking others look more difficult. For example I wanted to use the map directive to inspect the {query} variable.
I took me 'bout two hours to find out that I did not map the result to a simple value like 1 but something like 1=longqueryfoundbyregexp. The handle_errors etc. I wrote then never fired.

I’m not sure if it makes sense to rewrite the regexes of the 7G firewall just to work around the ReplaceAllString way. It might result unwanted side effects. Maybe I ask the author up front if he has a test suite to mitigate this.

I am a bit surprised by that result too. Why are we putting the value of {host} into {my_placeholder}? That seems like a bug, isn’t it Francis? (I know I wrote the code, but what was I thinking? Tell me, Francis. :slight_smile: )

Maybe, but I’d rather just link to the re2 syntax.

Anyway, maybe we need to not use ReplaceAllString…

1 Like

I just finished a little bash script that can sed-transform the 7G rules file for Nginx (that already uses a map similar to Caddy). Only two of the regexp patterns throw an error so far which is interesting to watch in regex101.com because those lines might be incorrect anyway.

Running something like curl -I "https://localhost/?path=../../../etc/passwd"currently results in (I prepared a header output) 7g-query: path=../../../15 where 15 should be/could be the final result.

This should be fixed in map: Fix regex mappings · caddyserver/caddy@95c0350 · GitHub – please try it out! :slight_smile:

(Disclaimer: I’m not very proficient with the regexp package, so I may be doing it wrong.)

Thank you! I built the latest caddy with your patch. The initial tests look good, but I have one result where the destination variable is concatenated multiple times with the value. (e.g. expected value “3”, got “333333” such as if 6 matches would result in such a concatenation). This could definitely be a bug on my test setup. I will dive into that soon when I have more time.

1 Like

Great, thanks for trying it out. If you could share your config to reproduce the issue, that’d be helpful as I can add test cases.

Here’s a link to the Repo with examples. I’ll provide an example where I think a problem still exists soon.

Ok. When I run the Caddyfile mentioned in the linked repo and run the following request via curl:

curl -I "https://localhost/?s=0%27+AND+%28SELECT+0+FROM+%28SELECT+count%28%2A%29%2C+CONCAT%28%28SELECT+%40%40version%29%2C+0x23%2C+FLOOR%28RAND%280%29%2A2%29%29+AS+x+FROM+information_schema.columns+GROUP+BY+x%29+y%29+-+-+%27"

(which is just an urlencoded random pseudo SQL blob) the debug header outputs:

7gbadbot: 0
7gbadmethod: 0
7gbadreferer: 0
7gbadrequest: 33333333333333333333333
7gfired: 1
7gquery: 35

meaning: the map for {bad_querystring_7g} and for {bad_request_7g} matched a pattern and both set {7gfired}.

I remove everything from the 7g-caddy.snippet file so that it only contains

map  {uri} {bad_request_7g} {7g_fired} {
	default 0 0
	~(?i)(\^|`|<|>|%|\\|\{|\}|\|) 3 1
}

Re-running the curl above output is then:

7gbadbot:
7gbadmethod:
7gbadreferer:
7gbadrequest: 33333333333333333333333
7gfired: 11111111111111111111111
7gquery:

When i look at the patch I assume the current implementation is “For each match of the regex in the content … Apply the captured submatches to the template and append the output to the result” (to quote the docs).

I expected the result to be

...
7gbadrequest: 3
7gfired: 1
...
1 Like

Ok, thanks for the specific test case; I’ve reproduced it in the unit tests and will push a fix when I figure it out.

@bernd Okay, I think I have a better fix pushed here: map: Fix 95c03506 (avoid repeated expansions) · caddyserver/caddy@a2119c0 · GitHub

If you could try it out that would be great. I added your test case to the corpus and it passes now.

Hey Matt, looks good. No repetitions anymore!

Hah, and I also noticed a silly conceptual bug in my example file: “do not use the same destination variable in multiple maps in combination with default values”: in one map a matched pattern sets the variable to 1 while in a following map the same variable is (re-)set to the given default 0 because no pattern matched :roll_eyes: :laughing:

1 Like

Great, thanks for testing that. Guess we’ll see if I caused regressions for anyone else then.