Add reverse_proxy in existing Caddyfile with auth restrictions

1. Caddy version (caddy version):

v2.2.1 h1:Q62GWHMtztnvyRU+KPOpw6fNfeCD3SkwH7SfT1Tgt2c=

2. How I run Caddy:

I have a Caddy service (automatically created when installing Caddy on Debian Buster) and I reload my Caddyfile configuration using systemctl sudo restart caddy.

a. System environment:

Debian Buster, php7.3-fpm.

b. Command:

systemctl sudo restart caddy

c. Service/unit/compose file:

[Unit]
Description=Caddy
Documentation=https://caddyserver.com/docs/
After=network.target network-online.target
Requires=network-online.target

[Service]
User=caddy
Group=caddy
ExecStart=/usr/bin/caddy run --environ --config /etc/caddy/Caddyfile
ExecReload=/usr/bin/caddy reload --config /etc/caddy/Caddyfile
TimeoutStopSec=5s
LimitNOFILE=1048576
LimitNPROC=512
PrivateTmp=true
ProtectSystem=full
AmbientCapabilities=CAP_NET_BIND_SERVICE

[Install]
WantedBy=multi-user.target

d. My complete Caddyfile or JSON config:

I am sorry I had to redact the domain in my Caddyfile because this website is not into production yet and I don’t want it to be indexed until it’s fully ready, as it involves people who haven’t heard about the project yet.

domain.tld, www.domain.tld {
    tls mat@email.com
    
    root * /var/www/domain.tld/wordpress

    # Prevent malicious PHP uploads from running
    @uploads path_regexp path /uploads\/(.*)\.php
    rewrite @uploads /

    encode gzip

        # Restricted access to /work/ and /misc/
        # (except for direct url to files in /misc/)
    handle /work/_h5ai/private/* {
        respond 404
    }
    route /work/* {   
        basicauth {
                user <<hashpass>>
        }
        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /work/_h5ai/public/index.php
    }
    handle /misc/_h5ai/private/* {
        respond 404
    }

    route /misc/* {
                @fileNotExists not {
                        not path */
                        file
                }
        basicauth @fileNotExists {
                user <<hashpass>>
        }
        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /misc/_h5ai/public/index.php
    }

    php_fastcgi unix//run/php/php7.3-fpm.sock   
    file_server
}

3. The problem I’m having:

So as you can see above, my website has two subfolders with authentication, /work/ and /misc/, the latter being restricted while still giving view permissions to anyone who has the complete URL to a file (but no access to browse with h5ai).

I am running an application that serves something to 127.0.0.0.1:8001, and I would like that to be accessed within the /work/ subfolder, with the same authentication. Ideally I would like that to be accessed by an URL (possibly just the subfolder, I don’t necessarily need a subdomain here) without forcing visitors to provide the port. I think what I would need is reverse_proxy but, as a complete newbie in Caddy, didn’t find where to add that line without breaking the rest of my domains, including this new served page.

I am assuming this is a fairly simple issue that I could solve if I was a little bit more skilled in Caddy, but even after reading the documentation, I am still very confused when it comes to nesting different functions since there are always some that I don’t fully understand.

Thanks in advance for any help!

Please upgrade to Caddy v2.3.0!

Yeah you probably need reverse_proxy, but I’m not sure what paths you’re trying to handle exactly with this new app. Could you be more specific about that?

Done!

The app is datasette, so I don’t really know where are the css/html files being served when I run datasette on a given sqlite database in a given folder. Regardless of where is the database I use with datasette, I think datasette will do its thing to present it with its default interface. I would just need to make Caddy serve the corresponding port to some URL inside domain.tld/work/.

Maybe it would be simpler with a subdomain tied to :8001, but in that case a subfolder in /work/ inheriting the authentication therein would actually better suit my needs.

Don’t worry about the port number and such, that’s not the problem. We need to know which request paths (the path segment of the URLs requested, not the filesystem paths) that should be proxied to datasette. You already have existing handling for /work so we need to know which paths to handle differently.

Oh, sorry I misunderstood. The base folder name within /work/ is not final at this stage, but any placeholder will do, let’s try with /work/data/pdb for now. All subfolders to that path should also be proxied, because datasette generates URLs by appending to the base URL when you browse its interface.

Okay then you might do like:

    route /work/* {   
        basicauth {
                user <<hashpass>>
        }

        reverse_proxy /work/data/pdb* localhost:8001

        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /work/_h5ai/public/index.php
    }

What also matters is whether you need to rewrite the paths before you proxy to it, but that depends how the upstream works. And if so, you can just put another route around the reverse_proxy (within the other one, after basicauth and before your other rewrite) and do a rewrite in that inner route before the proxy.

Thanks!

I tried both ways and am getting the Datasette interface on domain.tld/work/data/pdb, but with 404 error within the UI:

$ datasette /var/www/domain.tld/wordpress/work/datasette/
INFO:     Started server process [16288]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     someip:0 - "GET /work/_h5ai/public/index.php HTTP/1.1" 404 Not Found
INFO:     someip:0 - "GET /work/_h5ai/public/index.php?4e362c HTTP/1.1" 404 Not Found
INFO:     someip:0 - "GET /work/_h5ai/public/index.php HTTP/1.1" 404 Not Found
INFO:     someip:0 - "GET /work/_h5ai/public/index.php?4e362c HTTP/1.1" 404 Not Found

ss-2021-02-19_124806

I do not get this 404 if I access the datasette page directly from the headless server with w3m on 127.0.0.1:8001 (in that case, the database is found and working as expected).

This is with the following Caddyfile section, but I obtained the same results with the block you posted:

    route /work/* {
        basicauth {
                user <<hashpass>>
        }
        route /work/data/pdb*
                reverse_proxy localhost:8001
            }
        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /work/_h5ai/public/index.php
    }

Same issue if I move the reverse_proxy and surrounding route above the first line of the block above (i.e., at the same hierarchical level as route /work/* {):

$ datasette .
INFO:     Started server process [31125]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit)
INFO:     someip:0 - "GET /work/data/pclim HTTP/1.1" 404 Not Found
INFO:     someip:0 - "GET /work/data/pclim HTTP/1.1" 404 Not Found

And same issue as well if I leave this reverse_proxy out of the route /work/* block but serve it to domain.tld/data/pdb instead of doman.tld/work/data/pdb to avoid any potential conflict with /work/* being password protected.

Note that now that I tried the situation in real life, I am thinking I will likely need different authentication credentials for this specific datasette application, and therefore would need another basicauth nested in the /work/* one.

My assumption is datasette is trying to interpret the full URL and get me to that page instead of serving its root there.

Apologies in advance if this is all wrong, Caddy seems a lot simpler to use than Nginx or Apache to me, but as someone who is not into coding (let alone web development) and lack some dev reflexes, it will take some time before I comprehensively understand how to work my way with it.

My bad, reporting the errors here helped me understand what was wrong. Sometimes you really just need to write it down to realize what is the issue.

I just found this which allows telling datasette what is its base URL. It works now, so I just need to find how to use different basicauth credentials for all URLs starting with domain.tld/work/data/pdb!

So far what I tried is:

    # Restricted access to /work/ and /misc/
    # (except for direct url to files in /misc/)
    route /work/* {
        basicauth {
                user1 hashpass1
        }
        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /work/_h5ai/public/index.php
    }

    handle /misc/_h5ai/private/* {
        respond 404
    }
    route /misc/* {
                @fileNotExists not {
                        not path */
                        file
                }
        basicauth @fileNotExists {
                user1 hashpass1
        }
        @no_index not file {path}.html {path} {path}/index.html
        rewrite @no_index /misc/_h5ai/public/index.php
    }

        # Special /work/ subfolders with their own auth credentials
        basicauth /work/data/pdb* {
        user2 hahspass2
        }
        route /work/data/pdb* {
                reverse_proxy localhost:8001
        }

    php_fastcgi unix//run/php/php7.3-fpm.sock
    file_server
}

It does require a different set of credentials for /work/data/pdb, but those who have the user1/hasspass1 credentials (hence browsing permissions for /work/) cannot see the content of /work/data/pdb when they browse from a higher level folder in the h5ai UI. Instead, I would want them to be able to browse everything in all /work/ subfolders, but show the datasette UI only if they access it directly and have the corresponding user2/hashpass2 credentials.

I believe the best way would be to serve datasette to an URL that doesn’t correspond to a real filesystem path, so that there’s no overlap between the URL to reach the datasette UI and the URL to browse the datasette base files, meaning I can set different credentials for each. What do you think?

Thanks again!

1 Like

You should do it more like this:

route /work/data/pdb* {
	basicauth {
		...
	}

	reverse_proxy ...
}

Alternatively you may use handle_path instead of route which will strip the request path before proxying, which may or may not work better for you (and maybe avoid the need for changing the basepath for datasette).

1 Like

Understood. So far route seems to work perfectly fine so I’ll use it for now. Thanks again for your help and your time, it’s greatly appreciated!

1 Like

Now that I’m closer to production, I realize I need something a little more complicated.

I use caddy to serve datasette main page and subpages to domain.tld/project/datasette*. Now I don’t really want visitors to access this page directly, as it is only part of the project. I have a static maps.html generated by a Rmarkdown script, and that page has multiple tabs, the first of which being:

<iframe
    src="../../../project/datasette" 
    frameborder="0"
    style="overflow: hidden; height: 100%; width: 100%;" 
    height="100%" 
    width="100%">
</iframe>

This file is located at domain.tld/work/data/project/maps.html, therefore that iframe src= points to domain.tld/project/datasette served earlier by the caddy rules discussed above. So far, that works as expected.

The reason I placed maps.html in my /work/ folder is I want it to be visible from its real path only to those who have browse permissions in that /work/, i.e., those who have user1/hashpass1 credentials.

However, I also want other visitors who visit domain.tld/project to see it, so I need to set up a redirection. And those visiting the new URL domain.tld/project should be able to view it with the other set of credentials we defined above for domain.tld/project. Is that possible?

I tried this:

        rewrite /project /work/data/project/maps.html
        route /project* {
                basicauth {
                        user2 hashpass2
            }
        reverse_proxy localhost:8000
        }

But of course, since the rewrite points to a subfolder of /work/ and is not enclosed in the basicauth, it requires user1/hashpass1. And even when I sign in using user1/hasspass1, I am still prompted for authentication when trying to view the datasette iframe which requires user2/hashpass2. Can I move this rewrite so that everything accessed from domain.tld/project* obeys the basicauth { user2 hashpass2 } rule? I tried to nest the rewrite rule below but so far didn’t succeed in achieving what I want.