Process check: Spreading the Caddy Word

This thread follows hard on the heels of the site feedback thread Thoughts on improving the market penetration of Caddy

I’ve drafted a process I think will work for increasing Caddy’s footprint in application support documents. This is what I think might work:

  1. Pick a page to transform - Find an application support page to do a Caddy makeover on.
  2. Track the transformation - Create a Caddy forum thread for tracking the conversion, testing, and for community and Caddy core maintainer input.
  3. Wikify the result - Summarise the result in a Caddy wiki article
  4. Lobby for inclusion - Contact the application document team to include a reference to the Caddy wiki article on the original page.

In principle, it seems simple enough, but I’m sure there’s some hidden challenges along the way. So, to understand what these might be, I’ll do a test run first. The page I’ve picked to work on is Giving WordPress Its Own Directory

Links to relevant resources:

  1. Thoughts on improving the market penetration of Caddy
  2. Giving WordPress Its Own Directory
4 Likes

Initial observations on the WordPress article Giving WordPress Its Own Directory:

  1. The article is Apache-centric
  2. There appear to be three separate sections of the article that are candidates for a Caddy transformation.
    2.1 Moving a Root install to its own directory - Method I (Without URL change)
    2.2 Moving a Root install to its own directory - Method II (With URL change)
    2.3 Redirecting a domain to a subfolder - .htaccess modification
    There’s also a fourth section on Moving Specific WordPress Folders, but these appear to be handled through WordPress itself.
  3. Each of the sections to be transformed mentions .htaccess. This file is specific to Apache. There doesn’t appear to be an equivalent in Caddy. Within this file, there are Apache commands such as RewriteCond and RewriteRule. Without exposure to other proxy servers like Apache and NginX, the transformation of the article appears daunting. I’ll need to rely on the community for assistance. It does beg the question though ‘Would an equivalence guide (in the Caddy docs or a wiki article) that show common patterns in other proxy servers and their Caddy equivalent be helpful for Caddy transformations?’. Or, do I avoid getting caught up and bogged down in the detail of trying to convert the .htaccess file and instead focus on the three transformation objectives and try to figure out how to achieve those in Caddy?

The part where .htaccess gets stickest is when there are multiple .htaccess files, such as one in the webroot, one in the first subdirectory, etc…

Luckily all the .htaccess modification happens at the root level. Apache rewriting happens in a series of steps; understanding each step and what it’s trying to achieve makes it quite easy to achieve the same results in Caddyfile config. This means that:

Converting the .htaccess file and figuring out how to achieve the objectives are, more or less, the same thing.

It might appear so, but it’s pretty straightforward. The step-by-step nature helps. I’ll roll through one by one and we’ll figure out what’s happening as we go:

<IfModule mod_rewrite.c>
RewriteEngine on

#...snip

</IfModule>

This part just tells Apache to enable rewriting, assuming the rewrite module is enabled. Irrelevant to Caddy; rewriting is core functionality.

RewriteCond %{HTTP_HOST} ^(www.)?example.com$

As a precondition for rewrites, the requested host (%{HTTP_HOST}) must be the specified hostname (^(www.)?example.com$, regex for example.com with or without a www subdomain). We don’t need this one at all - since Caddyfile configuration hinges primarily on the site label, when we write our own rewrites, we already know for sure the client is requesting the right host.

RewriteCond %{REQUEST_URI} !^/my_subdir/

As a precondition for rewriting, this first checks that the URI doesn’t begin with /my_subdir/. In Caddy, we do that with a matcher, like not path /my_subdir/*.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

These parts check that the requested URI is not a file that exists on disk (!-f, a.k.a. “not -f”) and that it’s also not a directory on disk (!-d, “not -d”).

Once again, Caddy has a matcher to check this: not file. Easy.

RewriteRule ^(.*)$ /my_subdir/$1

Here’s the actual meat of the rewrite. The first part (^(.*)$, regex for “literally anything”) captures the original URI. The second part (/my_subdir/$1) takes the captured URI and puts it after the subdir, essentially prefixing it.

Doing this kind of prefixing in Caddy is also ludicously easy; it’s just a rewrite <matcher> /prefix{uri}. We also get to skip using regex entirely, keeping things simpler (and slightly faster!).

RewriteCond %{HTTP_HOST} ^(www.)?example.com$
RewriteRule ^(/)?$ my_subdir/index.php [L] 

Now we’ve got a second rewrite happening that double checks that we’re still on the right host (Caddy still doesn’t need to check this), then rewrites from / to /index.php (we don’t need to do this either, since php_fastcgi actually checks for this automatically!). We can just throw this part out.

So. Combine our matchers and add the rewrite:

@subdir {
  not path /my_subdir/*
  not file
}
rewrite @subdir /my_subdir{uri}

…Looks a fair bit simpler than Apache, doesn’t it? :thinking:

3 Likes

Are you serious?! You make the transformation appear deceptively simple! Can I reproduce it? No, not at this stage. It helps to speak Apache! I’ll take the time to study your post more carefully and then see how I go on my own with parts 2 and 3. I also want to test the transformation out as it helps me to actually see it working. A big THANK YOU for getting me started!

Seriously? Is that a rhetorical question? Who in their right mind would not want to swap this…

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_HOST} ^(www.)?example.com$
RewriteCond %{REQUEST_URI} !^/my_subdir/
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /my_subdir/$1
RewriteCond %{HTTP_HOST} ^(www.)?example.com$
RewriteRule ^(/)?$ my_subdir/index.php [L] 
</IfModule>

…for this…

@subdir {
  not path /my_subdir/*
  not file
}
rewrite @subdir /my_subdir{uri}
2 Likes

Heh, yeah. By comparison, Caddyfile almost looks like plain English.

3 Likes

Something I hadn’t anticipated is that parts of the article are misleading and ambiguous. The article is littered with spelling and grammatical errors as well. You can follow the discussion in the WordPress forum thread Giving WordPress Its Own Directory

I think I understand the not path matcher. A rewrite isn’t done if the URL example.com/my_subdir is specified. I’m not sure I understand the not file matcher though? What sort of URL would cause a problem if that matcher were excluded?

I guess what’s fueling my confusion is I don’t quite understand the difference between a path and a directory?

Finally, this extract from the documentation on the file directive:

Since rewriting based on the existence of a file on disk is so common, there is also a try_files directive which is a shortcut of the file matcher and a rewrite handler.

Could the try_files directive have been used here? I’ve never used this directive before, but I suspect because it looks for the existence, rather than the non-existence of a file, it’s not relevant here.

Essentially, we’re saying “only rewrite if they didn’t request a real file”.

Such as for /foo/bar/image.jpg - if image.jpg exists on disk, and the client made a valid request for it, we don’t want to rewrite away from it! So we ignore the rewrite and simply serve the file as-is.

It’s only if they make a request for something that doesn’t exist that we want to prepend a directory - since in this use case for WordPress we are moving the entire public web root down one directory, it’s very likely the case that they should have requested /my_subdir/foo/bar/image.jpg instead, which is what our rewrite achieves.

2 Likes

Ahh…of course! So if a file or directory exists in the site root path, then don’t prepend. Is there a gotcha here? Does this mean that if a file/directory exists in both places (root path and under the rewritten path), it could cause a problem? I think I’ve answered my own question. The URL needs to be explicit in this case.

1 Like

Theoretically, yes, if there was a clash.

In the linked document (and elsewhere across the internet, this kind of “rewrite-unless-file-exists” pattern isn’t too uncommon), the author explicitly decided to prioritise existing files at the exact URI above a possible existing file at a possible rewritten URI.

2 Likes

Doesn’t it have to be this way? Without these lines…

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

or, in the case of Caddy, not file, files in the root path would become inaccessible, wouldn’t they? Unless, by design, this is a desirable outcome?

Yes, you’re correct about that.

But, think; what’s the goal with relocating WordPress into its own neat little subdirectory, anyway?

What purpose would one possibly have for moving WP down the hierarchy within the webroot… other than the intent to serve other things in the webroot without mixing them up with WP files?

If the goal was simply to only have WordPress and WordPress alone in your webroot - you could pick any arbitrary folder on disk, set the webroot there, and not have to worry about rewriting at all. Things would be much cleaner that way!

You might even pick /var/www/html as your “arbitrary folder on disk”. :thinking:

But, no, I have to assume the point of this is neatly sideloading other files, which that rewrite exemption allows for.


P.S. Other than slipstreaming other assets, this also allows the other behaviour specified in the document - that is, installing separate versions in other subdirectories of the web root.

The “default” becomes whichever one you rewrite to, and you can manually select for specific versions by prepending them yourself. That way, if you navigate to a version of the site that actually exists on disk, you get that version, ELSE you get the “default”.

Without this rewrite exemption, this behaviour would break, since you can specify the subdirectory for a given version but the web server would still prepend it with the “default” subdirectory - resulting in a nonexistent URI.

2 Likes

Maybe the clue for the ‘correct’ approach in any particular context is revealed by examining this section of the article htaccess modification.

For historical reasons, I might take an annual snapshot of the site (and WordPress version at the time?) and keep separate versions under the root folder in subdirectories /2020, /2021, etc. To access the current version, the matcher and rewrite in the Caddyfile might look like this:

@subdir {
  not path /2021/*
  not file
}
rewrite @subdir /2021{uri}

In this situation, I wonder whether not file is appropriate? I feel, in this case, that the other versions shouldn’t be accessible.

The beauty of the Caddy construct is that it provides a really neat way to access earlier versions of the site (and WordPress?). All that’s required is to alter the pointers to the subdirectory and reload Caddy.

EDIT: I just spotted the additions to your post. What I’ve stated may no longer be relevant.

Depends entirely on whether or not the web admin wants them to be accessible!

Like, with not file present, you could manually request /2020/ and get the 2020 index, but if you request / you’ll get 2021’s index.

Ultimately I don’t think people go around manually prefixing possible version numbers to their URIs, so the risk of a user mistakenly finding your other versions are low; unless you need to keep them inaccessible as a security measure, I’d leave not file present for the sheer accessibility of the web admin / maintainer being able to easily see them.

2 Likes

Right, time to test out the theory. I set up a test WordPress site. By default, the installation method I use placed the WordPress files in the webroot. Before moving the files to a subdirectory, I made sure I could access various parts of the site. No problems here.

It’s worth noting what appears in the address bar in the second screenshot.

Next, I move the WordPress files into the subdirectory my_subdir. I confirm that I no longer have access to the site
.

I update the site block in the Caddyfile with the code below and reload Caddy.

@subdir {
  not path /my_subdir/*
  not file
}
rewrite @subdir /my_subdir{uri}

Refreshing the browser, the site bursts into life. However, I now have a problem accessing subpages of the site.

There’s some other issue going on. After further research, I find the issue has to do with permalinks. This extract from the WordPress support article Using Permalinks.

When you create or update a “pretty” permalink structure, WordPress will generate rewrite rules and attempt to insert them into the proper .htaccess file. If it can’t, it will say something like You should update your .htaccess now and print out the rules for you to copy and paste into the file (put them at the end).

You’ll probably need to do this only once, because WordPress does the rewriting internally. If you ever move your WordPress home directory (Site address), you’ll need to repeat this step.

I have a look in the permalinks section of the test site and note the following.

A closer look at the rewrite rules…

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

Some bits I recognise; some bits I don’t.

I also notice there is the (one and only!) WordPress reference to Caddy in the permalinks article. This extract…

Pretty permalinks are available under:

  • Apache web server with the mod_rewrite module
  • Nginx using try-files, e.g. according to this tutorial
  • Hiawatha web server with UrlToolkit support enabled.
  • Lighttpd using a 404 handler or mod_rewrite
  • Caddy using rewrite, e.g. according to this tutorial

Clicking the link to the tutorial takes me to a Caddy V1 article. (Looks like the WordPress permalinks article is the next one I should try to convert). This is the V1 rewrite rule: I don’t really understand what it’s telling me (e.g. what is this {query} bit?). I’m now stuck again. Help!

    # Routing for WordPress
    rewrite /{
        to {path} {path}/ /index.php?{query}
    }

EDIT: I’ve just made the test site https://xxx.udance.com.au accessible online.

1 Like

Essentially that’s the try_files pattern.

The php_fastcgi directive should already be doing that for you though. See the expanded form (the rewrite @indexFiles is the “long form” of the try_files directive… which uses the file matcher with the try_files option)

The {query} part is to preserve the request query (the bit after ? in a URL) after the rewrite, because otherwise it would be lost (i.e. copy it from the original URL into the rewritten URL)

1 Like

@francislavoie Thanks for responding :smiley:

Does this mean I shouldn’t have to take any further action i.e. the permalinks issue should inherently be addressed in Caddy V2? If so, ideas on how to proceed from here e.g. debug?

OK. I’m still getting my head around this. Sometimes {URI} is used as in…

rewrite @subdir /my_subdir{uri}

I’ve recently become comfortable with this after you explained it to me in Www handling - Use the same multiple site definition or use redirection? - #10 by francislavoie
Now you’re indicating sometimes I need to be more specific and use a subset of the URI. I need to think about this a bit more. How do I decide when I should use one or the other?

{uri} is the entire URI (path + query). Use that if you want to prefix it with something. {path} is just the path. {query} is just the query.

The way the fastcgi transport works, Caddy sends the entire original URI as requested by the client (browser) as an environment variable to php-fpm, among many other environment variables.

But index.php is the actual script that runs. That script is the entrypoint to the PHP app. It will do routing based on multiple different factors, some apps do it differently than others. Sometimes the apps want the query to be preserved in the rewritten path, others will just look at the original request URI.

I don’t think {query} will make a difference in this case; I was just explaining what that Caddyfile snippet from v1 did.

Hopefully. I don’t use WordPress so I can’t say for sure. You’ll have to try it to find out. (IMO WordPress is badsadbad, I very much prefer frameworks like Laravel, or just rolling my own from well maintained, modern libraries)

This part:

	# If the requested file does not exist, try index files
	@indexFiles file {
		try_files {path} {path}/index.php index.php
		split_path .php
	}
	rewrite @indexFiles {http.matchers.file.relative}

Might need to be overwritten in this case as we’ve put the main site in an arbitrary subdirectory. e.g.

try_files {path} {path}/index.php /my_subdir/index.php

OK. I’ve learnt something new.

You may very well be right, but if this infografic is to be believed:

  1. WordPress powers 40% of the internet.
  2. Around 64% of CMS sites are WordPress.
  3. Around 28% of WordPress sites run e-commerce.
  4. Around 75% of hacked CMS sites were built on WordPress :flushed:

On that last point, I’d like to add that I’m so glad I have my WP sites sitting behind Caddy!

I think the reason WP is so popular is that it’s accessible to the average person (like myself) who isn’t comfortable rolling their own site.