Rewrite for Shaarli

I am trying to do a rewrite for an installation of Shaarli. I want to redirect site.com/URL to site.com/?post-URL, the latter being the normal way to add a new url to the Shaarli database (and the former being a novel way to add that url, in the same way as saved.io).

So far I have as my Caddyfile:

root /var/www/Shaarli
rewrite {
  regexp /([^?].*)
  to /?post={1}
}

But it starts recursing… There must be an error in the regexp somehow? It shouldn’t match again after the rewrite.

I believe what’s happening here is that the regexp is only matched against the path portion of the URL, which does not include the query string. Even after you change the URL to /?post={1}, the path itself is just /, which still matches the regexp.

You’ll simply need to check that the path contains something more than /. Try something like this:

rewrite {
  regexp ^/.
  to /?post={path_escaped}
}

Maybe there’s a better way to check that the request path has more than just a slash, but I’m not sure.

With the release of 0.10.4, you should be able to use the conditional not_starts_with - which will be more efficient, both on server computation (vs a regex match) and trying to troubleshoot this issue. Try this on for size:

rewrite {
  if {path} not_starts_with /?
  if {path} not /
  to /index.php?post={uri_escaped}
}

I have made three other changes:

  • Firstly, I use {uri_escaped} instead of {1} from a regex match - using an unescaped URI might result in poorly formatted URLs;
  • Secondly, I add in a check to ensure we’re not redirecting from / to /?post= unnecessarily;
  • Thirdly, I redirect to /index.php?, to avoid ambiguity (this is what to /? does, but it leaves this part out).

As for the recursion, well… rewrite does not recurse at all. My hunch is that the rewritten path was not formatted well, and the index.php was issuing a redirect to /, which your previous rewrite was redirecting to /?post=, which PHP would then issue a redirect for, ad nauseum.

I would also strongly consider adding {path} {path}/ as rewrite points prior to the index.php because without them, this will rewrite requests for assets like /favicon.ico to /index.php?post=/favicon.ico. This could break all your static assets otherwise. I am not familiar with Shaarli, though.

2 Likes

I now have:

root /var/www/Shaarli
  rewrite {
    if {path} not_starts_with /?
    if {path} not /
    to {path} {path}/ /index.php?post={uri_escaped}
  }

But it still doesn’t work. This is the result when called like: site.com/nu.nl: GET /?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/nu.nl/?post=/nu.nl (What happened to index.php?? I guess Shaarli takes it out?)

Another issue is that I will need the uri without the leading / (otherwise the field gets filled in like http:///nu.nl)

At least I understand now the ‘recursing’ happens because of Shaarli redirecting/re-calling and not because of Caddy…

I think I have to conclude this won’t work without a rewrite of Shaarli. I was hoping to find a self-hostable alternative to saved.io

It might be interesting to see the logs if you add the line log / access.log "{common} - {rewrite_uri}" to your Caddyfile and curl your Caddy with a troublesome URL.

So I have

root /var/www/Shaarli
# log stdout
# errors stderr
log / /var/log/access.log "{common} - {rewrite_uri}"
rewrite {
  if {path} not_starts_with /?
  if {path} not /
  to {path} {path}/ /?post={uri_escaped}
}
fastcgi / /var/run/php/php7.0-fpm.sock php

Output for site.com/nu.nl:

49.49.148.65 - - [04/Jul/2017:16:55:35 +0700] "GET /?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/nu.nl/?post=/nu.nl HTTP/2.0" 200 4355 - /?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/nu.nl/?post=/nu.nl

According to that log entry, the GET request - unmodified, directly from the client - was for:

/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/?post=/nu.nl/?post=/nu.nl

Which is a little bit ludicrious and somewhat destined to fail from the outset. Although I note that the rewrite I gave you worked as designed in this case - it did NOT prepend an additional /?post= to the request, it honoured the original one received from the client (as you can see the second URI is identical to the first in that log entry).

Did you use curl, or did you try to use the same browser that you were being repeatedly redirected in earlier?

Hmm, you’re right, I should be careful in using browsers (Firefox in this case) for testing. curl https://site.com/nu.nl gave: 144.172.71.118 - - [04/Jul/2017:17:28:18 +0700] "GET /nu.nl HTTP/1.1" 302 0 - /?post=%2Fnu.nl

Still hard to get rid of that initial / of path/uri… Using an ‘innocent’ browser, that was the only issue left. It would be nice to be able to write {uri:1}…!

The weird result is that all links on Shaarli are now site.com/nu.nl?... instead of site.com/?...

Yeah, I think the only way to meet that requirement is to go back to regex to exclude the leading slash.

I’d try something like this maybe?

rewrite {
  r /([^?].*)
  to {path} {path}/ /?post={1}
}

Then curl -i it, have a look see what you get back in response, and check the access log. The output from curl -i might be interesting, if it gets a redirect from Shaarli, who knows if it’ll tell us anything useful.

As an aside, I have an urge to change the regex to ^\/([^?].*)$, but meddling with regex while troubleshooting is a recipe for more trouble to shoot…

Now:

root /var/www/Shaarli
log / /var/log/access.log "{common} - {rewrite_uri}"
rewrite {
    regexp /([^?].*)
    to {path} {path}/ /?post={1}
  }
  fastcgi / /var/run/php/php7.0-fpm.sock php
}

Response on curl:
144.172.71.118 - - [04/Jul/2017:17:51:24 +0700] "GET /nu.nl HTTP/1.1" 302 0 - /?post=nu.nl

Looks good, except the problem of some the links on the interface get the searched-for url as the path (and then they don’t function)… I wonder if this is hitting some sort of bug in Shaarli.

Can’t imagine why it would do that… I’m not sure I’ll be any help, lacking more in-depth knowledge of how Shaarli works internally.

Thanks a lot Matthew, you surely helped me understand some aspects of using Caddy better!
I think I need to give up trying to make Shaarli behave like saved.io.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.