[Solved] Add a <base> tag when reverse proxying with a subirectory

dkebler · November 2, 2018, 7:00am

I am attempting to reverse proxy a url subdirectory. The page begins to loads but of course none of the resources it calls can be found cause they all refer to the root. Since I have run into this before the fix is to add a <base> tag to every page <head> in the same way Hugo can do that when generating a site served from a subdirectory. My issue is I can’t grok the proper syntax for the http.filter to select every page and add a url to the section. I did this manually in the page inspector just to make sure and yes if all page resource requests get the extra /router they are found.

https://status.foo.net/router {
              import wildcard_cert
              filter rule {
                 path .*\.html
                 search_pattern "<head>"
                 replacement "<head><base href='/router'>"
              }
              proxy / http://router.foo.net:19999
              }

for example in the source of the main page is
<script type="text/javascript" src="dashboard.js?v20180922-1"></script>
which can’t be found but if it was
<script type="text/javascript" src="router/dashboard.js?v20180922-1"></script>
well it would be. That was confirmed in the network tab of the page inspector.

Whitestrake · November 2, 2018, 7:46am

Hi @dkebler,

You could try using content_type text/html instead of path to rule out cases where the URI sent from the client doesn’t end in the .html you’re looking for. This will ensure your replacement is injected for all HTML content.

The search pattern and replacement look good to me, though.

dkebler · November 2, 2018, 5:23pm

Thought I tried that before but that worked. Thx @Whitestrake. Also needed to fiddle with my base url. Must be absolute and must end with /

Here is what worked.

https://status.foo.net/router {
              import wildcard_cert
              filter rule {
                 content_type text/html.*
                 search_pattern <head>
                 replacement "<head><base href='https://status.foo.net/router/'>"
              }
              proxy / http://router.foo.net:19999
              }

Hope others find this post cause it is really useful for leveraging a single sub/domain pointing to lots of supporting servers. Now I can have for instance status.foo.net/router and status.foo.net/nas etc for a status page for each device served from that device on my network and yet other requests will go to my main status page which can of course have links to these…nice.

dkebler · November 3, 2018, 3:51am

This is even better. You can import this “baseurl” filter rule into any of your blocks

# inject in all pages a <base href='url'>
(baseurl) {
    filter rule {
       content_type text/html.*
       search_pattern <head>
       replacement "<head><base href='https://{request_host}/router/'>"
    }
}

then

https://status.foo.net/router {
              import wildcard_cert
              import baseurl
              proxy / http://router.foo.net:19999
              }

@Whitestrake, I would make this even more groovy but the {request_path} doesn’t return anything neither the {request_url} only the {request_host}

dkebler · January 4, 2019, 12:15am

I have another site I’m trying to proxy but it looks like all the urls on the page have a leading / which redirects the page to root domain regardless of the base tag.

I think if I filter the page(s) for those and get rid of them it just might work only I am pretty crummy at regex and can’t get come up with a clever regex that would look for “/" and replace with "”

Little help please.

here is proxy from caddyfile with filter(s)

  import wildcard_cert
  import US-only
  # add base tag to all pages
  # add base tag to all pages
  filter rule {
   content_type text/html.*
   search_pattern ="/
   replacement test
   }
  filter rule {
    content_type text/html.*
    search_pattern <head>
    replacement "<head><base href='https://blah.net/power/'>"
   }
  proxy / http://192.168.0.10:8081
            }

Here is the source of one of the pages where you can see the base tag got put in but the leading /'s on the various page/script loads still have a leading /

<html>
<head><base href='https://blah.net/power'>
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
 
<title>Power Controller </title>
 
<script language="javascript" src="/md5.js"></script>
<script language="javascript">
<!--
function calcResponse(){
var str;
str=document.login.Challenge.value+document.login.Username.value+document.login.Password.value+document.login.Challenge.value;
document.secin.Password.value = hex_md5(str);
document.secin.Username.value = document.login.Username.value;
document.secin.submit();
}//-->
</script>
</head>
<body>
<noscript>
<table width="100%" border=0>
<tr><td bgcolor=red>&nbsp;</td></tr>
<tr><td align=center><h1>Warning: Insecure Authentication</h1></td></tr>
<tr><td bgcolor=red>&nbsp;</td></tr></table>
</noscript>
<FORM NAME="login" ID="login" ACTION="/login.tgi" METHOD=post>
<TABLE BORDER="0">
<TR> 
<TD>User Name</TD>
<TD><INPUT TYPE="text" NAME="Username" VALUE="" SIZE=16 MAXLENGTH=32></TD>
</TR>
<TR> 
<TD>Password</TD>
<TD><INPUT TYPE="password" NAME="Password" SIZE=16 MAXLENGTH=32></TD>
</TR>
<TR ALIGN=RIGHT>
<TD></TD>
<TD><INPUT onClick="calcResponse(); return false;" TYPE="Submit" NAME="Submitbtn" VALUE="OK">
 
<input type="hidden" name="Challenge" value="BeCe76Z6xkJ+mYJ">
 
</TD></TR>
</TABLE>
</FORM>
<script language="javascript">
<!--
document.login.Username.focus();	
//-->
</script>
<FORM NAME="secin" ID="secin" ACTION="/login.tgi" METHOD=post>
<INPUT TYPE="hidden" NAME="Username">
<INPUT TYPE="hidden" NAME="Password">
</FORM>
</body>
</html>

dkebler · January 4, 2019, 12:27am

part of my problem is I don’t even know which flavor of regex is being used the docs don’t say. escaping " that’s another issue here I think. Does this regex start with / or backtick ??? Caddy is in golang right? so backtick `

I tried just keying on the action property to be more specific but can’t get that to work either

              search_pattern `action=\"\/
                replacement 'action="'

dkebler · January 4, 2019, 12:52am

ok this worked, simple

   search_pattern ="/
   replacement ="

but sadly it only did the filter for first page. All subsequent pages didn’t get the fiters. Thought it should do that for all. Maybe filter only works on an SPA? and not simple old style web with separate hard coded pages?

Whitestrake · January 4, 2019, 10:14am

I don’t know if Caddy can filter a single-page application. They usually drop their payload of JavaScript and async in (or generate) new HTML as you navigate the app.

It should absolutely work for an old style webpage, where the HTML document is transmitted directly by Caddy.

dkebler · January 4, 2019, 5:01pm

Well looks like it does a redirect to index.htm after the login page in some javascript which I can’t dynamically filter/edit?? and thus ignores the filters and attempts to load in the root which of course fails. But if I manually put back the /power/ in the address then filters are done on index.htm and site comes up and navigates fine.

So looks like something caddy can’t fix with filter or other plugin?

Whitestrake · January 7, 2019, 12:21am

You could try changing the filter to match the JavaScript content type, you might be able to find out where in the JS it’s generating that redirect and edit it. The plugin documentation seems to exclusively mention static HTML editing, but JavaScript is transmitted as text, too (text/javascript) so it should work.

system · April 7, 2019, 12:21am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.