Unable to replace HTML code using replace-response module (e.g. <title> or <b>)

1. The problem I’m having:

Using caddy with the replace_response module, I want to match and/or replace HTML code (not only text).

Replacing simple text strings works well (therefore I know that the module works) but I cannot get it to match or replace any HTML tags that contain special characters, such as e.g. <title> or <br/>. These are instead replaced as escaped HTML and thus rendered as text.

2. Error messages and/or full log output:

Example from my Caddyfile:

	replace {
		"Hello" "<b>Hello</b>"
	}

will result in &lt;b&gt;Hello&lt;/b&gt; in the page’s source code (inspected with the developer tools / F12 → Edit as HTML) on the website which is then rendered in the browser as

<b>Hello</b> User, welcome to this page!

instead of (desired outcome):

Hello User, welcome to this page!

I have also tried:

  • using different ways to escape including: \ ' " and `
  • the replace stream option (instead of just replace),
  • as well as toggling encode gzip
  • and header_up Accept-Encoding identity

None of them seem to make a difference.

3. Caddy version:

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

4. How I installed and ran Caddy:

Just downloaded exe from here and ran it in elevated cmd

a. System environment:

Windows 10 Pro x64 22H2 Build 19045.4894

b. Command:

caddy run --watch

d. My complete Caddy config:

My caddyfile:

{
	order replace after encode
}

https://localhost {
	#encode gzip
	reverse_proxy https://192.168.1.2 {
	#	header_up Accept-Encoding identity
		transport http {
			tls_insecure_skip_verify
		}		
	}

	replace {
		"Good morning" "Good night"
		"Latest News" "<b>Latest News</b>"
		`<img src="hero.jpg">` `<img src="hero_new.gif">` 
		`My books and more` `My books <br/> and more`
		"<i>Log out</i>" "<sup>Good bye</sup>"
	}
	
}

You can see that I have tried different options to escape the strings.

Only the first replacement (Good morning → Good night) works perfectly, all others fail in the above-described manner (don’t match if containing HTML in the pattern, or HTML is escaped and displayed as text when used in the replacement).

5. Links to relevant resources:

Documentation of the replace_response handler module, sadly does not contain any examples on how to replace strings with HTML tags or code.

I feel like it must be a simple and obvious oversight on my end that is causing this, so I’d be grateful if you could just show me a working example of how to do this right.
Thank you.

Use curl -v to test. The browser might be doing something funky (running some JS code which transforms/sanitizes it?)

The replace-response plugin doesn’t do anything at all regarding HTML sanitizing, so something else must be breaking it.

3 Likes

I think you’ve hit the nail on the head… Thanks for this idea! I did what you said and the result just leaves me more baffled. The page basically only seems to load an index.bundle.js (I guess it uses React Native or something) and there is otherwise basically no content. I’m not a developer so I struggle to understand what exactly is happening here.

While I would accept if that broke the replace-response plugin completely, the weird thing to me now is that basic string replacement still works (see my first post). So what exactly happens here that I can replace text strings but not HTML code?

If caddy somehow “unpacks” the js bundle and is able to replace single words of text on the page, why doesn’t it work with HTML snippets that I can see in the developer tools in the browser?

Any tips on how I could read the “original code” (including all the strings) the way caddy sees it (i.e. after the “de-bundling” but before it gets beautified and rendered in the browser)? So that I could adjust my patterns accordingly?

Here is the complete curl response:

* Connected to my.domain (10.0.0.123) port 443
* schannel: disabled automatic use of client certificate
* ALPN: curl offers http/1.1
* ALPN: server accepted http/1.1
* using HTTP/1.x
> GET / HTTP/1.1
> Host: my.domain
> User-Agent: curl/8.9.1
> Accept: */*
>
* Request completely sent off
* schannel: remote party requests renegotiation
* schannel: renegotiating SSL/TLS connection
* schannel: SSL/TLS connection renegotiated
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Access-Control-Allow-Credentials: true
< Access-Control-Allow-Headers: DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range
< Access-Control-Allow-Methods: GET, POST, OPTIONS
< Alt-Svc: h3=":443"; ma=2592000
< Cache-Control: max-age=14400
< Cache-Control: private
< Content-Length: 1136
< Content-Type: text/html
< Last-Modified: Sat, 28 Sep 2024 02:36:26 GMT
< Server: Caddy
< Server: nginx
< Strict-Transport-Security: max-age=63072000
< Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
< X-Robots-Tag: noindex, nofollow, nosnippet, noarchive
<
<!doctype html><html lang="en"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,shrink-to-fit=no"><link id="manifest-link" rel="manifest" href="manifest.json"/><meta name="mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-capable" content="yes"><meta name="apple-mobile-web-app-status-bar-style" content="black-translucent"><meta name="theme-color" content="#000000"/><link rel="apple-touch-icon" href="img/Icon_1024x1024.png"/><link id="favicon" rel="shortcut icon" type="image/png" href="img/favicon.png"/><title></title><script defer="defer" src="index.bundle.js?38df6d06bf6f2632b4fd"></script><link href="index.css?38df6d06bf6f2632b4fd" rel="stylesheet"></head><body class="app header-fixed sidebar-fixed aside-menu-fixed aside-menu-hidden"><div id="root"></div><script>"serviceWorker"in navigator&&window.addEventListener("load",(()=>{navigator.serviceWorker.register("serviceworker.js").then((e=>console.log("Success: ",e.scope))).catch((e=>console.log("Failure: ",e)))}))</script></body></html>
* Connection #0 to host my.domain left intact

Sorry if some of what I’m saying doesn’t make much sense, I’m still struggling to wrap my head around what is happening here and how to solve it.

Thank you!

Yeah that’s typical. React just gets pointed at an empty <div> (in this case <div id="root"></div>) and produces all the content inside of that. All the HTML is produced via JS code using what’s called JSX.

React protects from user-controlled HTML injection in content being rendered (e.g. data from the server) by replacing any risky characters (e.g. < > etc) with “HTML entities” (e.g. &lt; and &gt;) so the browser doesn’t run untrusted JS code (from <script> tags etc) which some user could have entered in whatever form input.

You’re probably replacing text that appears inside the JS code, or inside of some other API call response from the server. The replace plugin runs on every single response from the server, not only the initial HTML.

There’s no “unpacking” happening, it’s just replacing text as it’s being passed through. The JS bundle is just text too.

There’s just straight up no way to do this because the React code will always transform the unsafe characters when rendering it to the page.

What’s your goal here, exactly?

3 Likes

Well, that must be it then. I take it this is a good “trick” to protect your website from being proxied and modified, if there is really no way around this.

It makes sense to me the way you explained it, I guess. It was only eye-opening to me that now there is much more going on the client-side in the browser (as compared to the server just sending a simple HTML page ready to render) that even a powerful proxy in the middle such as Caddy can not completely influence anymore.

I take it, there is no way for Caddy to “strip out” the specific React code that causes the “unsafe” characters to be rendered as text?

And if I may ask, is that a feature inherent to all React apps or likely only “my” specific app because the developer specifically intended for this to happen?

I realize of course that if that were possible for me to stop this specific part of the code from reaching my browser, it would likely make the whole app unsafe(r) but as I’m only running this locally on my machine, safety would not be much of a concern.

I was trying to set up a dashboard / homepage, pulling together various sources of information from web apps (both self-hosted and external but the app I first experienced this with ran in a docker container) and displaying them according to my needs (removing unnecessary headings, graphics etc. to only display the pertinent information, think a weather station or server uptime monitor).

I thought caddy might be a quick and easy solution to this end when I discovered the replace module, as the app in question provides no API and its layout is pretty useful as-is apart from some page elements I would have liked to modify.

Thanks for your advice and your very helpful and interesting insights!

That’s never the goal. The point of this in React is to prevent untrusted user input from performing Cross-Site Scripting attacks (i.e. enter some HTML+JS code as your username and then any other user who sees you causes their browser to run code and they can steal your information etc). There needs to be protections for that to prevent security vulnerabilities, and the easiest way is just to encode HTML-like text < and > to prevent it from being run as HTML by the browser.

Yes, it’s the default. In React, you need to use dangerouslySetInnerHTML to make React not process the text when rendering it. It has a scary name to make users avoid it and never use it unless they know their content is trusted to not have vulnerable scripting etc.

You could try to monkeypatch the React library itself but… good luck. It would also be super brittle cause if the React library changes it might break your patch. The React code is probably minified (variables are renamed and shrunk to save bandwidth, but it also makes it super hard to read the code).

You could just apply custom CSS to the page to move things around. Don’t try to modify the HTML content. You don’t need Caddy to do that either, you can use browser extensions like Stylus to apply CSS overrides to your site.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.