Http placeholder does not trust proxy

1. Caddy version (caddy version):

v2.4.6 h1:HGkGICFGvyrodcqOOclHKfvJC0qTU7vny/7FhYp9hNw=

2. How I run Caddy:

a. System environment:

Windows 11

b. Command:

caddy.exe run --watch

c. Service/unit/compose file:

N/A

d. My complete Caddyfile or JSON config:

{
	admin off
	auto_https off
}

http:// {
	root www
	file_server
	templates
}

3. The problem I’m having:

$ curl -H "X-Forward-Proto: https" -H "X-Forward-Host: example.com" -s localhost/test.html
http://localhost

test.html content as follows:

{{ placeholder "http.request.scheme" }}://{{ placeholder "http.request.hostport" }}

4. Error messages and/or full log output:

2022/03/31 08:28:17.028 DEBUG   http.handlers.file_server       sanitized path join     {"site_root": "www", "request_path": "/test.html", "result": "www\\test.html"}
2022/03/31 08:28:17.028 DEBUG   http.handlers.file_server       opening file    {"filename": "www\\test.html"}

5. What I already tried:

I am currently use like

{{ $scheme := (default (placeholder "http.request.scheme") (placeholder "http.request.header.X-Forwarded-Proto")) }}
{{ $host := (default (placeholder "http.request.hostport") (placeholder "http.request.header.X-Forwarded-Host")) }}
{{ $scheme }}://{{ $host }}

to manually trust the incoming proxyies.

6. Links to relevant resources:

There is a trusted_proxies config in the reverse_proxy directive, but it’s not applicable to the template rendering.

This is in v2.5.0, but you’re using v2.4.6.

That said, I don’t understand what you’re trying to do. What’s your usecase, exactly? Please explain thoroughly what you’re trying to do and why you need it.

1 Like

Thanks for the quick reply!

I am actually writing a simple template rendered website that need to render the original request scheme and host from user for something like google scholar spider use.

For example, there is an academy used meta tag called <meta name="citation_pdf_url"> and the content value should be the full URL (but not relative or root-relative ones) of the thesis in PDF format. I am trying to use Caddy to embed the scheme and host of user’s actual request but not hard coded in the server config, in order to serve multiple journals in different domains.

I am currently writing like

{{ $scheme := (default (placeholder "http.request.scheme") (placeholder "http.request.header.X-Forwarded-Proto")) }}
{{ $host := (default (placeholder "http.request.hostport") (placeholder "http.request.header.X-Forwarded-Host")) }}
{{ $origin := printf "%v://%v" $scheme $host }}
<meta name="citation_pdf_url" content="{{ $origin }}/thesis-{{ placeholder "http.request.uri.query.paper-id" }}.pdf">

And of course, these sites are hosted behind a reverse proxy, that sends X-Forwarded-* headers to Caddy.

Do you actually need the origin in the URL? Can’t you just do like:

<meta name="citation_pdf_url" content="/thesis-{{ placeholder "http.request.uri.query.paper-id" }}.pdf">

i.e. an absolute path? The browser will automatically reuse the original scheme/host for the target.

Yes we need the absolute url for Google Scholar crawler use, but not only for browser use.

https://scholar.google.com/intl/en/scholar/inclusion.html#indexing

The “<meta>” tags normally apply only to the exact page on which they’re provided. If this page shows only the abstract of the paper and you have the full text in a separate file, e.g., in the PDF format, please specify the locations of all full text versions using citation_pdf_url or DC.identifier tags. The content of the tag is the absolute URL of the PDF file; for security reasons, it must refer to a file in the same subdirectory as the HTML abstract.

<meta name="citation_title" content="The testis isoform of the phosphorylase kinase catalytic subunit (PhK-T) plays a critical role in regulation of glycogen mobilization in developing lung">
<meta name="citation_author" content="Liu, Li">
<meta name="citation_author" content="Rannels, Stephen R.">
<meta name="citation_author" content="Falconieri, Mary">
<meta name="citation_author" content="Phillips, Karen S.">
<meta name="citation_author" content="Wolpert, Ellen B.">
<meta name="citation_author" content="Weaver, Timothy E.">
<meta name="citation_publication_date" content="1996/05/17">
<meta name="citation_journal_title" content="Journal of Biological Chemistry">
<meta name="citation_volume" content="271">
<meta name="citation_issue" content="20">
<meta name="citation_firstpage" content="11761">
<meta name="citation_lastpage" content="11766">
<meta name="citation_pdf_url" content="http://www.example.com/content/271/20/11761.full.pdf">

Alright, well I think the way you’re doing it now is the best way to do it for the time being.

Having configurable “trust” for templates/placeholders is unclear how that would even work. We definitely don’t want to change the existing http.request.scheme placeholders and such, because those should remain pure.

It’s very usecase-specific whether you need to get the value from a header.

FYI, you can use .Req.Header.X-Forwarded-Proto I think instead of placeholder… might not be able to use dashes, might need to use .Req.Header.["X-Forwarded-Proto"] or something, I’m not sure. Modules - Caddy Documentation

This topic was automatically closed after 30 days. New replies are no longer allowed.