Random OCSP response errors for random clients

2017-08-05-223912_1920x1200_scrot

This happens to me (and more often to other people) randomly.
In this case it is resolved by a simple caddy restart but it’s often noticed late… .

Caddy is on the latest release so far.
As i am not 100% sure if this could be something config related, i post it here just in case.

Strange, can you post the actual page where this is happening? I’d like to see it for myself.

it is known for happening at https://pod.tchncs.de and https://git.tchncs.de … i am not 100% sure if it happened to someone at https://social.tchncs.de as well… .

Those all seem to be working for me on Firefox. How sporadic is it? What’s your config look like?

Sorry for my late response, it currently happens at https://git.tchncs.de for me.

https://git.tchncs.de {
    log /var/log/caddy/git_access.log
    errors /var/log/caddy/git_errors.log {
        404 /opt/gitlab/embedded/service/gitlab-rails/public/404.html
        422 /opt/gitlab/embedded/service/gitlab-rails/public/422.html
        500 /opt/gitlab/embedded/service/gitlab-rails/public/500.html
        502 /opt/gitlab/embedded/service/gitlab-rails/public/502.html
    }

    proxy / unix:/var/opt/gitlab/gitlab-workhorse/socket {
        fail_timeout 0s

        header_upstream Host {host}
        header_upstream X-Real-IP {remote}
        header_upstream X-Forwarded-Proto {scheme}
        header_upstream X-Forwarded-Ssl on
    }
}

Not sure what your sporadic question means but i called it randomly by reason - i don’t see how to reproduce it and i cannot really predict when it happens…i guess between one an three days?

I see it too. Is there anything in the logs (process log, enabled with -log)? OCSP staples are checked hourly for updates, and they are updated if they are about halfway through their validity period. For Let’s Encrypt certs, I believe they are valid for about 7 days, so about 3 and a half days through, OCSP is updated. How long do these errors last? Log output would be useful, for sure, also, can you please look at $HOME/.caddy/ocsp and attach the latest relevant staple file(s)?

Hmm the thing is i have some partypeople who like to access the sites so the last times i restarted caddy to make it working again and didn’t wait 'til it may be working again… .
This time actually a browser refresh worked for me even before you confirmed the problem.

There is only one file for this domain: https://assets.illuna.rocks/git.tchncs.de-7aa10025

Thanks for the file. I’ll take a look when I have a chance.

Chrome doesn’t show errors for OCSP stuff (unless Must Staple is enabled, I think) - Firefox always does, though (which is good). This is the first and only report I’ve had of this kind of thing. Refreshing the page in the browser hasn’t helped for me, the error is still there. Try to leave it like that for a few minutes while I look into it.

1 Like

What does your process log have in it from the last ~3-4 days? (everything)

And what’s the full Caddyfile?

Based on following the code path, it probably means the serial number in the OCSP response doesn’t match the serial number on the certificate being presented. Or it could be any of these other conditions it checks. The OCSP response you sent me has this decoding:

Just so to avoid confusion, i sent the requested files via pm to matt. :wink:

It looks like the serial number on the OCSP staple is different than the serial number of the cert being served. One thing I am suspicious of (as a possible bug) is that you have one site configured manually to use a certificate from LE that has all your sites’ names in it, plus some of those sites in your Caddyfile configured with automatic HTTPS, where Caddy manages the certificates. It’s possible this overlap is causing some confusion. Could you verify for me by removing the manually-specified cert from your Caddyfile and letting Caddy manage all the certificates, then restart your server? (Also clear the .caddy/ocsp folder just in case.) If the error goes away for at least 1 week, that probably confirms my suspicions.

Please report back. :slight_smile:

1 Like

I’ve identified a bug in the OCSP maintenance routine, where it does not very well handle overlapping certificates. (It’s a bit of an odd case, granted.)

Still report back. Every piece of info helps.

1 Like

@tchncs I’ve pushed a fix to a branch and am currently using it to run some of my sites. So far so good. Would you please build it on this branch and see how it works for you? https://github.com/mholt/caddy/pull/1821 And then report back, of course. :wink:

1 Like