V2: log output net does not reconnect after lost connection

1. Caddy version (caddy version):

v2.0.0 h1:pQSaIJGFluFvu8KDGDODV8u4/QRED/OPyIR+MWYYse8=

2. How I run Caddy:

caddy-2.0.0 run

a. System environment:

FreeBSD 12.1
Graylog 3.3.0

Caddy is configured to send JSON logs to a RAW TCP socket in Graylog.

b. Command:


c. Service/unit/compose file:

paste full file contents here

d. My complete Caddyfile or JSON config:

    log {
        output net 172.16.1.7:5555
        format json
    }

3. The problem I’m having:

When I restart Graylog Caddy does not re-connect to Graylog after it is back up and running. After restarting Caddy logging works again.

4. Error messages and/or full log output:

2020-05-26 21:40:49.330365175 +0200 CEST m=+1171.667800761 write error: write tcp 172.16.1.3:23592->172.16.1.7:5555: write: broken pipe
2020-05-26 21:40:49.336574794 +0200 CEST m=+1171.674010355 write error: write tcp 172.16.1.3:23592->172.16.1.7:5555: write: broken pipe
2020-05-26 21:40:49.342712235 +0200 CEST m=+1171.680147806 write error: write tcp 172.16.1.3:23592->172.16.1.7:5555: write: broken pipe
2020-05-26 21:40:50.218493748 +0200 CEST m=+1172.555929329 write error: write tcp 172.16.1.3:23592->172.16.1.7:5555: write: broken pipe

5. What I already tried:

6. Links to relevant resources:

I was wondering how long before someone encountered this – answer: 3 weeks! I was just too lazy at first.

Can you please try Caddy from this branch? (You can build from source or download build artifacts.) logging: Net writer redials if write fails by mholt · Pull Request #3453 · caddyserver/caddy · GitHub - please confirm whether it works for you.

1 Like

Well it looked like parsing the JSON logs in Graylog would be handy. I like to experiment :slight_smile:
Made an adjustment to the JSON-input, restarted it and ran into this issue. I will give the branch a try and report back.

Tried to compile.

# git branch
  master
* netredial
# ./caddy version
(devel)

Issue remains. At one point I forgot to restart the Graylog input. When I tried to start caddy it hung and eventually gave a timeout:

run: loading initial config: loading new config: setting up custom log 'log1': opening log writer using &logging.NetWriter{Address:"172.16.1.7:5555", addr:caddy.NetworkAddress{Network:"tcp", Host:"172.16.1.7", StartPort:0x15b3, EndPort:0x15b3}}: dial tcp 172.16.1.7:5555: connect: operation timed out

Don’t know if UDP would be an option. Is that a possibility?

Sounds like your graylog server isn’t accepting the connection? Please double-check that your setup is right and try again.

Sure, but if it was working before over TCP it should still be working; that is a different problem.

Well there’s another issue now. I restarted Graylog once again making sure the Input was running. Caddy hung (no connections possible). After a while (could be after Graylog was back up) I got a crash and stacktrace:

2020/05/26 23:46:05 http2: panic serving [2001:41d0:8:1fdc::6:1]:28864: runtime error: invalid memory address or nil pointer dereference
goroutine 13041 [running]:
net/http.(*http2serverConn).runHandler.func1(0xc000d90040, 0xc0004e7f8e, 0xc00269ca80)
        /usr/local/go/src/net/http/h2_bundle.go:5713 +0x16b
panic(0x368020, 0x24e77c0)
        /usr/local/go/src/runtime/panic.go:969 +0x166
github.com/caddyserver/caddy/v2/modules/logging.(*redialerConn).Write(0xc0001c4070, 0xc001906a00, 0x633, 0x700, 0x0, 0x0, 0x0)
        /root/caddy/caddy/modules/logging/netwriter.go:120 +0x85
go.uber.org/zap/zapcore.(*ioCore).Write(0xc00071cbd0, 0x0, 0xbfab81674d6390f1, 0x136cbeb2afd, 0x257a5e0, 0xc00014ef60, 0x15, 0x58d762, 0xf, 0x0, ...)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/zapcore/core.go:90 +0x106
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0024973f0, 0xc000728000, 0x5, 0x5)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/zapcore/entry.go:216 +0x117
go.uber.org/zap.(*Logger).Info(0xc0011e0060, 0x58d762, 0xf, 0xc000728000, 0x5, 0x5)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/logger.go:187 +0x7f
github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP.func1(0xc000dc4f20, 0x8a63e0, 0xc00071c450, 0xc0004e7968, 0xc0015a27e0, 0xc000964120, 0xc0004e7d68)
        /root/caddy/caddy/modules/caddyhttp/server.go:182 +0x7ff
github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP(0xc000964120, 0x82a15d260, 0xc00071c450, 0xc0006d9f00)
        /root/caddy/caddy/modules/caddyhttp/server.go:207 +0xe87
net/http.serverHandler.ServeHTTP(0xc0008fb340, 0x88ddc0, 0xc000d90040, 0xc000a07000)
        /usr/local/go/src/net/http/server.go:2807 +0xa3
net/http.initALPNRequest.ServeHTTP(0x893640, 0xc001b438c0, 0xc000212a80, 0xc0008fb340, 0x88ddc0, 0xc000d90040, 0xc000a07000)
        /usr/local/go/src/net/http/server.go:3381 +0x8d
net/http.(*http2serverConn).runHandler(0xc00269ca80, 0xc000d90040, 0xc000a07000, 0xc000a82980)
        /usr/local/go/src/net/http/h2_bundle.go:5720 +0x8b
created by net/http.(*http2serverConn).processHeaders
        /usr/local/go/src/net/http/h2_bundle.go:5454 +0x4e1
^C2020/05/26 21:46:07.534       INFO    shutting down   {"signal": "SIGINT"}
2020/05/26 23:46:07 [DEBUG] Fake-closing underlying packet conn
2020/05/26 23:46:07 [INFO][cache:0xc0008c1310] Stopped certificate maintenance routine
2020/05/26 21:46:07.535 INFO    admin   stopped previous server
2020/05/26 21:46:07.535 INFO    shutdown done   {"signal": "SIGINT"}

There appear to be 2 issues:

  1. when net output fails the webserver hangs
  2. re-connect fails when tcp output is back available (or with netredial I get the stacktrace)

Is UDP already available? It was the 1st thing I looked for when I ran into the issue. It would (probably) fix both issues. Graylog also accepts RAW UDP Inputs.

Interesting… thanks. I pushed another commit – can you try it out again (make sure to build from the latest commit on that branch).

Of course. Caddy accepts any network addresses in these formats: Conventions — Caddy Documentation (unless otherwise specified in particular docs).

Tried the latest commit but the problem persists. Here are my observations:

I configured 1 vhost to use TCP net output to Graylog
I stopped the Graylog TCP input
The TCP net log vhosts is unreachable, other vhosts (using file or UDP) are accessible
After a while I got this stacktrace:

2020/05/27 15:42:57 http2: panic serving [2001:41d0:8:1fdc::6:1]:41336: runtime error: invalid memory address or nil pointer dereference
goroutine 1362 [running]:
net/http.(*http2serverConn).runHandler.func1(0xc000010090, 0xc0001fdf8e, 0xc00055aa80)
        /usr/local/go/src/net/http/h2_bundle.go:5713 +0x16b
panic(0x368020, 0x24e77c0)
        /usr/local/go/src/runtime/panic.go:969 +0x166
github.com/caddyserver/caddy/v2/modules/logging.(*redialerConn).Write(0xc0000f9ab0, 0xc000c4ee00, 0x634, 0x700, 0x0, 0x8700c0, 0xc0001da370)
        /root/caddy/caddy/modules/logging/netwriter.go:139 +0x1aa
go.uber.org/zap/zapcore.(*ioCore).Write(0xc000ee03f0, 0x0, 0xbfabb965a058315e, 0x136134b475, 0x257a5e0, 0xc00092a9e0, 0x15, 0x58d762, 0xf, 0x0, ...)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/zapcore/core.go:90 +0x106
go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000c29ce0, 0xc000c39180, 0x5, 0x5)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/zapcore/entry.go:216 +0x117
go.uber.org/zap.(*Logger).Info(0xc000a22900, 0x58d762, 0xf, 0xc000c39180, 0x5, 0x5)
        /root/go/pkg/mod/go.uber.org/zap@v1.15.0/logger.go:187 +0x7f
github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP.func1(0xc000ecce80, 0x8a63e0, 0xc000ebbdd0, 0xc0001fd968, 0xc0009cb380, 0xc000962240, 0xc0001fdd68)
        /root/caddy/caddy/modules/caddyhttp/server.go:182 +0x7ff
github.com/caddyserver/caddy/v2/modules/caddyhttp.(*Server).ServeHTTP(0xc000962240, 0x829c446f0, 0xc000ebbdd0, 0xc000ebc300)
        /root/caddy/caddy/modules/caddyhttp/server.go:207 +0xe87
net/http.serverHandler.ServeHTTP(0xc00094d340, 0x88ddc0, 0xc000010090, 0xc000c36700)
        /usr/local/go/src/net/http/server.go:2807 +0xa3
net/http.initALPNRequest.ServeHTTP(0x893640, 0xc000995800, 0xc000100e00, 0xc00094d340, 0x88ddc0, 0xc000010090, 0xc000c36700)
        /usr/local/go/src/net/http/server.go:3381 +0x8d
net/http.(*http2serverConn).runHandler(0xc00055aa80, 0xc000010090, 0xc000c36700, 0xc000c43960)
        /usr/local/go/src/net/http/h2_bundle.go:5720 +0x8b
created by net/http.(*http2serverConn).processHeaders
        /usr/local/go/src/net/http/h2_bundle.go:5454 +0x4e1

After restarting the Graylog TCP input the reconnect errors re-occurred but the TCP net vhosts was reachable again:

2020-05-27 15:45:27.484295101 +0200 CEST m=+308.176867028 write error: write tcp 172.16.1.3:41449->172.16.1.7:5555: write: broken pipe
2020-05-27 15:45:28.697393487 +0200 CEST m=+309.389965423 write error: write tcp 172.16.1.3:41449->172.16.1.7:5555: write: broken pipe
2020-05-27 15:45:29.230636837 +0200 CEST m=+309.923208782 write error: write tcp 172.16.1.3:41449->172.16.1.7:5555: write: broken pipe
2020-05-27 15:45:30.676534301 +0200 CEST m=+311.369106259 write error: write tcp 172.16.1.3:41449->172.16.1.7:5555: write: broken pipe
2020-05-27 15:45:31.037263793 +0200 CEST m=+311.729835766 write error: write tcp 172.16.1.3:41449->172.16.1.7:5555: write: broken pipe

After restarting Caddy all is working again.

For now it’s not a big issue; I prefer UDP for logs (also use it for Syslog and GELF) so I fixed it that way.

Thanks for trying it out. I realized I made a mistake with how errors are handled (it’s a bit of an unusual code flow). I know you have it working with UDP and that’s great but now I’ve put this much time into fixing so I’d like your help to verify it and finish this.

I pushed a new commit. Can you try one more time? Let me know if the latest works!

I did not mean it didn’t need a fix but as I have a workaround (which I actually prefer) it is “low priority”.

Latest build:
TCP net log vhosts hangs when the Graylog input is stopped
No stacktrace
Upon restart of the Graylog input the vhosts gets ‘unblocked’ and all events seem to be logged (I have a spike in messages at the time of the restart). Reconnect seems to work!

So the only really ‘issue’ is the vhosts is unavailable when the TCP log is unreachable.

1 Like

Great, thanks for trying it out again!

I have no clue what you mean by this. Can you explain?

That’s probably all the goroutines blocked on waiting for Dial() to complete suddenly unblocking now that they can write again.

Well to the best of my knowledge it looks like the thread is IO-blocked. I ran a couple of tests with telnet and openssl (sorry; I’m old-school) and the socket responds. I’m able to send a GET command but nothing is returned. The moment I restart the TCP input the request gets unblocked and the webpage is served.

Sounds about right – the alternative is to buffer the logs, which could lead to memory depletion.

Okay, glad this is solved for now!

You could indeed hand the logs over to another thread with a (configurable) buffer but question remains what to do when the buffer gets depleted. Drop events? Also IO-block but with some spare time for the input to respond?

Let’s just call it “works as designed”. I guess it’s what to expect when a thread gets IO-blocked. I suspect a filesystem-full on the logs would give the same problem?

Probably, sure. Open to accepting PRs, it’s just not a priority for me right now.

I don’t think so - that error would likely be realized immediately rather than blocking. The file writers and the network writers are very different pieces of code.

This topic was automatically closed after 30 days. New replies are no longer allowed.