Python urllib.request causes Caddy API to lock up

yapishu · September 4, 2022, 5:24pm

1. Output of `caddy version`:

v2.5.2 h1:eCJdLyEyAGzuQTa5Mh3gETnYWDClo1LjtQm2q9RNZrs=

2. How I run Caddy:

I run Caddy in a slightly modified version of the official Docker image, booted with the command and config file below.

a. System environment:

Docker/Compose on Ubuntu

b. Command:

from my Dockerfile:

caddy run --config /etc/caddy/default_config.json --resume

c. Service/unit/compose file:

FROM caddy:2-alpine
WORKDIR /
RUN ip route add 10.13.13.0/24 via 172.20.0.2
COPY ./default_config.json /etc/caddy/
RUN echo "#!/bin/ash" > /init
RUN echo "ip route add 10.13.13.0/24 via 172.20.0.2" >> /init
RUN echo "caddy run --config /etc/caddy/default_config.json --resume" >> /init
RUN chmod +x /init
EXPOSE 80
EXPOSE 443
CMD ["ash", "/init"]

---
version: "3.3"
services:
  wireguard:
    build: 
      context: wg
      dockerfile: Dockerfile
    container_name: wg
    cap_add:
      - NET_ADMIN
      - SYS_MODULE
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/London
      - SERVERURL=${HOSTNAME}.nativeplanet.live
      - SERVERPORT=51820
      - PEERS=2
      - PEERDNS=1.1.1.1
      - INTERNAL_SUBNET=10.13.13.0
      - ALLOWEDIPS=0.0.0.0/0
      - LOG_CONFS=true
    volumes:
      - ${PWD}/config:/config
      - /lib/modules:/lib/modules
    ports:
      - 51820:51820/udp
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
    networks:
      wgnet:
        ipv4_address: 172.20.0.2
    restart: unless-stopped
  caddy:
    build:
      context: caddy
      dockerfile: Dockerfile
    cap_add:
      - NET_ADMIN
    container_name: caddy
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./caddy/data:/data
      - ./caddy/config:/config/caddy
      - ./caddy/www:/www
    networks:
      wgnet:
        ipv4_address: 172.20.0.4
    restart: unless-stopped
  api:
    build: 
      context: api
      dockerfile: Dockerfile
    depends_on:
      - wireguard
      - caddy
    container_name: api
    volumes:
      - ${PWD}/config:/etc/wireguard/
    networks:
      wgnet:
        ipv4_address: 172.20.0.3
    restart: unless-stopped

networks:
  wgnet:
    driver: bridge
    ipam:
      driver: default
      config:
        - subnet: 172.20.0.0/24
          gateway: 172.20.0.1

d. My complete Caddy config:

{
  "admin": {
    "listen": "0.0.0.0:2019",
    "origins": ["172.20.0.3","127.0.0.1"],
    "enforce_origin": false
  },
    "apps": {
      "http": {
        "grace_period":2000000000,
        "servers": {
          "srv0": {
            "listen": [":443",":80"],
            "routes": [
            ],
            "automatic_https": {
              "disable": false,
              "disable_redirects": false
            }
          }
        }
      }
    }
}

3. The problem I’m having:

I’m running a stack of Wireguard + Caddy + a Python script that controls the Caddy API. Caddy reverse proxies connections through the WG tunnel, and Python registers subdomains and assigns them upstream addresses on different routes. Each has its own container, and Python uses urllib.request to communicate with Caddy.

It works for a while, but eventually Caddy stops responding and prints endless error messages about TCP i/o timeouts. I think maybe what is happening is that too many connections are left opened by Python? I’m not sure how to troubleshoot this or figure out what is going wrong though, I’m a bit of a novice to this. Any help is appreciated.

4. Error messages and/or full log output:

After a certain point, I get many hundreds of lines of the following in the Caddy logs:

{"level":"info","ts":1662308551.5569315,"msg":"http: Accept error: accept tcp [::]:2019: i/o timeout; retrying in 1s"}
{"level":"info","ts":1662308552.557097,"msg":"http: Accept error: accept tcp [::]:2019: i/o timeout; retrying in 1s"}
{"level":"info","ts":1662308553.55731,"msg":"http: Accept error: accept tcp [::]:2019: i/o timeout; retrying in 1s"}

Once this begins showing up in the logs, Caddy’s API becomes totally unresponsive

5. What I already tried:

I’ve tried adding a 2 second timeout on the Python requests side, and I’ve tried adding a grace_period param to my Caddy config. I’m not sure what is causing this so I’m not sure what to try.

6. Links to relevant resources:

matt · September 4, 2022, 5:28pm

What are your full logs?

Can you please post those, since just an error message is not helpful, we need to know what led up to it. Please post the full log output as the help template asks.

Also what happens if you try the latest commits to master?

yapishu · September 4, 2022, 5:37pm

Here are my unabridged logs; you can see where I restarted the container, when you stop seeing the giant wall of errors:

https://ams3.digitaloceanspaces.com/urbits3/2022.9.04..17.34.43-output.txt

Also what happens if you try the latest commits to master?

How would I do this with the docker container? I’m just using latest, I believe; I’m also not sure how to actually trigger the error

francislavoie · September 4, 2022, 9:52pm

This isn’t a good idea – Caddy will no longer receive signals if you run it like this. This means that if you docker-compose stop, Caddy won’t hear about it and will never gracefully shut down, and will instead just be killed when docker times out (gives up waiting).

You should probably use exec in front to have the command replace the shell script’s process so that it does receive signals.

Did you omit the routes from your config? That might be relevant. If your Python script is triggering config updates from requests that proxy through Caddy, then there might be a deadlock because Caddy will wait until pending HTTP requests are done before it can shut down the old config, but if your script is waiting for the admin reload to complete before responding to the HTTP request, it never ends. It’s kinda like an ouroboros, infinite circle, causing a deadlock.

If it’s not that, then you might be running into this issue Admin Endpoint constantly crashing · Issue #4942 · caddyserver/caddy · GitHub which should be resolved in v2.6 (or you can build from master and try it out now).

yapishu · September 4, 2022, 10:18pm

Sorry, I shared my default config but my Python script adds routes for various subdomains on the fly. Here’s what it looks like now (not crashing, but the config ought to be more or less the same when it does, all of the routes besides one are basically the same):

https://ams3.digitaloceanspaces.com/urbits3/2022.9.04..22.12.00-con2.json

Apologies if linking a text file is against your preferred style but I thought it may be excessively long, I’ve formatted it with jq.

Thanks for the tip about Docker, I’ll go ahead and fix that.

If your Python script is triggering config updates from requests that proxy through Caddy, then there might be a deadlock because Caddy will wait until pending HTTP requests are done before it can shut down the old config, but if your script is waiting for the admin reload to complete before responding to the HTTP request, it never ends. It’s kinda like an ouroboros, infinite circle, causing a deadlock.

Is this what the grace_period param is meant to address? Or, is there a way I could test to see if this is occurring?

I’ll see if I can figure out a way to shimmy in a newer build of Caddy. I really appreciate your help.

matt · September 5, 2022, 10:05pm

Thanks for the full logs; as I suspected this comes right after a config reload, which should be fixed in v2.6. The beta is being released right now.

yapishu · September 8, 2022, 2:41am

Much appreciated! I squeezed a build layer into my Dockerfile after spending a bit trying to figure out why I couldn’t execute the binary I compiled on my machine inside the container (something about dynamic linking on Alpine). For others’ reference, It ended up looking basically like this:

# Build Caddy from master until 2.6 release
FROM golang:1.19.1-alpine3.16 as builder
WORKDIR /
RUN apk add --no-cache git
RUN git clone https://github.com/caddyserver/caddy.git
WORKDIR /caddy/cmd/caddy/
RUN go build

FROM caddy:2-alpine
WORKDIR /
RUN rm /usr/bin/caddy
COPY --from=builder /caddy/cmd/caddy/caddy /usr/bin/caddy
RUN chmod +x /usr/bin/caddy
RUN apk add --no-cache libcap
RUN setcap 'cap_net_bind_service=+ep' /usr/bin/caddy
COPY ./default_config.json /etc/caddy/
RUN echo "#!/bin/ash" > /init
RUN echo "exec caddy run --config /etc/caddy/default_config.json --resume" >> /init
RUN chmod +x /init
EXPOSE 80
EXPOSE 443
CMD ["ash", "/init"]

Hopefully this solves it, I’ll let you know if the issue persists. Thanks for all the help to both of you.

francislavoie · September 8, 2022, 4:26am

You can significantly simplify the build by using the builder image variant we provide. See the docs on Docker Hub. You can run xcaddy build master (or better, target a specific commit instead of master).

system · October 4, 2022, 5:25pm

This topic was automatically closed after 30 days. New replies are no longer allowed.