Help! Docker caddy suddenly no longer can find DNS records?

rinkrealclose · July 24, 2024, 8:33pm

1. The problem I’m having:

Nextcloud had been connecting on my host/remote machines with tailscale, but suddenly it’s like it can’t even find the website on the host machine. Jellyfin never worked simultaneously with nextcloud, but before I combined them to run both. I only get errors from Caddy when I try to access from another device that has tailscale installed. I have default acls, override local dns is on for my exit node vpn, and magicDNS is on. Where have I gone wrong? Thank you for your help!

2. Error messages and/or full log output:

    
INF ts=1721851386.2966852 logger=tls.obtain msg=obtaining certificate identifier=jellyfin.wallaby-gopher.ts.net

INF ts=1721851386.297025 logger=tls msg=using ACME account account_id=https://acme-staging-v02.api.letsencrypt.org/acme/acct/156809383 account_contact=[]

INF ts=1721851386.9860916 logger=tls.acme_client msg=trying to solve challenge identifier=jellyfin.wallaby-gopher.ts.net challenge_type=tls-alpn-01 ca=https://acme-staging-v02.api.letsencrypt.org/directory

ERR ts=1721851387.3541026 logger=tls.acme_client msg=challenge failed identifier=jellyfin.wallaby-gopher.ts.net challenge_type=tls-alpn-01 problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]}

ERR ts=1721851387.3541567 logger=tls.acme_client msg=validating authorization identifier=jellyfin.wallaby-gopher.ts.net problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]} order=https://acme-staging-v02.api.letsencrypt.org/acme/order/156809383/17990284953 attempt=1 max_attempts=3

INF ts=1721851388.5035493 logger=tls.acme_client msg=trying to solve challenge identifier=jellyfin.wallaby-gopher.ts.net challenge_type=http-01 ca=https://acme-staging-v02.api.letsencrypt.org/directory

ERR ts=1721851388.8611884 logger=tls.acme_client msg=challenge failed identifier=jellyfin.wallaby-gopher.ts.net challenge_type=http-01 problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]}

ERR ts=1721851388.8612058 logger=tls.acme_client msg=validating authorization identifier=jellyfin.wallaby-gopher.ts.net problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]} order=https://acme-staging-v02.api.letsencrypt.org/acme/order/156809383/17990285353 attempt=2 max_attempts=3

ERR ts=1721851388.8612182 logger=tls.obtain msg=could not get certificate from issuer identifier=jellyfin.wallaby-gopher.ts.net issuer=acme-v02.api.letsencrypt.org-directory error=HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain

ERR ts=1721851388.861246 logger=tls.obtain msg=will retry error=[jellyfin.wallaby-gopher.ts.net] Obtain: [jellyfin.wallaby-gopher.ts.net] solving challenge: jellyfin.wallaby-gopher.ts.net: [jellyfin.wallaby-gopher.ts.net] authorization failed: HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain (ca=https://acme-staging-v02.api.letsencrypt.org/directory) attempt=2 retrying_in=120 elapsed=65.19797253 max_duration=2592000

ERR ts=1721851499.0825777 logger=http.log.error msg=dial tcp: lookup nextcloud on 192.168.0.1:53: no such host request={"remote_ip":"127.0.0.1","remote_port":"33674","client_ip":"127.0.0.1","proto":"HTTP/2.0","method":"GET","host":"nextcloud.wallaby-gopher.ts.net","uri":"/","headers":{"Te":["trailers"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"],"Accept-Language":["en-US"],"Cookie":["REDACTED"],"Sec-Fetch-Mode":["navigate"],"Sec-Fetch-Site":["cross-site"],"Priority":["u=0, i"],"User-Agent":["Mozilla/5.0 (Android 14; Mobile; rv:128.0) Gecko/128.0 Firefox/128.0"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-Dest":["document"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"nextcloud.wallaby-gopher.ts.net"}} duration=0.109954608 status=502 err_id=yu16iheut err_trace=reverseproxy.statusError (reverseproxy.go:1269)
ERR ts=1721851388.8612182 logger=tls.obtain msg=could not get certificate from issuer identifier=jellyfin.wallaby-gopher.ts.net issuer=acme-v02.api.letsencrypt.org-directory error=HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain

ERR ts=1721851388.861246 logger=tls.obtain msg=will retry error=[jellyfin.wallaby-gopher.ts.net] Obtain: [jellyfin.wallaby-gopher.ts.net] solving challenge: jellyfin.wallaby-gopher.ts.net: [jellyfin.wallaby-gopher.ts.net] authorization failed: HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain (ca=https://acme-staging-v02.api.letsencrypt.org/directory) attempt=2 retrying_in=120 elapsed=65.19797253 max_duration=2592000

ERR ts=1721851499.0825777 logger=http.log.error msg=dial tcp: lookup nextcloud on 192.168.0.1:53: no such host request={"remote_ip":"127.0.0.1","remote_port":"33674","client_ip":"127.0.0.1","proto":"HTTP/2.0","method":"GET","host":"nextcloud.wallaby-gopher.ts.net","uri":"/","headers":{"Te":["trailers"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"],"Accept-Language":["en-US"],"Cookie":["REDACTED"],"Sec-Fetch-Mode":["navigate"],"Sec-Fetch-Site":["cross-site"],"Priority":["u=0, i"],"User-Agent":["Mozilla/5.0 (Android 14; Mobile; rv:128.0) Gecko/128.0 Firefox/128.0"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-Dest":["document"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"nextcloud.wallaby-gopher.ts.net"}} duration=0.109954608 status=502 err_id=yu16iheut err_trace=reverseproxy.statusError (reverseproxy.go:1269)

I get some errors in nextcloud-aio-mastercontainer as well, and the apache server is unhealthy

2024-07-24T13:17:06.786069461Z 🛈 Configured WOPI URL: 
2024-07-24T13:17:06.786075390Z 🛈 Configured public WOPI URL: 
2024-07-24T13:17:06.786081025Z 🛈 Configured callback URL: 
2024-07-24T13:17:06.786086814Z 
2024-07-24T13:17:06.852006229Z Failed to fetch discovery endpoint from 
2024-07-24T13:17:06.852027030Z cURL error 6: Could not resolve host: nextcloud.wallaby-gopher.ts.net (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://nextcloud.wallaby-gopher.ts.net/hosting/discovery2024-07-24T13:17:06.786069461Z 🛈 Configured WOPI URL: https://nextcloud.wallaby-gopher.ts.net
2024-07-24T13:17:06.786075390Z 🛈 Configured public WOPI URL: 
2024-07-24T13:17:06.786081025Z 🛈 Configured callback URL: 
2024-07-24T13:17:06.786086814Z 
2024-07-24T13:17:06.852006229Z Failed to fetch discovery endpoint from 
2024-07-24T13:17:06.852027030Z cURL error 6: Could not resolve host: nextcloud.wallaby-gopher.ts.net (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for https://nextcloud.wallaby-gopher.ts.net/hosting/discoveryhttps://nextcloud.wallaby-gopher.ts.nethttps://nextcloud.wallaby-gopher.ts.nethttps://nextcloud.wallaby-gopher.ts.nethttps://nextcloud.wallaby-gopher.ts.nethttps://nextcloud.wallaby-gopher.ts.net

3. Caddy version:

v2.8.4 h1:q3pe0wpBj1OcHFZ3n/1nl4V4bxBrYoSoab7rL9BMYNk=

4. How I installed and ran Caddy:

I used the docker-compose in section C

a. System environment:

Operating System: Arch Linux
KDE Plasma Version: 6.1.3
KDE Frameworks Version: 6.4.0
Qt Version: 6.7.2
Kernel Version: 6.9.10-arch1-1 (64-bit)
Graphics Platform: X11
Processors: 20 × Intel® Core™ i9-10900K CPU @ 3.70GHz
Memory: 62.7 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3080/PCIe/SSE2
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: Z490 AORUS MASTER
System Version: -CF
installed with docker version:
Client:
Version: 27.0.3
API version: 1.46
Go version: go1.22.4
Git commit: 7d4bcd863a
Built: Mon Jul 1 21:15:54 2024
OS/Arch: linux/amd64
Context: default

Server:
Engine:
Version: 27.0.3
API version: 1.46 (minimum version 1.24)
Go version: go1.22.4
Git commit: 662f78c0b1
Built: Mon Jul 1 21:15:54 2024
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.7.20
GitCommit: 8fc6bcff51318944179630522a095cc9dbf9f353.m
runc:
Version: 1.1.13
GitCommit:
docker-init:
Version: 0.19.0
GitCommit: de40ad0

b. Command:

I navigate to my location, here is the tree of combo:

.
├── caddy
│   ├── Caddyfile
│   ├── certs
│   ├── config
│   │   └── caddy  [error opening dir]
│   ├── data
│   │   └── caddy  [error opening dir]
│   └── sites
├── docker-compose.yaml
├── gluetun
│   ├── my_expressvpn_switzerland_udp.ovpn
│   ├── my_expressvpn_usa_-_chicago_udp.ovpn
│   └── servers.json
├── tailscale
│   ├── state
│   │   └── tailscaled.state
│   ├── var
│   │   └── lib
│   │       └── tailscale  [error opening dir]
│   └── varlib
│       └── tailscale  [error opening dir]
├── tailscale1
│   └── varlib
│       └── tailscale
│           ├── certs  [error opening dir]
│           ├── derpmap.cached.json
│           ├── files  [error opening dir]
│           ├── tailscaled.log1.txt
│           ├── tailscaled.log2.txt
│           ├── tailscaled.log.conf
│           └── tailscaled.state
└── tailscale3
    ├── derpmap.cached.json
    ├── files  [error opening dir]
    ├── tailscale  [error opening dir]
    └── tailscaled.state

then do: docker-compose up -d
I only need to do this once, but I run these two to get my cert from tailscale, first jellyfin then nextcloud:
docker exec tailscalej tailscale --socket /tmp/tailscaled.sock cert jellyfin.wallaby-gopher.ts.net
docker exec tailscalen tailscale --socket /tmp/tailscaled.sock cert nextcloud.wallaby-gopher.ts.net
I then hashtag out the authkey and run again
docker-compose up -d

PASTE OVER THIS, BETWEEN THE ``` LINES.
Please use the preview pane to ensure it looks nice.

c. compose file:

volumes:
  nextcloud_aio_mastercontainer:
    name: nextcloud_aio_mastercontainer # This line is not allowed to be changed as otherwise the built-in backup solution will not work
  # shared volumes any container in the same docker-compose file can access
  # used to share the tailscaled.sock file with caddy
  sock_volume1:
  sock_volume2:
  # you do not have to use the same local filepaths that I do for volume mapping in the containers,
  # but you do have to make sure whatever filepath you use is mapped to the correct filepath in the container
services:
  jellyfin:
    image: jellyfin/jellyfin
    container_name: "jellyfin-ts"
    user: 1000:1000
    volumes:
      - /media/server/server/jellyfin-server/config:/config
      - /media/server/server/jellyfin-server/cache:/cache
      # ro means read only, we don't want jellyfin accidentally deleting our files
      - /media/16tb/Shows and Movies/Movies:/Movies:ro
      - /media/16tb/Shows and Movies/Shows:/Shows:ro
    restart: unless-stopped

  caddy:
    image: caddy
    network_mode: host
    container_name: "caddy"
    hostname: caddy
    depends_on:
      # wait for tailscale to boot
      # to communicate to it using the tailscaled.sock
      - tailscale 
    #removed ports due to host mode
    volumes:
      - /media/server/server/combo/caddy/Caddyfile:/etc/caddy/Caddyfile
      - /media/server/server/combo/caddy/data:/data
      - /media/server/server/combo/caddy/config:/config
      # get socket tailscale created in the shared volume and share it with caddy
      - /media/server/server/combo/caddy/certs:/certs
      - /media/server/server/combo/caddy/sites:/srv
      # caddy expects the socket to be at /var/run/tailscale/tailscaled.sock
      - sock_volume1:/var/run/tailscale
      - sock_volume2:/var/run/tailscale
    restart: unless-stopped

  #tailscale for jellyfin
  tailscale:
        container_name: tailscalej
        image: tailscale/tailscale
        network_mode: host
        # tailscale sets new machine names to the OS hostname
        # docker-desktop is the default hostname for docker
        # if you modify this and recreate the container, the machine name will be updated automatically
        # make sure this matches the machine name you set in the Caddyfile
        hostname: jellyfin
        cap_add:
            - NET_ADMIN
            - NET_RAW
        volumes:
            # saves container state after container is recreated
            # used varlib because var folder isn't needed locally
            - /media/server/server/combo/tailscale/varlib:/var/lib
            # containerized version of tailscale uses /tmp/tailscaled.sock
            # binds the socket to a docker volume so it can be accessed by other containers
            # this can't be a local directory because the socket is created by the container
            - sock_volume1:/tmp
        environment:
            # if you add a command key, it will override environment key variables with default values!
            # info: https://tailscale.com/kb/1282/docker#ts_socks5_server

            # set the authkey to reusable when generating it from tailscale
            #- TS_AUTHKEY=
            # prevents a new machine from being added each time the container is restarted
            - TS_STATE_DIR=/var/lib/tailscale
            # https://tailscale.com/kb/1112/userspace-networking
            - TS_USERSPACE_NETWORKING=userspace-networking
        restart: unless-stopped

  #tailscale for nextcloud
  tailscale1:
        container_name: tailscalen
        image: tailscale/tailscale
        network_mode: host
        # tailscale sets new machine names to the OS hostname
        # docker-desktop is the default hostname for docker
        # if you modify this and recreate the container, the machine name will be updated automatically
        # make sure this matches the machine name you set in the Caddyfile
        hostname: nextcloud
        cap_add:
            - NET_ADMIN
            - NET_RAW
        volumes:
            # saves container state after container is recreated
            # used varlib because var folder isn't needed locally
            - /media/server/server/combo/tailscale1/varlib:/var/lib
            #might need new file path??
            # containerized version of tailscale uses /tmp/tailscaled.sock
            # binds the socket to a docker volume so it can be accessed by other containers
            # this can't be a local directory because the socket is created by the container
            - sock_volume2:/tmp
        environment:
            # if you add a command key, it will override environment key variables with default values!
            # info: https://tailscale.com/kb/1282/docker#ts_socks5_server

            # set the authkey to reusable when generating it from tailscale
            #- TS_AUTHKEY=
            # prevents a new machine from being added each time the container is restarted
            - TS_STATE_DIR=/var/lib/tailscale
            # https://tailscale.com/kb/1112/userspace-networking
            - TS_USERSPACE_NETWORKING=userspace-networking
        restart: unless-stopped

  #tailscale for gluetunnel vpn exit
  tailscale3:
    container_name: tailscalee
    cap_add:
      - NET_ADMIN
      - NET_RAW
    volumes:
      - /media/server/server/combo/tailscale3:/var/lib
      - /media/server/server/combo/tailscale3:/state
      - /dev/net/tun:/dev/net/tun
    network_mode: "service:gluetun"
    restart: unless-stopped
    environment:
      - TS_HOSTNAME=vpn-exit-node
      - TS_AUTHKEY=
      - TS_ROUTES=192.168.1.0/24
      - TS_EXTRA_ARGS=--accept-routes #=true
      - TS_EXTRA_ARGS=--advertise-exit-node
      - TS_NO_LOGS_NO_SUPPORT=true
      - TS_STATE_DIR=/state
    image: tailscale/tailscale
    depends_on: #needs to be in same stack? Just start after jellyfin stack?
      - gluetun

  #for vpn
  gluetun:
    image: qmcgaw/gluetun
    container_name: gluetun
    # line above must be uncommented to allow external containers to connect.
    # See https://github.com/qdm12/gluetun-wiki/blob/main/setup/connect-a-container-to-gluetun.md#external-container-to-gluetun
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
    devices:
      - /dev/net/tun:/dev/net/tun
    volumes:
      - /media/server/server/combo/gluetun:/gluetun
    environment:
      - VPN_SERVICE_PROVIDER=expressvpn
      - OPENVPN_USER= #don't listen to error about using old variable, new ones dont work
      - OPENVPN_PASSWORD=
      - SERVER_COUNTRIES=USA
      - SERVER_CITIES=Chicago
      # See https://github.com/qdm12/gluetun-wiki/tree/main/setup#setup
      # Timezone for accurate log times
      - TZ=America/Chicago
      # Server list updater
      # See https://github.com/qdm12/gluetun-wiki/blob/main/setup/servers.md#update-the-vpn-servers-list
      - UPDATER_PERIOD=24h

  #nextcloud
  nextcloud:
    image: nextcloud/all-in-one:latest
    restart: always
    container_name: nextcloud-aio-mastercontainer # This line is not allowed to be changed as otherwise AIO will not work correctly
    volumes:
      - nextcloud_aio_mastercontainer:/mnt/docker-aio-config # This line is not allowed to be changed as otherwise the built-in backup solution will not work
      - /var/run/docker.sock:/var/run/docker.sock:ro # May be changed on macOS, Windows or docker rootless. See the applicable documentation. If adjusting, don't forget to also set 'WATCHTOWER_DOCKER_SOCKET_PATH'!
    ports:
      #- 80:80 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      - 8080:8080
      #- 8443:8443 # Can be removed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
    environment: # Is needed when using any of the options below
      # - AIO_DISABLE_BACKUP_SECTION=false # Setting this to true allows to hide the backup section in the AIO interface. See https://github.com/nextcloud/all-in-one#how-to-disable-the-backup-section
      #- SKIP_DOMAIN_VALIDATION=true #might not be helping?
      - APACHE_PORT=11000 # Is needed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      - APACHE_IP_BINDING=0.0.0.0 # Should be set when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else) that is running on the same host. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      # - BORG_RETENTION_POLICY=--keep-within=7d --keep-weekly=4 --keep-monthly=6 # Allows to adjust borgs retention policy. See https://github.com/nextcloud/all-in-one#how-to-adjust-borgs-retention-policy
      # - COLLABORA_SECCOMP_DISABLED=false # Setting this to true allows to disable Collabora's Seccomp feature. See https://github.com/nextcloud/all-in-one#how-to-disable-collaboras-seccomp-feature
      # - NEXTCLOUD_DATADIR=/mnt/ncdata # Allows to set the host directory for Nextcloud's datadir. ⚠️⚠️⚠️ Warning: do not set or adjust this value after the initial Nextcloud installation is done! See https://github.com/nextcloud/all-in-one#how-to-change-the-default-location-of-nextclouds-datadir
      # - NEXTCLOUD_MOUNT=/mnt/ # Allows the Nextcloud container to access the chosen directory on the host. See https://github.com/nextcloud/all-in-one#how-to-allow-the-nextcloud-container-to-access-directories-on-the-host
      - NEXTCLOUD_UPLOAD_LIMIT=1G # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-upload-limit-for-nextcloud
      - NEXTCLOUD_MAX_TIME=3600 # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-max-execution-time-for-nextcloud
      - NEXTCLOUD_MEMORY_LIMIT=1024M # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-php-memory-limit-for-nextcloud
      # - NEXTCLOUD_TRUSTED_CACERTS_DIR=/path/to/my/cacerts # CA certificates in this directory will be trusted by the OS of the nexcloud container (Useful e.g. for LDAPS) See See https://github.com/nextcloud/all-in-one#how-to-trust-user-defined-certification-authorities-ca
      # - NEXTCLOUD_STARTUP_APPS=deck twofactor_totp tasks calendar contacts notes # Allows to modify the Nextcloud apps that are installed on starting AIO the first time. See https://github.com/nextcloud/all-in-one#how-to-change-the-nextcloud-apps-that-are-installed-on-the-first-startup
      # - NEXTCLOUD_ADDITIONAL_APKS=imagemagick # This allows to add additional packages to the Nextcloud container permanently. Default is imagemagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-os-packages-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ADDITIONAL_PHP_EXTENSIONS=imagick # This allows to add additional php extensions to the Nextcloud container permanently. Default is imagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-php-extensions-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ENABLE_DRI_DEVICE=true # This allows to enable the /dev/dri device in the Nextcloud container. ⚠️⚠️⚠️ Warning: this only works if the '/dev/dri' device is present on the host! If it should not exist on your host, don't set this to true as otherwise the Nextcloud container will fail to start! See https://github.com/nextcloud/all-in-one#how-to-enable-hardware-transcoding-for-nextcloud
      - TALK_PORT=3478 # This allows to adjust the port that the talk container is using. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-talk-port
      # - WATCHTOWER_DOCKER_SOCKET_PATH=/var/run/docker.sock # Needs to be specified if the docker socket on the host is not located in the default '/var/run/docker.sock'. Otherwise mastercontainer updates will fail. For macos it needs to be '/var/run/docker.sock'
      # networks: # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      # - nextcloud-aio # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      - trusted_domains=nextcloud.wallaby-gopher.ts.net #should I use dbhost=? #Think both are wrong according to https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md#adapting-the-sample-web-server-configurations-below

    depends_on: #needs to be in same stack? Just start after jellyfin stack?
      - caddy

d. My complete Caddy config:

I tried to run docker exec caddy caddy fmt, but it gave me an Error: reading input file: open Caddyfile: no such file or directory. This is silly because I can see it right there in the directory with ls

# make sure the machine name is the same as the hostname of the tailscale container in docker-compose.yml
jellyfin.wallaby-gopher.ts.net {
	reverse_proxy jellyfin:8096
}

nextcloud.wallaby-gopher.ts.net:443 {
	reverse_proxy nextcloud:11000
}

5. Links to relevant resources:

The page I was building jellyfin/tailscale and then nextcloud off of: GitHub - brianalewine/jellyfin-docker: Guide for setting up jellyfin with tailscale through docker

Whitestrake · July 24, 2024, 11:27pm

Anecdotally, I’ve had issues with Tailscale DNS lately where it returns no records for pretty much any request. I could dig @8.8.8.8 google.com and get a result but dig google.com failed with no response from Tailscale’s Magic DNS 100.100.100.100 on loopback; it seems like Tailscale was failing entirely to forward these requests to the next configured recursive resolver.

I had this issue crop up on a Macbook and on a NixOS server, but it didn’t impact my Windows PC or a few other NixOS servers I run, so it seemed like an inconsistent failure.

My solution was to turn Tailscale off, which immediately resolved the issue. Turning it back on then worked A-OK. A reboot also sufficed. The issue has not reoccurred for a week or so now, possibly with Tailscale client updates? Not entirely sure, but I’m smooth running now.

rinkrealclose · July 25, 2024, 12:11am

Hello Whitestrake, I have tried rebooting the pc, restarting containers in portainer, and docker-compose down/ docker-compose up. Behavior is still the same as posted above I’ve even gone so far as to delete the tailscale local files in an effort to start fresh. This leads me to believe I have a settup error, or need to change a setting on my pc, I’m totally unsure of what it would be though. I should add, I’m trying to access only via tailscale because I do not want nextcloud/jellyfin on the open internet. I was following this guide to make nextcloud ‘local’: all-in-one/local-instance.md at main · nextcloud/all-in-one · GitHub

Whitestrake · July 25, 2024, 12:19am

Oh, one thing I just noticed.

ERR ts=1721851499.0825777 logger=http.log.error msg=dial tcp: lookup nextcloud on 192.168.0.1:53: no such host

Why are you reverse proxying to nextcloud? You can’t access that host. The nextcloud container is in the default compose network, but you have Caddy in the host network. You only have access to these containers via ports opened on the host itself, via localhost. Same with jellyfin, it seems.

That also explains why you’re getting errors looking for ts.net addresses. You’re apparently not running Tailscale on your host, but in userspace inside your containers. Caddy also has no access to these as it’s in host networking.

Edit: Oh, no, you have Tailscale in host networking too, and only one of them is running in userspace. Nevermind the last part - kind of? Your host has access to the Tailscale network via tun adapter, but is it possible running Tailscale in a container is preventing it from providing Magic DNS to the host since it can’t configure /etc/resolv.conf or the systemd resolver? This seems like a very complex setup…

rinkrealclose · July 26, 2024, 5:50pm

Okay, I’ve been working on this, and trying a lot of different parameters. This really is unreasonably complicated to just keep my services off the wide web, especially for a beginner lol. Funnily enough It seems my issue wasn’t a parameter with nextcloud but the DNS local override I’d turned on to use an exit node with my vpn. Nextcloud is running again without it. I’m not sure how I will use my tailscale exit node without it though. I’d kept the userspace based on the jellyfin settup, but along with my testing discovered that nextcloud functions with or without it. I’m going to try the same test with jellyfin, but it seems I’ve run out of tests on that based on the caddy log:

INF ts=1722015874.9593458 logger=tls.acme_client msg=trying to solve challenge identifier=jellyfin.wallaby-gopher.ts.net challenge_type=http-01 ca=https://acme-staging-v02.api.letsencrypt.org/directory

ERR ts=1722015875.3191078 logger=tls.acme_client msg=challenge failed identifier=jellyfin.wallaby-gopher.ts.net challenge_type=http-01 problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]}

ERR ts=1722015875.3191268 logger=tls.acme_client msg=validating authorization identifier=jellyfin.wallaby-gopher.ts.net problem={"type":"urn:ietf:params:acme:error:dns","title":"","detail":"DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain","instance":"","subproblems":[]} order=https://acme-staging-v02.api.letsencrypt.org/acme/order/156809383/18030175613 attempt=2 max_attempts=3

ERR ts=1722015875.319142 logger=tls.obtain msg=could not get certificate from issuer identifier=jellyfin.wallaby-gopher.ts.net issuer=acme-v02.api.letsencrypt.org-directory error=HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain

ERR ts=1722015875.3191717 logger=tls.obtain msg=will retry error=[jellyfin.wallaby-gopher.ts.net] Obtain: [jellyfin.wallaby-gopher.ts.net] solving challenge: jellyfin.wallaby-gopher.ts.net: [jellyfin.wallaby-gopher.ts.net] authorization failed: HTTP 400 urn:ietf:params:acme:error:dns - DNS problem: NXDOMAIN looking up A for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain; DNS problem: NXDOMAIN looking up AAAA for jellyfin.wallaby-gopher.ts.net - check that a DNS record exists for this domain (ca=https://acme-staging-v02.api.letsencrypt.org/directory) attempt=2 retrying_in=120 elapsed=63.17544551 max_duration=2592000

INF ts=1722015992.1428125 logger=tls.obtain msg=releasing lock identifier=jellyfin.wallaby-gopher.ts.net

I will try again later, unless this isn’t just a “give it some time to cooldown” thing.

Whitestrake · July 27, 2024, 7:22am

I’d like to throw some alternatives at you:

Firstly, ditch the Tailscale containers. Run Tailscale on your host and remove all that complexity there. You can make that host your exit node if you like, you can have it advertise routes. Just let the host handle it all.

This has the benefit of Tailscale DNS working at the host level. That means that everything in a container can resolve things using Tailscale. That should immediately fix your ts.net DNS resolution problem inside containers.

As a possible second step, build Caddy with tailscale/caddy-tailscale. Use Tailscale’s own caddy-tailscale module to have Caddy create dedicated Tailscale “nodes” for your reverse-proxied sites.

I’ve tinkered with this solution recently and they’ve been doing great work with it and cleaning it up nicely. The module creates its own userspace Tailscale connections within Caddy itself for Caddy to listen and serve HTTP(S) on.

Then, simply only serve those Caddy sites on the Tailnet. Boom - non-globally-accessible, Tailscale-only, HTTPS services with custom node names on your Tailnet.

You’d have to do some learning and some tinkering, but it could be a pretty good solution for you.

rinkrealclose · July 29, 2024, 1:09am

I would be happy to run tailscale on the host, but I think the issue is I’d only get one magicdns domain without multiple containers. I hope to run jellyfin.etc and nextcloud.etc. Additionally I cannot use the host as the exit because vpns and tailscale don’t play together well. I really like the sound of the tailscale/caddy-tailscale’s capabilities. I’m not sure how I use that with docker-compose, however. When I tried image: tailscale/caddy-tailscale I got: “Error response from daemon: pull access denied for tailscale/caddy-tailscale, repository does not exist or may require ‘docker login’: denied: requested access to the resource is denied.” Does this have to be built outside of docker as well?

Whitestrake · July 29, 2024, 1:20am

I think either you or I are confused about how MagicDNS works for you here. MagicDNS registers each device name as its own DNS entry so you can resolve hostname foobar with dig foobar instead of needing dig foobar.example.com.

As for the tailnet domain, you only get one of those per tailnet anyway. Are you using containers to connect to multiple tailnets?

If you’re talking about search domains, you can specify I think as many as you like in the DNS settings in your Tailscale admin panel.

Can you elaborate? I might be missing something here but if you’re using a Tailscale exit node, why do you need a VPN? (The exit node acts as your VPN.) If you’re using the VPN on the same host you’re using as an exit node - why? (You should consider using the VPN directly on your client rather than exiting from your Tailscale node into a VPN to exit from the VPN’s node.)

That repository is not a container image, it is a module for Caddy. You can build it in Docker, you just need to specify a build: stanza.

Simplest implementation looks something like:

services:
  caddy:
    build:
      dockerfile_inline: |
        FROM caddy:2-builder AS builder
        RUN xcaddy build latest \
          --with https://github.com/tailscale/caddy-tailscale
        FROM caddy:2
        COPY --from=builder /usr/bin/caddy /usr/bin/caddy

There’s some more detail at https://hub.docker.com/_/caddy, look for “Adding custom Caddy modules”.

rinkrealclose · July 29, 2024, 8:09pm

I was definitely unaware that you could just refer to the hostname, I thought MagicDNS was just translating an IP address like 100.122.199.7 to the hostname.tailnet.ts.net. I also may have been mislabeling the different components of the tailscale address. I don’t particularly care how, but I need two distinct paths to get to the different services. I’d tried computer_hostname.tailnet.ts.net/service1 and /service2, but it seems that they do not support subdomains like that, so I instead did service1.tailnet.ts.net and service2.tailnet.ts.net with a separate container each.

No, I just have the one tailnet: wallaby-gopher. It has to stay that way now due to https certification that gets logged in a ledger independent from tailscale.

I think that these just simplify inputting the full name like jellyfin.wallaby-gopher.ts.net with a simple search of jellyfin, but aren’t actually necessary for the initial setup?

By VPN I mean a regular privacy vpn, like expressvpn. The problem is I will need tailscale running on the host pc to access files in nextcloud, so there needs to be some degree of separation between tailscale and expressvpn.

Whitestrake:

Simplest implementation looks something like:

services:
  caddy:
    build:
      dockerfile_inline: |
        FROM caddy:2-builder AS builder
        RUN xcaddy build latest \
          --with https://github.com/tailscale/caddy-tailscale
        FROM caddy:2
        COPY --from=builder /usr/bin/caddy /usr/bin/caddy

Wow, this was way simpler than the Dockerfile I was attempting to do. Just a small note, it only accepts https://github.com/tailscale/caddy-tailscale without the https://. That said, neither nextcloud or jellyfin connect now. I don’t need to run anything like docker exec tailscalej tailscale --socket /tmp/tailscaled.sock cert jellyfin.wallaby-gopher.ts.net, right? I tried adapting it to caddy but get: “OCI runtime exec failed: exec failed: unable to start container process: exec: “tailscale”: executable file not found in $PATH: unknown.” Strangely I also find the nodes do not want to go away even though ephemeral is set to true. I’d prefer to not use ephemeral but each time I start they get iterated with a -1 and continue to exist so the website name wouldn’t be consistent.

Error:

ERR ts=1722279306.2838244 logger=http.log.error msg=dial tcp 100.107.81.52:8096: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"56696","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/","headers":{"Sec-Gpc":["1"],"Sec-Fetch-Dest":["document"],"Sec-Fetch-User":["?1"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"],"Accept-Language":["en-US,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Dnt":["1"],"Te":["trailers"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-Mode":["navigate"],"Sec-Fetch-Site":["none"],"Priority":["u=0, i"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.00119949 status=502 err_id=hh9vz3nd0 err_trace=reverseproxy.statusError (reverseproxy.go:1269)

ERR ts=1722279306.3645525 logger=http.log.error msg=dial tcp 100.107.81.52:8096: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"56696","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/favicon.ico","headers":{"Accept-Encoding":["gzip, deflate, br, zstd"],"Sec-Fetch-Dest":["image"],"Sec-Fetch-Site":["same-origin"],"Priority":["u=6"],"Te":["trailers"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5"],"Accept-Language":["en-US,en;q=0.5"],"Dnt":["1"],"Sec-Gpc":["1"],"Referer":["https://jellyfin.wallaby-gopher.ts.net/"],"Sec-Fetch-Mode":["no-cors"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.000561729 status=502 err_id=5mwt7ijws err_trace=reverseproxy.statusError (reverseproxy.go:1269)
The error looks to be reverse proxy based, but if I remove either the bind or reverse proxy the tailscale nodes don't even come online and I get no errors at all.

I can get a different error If I add :443 to the end of the sites in the Caddyfile, not sure if it’s better or worse

ERR ts=1722283449.054262 logger=tls.handshake msg=external certificate manager remote_ip=100.122.199.7 remote_port=35206 sni=nextcloud.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": context deadline exceeded

ERR ts=1722283460.0427012 logger=tls.handshake msg=external certificate manager remote_ip=100.103.44.32 remote_port=44230 sni=jellyfin.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": context deadline exceeded

Current caddy compose config:

  caddy:
    build:
        dockerfile_inline: |
          FROM caddy:2-builder AS builder
          RUN xcaddy build latest \
            --with github.com/tailscale/caddy-tailscale
          FROM caddy:2
          COPY --from=builder /usr/bin/caddy /usr/bin/caddy
    network_mode: host
    container_name: "caddy"
    hostname: caddy
    volumes:
      - /media/server/server/combo/caddy/Caddyfile:/etc/caddy/Caddyfile
      - /media/server/server/combo/caddy/data:/data
      - /media/server/server/combo/caddy/config:/config
      # get socket tailscale created in the shared volume and share it with caddy
      - /media/server/server/combo/caddy/certs:/certs
      - /media/server/server/combo/caddy/sites:/srv
      #for new node tailscale info
      - /media/server/custom/nextcloud
      - /media/server/custom/jellyfin
      # caddy expects the socket to be at /var/run/tailscale/tailscaled.sock
      - sock_volume1:/var/run/tailscale
      - sock_volume2:/var/run/tailscale
    restart: unless-stopped

and caddy portion in compose

#for module
{
  tailscale {

    # If true, register ephemeral nodes that are removed after disconnect.
    # Default: false
    ephemeral true

    # Directory to store Tailscale state in. A subdirectory will be created for each node.
    # The default is to store state in the user's config dir (see os.UserConfDir).
    state_dir /media/server/custom

    # If true, run the Tailscale web UI for remotely managing the node. (https://tailscale.com/kb/1325)
    # Default: false
    webui true

    # Any number of named node configs can be specified to override global options.
    jellyfin {
      # Tailscale auth key used to register this node.
      auth_key tskey-auth-

      # If true, remove this node after disconnect.
      ephemeral true #otherwise I get duplicates

      # Hostname to request when registering this node.
      # Default: <node_name> used for this node configuration
      hostname jellyfin

      # Directory to store Tailscale state in for this node. No subdirectory is created.
      state_dir /media/server/custom/jellyfin

      # If true, run the Tailscale web UI for remotely managing this node.
      webui true
    }
    nextcloud {
      # Tailscale auth key used to register this node.
      auth_key tskey-auth-

      # If true, remove this node after disconnect.
      ephemeral true

      # Hostname to request when registering this node.
      # Default: <node_name> used for this node configuration
      hostname nextcloud

      # Directory to store Tailscale state in for this node. No subdirectory is created.
      state_dir /media/server/custom/nextcloud

      # If true, run the Tailscale web UI for remotely managing this node.
      webui true
    }
  }
}

https://jellyfin.wallaby-gopher.ts.net {
  bind tailscale/jellyfin
  reverse_proxy jellyfin:8096
}

https://nextcloud.wallaby-gopher.ts.net {
  bind tailscale/nextcloud
  reverse_proxy nextcloud:11000
}

Whitestrake · July 30, 2024, 12:00am

To be comprehensive, MagicDNS is its own DNS resolver. When you install Tailscale it interposes this lightweight resolver in front of any DNS servers you’ve already configured.

It then intercepts DNS requests and returns the correct IP address for bare hostnames as well as ts.net FQDNs where applicable. Anything else it forwards on to your normal resolvers.

Tailscale is a little bit opinionated on how this works. You can’t give a single Tailscale host arbitrary subdomains. Tailscale insists on a 1:1 ratio of node:subdomain. A node is, essentially, an individual Tailscale client. This is why you’ve found yourself having to try to install multiple Tailscale clients, in containers, in order to get multiple nodes that then work as separate subdomains.

The appeal of the caddy-tailscale is that it actually creates individual nodes for you and has Caddy connect, essentially, as many separate nodes as you need to your Tailnet in a single running process. No more juggling containers!

Actually, your Tailnet domain (which is wallaby-gopher.ts.net) is automatically a search domain.

If you needed other search domains you can specify them there. I believe that most people do not and will not need them.

What I don’t understand at the moment is why you need your server to act as an exit node at all if you’ve already got a VPN.

Replacing a VPN is the primary use case for an exit node.

If you just need to access your server - you already can, because it’s on your Tailnet. If you need to access other hosts on your server’s LAN, you’d want to use a subnet router. If you want to use your server as a VPN to access the internet through - you’d use an exit node. But if you’re using a VPN service, you would use that instead of an exit node.

Ahh! D’oh. My mistake.

Tailscale isn’t running as a separate executable. It’s running inside Caddy as a module, and is only configurable and accessible via Caddy configuration.

They still seem to be connected, based on your screenshot. Is Caddy still running?

Are you properly persisting the node keys?

Failure to persist would result in caddy-tailscale using the provided auth key to generate new node keys. This creates a new node in your Tailnet.

Since auth keys have a limited lifetime, the current ideal solution for long-term use would be to persist a non-ephemeral node and disable key expiry on it. This would enable you to remove the auth key entirely from your config after the very first run and it should continue to work just fine. If it doesn’t, you know that caddy-tailscale can’t access the node keys for some reason (e.g. persistence failure).

rinkrealclose:

Error:

ERR ts=1722279306.2838244 logger=http.log.error msg=dial tcp 100.107.81.52:8096: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"56696","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/","headers":{"Sec-Gpc":["1"],"Sec-Fetch-Dest":["document"],"Sec-Fetch-User":["?1"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/png,image/svg+xml,*/*;q=0.8"],"Accept-Language":["en-US,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Dnt":["1"],"Te":["trailers"],"Upgrade-Insecure-Requests":["1"],"Sec-Fetch-Mode":["navigate"],"Sec-Fetch-Site":["none"],"Priority":["u=0, i"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.00119949 status=502 err_id=hh9vz3nd0 err_trace=reverseproxy.statusError (reverseproxy.go:1269)

ERR ts=1722279306.3645525 logger=http.log.error msg=dial tcp 100.107.81.52:8096: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"56696","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/favicon.ico","headers":{"Accept-Encoding":["gzip, deflate, br, zstd"],"Sec-Fetch-Dest":["image"],"Sec-Fetch-Site":["same-origin"],"Priority":["u=6"],"Te":["trailers"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["image/avif,image/webp,image/png,image/svg+xml,image/*;q=0.8,*/*;q=0.5"],"Accept-Language":["en-US,en;q=0.5"],"Dnt":["1"],"Sec-Gpc":["1"],"Referer":["https://jellyfin.wallaby-gopher.ts.net/"],"Sec-Fetch-Mode":["no-cors"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.000561729 status=502 err_id=5mwt7ijws err_trace=reverseproxy.statusError (reverseproxy.go:1269)
The error looks to be reverse proxy based, but if I remove either the bind or reverse proxy the tailscale nodes don't even come online and I get no errors at all.

I can get a different error If I add :443 to the end of the sites in the Caddyfile, not sure if it’s better or worse

Here we see that when you browsed to jellyfin.wallaby-gopher.ts.net, Caddy tried to dial 100.107.81.52:8096 and failed to connect.

We need to ask ourselves - is Caddy trying to do what we actually want it to do here? 100.107.81.52 is in the 100.64.0.0/10 pool that Tailscale assigns IP addresses from, but we want Caddy to reverse proxy over the default Docker compose network, right? We should be seeing Caddy try to connect to something in the 172.16.0.0/12 range. So why would it be trying to connect to a Tailscale IP?

From your screenshot we can see that 100.107.81.52 is actually the IP address of your jellyfin node in your Tailnet.

Recall that Magic DNS returns IP addresses for bare hostnames. Since jellyfin is a valid hostname for a node in your Tailnet, it must be Magic DNS resolving jellyfin to 100.107.81.52 instead of the IP of your Jellyfin container in Docker.

So how do you get Caddy to use the Docker IPs instead?

Well, first off, you’re still running Caddy in the host’s networking stack:

That means Caddy has no access to the default network for your compose project and no access to Docker’s internal DNS resolver for that network.

That means even if you stop Caddy from resolving the IP via Magic DNS, you can’t get it to resolve the IP via Docker. To solve the problem that way, you would need to put Caddy back into the default compose network.

As an alternative, you could expose port 8096 in your compose file for your Jellyfin container. That would make Jellyfin available at 100.107.81.52:8096 (assuming you’re running Tailscale directly on the host, now, which I’m assuming you are because Magic DNS is working on the host). I personally don’t like that idea because I’d rather have that part of the network isolated; if my services are available through Caddy, I don’t want the host to expose direct access to those ports even on the LAN if it’s not necessary. If you do that, though, use localhost:8096 instead of jellyfin etc; just to ensure it always works even if you make Tailscale changes.

That’s annoying. I don’t think you need :443, but… It shouldn’t be incorrect. I’d say don’t use :443 in the site addresses. We know it works without them because the earlier errors are later on in the stage (the connection to Caddy was successful but the error is with Caddy reverse proxy connecting to the upstream server).

rinkrealclose · July 30, 2024, 3:42pm

Wow, I’m learning way more about networking than that stupid college class I took ever taught me!

Unless I’m mistaken I think routing my traffic through an exit node on the same computer would not actually disguise any traffic, and my ISP would still be able to see where I’m sending requests, while if there’s expressvpn at that exit node all they can see is me pushing everything to a vpn. I can just have my vpn on the host machine, but then I won’t be able to access my tailnet because they don’t like each other. Before I migrated to linux I was able to split-tunnel the vpn so I could have both running, but that’s not a functionality it has in linux.

Yes, new instances say connected while Caddy is running, they disconnect when caddy is off, and remain. They will not reconnect when docker-compose up is run. Even if the key if commented out, ephemeral is false, and key expiration was disabled, it just requests I go to a link to authenticate a new one anyway in the caddy logs.

Okay, then I think this is a persistence problem. With/without the above settings the folder where they are supposed to put their state, ie for the nextcloud node
state_dir /media/server/custom/nextcloud
appears to be empty

caddy DOES have access to these volumes per the compose file

volumes:
           - /media/server/custom/nextcloud
           - /media/server/custom/jellyfin

This is the only error I can get now that I removed host from caddy, If I’m reading it right it just means that it can’t access the authentication record because of the folder issue. Same as the :443 addition, strangely.

ERR ts=1722283449.054262 logger=tls.handshake msg=external certificate manager remote_ip=100.122.199.7 remote_port=35206 sni=nextcloud.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": context deadline exceeded

ERR ts=1722283460.0427012 logger=tls.handshake msg=external certificate manager remote_ip=100.103.44.32 remote_port=44230 sni=jellyfin.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": context deadline exceeded

Whitestrake:

From your screenshot we can see that 100.107.81.52 is actually the IP address of your jellyfin node in your Tailnet.

Recall that Magic DNS returns IP addresses for bare hostnames. Since jellyfin is a valid hostname for a node in your Tailnet, it must be Magic DNS resolving jellyfin to 100.107.81.52 instead of the IP of your Jellyfin container in Docker.

So how do you get Caddy to use the Docker IPs instead?

Well, first off, you’re still running Caddy in the host’s networking stack:
That means Caddy has no access to the default network for your compose project and no access to Docker’s internal DNS resolver for that network.

That means even if you stop Caddy from resolving the IP via Magic DNS, you can’t get it to resolve the IP via Docker. To solve the problem that way, you would need to put Caddy back into the default compose network.

I re-added ports to Caddy, and thought, maybe they’re not communicating over the default network? So tried adding a comon network to each container, same error. I thought, maybe the issue is I have too many things named jellyfin/nextcloud in the caddy file and the same DNS issue is continuing, so I renamed the container definition to say next1 instead of nextcloud and then in the Caddyfile did reverse_proxy next1:11000. I’ve tried different combinations of all of these with all the same error as above, so I think I’m getting held up by the persistence issue which I cannot find documentation on fixing. Thank you for your help so far.

Whitestrake · July 31, 2024, 12:08am

You’re pretty much right about this.

That said - I don’t think the solution you’re going for (containerising Tailscale) is a very robust one and I think it’s contributing a bit conceptually to the problems getting your setup working.

Your requirements are fairly straightforward - you need your client PCs to be able to access your Tailnet (and therefore your server) but you also need those client PCs to route internet-bound traffic through your anonymising VPN. A full-tunnel VPN interferes with traffic outbound to Tailscale, which means running it on your client PCs is not feasible. It also might mean running it on your server is not feasible if your server needs outbound access into your Tailnet.

I’d probably recommend looking into an alternative solution. I have two suggestions: firstly, Mullvad VPN integrate directly into Tailscale to provide VPN endpoints as exit nodes in your Tailnet. This would be a turn-key solution to use Tailscale and anonymise your internet traffic at the same time without having to manage any extra complexity. See: Mullvad exit nodes · Tailscale Docs

As an alternative, dedicate a VM just for this. It would have only one job: run Tailscale as an exit node, sending traffic into a full tunnel VPN. Then, use your dedicated VM as your exit node on your client PCs to anonymise them while retaining Tailnet access. If you want to anonymise your server traffic, you can have your server use the dedicated VM as an exit node, too. Since your dedicated VM has no need to reach out to your Tailnet, because its entire job is limited to providing an exit node through the VPN, this should work without a problem - bidirectional incoming connections should be fine. For example: https://www.reddit.com/r/Tailscale/comments/sjyc6k/route_exit_node_traffic_into_vpn/hvmeiep/

Tailscale themselves also reference the ability for an exit node to support a VPN.

Exit nodes only support one VPN at a time.

—https://tailscale.com/kb/1105/other-vpns#workaround-split-tunnels

If you can solve your Tailscale+VPN problem with a turn-key solution or in a dedicated VM, that significantly cuts down the amount of problem solving you have to do on your server.

I’d probably try something along the lines of docker compose exec caddy touch /media/server/custom/nextcloud/testfile and see if the testfile appears in the bind mounted directory or not. Sure fire way to tell if things are connected the way you think they are.

That is exactly what I would’ve suggested and it’s a great way to rule out other possible name resolution issues.

If that doesn’t work, something is definitely wrong at the Docker network level. I find myself wondering if it’s related to the use of the Nextcloud all-in-one container. The AIO team do some pretty wild stuff by talking directly to your Docker socket and configuring networks, containers etc directly and exactly what they’re doing is not obvious just looking at the compose file. The “mastercontainer”, as far as I am aware, just contains the orchestrator and not the actual instance of Nextcloud itself. If that’s the case and it’s on a different Docker network and exposing ports to host - we could work around that if necessary, there’s a trick we can use to get Caddy inside a container to talk to the Docker host.

But first, did you try renaming the Jellyfin compose service too? Did that also fail? Jellyfin should be a simpler set of circumstances to test, here.

rinkrealclose · July 31, 2024, 2:04am

While working on nextcloud/jellyfin I have removed the gluetun and its tailscale container portions so it shouldn’t be contributing to the problem. I am confused on the difference between gluetun and a virtual machine. It seems that a virtual machine would accomplish the same thing but add overhead. This person had it working: https://lemmy.world/post/7281194. I would be happy to use Mullvad, it’s just I already have a vpn purchased and don’t want to waste the subscription period while I have it.

Wow, that totally didn’t make the testfile. I found this which suggested I do sudo chmod 755 /etc/caddyafter sudo chown -R root:root /etc/caddy. No dice there unfortunately, so I tried doing it via gui in dolphin admin mode

no testfile this way either. Is there another chown command I should be using, or a better file path that docker already allows? I did test touch from the host’s terminal and that created the testfile no problem.

Yes, I changed the caddy object to match jellyfin’s hostname, so both were jellyfin-ts instead of just jellyfin. Definitely caught up on the file permissions at the moment.

Whitestrake · July 31, 2024, 2:42am

Here’s what I know:

A virtual machine has its own entire kernel and networking stack.
Where for example ZeroTier makes one virtual interface for each client+network, Tailscale wants to own the singular tailscale0 interface on the kernel TUN driver, so you can’t run two kernel Tailscale instances on the same kernel.
Containerisation shares a kernel, so you’d need userspace if you want multiple Tailscales.
Userspace wireguard has significant performance overhead compared to kernel wireguard.
The Docker container for Tailscale runs in userspace by default.

Here is what I’d personally be worried about in your scenario:

Containerised Tailscale can’t access the Docker host’s DNS resolution mechanisms. If that’s the case, Magic DNS wouldn’t work directly on the host unless you also run it on the host.
Running Tailscale on the host as well as inside containers means you would have to pick whether the server host itself runs in userspace and has degraded performance or whether the container routing traffic to the anonymising VPN has degraded performance. (Or both.)

Meanwhile, if you have a separate VM with its own kernel and network stack, you can have a host-level Tailscale with a host-level VPN in that VM and simplify that stack for maximum performance. You can then have a single host-level Tailscale on your server for maximum performance and Magic DNS compatibility for all containers hosted on that server.

For the cost of a bit of RAM (which - I don’t know if your server is especially tight on or not, but kernel Tailscale+VPN is mostly network-bound, not RAM or CPU intensive, so you could get by with a pair of cores and maybe 1GB RAM), you get a conceptually simpler and more performant solution. In my assessment, this solution has the maximum benefit, for minimal effort, and minimal conceptual complexity, at the cost of a minor investment of additional hypervisor resources.

If you can get it to work, though, and you prefer to containerise it on a single server, then that’s absolutely respectable too.

Per Services top-level elements | Docker Docs, the syntax should be in the form of VOLUME:CONTAINER_PATH, where VOLUME is a host path on the platform hosting containers (bind mount) or a volume name. I’m guessing it’s hung up on this, although I for sure would’ve assumed Docker would throw an error of some kind if you didn’t have a valid mount specified. I wonder what, exactly, it’s interpreting this volume stanza as?

You could docker inspect the Caddy container if you want, try and see if you can find how Docker interpreted and implemented those volumes.

rinkrealclose · July 31, 2024, 11:00pm

I’ll give it a go: I installed like this via kvm/QEMU

this is a minimal installation (good god so much terminal stuff)
it behaves the same way as my host computer, both work fine independently, but as soon as both are enabled you cannot connect to the internet.

Feels like I’m back to square one on the host computer lol.

Okay I absolutely donked up the filepath, I should have noticed all the other had a colon. After I updated the files to this format /media/server/custom/jellyfin:/jellyfin I can touch that file with docker compose exec caddy touch /nextcloud/testfile
Still though, no luck with commenting out the authkey and reconnecting. I was thinking maybe the Caddyfile doesn’t need the :/jellyfin :/nextcloud but there was no change with/without it. The file remains empty when I start/end it.

Sometimes when the server just gets started Jellyfin will work for a second with the logo, and then say it can’t find a server. I try hitting in every ip/address combination without luck then it will time out. No new errors/logs in caddy or tailscale. Nextcloud will be a blank page less often during this period.

and I get the same errors again:

ERR ts=1722438196.1847224 logger=tls.handshake msg=external certificate manager remote_ip=100.122.199.7 remote_port=35090 sni=nextcloud.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/nextcloud.wallaby-gopher.ts.net?type=pair": context deadline exceeded

ERR ts=1722438216.5582976 logger=tls.handshake msg=external certificate manager remote_ip=100.122.199.7 remote_port=39198 sni=jellyfin.wallaby-gopher.ts.net cert_manager=caddytls.Tailscale cert_manager_idx=0 error=Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": Get "http://local-tailscaled.sock/localapi/v0/cert/jellyfin.wallaby-gopher.ts.net?type=pair": context deadline exceeded

At this point I have left the service names changed as a best practice.

services:
  jellyfin-ts:
  nextcloud-ts:

Whitestrake · August 1, 2024, 12:03am

Hmm. I’m a little stumped by that one. Did you have this working before in Docker?

This shouldn’t be a problem on your exit node VM. The issue arises when the VPN sets aggressive firewall rules to prevent any internet traffic that doesn’t go through the VPN itself. This prevents outbound Tailscale traffic (but should leave inbound connections just fine).

When you connect Tailscale to another exit node, it does a similar thing - aggressively firewall to ensure all outgoing traffic goes through the exit node. If you have a VPN enabled and another exit node enabled, they’ll prevent each other from working.

That said, on your VM you shouldn’t be using any other exit nodes. It itself is the exit node. So I expect internet access to work just fine, and I expect incoming Tailscale connections to work, but I expect outgoing Tailscale connections to break. That’s the acceptable use case for the VM, which is only intended to do one thing: accept incoming connections from your Tailnet (other devices using it as an exit node) and send that traffic out to your VPN.

Did you run it at least once with the auth key uncommented? It needs the auth key to make the files on first run, it can only be removed if the node keys are already on disk.

What was in the Caddy logs when you ran it after fixing the volumes?

Whitestrake · August 1, 2024, 12:11am

I ran a very basic setup to proof-of-concept this. The machine is a NixOS VM running in Proxmox and it has Docker installed and Tailscale installed and joined to my Tailnet.

Here is the entire configuration:

whitestrake at 🌐 pascal in /opt/docker
❯ cat docker-compose.yml
configs:
  Caddyfile:
    content: |
      {
        tailscale {
          auth_key [snip]
          state_dir /tailscale
        }
      }
      https://whoami.fell-monitor.ts.net {
        bind tailscale/whoami
        reverse_proxy whoami
      }

volumes:
  caddy:
  tailscale:

services:
  caddy:
    build:
      dockerfile_inline: |
        FROM caddy:2-builder AS builder
        RUN xcaddy build latest \
          --with github.com/tailscale/caddy-tailscale
        FROM caddy:2
        COPY --from=builder /usr/bin/caddy /usr/bin/caddy
    restart: unless-stopped
    volumes:
      - caddy:/data
      - tailscale:/tailscale
    configs:
      - source: Caddyfile
        target: /etc/caddy/Caddyfile

  whoami:
    image: traefik/whoami
    restart: unless-stopped

Here is the logs from startup:

whitestrake at 🌐 pascal in /opt/docker
❯ docker compose up
Attaching to caddy-1, whoami-1
whoami-1  | 2024/08/01 00:02:09 Starting up on port 80
caddy-1   | {"level":"info","ts":1722470529.3187523,"msg":"using config from file","file":"/etc/caddy/Caddyfile"}
caddy-1   | {"level":"info","ts":1722470529.319486,"msg":"adapted config to JSON","adapter":"caddyfile"}
caddy-1   | {"level":"warn","ts":1722470529.3195,"msg":"Caddyfile input is not formatted; run 'caddy fmt --overwrite' to fix inconsistencies","adapter":"caddyfile","file":"/etc/caddy/Caddyfile","line":2}
caddy-1   | {"level":"info","ts":1722470529.32001,"logger":"admin","msg":"admin endpoint started","address":"localhost:2019","enforce_origin":false,"origins":["//localhost:2019","//[::1]:2019","//127.0.0.1:2019"]}
caddy-1   | {"level":"info","ts":1722470529.3201323,"logger":"http.auto_https","msg":"server is listening only on the HTTPS port but has no TLS connection policies; adding one to enable TLS","server_name":"srv0","https_port":443}
caddy-1   | {"level":"info","ts":1722470529.3201504,"logger":"http.auto_https","msg":"enabling automatic HTTP->HTTPS redirects","server_name":"srv0"}
caddy-1   | {"level":"info","ts":1722470529.3202755,"logger":"tls.cache.maintenance","msg":"started background certificate maintenance","cache":"0xc000149100"}
caddy-1   | {"level":"info","ts":1722470529.3373108,"logger":"tls","msg":"cleaning storage unit","storage":"FileStorage:/data/caddy"}
caddy-1   | {"level":"info","ts":1722470529.3374925,"logger":"tls","msg":"finished cleaning storage units"}
caddy-1   | {"level":"info","ts":1722470529.3393836,"logger":"tailscale","msg":"tsnet running state path /tailscale/whoami/tailscaled.state"}
caddy-1   | {"level":"info","ts":1722470529.357314,"logger":"tailscale","msg":"tsnet starting with hostname \"whoami\", varRoot \"/tailscale/whoami\""}
caddy-1   | {"level":"info","ts":1722470530.3694124,"logger":"tailscale","msg":"LocalBackend state is NeedsLogin; running StartLoginInteractive..."}
caddy-1   | {"level":"info","ts":1722470530.3694654,"logger":"http","msg":"enabling HTTP/3 listener","addr":"whoami:443"}
caddy-1   | {"level":"info","ts":1722470533.2040713,"msg":"connection doesn't allow setting of receive buffer size. Not a *net.UDPConn?. See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details."}
caddy-1   | {"level":"info","ts":1722470533.204147,"logger":"http.log","msg":"server running","name":"srv0","protocols":["h1","h2","h3"]}
caddy-1   | {"level":"info","ts":1722470533.2041695,"logger":"http.log","msg":"server running","name":"remaining_auto_https_redirects","protocols":["h1","h2","h3"]}
caddy-1   | {"level":"info","ts":1722470533.2043512,"msg":"autosaved config (load with --resume flag)","file":"/config/caddy/autosave.json"}
caddy-1   | {"level":"info","ts":1722470533.204362,"msg":"serving initial configuration"}
caddy-1   | {"level":"info","ts":1722470535.3704717,"logger":"tailscale","msg":"AuthLoop: state is Running; done"}

And here is the result from curling the new node:

whitestrake at 🌐 pascal in /opt/docker
❯ curl https://whoami.fell-monitor.ts.net
Hostname: 78f2a9f88e90
IP: 127.0.0.1
IP: 172.18.0.3
RemoteAddr: 172.18.0.2:42926
GET / HTTP/1.1
Host: whoami.fell-monitor.ts.net
User-Agent: curl/8.7.1
Accept: */*
Accept-Encoding: gzip
X-Forwarded-For: 100.106.140.61
X-Forwarded-Host: whoami.fell-monitor.ts.net
X-Forwarded-Proto: https

I can also see that node keys were persisted:

whitestrake at 🌐 pascal in /opt/docker
❯ dc run --rm caddy ls -al /tailscale/whoami
total 24
drwx------    4 root     root          4096 Aug  1 00:09 .
drwxr-xr-x    3 root     root          4096 Aug  1 00:02 ..
drwx------    2 root     root          4096 Aug  1 00:02 certs
drwx------    3 root     root          4096 Aug  1 00:02 files
-rw-------    1 root     root           209 Aug  1 00:02 tailscaled.log.conf
-rw-------    1 root     root             0 Aug  1 00:09 tailscaled.log1.txt
-rw-------    1 root     root             0 Aug  1 00:09 tailscaled.log2.txt
-rw-------    1 root     root          2775 Aug  1 00:09 tailscaled.state

rinkrealclose · August 1, 2024, 8:07pm

Okay so your config super doesn’t work with my filepaths. So, I adapted to the way of the volume! Then your config file worked on my machine, curl and persistence all. I copied it over to the jellyfin-compose file and we now have a successful persistence!
Unfortunately still not connecting and getting an error from nextcloud, but not jellyfin

ERR ts=1722534209.9936292 logger=http.log.error msg=dial tcp 172.20.0.4:11000: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"49124","client_ip":"100.122.199.7","proto":"HTTP/1.1","method":"GET","host":"nextcloud.wallaby-gopher.ts.net","uri":"/index.php/204","headers":{"Accept":["*/*"],"X-Request-Id":["6bab3a6e-d821-49fc-8c93-4bcba635de7f"],"Cookie":["REDACTED"],"Accept-Encoding":["gzip, deflate"],"Accept-Language":["en-US,*"],"Authorization":["REDACTED"],"User-Agent":["Mozilla/5.0 (Linux) mirall/3.13.2git (Nextcloud, arch-6.10.2-arch1-1 ClientArchitecture: x86_64 OsArchitecture: x86_64)"],"Connection":["Keep-Alive"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"","server_name":"nextcloud.wallaby-gopher.ts.net"}} duration=0.001720952 status=502 err_id=2azp60e6z err_trace=reverseproxy.statusError (reverseproxy.go:1269)

Jellyfin opens here, but again no Ips work, but now it doesn’t time out!

Noticing yours doesn’t have a port I tried removing it after the reverse proxy and can get an error with either jellyfin/nextcloud

ERR ts=1722534699.102591 logger=http.log.error msg=dial tcp 172.18.0.3:80: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"42278","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/web/serviceworker.js","headers":{"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["*/*"],"Sec-Gpc":["1"],"Sec-Fetch-Dest":["serviceworker"],"Priority":["u=4"],"Te":["trailers"],"Accept-Language":["en-US,en;q=0.5"],"If-None-Match":["\"1dadb2d244eb780\""],"Cache-Control":["max-age=0"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Service-Worker":["script"],"Sec-Fetch-Mode":["same-origin"],"Sec-Fetch-Site":["same-origin"],"If-Modified-Since":["Sun, 21 Jul 2024 05:16:29 GMT"],"Dnt":["1"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.000489597 status=502 err_id=18cyk0rne err_trace=reverseproxy.statusError (reverseproxy.go:1269)

seeing as these are the same error but the wrong port now, I must be doing something wrong with the reverse proxy. At least we can see it’s trying to dial the right IP now. I tried adding :443 again, no dice. I tried changing the container names with -ts again, no change. I’m not sure what else I could do though? It feels like we’re close now, though!
Here’s the full compose file with the cleaner caddyfile portion thanks to you:

configs:
  Caddyfile:
    content: |
      {
        tailscale {
          #auth_key tskey-auth-[snip]
          state_dir /tailscale
        }
      }
      https://jellyfin.wallaby-gopher.ts.net: {
        bind tailscale/jellyfin
        reverse_proxy jellyfin:8096
      }
      https://nextcloud.wallaby-gopher.ts.net {
        bind tailscale/nextcloud
        reverse_proxy nextcloud:11000
      }

volumes:
  nextcloud_aio_mastercontainer:
    name: nextcloud_aio_mastercontainer # This line is not allowed to be changed
  caddy:
  tailscale:
services:
  jellyfin:
    image: jellyfin/jellyfin
    container_name: "jellyfin-ts"
    user: 1000:1000
    volumes:
      - /media/server/server/jellyfin-server/config:/config
      - /media/server/server/jellyfin-server/cache:/cache
      # ro means read only, we don't want jellyfin accidentally deleting our files
      - /media/16tb/Shows and Movies/Movies:/Movies:ro
      - /media/16tb/Shows and Movies/Shows:/Shows:ro
    restart: unless-stopped
    depends_on:
      - caddy

  caddy:
    build:
        dockerfile_inline: |
          FROM caddy:2-builder AS builder
          RUN xcaddy build latest \
            --with github.com/tailscale/caddy-tailscale
          FROM caddy:2
          COPY --from=builder /usr/bin/caddy /usr/bin/caddy
    hostname: caddy
    container_name: "caddy"
    ports:
      - "80:80"
      - "443:443"
      - "443:443/udp"
    volumes:
      - caddy:/data
      - tailscale:/tailscale
    configs:
      - source: Caddyfile
        target: /etc/caddy/Caddyfile
    restart: unless-stopped


  #nextcloud
  nextcloud:
    image: nextcloud/all-in-one:latest
    restart: always
    container_name: nextcloud-aio-mastercontainer # This line is not allowed to be changed as otherwise AIO will not work correctly
    volumes:
      - nextcloud_aio_mastercontainer:/mnt/docker-aio-config # This line is not allowed to be changed as otherwise the built-in backup solution will not work
      - /var/run/docker.sock:/var/run/docker.sock:ro # May be changed on macOS, Windows or docker rootless. See the applicable documentation. If adjusting, don't forget to also set 'WATCHTOWER_DOCKER_SOCKET_PATH'!
    ports:
      - 8080:8080
    environment: # Is needed when using any of the options below
      # - AIO_DISABLE_BACKUP_SECTION=false # Setting this to true allows to hide the backup section in the AIO interface. See https://github.com/nextcloud/all-in-one#how-to-disable-the-backup-section
      #- SKIP_DOMAIN_VALIDATION=true #might not be helping?
      - APACHE_PORT=11000 # Is needed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      - APACHE_IP_BINDING=0.0.0.0 # Should be set when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else) that is running on the same host. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      # - BORG_RETENTION_POLICY=--keep-within=7d --keep-weekly=4 --keep-monthly=6 # Allows to adjust borgs retention policy. See https://github.com/nextcloud/all-in-one#how-to-adjust-borgs-retention-policy
      # - COLLABORA_SECCOMP_DISABLED=false # Setting this to true allows to disable Collabora's Seccomp feature. See https://github.com/nextcloud/all-in-one#how-to-disable-collaboras-seccomp-feature
      # - NEXTCLOUD_MOUNT=/mnt/ # Allows the Nextcloud container to access the chosen directory on the host. See https://github.com/nextcloud/all-in-one#how-to-allow-the-nextcloud-container-to-access-directories-on-the-host
      - NEXTCLOUD_UPLOAD_LIMIT=1G # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-upload-limit-for-nextcloud
      - NEXTCLOUD_MAX_TIME=3600 # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-max-execution-time-for-nextcloud
      - NEXTCLOUD_MEMORY_LIMIT=1024M # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-php-memory-limit-for-nextcloud
      # - NEXTCLOUD_TRUSTED_CACERTS_DIR=/path/to/my/cacerts # CA certificates in this directory will be trusted by the OS of the nexcloud container (Useful e.g. for LDAPS) See See https://github.com/nextcloud/all-in-one#how-to-trust-user-defined-certification-authorities-ca
      # - NEXTCLOUD_ADDITIONAL_APKS=imagemagick # This allows to add additional packages to the Nextcloud container permanently. Default is imagemagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-os-packages-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ADDITIONAL_PHP_EXTENSIONS=imagick # This allows to add additional php extensions to the Nextcloud container permanently. Default is imagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-php-extensions-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ENABLE_DRI_DEVICE=true # This allows to enable the /dev/dri device in the Nextcloud container. ⚠️⚠️⚠️ Warning: this only works if the '/dev/dri' device is present on the host! If it should not exist on your host, don't set this to true as otherwise the Nextcloud container will fail to start! See https://github.com/nextcloud/all-in-one#how-to-enable-hardware-transcoding-for-nextcloud
      - TALK_PORT=3478 # This allows to adjust the port that the talk container is using. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-talk-port
      # - WATCHTOWER_DOCKER_SOCKET_PATH=/var/run/docker.sock # Needs to be specified if the docker socket on the host is not located in the default '/var/run/docker.sock'. Otherwise mastercontainer updates will fail. For macos it needs to be '/var/run/docker.sock'
      # networks: # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      # - nextcloud-aio # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      - trusted_domains=nextcloud.wallaby-gopher.ts.net #should I use dbhost=? #Think both are wrong according to https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md#adapting-the-sample-web-server-configurations-below
    depends_on: 
      - caddy

Whitestrake · August 2, 2024, 12:21am

ERR ts=1722534209.9936292 logger=http.log.error msg=dial tcp 172.20.0.4:11000: connect: connection refused [...]

So, Caddy resolved the nextcloud service to IP 172.20.0.4 and tried to connect on port 11000. So, Caddy is working as configured. The service refused the connection, though.

I have to assume this is due to the way the Nextcloud AIO works… Yep, looking at Caddy Docker Compose Example · nextcloud/all-in-one · Discussion #575 · GitHub I can see evidence of this as they’ve had to work around the problem. They solved the issue by putting Caddy on the host networking stack, though. That won’t work for you because you want to be inside the Compose network to proxy to Jellyfin, too.

So - Nextcloud doesn’t run inside the mastercontainer. It runs in a separate container that the mastercontainer spins up. This separate container listens on the Docker host on the port specified in the APACHE_PORT env var.

That means you need Caddy to talk to the Docker host, not to the Nextcloud service inside the Compose network.

Add this to your Caddy service:

    extra_hosts:
      - "host.docker.internal:host-gateway"

And change your Caddyfile from: reverse_proxy nextcloud:11000

To: reverse_proxy host.docker.internal:11000

This will have Caddy connect to the host itself, on which Nextcloud-AIO will establish a port 11000 listener on for your Nextcloud app.

Did you try https://jellyfin.wallaby-gopher.ts.net in that box instead of an IP address?

rinkrealclose:

Noticing yours doesn’t have a port I tried removing it after the reverse proxy and can get an error with either jellyfin/nextcloud

ERR ts=1722534699.102591 logger=http.log.error msg=dial tcp 172.18.0.3:80: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"42278","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/web/serviceworker.js","headers":{"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["*/*"],"Sec-Gpc":["1"],"Sec-Fetch-Dest":["serviceworker"],"Priority":["u=4"],"Te":["trailers"],"Accept-Language":["en-US,en;q=0.5"],"If-None-Match":["\"1dadb2d244eb780\""],"Cache-Control":["max-age=0"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Service-Worker":["script"],"Sec-Fetch-Mode":["same-origin"],"Sec-Fetch-Site":["same-origin"],"If-Modified-Since":["Sun, 21 Jul 2024 05:16:29 GMT"],"Dnt":["1"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.000489597 status=502 err_id=18cyk0rne err_trace=reverseproxy.statusError (reverseproxy.go:1269)

seeing as these are the same error but the wrong port now, I must be doing something wrong with the reverse proxy. At least we can see it’s trying to dial the right IP now.

Don’t remove the port. It needs to be going to the correct port. If you remove the port Caddy will assume port 80 (the default for HTTP, which is the assumed scheme unless you specify a HTTPS upstream). I have no port on my whoami proxy because whoami listens for HTTP on port 80 by default so there was no need to specify it.

I think you’ve said you’ve tried this a few times over our various troubleshooting, but I’m not sure why - can you tell me what the indication was that this might have been a possible solution? I don’t know that at any point it made sense to throw port 443 on anything. Caddy will handle HTTPS, you shouldn’t need to worry about that. And your services aren’t running on port 443, they’re serving - as far as I know - HTTP on other ports.

You could move your configuration back to a separate volume-mounted Caddyfile if you liked.

The container_name: "jellyfin-ts" doesn’t serve much purpose at all since Caddy is proxying to jellyfin and Docker DNS is using the service name (not the container name) to resolve an IP for it.

rinkrealclose · August 2, 2024, 2:14am

Ayyy you got nextcloud working again!! Nice job!! YESSS!!! I was sooo close to giving up. Thank you!! I figured out jellyfin’s problem (it was me so you can skip the below trouble shooting). Here’s what was happening:

When I had tailscale working by itself it was not necessary to hit in a server.
Here’s a list of other IP’s I’ve tried:

https://jellyfin.wallaby-gopher.ts.net:8096
https://jellyfin.wallaby-gopher.ts.net:8920
172.18.0.3
172.18.0.3:8096
https://172.18.0.3
100.102.29.56
100.102.29.56:8096
https://100.102.29.56

Looking back at the guide they threw this in the caddy file:

127.0.0.1 {
	reverse_proxy jellyfin:8096
}

Looking at the jellyfin logs I’m like, maybe I need to do the same even though in portainer it’s labeled as 172.18.0.3

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Defined LAN subnets: ["127.0.0.1/8", "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Defined LAN exclusions: []

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Used LAN subnets: ["127.0.0.1/8", "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Filtered interface addresses: ["127.0.0.1", "172.18.0.3"]

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Bind Addresses ["0.0.0.0"]

[01:47:26] [INF] [1] Jellyfin.Networking.Manager.NetworkManager: Remote IP filter is Allowlist

All it adds in a warning in caddy though and the same server add page

WRN ts=1722563249.706316 logger=tls msg=stapling OCSP error=no OCSP stapling for [127.0.0.1]: no OCSP server specified in certificate identifiers=["127.0.0.1"]

So I thought I’d try the same thing as nextcloud

ERR ts=1722563572.2223425 logger=http.log.error msg=dial tcp 172.17.0.1:8096: connect: connection refused request={"remote_ip":"100.122.199.7","remote_port":"60034","client_ip":"100.122.199.7","proto":"HTTP/2.0","method":"GET","host":"jellyfin.wallaby-gopher.ts.net","uri":"/web/serviceworker.js","headers":{"Dnt":["1"],"Sec-Fetch-Mode":["same-origin"],"Te":["trailers"],"User-Agent":["Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0"],"Accept":["*/*"],"Service-Worker":["script"],"Sec-Gpc":["1"],"Sec-Fetch-Dest":["serviceworker"],"Sec-Fetch-Site":["same-origin"],"Priority":["u=4"],"Cache-Control":["max-age=0"],"Accept-Language":["en-US,en;q=0.5"],"Accept-Encoding":["gzip, deflate, br, zstd"]},"tls":{"resumed":false,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"jellyfin.wallaby-gopher.ts.net"}} duration=0.00020961 status=502 err_id=6zn8cs5ru err_trace=reverseproxy.statusError (reverseproxy.go:1269)

Interesting that NONE of the containers have that IP

I saw this post saying you need to clear the cache, no dice.
Turns out, it was the damn local paths to configs again. I moved it into a volume and it immediately started working.

Oh heavens no, I think it’s neat you formatted it into one file like that!!
Okay, I removed that line.