DNS and Caddy configuration synchronization issue during AWS ECS deployments

RJM · December 18, 2024, 10:57am

1. The problem I’m having:

During automated deployments in AWS ECS Fargate, we’re experiencing DNS resolution issues with Caddy. Our setup consists of two containers in the same task definition: an API (Node.js) and Caddy as reverse proxy. The workflow is:

ECS deploys a new task
A script updates Route53 A record with the new task’s IP and waits for INSYNC status
API container starts and waits for health check
Caddy container starts and proxies to the API (127.0.0.1:3000)

After a CI/CD pipeline deployment, Caddy seems unable to properly handle DNS resolution, and requests fail. However, if we manually stop the task and let ECS create a new one automatically, everything works perfectly. This suggests there might be an issue with how Caddy handles DNS resolution during the initial deployment.

2. Error messages and/or full log output:

{
  "level": "info",
  "ts": 1734084306.6575897,
  "logger": "http.handlers.reverse_proxy",
  "msg": "selected upstream",
  "dial": "127.0.0.1:3000",
  "total_upstreams": 1
}

3. Caddy version:

Caddy 2.8.4-alpine

4. How I installed and ran Caddy:

a. System environment:

AWS ECS Fargate
Two containers in same task definition
Container 1: Node.js API
Container 2: Caddy reverse proxy
AWS Route53 for DNS management

b. Command:

caddy run --config /etc/caddy/Caddyfile --adapter caddyfile

c. Service/unit/compose file:

container_definitions = jsonencode([
  {
    name      = "caddy"
    image     = "caddy:2.8.4-alpine"
    essential = true
    command = [
      "sh", "-c", <<-EOT
        echo '{
          log {
            output stdout
            format json
            level DEBUG
          }
          storage file_system {
            root "/data/caddy"
          }
        }
        https://api-v1.domain.com {
          reverse_proxy 127.0.0.1:3000 {
            header_up Access-Control-Allow-Origin "*"
            header_up Access-Control-Allow-Methods "*"
            header_up Access-Control-Allow-Headers "*"
            header_up Access-Control-Allow-Credentials "true"
            header_up Access-Control-Expose-Headers "*"
            health_uri /api/health
            health_interval 60s  
            max_fails 10          
          }
        }' > /etc/caddy/Caddyfile && sleep 40 &&
        caddy run --config /etc/caddy/Caddyfile --adapter caddyfile
      EOT
    ]
    portMappings = [
      {
        containerPort = 80
      },
      {
        containerPort = 443
      }
    ]
  }
])

d. My complete Caddy config:

caddy

{
  log {
    output stdout
    format json
    level DEBUG
  }
  storage file_system {
    root "/data/caddy"
  }
}

https://api-v1.domain.com {
  reverse_proxy 127.0.0.1:3000 {
    header_up Access-Control-Allow-Origin "*"
    header_up Access-Control-Allow-Methods "*"
    header_up Access-Control-Allow-Headers "*"
    header_up Access-Control-Allow-Credentials "true"
    header_up Access-Control-Expose-Headers "*"
    health_uri /api/health
    health_interval 60s
    max_fails 10
  }
}

5. Links to relevant resources:

The key aspect is that manually stopping and letting ECS recreate the task always fixes the issue, suggesting there might be something in how Caddy handles DNS resolution during the initial deployment that differs from subsequent task creations.

francislavoie · December 22, 2024, 10:39am

You haven’t actually shown a problem here. I don’t understand what you mean. That debug log is normal behaviour of the proxy.

RJM · January 2, 2025, 9:01am

Hello @francislavoie , thanks for being interested in my issue.

The main problem might not be directly caused by caddy. It can be just a configuration issue or just a lack of understanding by my side. The key here might be the knowledge of how AWS/ECS works in IP/DNS resolutions every when a new service is deployed.

The main problem is this:

So, after a CI/CD deployment, there is a new IP generated, but DNS route 53 is updated so it should be no problem becuase caddy is configured by dns and not by IP. BUT it actually doesn’t work. Caddy does his work but it goes to nowhere. (I have struggling with logs but I can’t have proper logs to show my problem. getting better logs can be a nice help for me here)

system · February 1, 2025, 9:02am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.