Latency metrics

tas · December 2, 2022, 6:09pm

Caddy v2.6.2

Hi,

I cannot make sense of latency metrics from Caddy in Prometheus/Grafana.

Requests/s seems fine, but latency doesn’t add up.
I run load test with Loader.io or locally via wrk, 2k requests sustained over 1min.

According to Loader.io I get average 95ms response time, in Grafana it’s more like 721ms.
I tried reading histograms from caddy_http_request_duration_seconds_bucket, but not sure how to interpret them either.

Is it possible that Loader and wrk are wrong, and Caddy metrics are right?

Sample queries from docs didn’t yield any better results.

Thank you

matt · December 2, 2022, 6:50pm

/cc @hairyhenderson might have a clue!

hairyhenderson · December 19, 2022, 12:48pm

Hi @tas,

Prometheus scrapes metrics at an interval, by default every 15s. From squinting at your Requests panel it looks like that’s what you’re using. For a 1 minute load test, this means you only get 4 samples at best during the test, though probably more like 3, so the measurement is going to be pretty inaccurate.

On the other hand, load testing tools like loader.io will record every duration from every request client-side, giving you a complete view client-side.

When performance testing like this, IMO you should running the test for much longer than 1min - at least 5-10 mins, if not longer, to gather enough data server-side to do a meaningful rate calculation.

Also, I’d recommend ignoring average response time in general. Much more useful would be to look at 95th or 99th (or even 99.9th) percentile, to get a better idea of what actual users will experience when loading pages involving multiple requests. See this classic talk for some details on why.

Hope this helps, and sorry for the late reply!

system · January 1, 2023, 6:10pm

This topic was automatically closed after 30 days. New replies are no longer allowed.