I really like GoAccess (https://goaccess.io/) as a tool for convenient and quick analysis of access logs . . . it shares a philosophy, if not its development language, with Caddy in that it is self-contained and stand-alone with no dependencies (and can even generate self-contained access log file reporting in a single HTML file, that you could then auto-deploy on your web site) . . . but, currently, it is difficult to obtain the full benefit from GoAccess with Caddy log files as there is not a log file format shared by both Caddy and GoAccess that passes the full set of Caddy log data that GoAccess can use (other than the somewhat limited, common format).
I believe that implementing JSON log file input is on the GoAccess list/roadmap somewhere, but that doesnāt help us now.
To solve this issue, I have written a python3 script that will convert Caddy JSON log data into a format that maximises the input to GoAccess. This converts all of the relevant available data [that GoAccess can make use of] from Caddy JSON log data into a format that can then be parsed and understood by GoAccess.
The latest version can even support real-time analysis of log files (with html output) by leveraging TCP/IP network sockets (read all the way to the bottom of this thread to see some examples of how to use it). Access the latest version of this python script in my github repository where you can download it . . . CaddyGoAccessDataLoggerConverter
The initial python code example (below)
#!/usr/bin/python3
""" ***************************************************************************
Takes a caddy JSON log file as input and converts it into a format suitable for
analysis by goaccess. To use the output file, goaccess must be run with the
log-format specified as detailed below
goaccess caddy.log --log-format="%d %t %v %h %m %U %H %s %b %T %R %u" --date-format=%F --time-format=%H:%M:%S >caddy.html
Note: when running caddy the Caddyfile must define "format json" as shown in
the example below
localhost {
file_server
log {
output file caddy.log {
roll_local_time true
}
format json
}
}
*************************************************************************** """
import sys, signal, getopt, json
from datetime import datetime
from time import sleep
interval = 0
inputfile = ''
def shortHelp():
print()
print(' logConvert.py -i <interval time in seconds> filename')
print()
def longHelp():
print()
print(' logConvert.py -i <interval time in seconds> filename')
print()
print(' If interval is zero (or -i option is ommitted) logConvert will process the')
print(' input file and then exit, otherwise logConvert will process the input file')
print(' then sleep for the specified interval before processing any additional')
print(' entries (and then repeat indefinately until terminated)')
print()
print(' output from logConvert can be caputred to a file by adding " >outputfilename"')
print(' to the end of the command line')
print()
print(' To then process the output file with goaccess, use the following command')
print()
print(' goaccess caddy.log --log-format="%d %t %v %h %m %U %H %s %b %T %R %u" --date-format=%F --time-format=%H:%M:%S >caddy.html')
print()
def processArgs(argv):
global interval
global inputfile
try:
opts, args = getopt.getopt(argv,"hi:",["interval="])
except getopt.GetoptError:
longHelp()
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
longHelp()
sys.exit()
elif opt in ("-i", "--interval"):
try:
interval = int(arg)
except:
print()
print(' Interval must be a whole number of seconds')
print()
sys.exit(2)
if (len(args) > 0):
inputfile = args[0]
else:
longHelp()
sys.exit(2)
def main():
processArgs(sys.argv[1:])
try:
with open(inputfile) as f:
while True:
line = f.readline()
if (line != ""):
data =(json.loads(line))
ts = str(datetime.fromtimestamp(data['ts']))
date = ts[0:10]
time = ts[11:19]
ip = (data['request']['remote_addr'])
ip = ip[0:ip.rindex(':')]
if (ip[0] == '['):
ip = ip[1:ip.rindex(']')]
if "Referer" in data['request']['headers'].keys():
referer = '"'+data['request']['headers']['Referer'][0]+'"'
else:
referer = '""'
if "User-Agent" in data['request']['headers'].keys():
user_agent = '"'+data['request']['headers']['User-Agent'][0]+'"'
else:
user_agent = '""'
latency=data['duration']
print(
date, # %d
time, # %t
'"'+(data['request']['host'])+'"', # %v
ip, # %h
data['request']['method'], # %m
'"'+(data['request']['uri'])+'"', # %U
data['request']['proto'], # %H
data['status'], # %s
data['size'], # %b
latency, # %T
referer, # %R
user_agent # %u
)
elif (interval > 0):
sleep(interval)
elif (interval == 0):
break
except FileNotFoundError:
print()
print(' Input file "{}" not found'.format(inputfile))
print()
def signal_handler(signal, frame):
sys.exit(0)
if __name__ == "__main__":
signal.signal(signal.SIGINT, signal_handler)
main()
Copy the above text into a file called logConvert.py and use chmod +x logConvert.py to make it executable. The Caddy log file can then be converted by using the following command:
./logConvert.py access.log >access.goaccess.log
To continually monitor and convert the Caddy log file [i.e. stream the logs] use the following command instead:
./logConvert.py -i 300 access.log >access.goaccess.log
This will convert all of the current content of the Caddy log file, write it out into the specified file (access.goaccess.log in this case) and then go to sleep for 300 seconds. Once the sleep period has elapsed, any further entires in the Caddy log file will be processed and added to the output file. By using this approach the output file can be continually updated/synchronised with Caddy log files in near real time (or at whatever interval you choose).
Alternatively, using cron, logConvert.py could be executed in the early hours of the morning each day to process the Caddy log file. If this were also combined with automatically executing goaccess after completion to create a self-contained html output file, a static html file could be generated every day and made available as a page on your web site.
The output from logConvert.py is in the following format
2020-08-03 19:17:37 "example.com" 192.168.1.1 GET "/" HTTP/1.1 200 458 0.005674565 "" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
2020-08-03 18:35:53 "example.com" 192.168.100.3 GET "/wp-login.php" HTTP/1.1 404 0 0.000298749 "http://example.com/wp-login.php" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
2020-08-03 19:04:35 "example.com" 192.168.200.56 GET "/admin/" HTTP/1.1 404 0 0.000482654 "http://example.com/admin/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0"
By virtue of including the āhostā name in the output (āexample.comā as shown above) , logConvert.py is ideally suited to a multiple domain/sub-domain scenario for which GoAccess can also provide analysis.