Troubleshooting

I try to document when things does not work as I intended them to.

General "Rule of Thumb" Workflow

Check State: docker ps -a (Is it restarting? Exited?)
Check Logs: docker logs <container_name> --tail 50 (Read the actual error)
Check Resources: docker stats --no-stream (Is CPU/RAM spiked?)
Check Connections: curl -v http://<ip>:<port> (Is the port actually open?)

High CPU / Resource Usage

Symptom System fans spin up, load average spikes, or UI becomes unresponsive.

Check Use docker stats to identify the offending container. The --no-stream flag gives a clean snapshot instead of a jumping live feed.

docker stats --no-stream

Case Study: The "Log Flood" (CrowdSec)

Observation: CrowdSec using 400% CPU. Logs show rapid processing of internal IP addresses.
Diagnosis: Caddy was logging internal health checks (from Homepage), flooding the parser.
The Fix: Configure Caddy to skip logging for internal IPs.
- Code: log_skip @internal in Caddyfile.

Case Study: The "Engine Bottleneck" (CrowdSec)

Observation: Logs are quiet (no flood), but CrowdSec CPU is still high (\~500%).
Diagnosis: Default CrowdSec config is single-threaded. On my CPU (Ryzen 7600X), this creates a queue backlog.
The Fix:
1. Parallelization: Edit crowdsec/config/config.yaml to increase parser_routines to 6 (matching CPU cores).
2. Polling Frequency: Edit Caddyfile to increase ticker_interval to 60s (reduces how often Caddy asks the Agent for updates).

Connection Refused / Service Down

Symptom A service (like Portainer or WUD) cannot connect to another service (like Socket Proxy), showing ECONNREFUSED or ENOTFOUND.

Check

Check if the target is actually listening on the expected port from inside the network.

# 1. Check if the target is running
docker ps | grep socket-proxy

# 2. Check container logs for startup errors
docker logs socket-proxy

# 3. Verify Internal DNS resolution (from another container)
docker exec -it wud ping socket-proxy

Case Study: WUD & Portainer vs. Socket Proxy

Observation: WUD logs showed getaddrinfo ENOTFOUND socket-proxy.
Diagnosis: Docker's internal DNS sometimes fails to resolve container names immediately on boot.
The Fix: Switch from Hostname (socket-proxy) to Static IP (172.20.0.28).
- Config: WUD_WATCHER_LOCAL_HOST=172.20.0.28

Storage / Disk Missing in Dashboard

Symptom

Homepage reports Drive not found for target: /mnt/<name>_disk or Beszel shows 0 Disk I/O.

Check

Verify what the container actually sees mounted.

docker exec homepage df -h

Case Study: The "Ghost" Mount (Homepage)

Observation: df -h inside the container did NOT show the mount, even though compose had it.
Diagnosis: Startup Race Condition. Docker started before the OS finished mounting the LVM drive.
The Fix: Point the widget to /app/config (internal path) instead of an external /mnt path.

Case Study: Beszel "Zero Speed"

Observation: Beszel showed Disk Usage (%) but 0 MB/s Read/Write.
Diagnosis: Docker mounts (- /mnt/pool01:/data) abstract the file system. The container cannot see the Kernel Device (dm-0).
The Fix: Use Device Mapper mounting.
Config: /mnt/pool01/media/.beszel:/extra-filesystems/dm-2__Media:ro

External Access Fails (Mobile Only)

Symptom

Jellyfin works on LAN (Wi-Fi) and Desktop Browser (4G), but fails on Android App (4G).

Tools: SSL Labs Server Test (ssllabs.com).

Case Study: The IPv6 Trap

Observation: SSL Labs showed IPv6 test failing.
Diagnosis: 4G Mobile networks prioritize IPv6. Cloudflare was publishing an AAAA record, but our host wasn't routing IPv6 ingress correctly.
The Fix: Deleted AAAA records in Cloudflare to force IPv4.

Homepage Issues

Issue: "API Error" on Storage Widgets

Fix: Ensure volume is mounted (- /mnt/pool01/media:/mnt/media_disk:ro).

Issue: "Host validation failed"

Fix: Set HOMEPAGE_ALLOWED_HOSTS=* in compose.

Issue: CrowdSec Widget Error

Fix: If DB was wiped, update .env with new credentials from crowdsec/config/local_api_credentials.yaml.

WUD (What's Up Docker) Issues

Issue : Duplicate Containers / "Ghosts"

Symptom: WUD shows 2 entries for every container (one "Local", one "Proxy").
Cause: Defining WUD_WATCHER_PROXY_... creates a second watcher, while the default local watcher still tries to run.
Fix: "Hijack" the local watcher by setting WUD_WATCHER_LOCAL_HOST=172.20.0.28 and removing all Proxy variables.

Issue: Updates not showing

Fix: WUD scans on a CRON schedule. Restart the container (docker restart wud) to force an immediate re-scan.

Issue: Shutdown Corruption (Exit Code 137)

Symptom: WUD took 10.2s to stop and messed up its DB.
Diagnosis: Docker's default kill timer is 10s. Heavy apps need more time to save state.
Fix: Added "shutdown-timeout": 30 to /etc/docker/daemon.json

GoAccess WebSocket Errors

Symptom

Dashboard loads, but the connection icon is Red/Disconnected.

Case Study: Origin Mismatch

Observation: Browser console showed 400 Bad Request.
Diagnosis: GoAccess has strict security. The URL in the browser (http://192.168.x.x) did not match the --origin flag in the container.
Fix: Ensure --origin matches the exact URL we use to visit the dashboard.