Errors, what a pain. Is there any good general advice on hanlding them? What about generating them? Let's do a brain dump
Putting your logs in a central queryable repository can be a massive boon to quality especially relative to the low cost. I'm a fan of the ELK stack (elastic search, logstack and kibana), but anything is fine (just be mindful not to flood yourself).
The one thing most likely to undermine your centralized logging is having a poor signal-to-noise ratio. Error logs should be actionable, even if that action is to stop logging a specific error. You can log informational data, but errors should be distinctly queryable.
Not only should error logs be actionable, but actioning them should be the top priority. Mainly because, if you have errors, you should fix them. Resolving them also helps keep your logs clean. Finally and more subtly, some errors are easier to figure as they're happening.
Many errors can't be handled but also shouldn't be silenced. Intermittently dropped connections and timeouts are common example. Move these to counters and set up alerts to fire when a threshold over a period is reached (or some fancier anomaly detection). A dropped connection a day might be fine, but 10 an hour might require investigation.
A simple way to improve the usefulness of logs is to give them a static label such as "connection failed". Without this, it can be hard to group errors together if code gets refactored, line number of function names change, or if there's any dynamic data in the error details. Grouping is important to know when it first and last happened, how often it's happening, has the rate increased and so on
Moving into code.