Software Engineering Management & Warning logs vs Fatal logs
How to handle side effects
An underrated activity in software development from my experience is logging management. I do not want to dwell on the fact that what is recorded is often written in a way not understandable, or without a fixed scheme (which helps the log collection and analysis for SaaS products through dedicated software such as Logstash); instead I would like to think about the situation in which one is in doubt if it is better to log either a WARNING message or an ERROR message or even a FATAL one.
I ‘ve never been able to have a 100% consensus on the log level for critical situations, mainly because people sometimes feel the criticality in completely different ways. But to be as concrete and objective as possible I’ll begin to quote the common standard meaning for these three log levels:
- WARNING: it is used when you have detected an unexpected application problem. This means you are not quite sure whether the problem will recur or remain. You may not notice any harm to your application at this point. This issue is usually a situation that stops specific processes from running. Yet it does not mean that the application has been harmed. In fact, the code should continue to work as usual. You should eventually check these warnings just in case the problem reoccurs.
- ERROR: it does not mean your application is aborting. Instead, there is just an inability to access a service or a file. This ERROR shows a failure of something important in your application. This log level is used when a severe issue is stopping functions within the application from operating efficiently. Most of the time, the application will continue to run, but eventually, it will need to be addressed.
- FATAL: it means that the application is about to stop a serious problem or corruption from happening. The FATAL level of logging shows that the application’s situation is catastrophic, such that an important function is not working.
Just from the above descriptions you can see how sometimes it is hard to select one or another log level. But the situation I face most of the time is the one where your application has to start with a given input shaped in a specific (even complex) way. What should we log if the application recognizes that the input data is missing or even incorrectly modeled?
If these incorrect inputs broke the functionality of the application or even if the output produced are meaningless I would like to stop the application as soon as possible informing of the detected problems for which the abnormal behaviour has been observed. So this goes more in the approach of the FATAL management. Clearly this is a difficult road and probably the best option I would desire is to have an application designed as robust as possible to be resilient against errors and therefore able to move foreward without degradation (especially in a SaaS context), but it is not easy and not feasible everytime. Peraphs the situation can be mitigated if your application has a User Interface and you can visually notify the user that something is going wrong and you can disable the affected functionalities at the UI level. While this approach is possible, it is still not easy to follow because it requires a fairly good and modular design of your application. But what if your application does not have any UI, but only a set of APIs? I think that even in this case the approach should be robust in the sense that the API responses should contain the the error info.
Something that I totally disagree is when I hear proposals where it is claimed to go on with the application just logging WARNING or ERROR and not giving any other feedback to the user, assuming that he will be able to understand the results he is getting are corrupted and so that he will inspect the log files. This type of approach is failing for me in both the assumptions because nobody can grant the user will be capable to recognized crapped outputs, and why should we think that the log file will be read by the user? This type of approach is rooted in the assumption that you have clever users.
The key to a product’s success is its simplicity and it does not mean that it has zero complexity, but it is shaped in a way that the cognitive load of the user is minimized. The smoother and easier the user experience, the better.