7 More Good Tips on Logging

Logging in web applications is important – to know what’s going on, for performance tuning and incident analyis. This is my second post about logging. The first post “7 Good Rules to Log Exceptions” was specific to logging exceptions, ths is about logging in general. What makes your logs more useful to you?

Nerdy Bookshelf Part One
Creative Commons Licensecredit: schoschie

1. No debugging logs in production

I have seen time and again that debug logging is enabled in production. This can be intentional or happening by some developers who accidently checked in a debugging logging configuration. Enabled debugging slows down your application remarkedly and makes it impossible to read production logs due to noise. Make sure during deployments – best with some scripts – that debugging level logging is disabled during production.

2. Look through your logs

Some companies have good logging in their production system, but do not look into their logs. Look into your logs, discover issues (bugs, performance, memory) with your application and fix them. Essentially your logs should be without known errors.

3. Log to the correct log level

Developers who write logging code often don’t know which log level to use. Have a document ready which explains which log level developers should use. For example SEVERE should only be used for technical problems which need immediate action. ERROR should be used for errors that someone needs to look into and fix, like not getting a databasde connection, low resources or failing integration points. This is specific to your company and application.

4. Do not log locally

If your server has major problems like resource troubles, it’s often impossible to log in. Therefor you can’t get to your logs finding the problem. Logs should be written to a network drive, copied over to another host or written to the network e.g. with Syslogd. A nice solution is to use the Spread Toolkit to write to a network group with multicasting. This also enables easy monitoring (see “Scalable Internet Architectures”).

5. Monitor your logs

Similar to “Look into you logs”, you should setup a monitoring solution which looks at SEVERE entries, ERROR entries, exceptions and other conditions in you logs. With Spread it’s easy to add monitors. A good idea is also to classify and count exceptions, then do something about the severe and most frequent ones.

6. Use a human readable format

Developers often don’t think about the output they produce. This leads to hard to read log files. “Release It!” has an example for readable output:

This row oriented format makes it easier to fast scan logs. Compare this to the your logs.

7. Use error codes in logging

Each cause which leads to log output should have a unique error code. Without a unique error code it’s hard to find the cause in your source code. Error codes make it also much easier to count and classify log statements and enables communications between development and operations.

Want to know more? Books with good sections on web site logging are “Release It!” by Michael T. Nygard (really excellent book!) and “Scalable Internet Architectures” by Theo Schlossnagle.

7 Good Rules to Log Exceptions

I’ve been helping to debug some nasty problems and bugs lately. It occurred to me that some best practices on how to log exceptions go a long way towards easier debugging. Some of the best practices I’ve learned to log exceptions are compiled in this post.

1. Only log technical exceptions not user exceptions
User exceptions are either ok and need not to be logged (“login name already exists”) but shown to the user, or no exception at all (“user has no credit left”). Technical exceptions are those you need to debug (“no file storage left”, “could not book product”) and react to. If you log everything you will probably get too many log entries to have a meaningful reaction to exceptions in your log. You should inquire into every exception in your log files and find the cause for it (“is it a bug?”). Too many exceptions will make you sloppy with exceptions in your log files (“nah, just another exception”).

2. Store data in your exceptions to make them easier to log
Taking the exception “could not charge money from account” you should store the context of the exception just like Junit does (“expected but got …”) to make debugging easier

The message could be: “Tried to charge 20 EUR from account 1234567890 but 10 EUR available” compared to “Charge failed”. This makes it much easier later to log the exception in a meaningful way. Be careful to create no memory leaks though.

3. Log the description of your exception
Very bad example from Sun: The ClassCastException didn’t show you what class you did want an object to cast to for a long time.

Now it even detects

and tells you

During runtime the exception thrown by Java is now:

Much better than before.

4. Output all causes to your exception
If your exception has an exception wrapped as a cause, log all causes. Some logging frameworks do this for you, some don’t. Be sure to have all causes of your exception in the log file. Be sure the beginning of all relevant stack traces in your log, not scrambled ones.

5. Log to the right level
If your Exception is critical, log it as Level.CRITICAL. You need to decide what critical means for you (most often it means losing money). For example if a booking didn’t work, or a user could not register due to technical problems then you have a CRITICAL problem you need to solve.

Monitor your log files for critical exceptions. You’re losing money.

Have your own exception implement isCritical() or a CriticalException interface and test when logging the exception in your wrapper to log it on the right level. If your logging framework hasn’t got an appropriate level, create one.

6. Don’t log and rethrow
Logging and rethrowing an exception is an exception anti-pattern.

Don’t do it. Your log files will then contain the same exceptions several times on several stack levels. Only log the exception once.

7. Do not log with System.out or System.err
Always use a log framework, it can handle logging better than you.

I hope those rule help you with your exception logging and enable you to easier debug your problems.

Thanks for listening. As ever, please do share your thoughts and additional tips in the comments below, or on your own blog (I have trackbacks enabled).

Update: For some Hackernews comments: Logging everything is fine but leads to ~50gig/day, 300gig/week, 1.2tb/month log files for a moderate site. Your grep won’t work very good for that. Splunk will obviously help of course but is only free for 0.5gig/day, a 1/100 of the log files you will get.