Saturday, September 29, 2007

Using Nested Diagnostics Contexts in Glassfish

What is a Nested Diagnostics Context?

Let's say that we're writing a message driven bean (MDB) that we'll deploy on Glassfish. Let's say that the MDB's onMessage() method grabs the payload of the message and calls into a stateless session bean (SLSB) for processing. Let's say that the implementation of the SLSB calls into org.apache.xparser:

my.company.MDB > my.company.SLSB > org.apache.xparser

Let's say that he xparser package may log some warnings if the payload is not properly formatted. No problem so far. Now let's say that the application is put in production together with a dozen other applications and that many of these applications use this library. The administrator once in a while finds these warnings in Glassfish's server.log:

[#|2007-09-17T18:36:03.247-0400|WARN|sun-appserver9.1|org.apache.xparser.Parser
|_ThreadID=18; ThreadName=ConsumerMessageQueue:(1);
|Encoding missing, assuming UTF-8|#]

Let's say that the administrator wants to relay this information to the developer responsible for this application. Using the category name org.apache.xparser.Parser, the administrator can find out what code is responsible (a third party component in this case), but how can the administrator find out which application is responsible for this log output?

One approach is to always log the application name before calling into the SLSB, so that the administrator can find the application name using the _ThreadID: he would look at the _ThreadID of the warning, then look for a message earlier in the log that has the same _ThreadID that identifies the application. Not only is this cumbersome, it's also a big problem that the application now fills up the log with the application name just in case the SLSB would log something.

It would be nice if somehow the MDB could associate the thread with the application name, so that if code downstream logs anything, the log message will be adorned with the application name:

[#|2007-09-17T18:36:03.247-0400|WARN|sun-appserver9.1|org.apache.xparser.Parser
|_ThreadID=18; Context=Payrollsync; ThreadName=ConsumerMessageQueue:(1);
|Encoding missing, assuming UTF-8|#]

In Log4J, this is quite simple using Log4J's NDC class: before the MDB calls into the SLSB, it would call NDC.push("Payrollsync") to push the context onto the stack, and after the SLSB it would call NDC.pop(). NDC stands for Nested Diagnostic Context. It's called nested because it maintains a stack, so that the SLSB could push another context onto the stack, hiding the context of the MDB, and pop the context off the stack to restore the stack in its original state before returning. Of course each thread has to have its own stack.

The NDC is a nice facility in Log4J. Unfortunately, in java.util.logging there's no such facility. Let's build one!

Building a Nested Diagnostic Context

The Nested Diagnostics Context will have to keep a stack per thread. When the logging mechanism logs something, it needs to peek at the stack and add the top most item on the stack to the log message. The stack needs to be accessible somehow by both the application that sets the context and by the logging mechanism. A complicating factor in this is that it needs to work with both delegating-first and self-first classloaders. The latter is found in some web applications (special setting in sun-web.xml) and in some JBI components. Furthermore, we would like to use this mechanism in Glassfish and avoid having to make changes to the Glassfish codebase. Lastly, we need to avoid making changes to the application that would cause the application to be no longer portable to other application servers.

How do we expose an API to the application so that it can push and pop contexts onto and off the stack? We could define a new Java API, but that would mean that unless an application packages the jar with that new API, it cannot be deployed on an application server instance that doesn't have the NDC code in its classpath. Here's a solution that doesn't require a new API: reuse the existing java.util.logging.Logger API! We'll define two special logger names, one for pushing contexts onto the stack, and one for popping contexts off the stack. Since we're tapping into the log stream anyways, this is not as far a stretch as it may seem. Here's how an application uses this mechanism:
Logger.getLogger("com.sun.EnterContext").fine("Payrollsync");
slsb.process(msg.getText());
Logger.getLogger("com.sun.ExitContext").fine("Payrollsync");

The loggers com.sun.EnterContext and com.sun.ExitContext are special loggers that we'll develop; messages written to these loggers directly interact with the context stack. Through these special loggers, this example will result in adding the context to any log messages that are produced in the slsb.process(msg) call. On other application servers without these special loggers, this will result in logging the context at FINE level before and after the call to the SLSB is made, so that one can associate a log message using the _ThreadID; it will not do anything if FINE logging is turned off.

What if we want to add more than one context parameter to the log message? For instance, what if we want to add the ID of the message that we're processing?

Logger.getLogger("com.sun.EnterContext")
.log(Level.FINE, {0}={1}, {2}={3}, new Object[] {"Application", "Payrollsync", "Msgid", msg.getMessageID()});
slsb.process(msg.getText());
Logger.getLogger("com.sun.ExitContext") .log(Level.FINE, {0}={1}, {2}={3}, new Object[] {"Application", "Payrollsync", "Msgid", msg.getMessageID()});

The special logger will take the Object[] and push these on the stack. The message string "{0}={1}, {2}={3}" is there merely for portability: if the the code is deployed onto an application server to which we didn't install the NDC facilities, this will simply log the context parameters at FINE level.

Implementation

In a stand alone Java application, you would simply set your own LogManager and implement the NDC functionality there. Glassfish already comes with its own LogManager, and we don't want to override that. Rather, we want to plug in new functionality without any changes to the existing code base. Here's what we need to do:
  1. create the special loggers com.sun.EnterContext and com.sun.ExitContext
  2. hookup these special loggers
  3. hook into the log stream to print out the context

To create the special loggers, we can simply create a new class that derives from java.util.logging.Logger, say EntryLogger. Next, we need to make sure that when someone calls Logger.getLogger("com.sun.EnterContext"), it will be this class that is returned. Without making any changes to the LogManager, the way that that can be accomplished is by instantiating the new EntryLogger and registering it with the LogManager immediately. This has to be done before anybody calls Logger.getLogger("com.sun.EnterContext"). In other words, we should do this before any application starts. In Glassfish there's an extensibility mechanism called LifeCycleListeners. An object that implements this interface can be loaded by Glassfish automatically upon startup.

Lastly, we need to find a way to add the context to the log entries in the log. Glassfish already has a mechanism to add key-value pairs to each log entry: when formatting a LogRecord for printing, Glassfish calls LogRecord.getParameters() and checks each object in the returned Object[] for objects that implement java.util.Map and java.util.Collection. For objects that implement java.util.Map, Glassfish adds the key-value pairs to the log message. For objects that implement java.util.Collection, Glassfish adds each entry as a String to the log message.

If each LogRecord can somehow be intercepted before it reaches Glassfish's Formatter, the context can be added as an extra parameter to the LogRecord's parameter list. This can be done by adding a new java.util.logging.Handler to the root-Logger before Glassfish's own Handler. For each LogRecord that this new Handler receives, it will inspect the Context stack and add a Map with the Context to the LogRecord. Next, the root-Logger will send the LogRecord to Glassfish's own Handler which takes care of printing the message into the log. Once again, the LifeCycleListener is the ideal place to register the new Handler.

Give it a spin!

You can download the jar that has these new classes and/or download the sources. Put the jar in Glassfish's lib directory. Restart the server and install the LifeCycleListener:

LifeCycleListener Configuration

4 comments:

Rajan Jain said...

In my opinion, Maped Diagnostic Context (MDC) is better than NDC. I may be wrong but NDC requires push and pop..If a developer forgets to pop then it may cause some memory leaks. MDC how ever useds maps.

Frank Kieviet said...

Re Rajan:

Correct, the NDC requires push and pop. Through that, a thread can go from module to module and maintain a debugging context.

Although the MDC certainly has its merits, going through multiple modules may be confusing because values are overwritten. And if values are not removed, they may confusion in subsequent calls, especially if threads can be dispatched to different applications.

Nevertheless, the same approach as described in this blog entry can be used to hookup an MDC in the jdk logging package.

Frank

Frank Kieviet said...

An update: see http://frankkieviet.blogspot.com/2009/09/lessons-of-interesting-deadlock-problem.html

Brent Haigh said...

This is a nice feature in Glassfish - and is particularly useful with the JBI BPEL Service Engine.

There is a problem with this code when glassfish runs on JDK1.6.0_18 (that is not present with at least JDK1.6.0_11 or earlier). The java.util.logging.LogManager now maintains weak references to its loggers (including the NDCEntryLogger and NDCExitLogger). And so these are garbage collected by the container.

If the NDCProvider code is changed to declare the NDCLogger references as static this stops them from being garbage collected.