Frank Kieviet's Engineering Notebook

Tuesday, January 16, 2007

JMSJCA, a feature rich JMS Resource Adapter, is now a java.net project

JMSJCA is now a java.net project. It can be found here: http://jmsjca.dev.java.net.

The project is currently used in Java CAPS, JMS Grid and the JMS BC as part of the open-jbi-components project.

The connector can be used as a J2EE 1.4 Resource Adapter, but its libraries can also be used as an abstraction layer to JMS servers from non J2EE-code. As such, the adapter acts like a library that hides the complexities of transactions, concurrency, connection failure detection, JMS server implementation idiosyncracies, etc. That is how it is used in the JMS BC as part of the open-jbi-components project.

Sunday, January 14, 2007

Logless transactions

A few months ago, in my blog entry Transactions, disks, and performance I went into the importance of minimizing the number of writes. Transaction logging is one of those cases where minimizing the number of writes greatly enhances performance. In this entry, I'll describe a way to avoid transaction logging altogether.

What is transaction logging? Transaction logging refers to persisting the state of a two-phase transaction so that in the event of a crash, the transaction can either be committed or rolled back (recovered). I won't go into the details of what XA is; more information about XA transactions can be found elsewhere, e.g. in Mike Spille's XA Exposed.

Let me illustrate what recovery is using a "diagram". Consider an XA two phase transaction with three Resource Managers (RM_a, RM_b, and RM_c). To indicate what happens at what time, I'll put all actions in a table; each row corresponds to a different time.

time	RM_a	RM_b	RM_c	Coordinator
t1	start(xid1_a, TMNOFLAGS)
t2		start(xid1_b, TMNOFLAGS)
t3			start(xid1_c, TMNOFLAGS)
t4	end(xid1_a, TMSUCCESS)
t5		end(xid1_b, TMSUCCESS)
t6			end(xid1_c, TMSUCCESS)
t7	prepare(xid1_a)
t8		prepare(xid1_b)
t9			prepare(xid1_c)
t10				log
t11	commit(xid1_a, false)
t12		commit(xid1_b, false)
t13			commit(xid1_c, false)
t14				delete from log

At t10 the transaction manager records the decision to commit to the log. Let's say that the system crashes after t10, say between t11 and t12. When the system restarts, it will call recover() on all known Resource Managers and it will read the transaction log. In the transaction log it will find that xid1_x was marked for commit. Through recover() it will find that xid1_b and xid1_c are in doubt. It knows that these two need to be committed because of the commit decision in the log.

What happens if the system crashes before the commit decision is written to the log, for example between t8 and t9? Upon recovery, the recover() method of RM_a, RM_b and RM_c return xid1_a and xid1_b (but not xid1_c because prepare was not called on RM_c yet). The transaction manager will rollback RM_a and RM_b because no commit decision was found in the log.

SeeBeyond's Logless XA Transactions

Let's take a look at the recover() method on the XAResource. This method returns an array of Xid objects. Each Xid object holds two byte[] arrays. These two arrays represent the global transaction ID and the branch qualifier. They are typically random numbers picked by the transaction manager. The Resource Managers that receive these Xids should use these objects as identifiers and return them in the recover() method unmodified.

At SeeBeyond, Jerry Waldorf and Venugopalan Venkataraman came up with an idea to use the storage space in the byte[] arrays of the Xid as a way to persist the transaction state. Here's how it works. Let's modify the above example by removing transaction logging:

time	RM_a	RM_b	RM_c	Coordinator
t1	start(xid1_a, TMNOFLAGS)
t2		start(xid1_b, TMNOFLAGS)
t3			start(xid1_c, TMNOFLAGS)
t4	end(xid1_a, TMSUCCESS)
t5		end(xid1_b, TMSUCCESS)
t6			end(xid1_c, TMSUCCESS)
t7			prepare(xid1_c)
t8		prepare(xid1_b)
t9	prepare(xid1_a)
t10			commit(xid1_c, false)
t11		commit(xid1_b, false)
t12	commit(xid1_a, false)

A commit decision is still being made, but this decision is no longer persisted in a separate transaction log. In stead, it is persisted in xid1_a. If the system finds xid1_a upon recovery, it knows that a commit decision was made. If it doesn't find xid1_a, it knows that a commit decision was not made. Note that the order in which both prepare and commit are called on the three Resource Managers is very important.

As in the first example, if the system crashes before a commit decision has been made, it will rollback any resources upon recovery. E.g. if the system crashes between t8 and t9, it will encounter xid1_c and xid1_b and will call rollback() on these because it cannot find a record of a commit-decision for xid1, i.e. it cannot find xid1_a. Hence, xid1_b and xid1_c need to be rolled back.

If the system crashes after a commit decision has been made, for example between t10 and t11, it will find xid1_b and xid1_a. Since xid1_a signifies a commit decision, both xid1_b and xid1_a should be committed.

So far so good. But how does the transaction manager know that if it encounters xid_b it should look for xid_a to figure out if a commit decision was made? This is where the transaction manager uses the byte[] of the Xid: it stores this information in one of them.

Complicating factors

A problem in this scheme occurs when the prepare(xid1a) method returns XA_RDONLY. If that happens, commit(xid1_a, false) cannot be called, and RM_a will not return xid1_a upon calling recover(). Recall that xid1a had special significance! Hence it is important to order the Resource Managers such that the first one on which prepare() is called, is both reliable and will not return XA_RDONLY. However, in normal EE applications, the application prescribes in which order resources are enlisted in a transaction. Hence, to use this logless transaction scheme, the application server either needs to be extended with a way to specify resources a priori, or the application server needs to be extended with a learning capability so that it knows which resources are enlisted in a particular operation so that it can pick the right resource manager to write the commit decision to.

The SeeBeyond logless transaction approach is one of the ways that transaction logging can be made less exensive. In a future blog, I'll cover additional ones.

Sunday, December 10, 2006

Short note: Running Java CAPS on Java SE 6

Today Java SE 6 was released. It comes with many new features and cool tools. One of them being jmap as described in a previous log on permgen exceptions. The Integration Server is not officially supported on SE 6 yet. However, if you want to run the Java CAPS integration server on SE 6, this is what you can do:

Install JDK 6 somewhere, e.g. c:\\java
Install the IS somewhere, e.g. c:\\logicalhost
Rename c:\\logicalhost\\jre to c:\\logicalhost\\jre.old
Copy c:\\java\\jre1.6.0 to c:\\logicalhost\\jre
Copy c:\\java\\jdk1.6.0\\lib\\tools.jar to c:\\logicalhost\\jre\\lib
Copy c:\\logicalhost\\jre\\bin\\javaw.exe to c:\\logicalhost\\jre\\bin\\is_domain1.exe
Copy c:\\logicalhost\\jre\\bin\\javaw.exe to c:\\logicalhost\\jre\\bin\\ isprocmgr_domain1.exe
Edit c:\\logicalhost\\is\\domains\\domain1\\config\\domain.xml and comment out these lines:

<!--
<jvm-options>-Dcom.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager=com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager</jvm-options>

<jvm-options>-Dorg.xml.sax.driver=com.sun.org.apache.xerces.internal.parsers.SAXParser</jvm-options>

<jvm-options>-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl</jvm-options>

<jvm-options>-Dcom.sun.org.apache.xerces.internal.xni.parser.XMLParserConfiguration=com.sun.org.apache.xerces.internal.parsers.XIncludeParserConfiguration</jvm-options>

<jvm-options>-Djavax.xml.transform.TransformerFactory=com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl</jvm-options>

<jvm-options>-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl</jvm-options>

<jvm-options>-Djavax.xml.soap.MessageFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPMessageFactory1_1Impl</jvm-options>

<jvm-options>-Djavax.xml.soap.SOAPFactory=com.sun.xml.messaging.saaj.soap.ver1_1.SOAPFactory1_1Impl</jvm-options>

<jvm-options>-Djavax.xml.soap.SOAPConnectionFactory=com.sun.xml.messaging.saaj.client.p2p.HttpSOAPConnectionFactory</jvm-options>

-->

i.e. add  after these lines.

Also comment out this line:

<!--        <jvm-options>-server</jvm-options>-->

After these changes you can run the Integration Server with Jave SE 6. These are not “official” recommendations (as mentioned, there’s no support for SE 6 just yet); also the lines commented out are optimizations, that need to be re-established for SE 6 tet, so don’t do any performance comparisons just yet.

Saturday, December 2, 2006

Moving out of people management

Last week it was four years ago that I started at SeeBeyond. At SeeBeyond, I managed products (the JMS server and J2EE application server in Java CAPS), technology and people. The latter was part of the culture at SeeBeyond: the only way to have influence on any product was to have a team of people reporting to you.

Having responsibility for a team has been interesting. Over the past four years, I've seen people grow. I've seen people "turn around" on whom I was ready to give up. This was very satisfying. With them and through them, I've grown as well. However, in the past year I felt it was time for me to move to the next level. Also, with the increasing number of people in my team (two originally, eight at one point, and six lately), there was less and less time to stay involved with technology at a deep enough level.

When Sun acquired SeeBeyond last year, I was classified as people manager because of the fact that I had people reporting to me. As it turned out, Sun's culture is quite different from SeeBeyond's: there is a dual career ladder with appreciation and growth opportunities for both people managers and individual contributors. But unlike SeeBeyond, people managers primarily manage people and are less involved with technology. 11 people per manager is seen as the norm. It are the technical individual contributors that manage and shape products.

Moving up to the next step on Sun's career ladder, I requested a "diagonal promotion": up one level and from the people management track to the technology track. This week I got my promotion: I'm now a Senior Staff Engineer. Does this mean I'm now a heads-down techie? No, of course not. Sure there will be no more managing reports, but I'll now devote more time influencing people in other teams even outside of the organization. And yes, hopefully there'll be more time to dive a little deeper into a piece of technology.

Wednesday, November 15, 2006

Using java.util.logging in BEA Weblogic

Some weeks ago I was making sure that the JMSJCA connector runs properly on BEA Weblogic 9.1. Of course I ran into some issues: never assume that your code is really portable until you try it out. I also found some issues in Weblogic. Now I don't want to make this into a bashing session of Weblogic: since I work for Sun Microsystems, and Weblogic can be considered a competitor or Glassfish, that would be in bad taste. However, there's one thing that bugged me a lot and for which I want to share a solution: logging.

Dude, where's my log output?

At SeeBeyond (and of course at Sun) we standardized on the use of java.util.logging. You could argue whether java.util.logging is technically the best solution, but since it's a standard it's a lot better if you use that package than it is to write your own or use other third party logging tools. Standardization gives portability! So, JMSJCA, STCMS, etc all use java.util.logging.

When I started to run my test suite on Weblogic, I was surprised that the log output from the tests did not end up in one of the Weblogic log files. Instead, output appeared on the console. Surely there was a configuration option somewhere in Weblogic to fix this, right? I couldn't find one. The problem became even more annoying when I tried to enable debug logging. In Glassfish you go to the logging configuration screen, and you simply type the name of the logging category that you want to change, select the level and that's it. Not so in Weblogic. What was going on? It was time to look a little deeper into this.

How Weblogic uses java.util.logging.

I must admit that the Weblogic documentation is good, and very easily accessible. I wish we had the same situation at SeeBeyond. Anyway, from the documentation it appeared that Weblogic fully supports java.util.logging: I quote from the documentation: To distribute messages, WebLogic Server supports Java based logging by default. The LoggingHelper class provides access to the java.util.logging.Logger object used for server logging. [snip] If your application is configured for Java Logging or Log4j, in order to publish application events using WebLogic logging services, create a custom handler or appender that relays the application events to WebLogic logging services using the message catalogs or Commons API.

Eh, what do you mean... LoggingHelper class? Can't I just use java.util.logging without depending on any Weblogic classes? Do I have to write custom code?

And how do I change log levels dynamically? Isn't there an option in the admin console to do that? An MBean perhaps? I was really surprised to find this recommendation: If you use the DEBUG severity level, BEA recommends that you create a "debug mode" for your application. For example, your application can create an object that contains a Boolean value. To enable or disable the debug mode, you toggle the value of the Boolean. Then, for each DEBUG message, you can create a wrapper that outputs the message only if your application's debug mode is enabled.

For example, the following code can produce a debug message:

private static boolean debug = Boolean.getBoolean("my.debug.enabled");
if (debug) {
mylogger.debug("Something debuggy happened");
}
[snip]
To enable your application to print this message, you include the following Java option when you start the application's JVM:

-Dmy.debug.enabled=true

That was a good recommendation... TEN YEARS AGO!

A fix

Fortunately, things are not as grim as they look. It's quite easy to write a little bit of code that is deployed globally independently of your application, so that you can continue to use java.util.logging without sprinkling Weblogic dependencies all over your application. First of all, what's all this with this BEA logger? As it turns out, during startup, Weblogic instantiates a java.util.logging.Logger object and hooks a number of Handler objects to it. So if you log to that particular logger, your log output will appear in one of the Weblogic log files. Can't you get access to this Logger using Logger.getLogger(name)? No you cannot: Weblogic is not using the LogManager at all. That particular logger is not registered in the LogManager. That is the reason why you need to use the weblogic.logging.LoggingHelper.getServerLogger() method.

Once we have a reference to this special Logger object, we can get a list of the Handler objects. Next, we can assign this list to the root logger object. As a result, whenever you use a Logger object obtained through Logger.getLogger(), the output will go to the Weblogic handlers. Problem solved!

Of course we want to centralize this code: afterall we don't want to introduce this dependency in our application code. There are two ways of doing this: we can write our own LogManager and set that as the JVM's LogManager singleton. We can do this because Weblogic doesn't set a LogManager. The other approach is to write a Weblogic startup class. An instance of this class is created at server startup and its startup() method is called. This is the approach I've taken:

package com.sun.bealog;

import java.util.Hashtable;
import java.util.logging.Handler;
import java.util.logging.Level;
import java.util.logging.LogManager;
import java.util.logging.Logger;
import javax.management.MBeanServer;
import javax.management.ObjectName;
import javax.naming.Context;
import javax.naming.InitialContext;
import javax.naming.NamingException;
import weblogic.common.T3ServicesDef;
import weblogic.common.T3StartupDef;
import weblogic.logging.LoggingHelper;

/\*\*
 \* Allows logging by applications and components using the java.util.logging
 \* package by sending output from java.util.logging.Logger-s to the 
 \* WebLogic trace file.
 \*
 \* @author fkieviet
 \*/
public class BeaLog implements T3StartupDef {

    public String startup(String string, Hashtable hashtable) throws Exception {
        String result = "";
        Logger wllogger = LoggingHelper.getServerLogger();
        Handler[] wlhandlers = wllogger.getHandlers();
        
        // Change log level on log file handler
        for (int i = 0; i < wlhandlers.length; i++) {
            if (wlhandlers[i].getLevel().intValue() >= 530) {
                result += "log level changed of " + wlhandlers[i] + "; ";
                wlhandlers[i].setLevel(Level.FINEST);
            }
        }
        
        // Copy handlers from wllogger to rootlogger
        Logger rootlogger = Logger. getLogger("");
        Handler[] toremove = rootlogger.getHandlers();
        for (int j = 0; j < toremove.length; j++) {
            rootlogger.removeHandler(toremove[j]);
        }
        for (int i = 0; i < wlhandlers.length; i++) {
            rootlogger.addHandler(wlhandlers[i]);
        }
        result += "root handler now has " + rootlogger.getHandlers().length + " handlers; ";
        
        // Register wllogger so that it can be manipulated through the mbean
        boolean alreadythere = !LogManager.getLogManager().addLogger(wllogger);
        
        // Register mbean
        InitialContext ctx = null;
        String[] names = new String[] {"java:comp/jmx/runtime", "java:comp/env/jmx/runtime"};
        for (int i = 0; i < names.length; i++) {
            try {
                ctx = new InitialContext();
                MBeanServer mbeanServer = (MBeanServer) ctx.lookup(names[i]);
                mbeanServer.registerMBean(LogManager.getLoggingMXBean(), 
                        new ObjectName("java.util.logging:type=Logging"));
                result += "mbean registered; ";
            } catch (Exception e) {
                // ignore
            } finally {
                safeClose(ctx);
            }
        }
        
        // For Java CAPS: disable two commonly used loggers that are used to 
        // relay call context information
        Logger.getLogger("com.stc.EnterContext").setLevel(Level.OFF);
        Logger.getLogger("com.stc.ExitContext").setLevel(Level.OFF);
        
        return this.getClass().getName() + result;
    }

    private void safeClose(Context ctx) {
        if (ctx != null) {
            try {
                ctx.close();
            } catch (NamingException ignore) {
                // ignore
            }
        }
    }

    public void setServices(T3ServicesDef t3ServicesDef) {
    }
}

As you may have noticed, there's another problem that's solved in the code snippet above: dynamice log level configuration. The JVM's logging package ships with an MBean that allows you to set the log levels dynamically. The only thing that needs to be done is to register this MBean in Weblogic's MBeanServer (Weblogic has an unusual way to access its MBeanServer as you can see in the code). From then on, you can use a tool like jmx-console to access the logging mbean and set the log levels.

Download a NetBeans project that contains the BeaLog code.

Tuesday, November 14, 2006

More on... How to fix the dreaded "java.lang.OutOfMemoryError: PermGen space" exception (classloader leaks)

I got quite a few comments on my last blog (How to fix the dreaded "java.lang.OutOfMemoryError: PermGen space" exception (classloader leaks)). Apparently more people have been struggling with this problem.

Why bring this up? What's the news? Edward Chou continued to explore options to diagnose classloader leaks. First of all, he explored how to generate a list of orphaned classloaders with jhat. An orphaned classloader is a classloader that is not referenced by any object directly but cannot be garbage collected. The thinking behind this is that programs that create classloaders (e.g. application servers) do maintain references to them. So if there's a classloader that is no longer directly referenced, this classloader is probably a leak. Read about it on his blog (Find Orphaned Classloaders).

Still we were not satisfied: when examining some memory dumps from code that we were not familiar with, we explored yet some other options to diagnose classloader leaks: duplicate classes and duplicate classloaders. Let me explain.

Let's say that your application has a com.xyz.Controller class. If you find many instances of this class object, you likely have a classloader leak. Note the phrase "instances of this class object". What I mean by this: the class com.xyz.Controller is loaded multiple times, i.e. multiple instances of the com.xyz.Controller.class are present. You can use jhat to run this query: simply list all instances of java.lang.Class.

Edward modified jhat to generate a list of all classloader instances that have an identical set of classes that it loaded. Typically there's no reason why someone would create two classloader instances and load exactly the same set of classes into them. If you find any in your memory dump, you should get suspicious and take a closer look. Monitor Edward's blog for more details on this.

One more thing: Edward found out that the method java.lang.String.intern() allocates memory in PermGen space. So if your application frequently uses this method with different strings, watch out. Fortunately these strings are subject to garbage collection. But if your application holds references to these strings, thereby making garbage collection impossible, your application may cause the dreaded "java.lang.OutOfMemoryError: PermGen space" exception. No classloaders involved this time.

Wednesday, October 18, 2006

How to fix the dreaded "java.lang.OutOfMemoryError: PermGen space" exception (classloader leaks)

In the previous blog entry Classloader leaks: the dreaded "java.lang.OutOfMemoryError: PermGen space" exception I explained how this type of problem can originate in the application code that you deploy to an application server. In this post I'll explain how to track down the leak so that you can fix it.

Profilers

Memory leak? Use a profiler. Right? Well... generally speaking the answer is yes, but classloader leaks are a bit special...

To refresh your memory (pardon the pun), a memory leak is an object that the system unintentionally hangs on to, thereby making it impossible for the garbage collector to remove this object. The way that profilers find memory leaks is to trace references to a leaked object.

What do I mean by "tracing"? A leaked object can be referenced by another object which itself is a leak. In turn, this object may also be a leak, and so on. This repeats until an object is found that references a leaked object by mistake. This reference is where the problem is, and what you need to fix. Let me try to clarify this by illustrating this with a picture from my previous blog:

In this picture the AppClassloader, LeakServlet.class, STATICNAME, CUSTOMLEVEL, LeakServlet$1.class are all leaked objects. Due to static objects (e.g. STATICNAME) in the picture, that may in turn reference other objects, the number of leaked objects may be in the thousands. Going over each leaked object manually to check if there are any incidental references to it (the red reference in the picture) until you find the troublesome object (CUSTOMLEVEL) is laborious. You would rather have a program find the violating reference for you.

A profiler doesn't tell you which leaked object is interesting to look at (CUSTOMLEVEL). Instead it gives you all leaked objects. Let's say that you would look at STATICNAME. The profiler now should find the route STATICNAME to LEAKSERVLET.class to AppClassloader to LeakServlet1$1.class, to CUSTOMLEVEL to Level.class. In this route, the red line in the picture is the reference that actually causes the leak. I said the profiler should find this route. However, all the profilers that we tried, stop tracing as soon as they reach a class object or classloader. There's a good reason for that: the number of traces grows enormous if it follows through the references through classes. And in most cases, these traces are not very useful.

So no luck with profilers! We need to try something else.

JDK 6.0 to the rescue

When Edward Chou and I worked on tracking down classloader leaks last year, we tried to run the JVM with HPROF and tried to trigger a memory dump; we looked at using Hat to interpret the dump. Hat stands for Heap Analysis Tool, and was developed to read dump files generated with HPROF. Unfortunately, the hat tool blew up reading our dump files. Because we didn't think it was difficult to parse the dump file, we wrote a utility to read the file and track the memory leak.

That was last year. This year we have JDK 6.0; this new JDK comes with a few tools that make looking at the VM's memory a lot simpler. First of all, there's a tool called jmap. This command line tool allows you to trigger a dump file without HPROF. It is as simple as typing something like:
jmap -dump:format=b,file=leak 3144
Here leak is the filename of the dump, and 3144 is the PID of the process. To find the PID, you can use jps.

Secondly, Hat is now part of the JDK. It is now called jhat. You can run it using a command line like:
jhat -J-Xmx512m leak
Here leak is the name of the dump file, and as you may have guessed, -J-Xmx512m is a parameter to specify how much memory jhat is allowed to allocate.

When you start jhat it reads the dump file and then listens on an HTTP port. You point your browser to that port (7000 by default) and through that you can browse the memory heap dump. It's a very convenient way of looking at what objects are in memory and how they are connected.

So, it seemed like a good idea to check out what can be done with these new tools to find classloader leaks.

... or not?

Unfortunately, jhat, just like the profilers we tried, also stops tracing when it encounters a class. Now what? I decided to download the JDK source code and find out what the problem is. Building the whole JDK is a difficult task from what I gather from the documentation. Fortunately, jhat is a nicely modularized program; I could just take the com.sun.tools.hat-packages out of the source tree, load them in my favorite editor and compile the code. The patched code was easily packaged and run: I just jar-ed it and added it to the lib/ext directory of the JDK:
jar -cf C:\\apps\\Java\\jdk1.6.0\\jre\\lib\\ext\\ahat.jar -C hat\\bin .
jhat leak

This was really as easy as pie. So after running the program in the debugger for some time, I figured out how it works and what changes I wanted to make. The change is that when you follow the references from a classloader, the modified jhat will follow through all traces from all the instances of the classes that it loaded. With that change, finding the cause of a classloader leak is simple.

An example

Let's look at the example from my previous blog as depicted in the picture above. Using NetBeans I created the following servlet and deployed it to Glassfish:

 1 package com.stc.test;
 2
 3 import java.io.\*;
 4 import java.net.\*;
 5 import java.util.logging.Level;
 6 import java.util.logging.Logger;
 7 import javax.servlet.\*;
 8 import javax.servlet.http.\*;
 9
10 public class Leak extends HttpServlet {
11
12     protected void processRequest(HttpServletRequest request, HttpServletResponse response)
13     throws ServletException, IOException {
14         response.setContentType("text/html;charset=UTF-8");
15         PrintWriter out = response.getWriter();
16         out.println("<html><body><pre>");
17         Level custom = new Level("LEAK", 950) {};
18         Logger.getLogger(this.getClass().getName()).log(custom, "New level created");
19         out.println("</pre></body></html>");
20         out.close();
21     }
22+    HTTPServlet methods. Click on the + sign on the left to edit the code
48 }
49

I invoked the servlet to cause the leak. Next I undeployed the servlet. Then I triggered a heap dump:
jmap -dump:format=b,file=leak 3144
and fired up the modified jhat:
jhat -J-Xmx512m leak
and brought up the browser. The opening screen shows amongst other things, all classes that are found in the dump:

Finding objects that were leaked is easy since I know that I shouldn't see any objects of the classes that I deployed. Recall that I deployed a class com.stc.test.Leak; so I searched in the browser for the com.stc.test package, and found these classes (never mind the NoLeak class: I used it for testing).

Clicking on the link class com.stc.test.Leak brings up the following screen:

Clicking on the classloader link brings up the following screen:

Scrolling down, I see Reference Chains from Rootset / Exclude weak refs . Clicking on this link invokes the code that I modified; the following screen comes up:

And there's the link to java.util.Logging.Level that we've been looking for!

Easy as pie!

Summarizing, the steps are:

undeploy the application that is leaking
trigger a memory dump
run jhat (with modification)
find a leaked class
locate the classloader
find the "Reference chains from root set"
inspect the chains, locate the accidental reference, and fix the code

~~I'll contact the JDK team to see if they are willing to accept the changes I made to jhat. If you cannot wait, send me an email or leave a comment.~~

Update (April 2007): Java SE SDK 6.0 update 1 has the updated code.

Other Permgen space tidbits

After fixing the classloader leak, you of course want to test to see if the memory leak has disappeared. You could again trigger a memory dump and run jhat. What you also could try is to see if the amount of used permgen space memory goes up continuously after each deployment/undeployment of your application.

You can monitor permgen space usage using jconsole. You can see the memory usage go up when you repeatedly deploy and undeploy an application. However, this may not be a classloader / memory leak. As it turns out, it's difficult to predict when the garbage collector cleans up permgen space. Pressing the button in Run GC in jconsole does not do the trick. Only when you encounter a java.lang.OutOfMemoryError: PermGen space exception can you be sure that there really was no memory. This is a bit more involved than it should be!

How can we force the garbage collector to kick in? We can force a java.lang.OutOfMemoryError: PermGen space and then releasing the memory after which we force the garbage collector to kick in. I wrote the following servlet to do that:

package com.stc.test;

import java.io.\*;
import java.util.ArrayList;
import javax.servlet.\*;
import javax.servlet.http.\*;

public class RunGC extends HttpServlet {

private static class XClassloader extends ClassLoader {
private byte[] data;
private int len;

public XClassloader(byte[] data, int len) {
super(RunGC.class.getClassLoader());
this.data = data;
this.len = len;
}

public Class findClass(String name) {
return defineClass(name, data, 0, len);
}
}

protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
PrintWriter out = response.getWriter();
out.println("<html><body><pre>");

try {
// Load class data
byte[] buf = new byte[1000000];
InputStream inp = this.getClass().getClassLoader()
.getResourceAsStream("com/stc/test/BigFatClass.class");
int n = inp.read(buf);
inp.close();
out.println(n + " bytes read of class data");

// Exhaust permgen
ArrayList keep = new ArrayList();
int nLoadedAtError = 0;
try {
for (int i = 0; i < Integer.MAX_VALUE; i++) {
XClassloader loader = new XClassloader(buf, n);
Class c = loader.findClass("com.stc.test.BigFatClass");
keep.add(c);
}
} catch (Error e) {
nLoadedAtError = keep.size();
}

// Release memory
keep = null;
out.println("Error at " + nLoadedAtError);

// Load one more; this should trigger GC
XClassloader loader = new XClassloader(buf, n);
Class c = loader.findClass("com.stc.test.BigFatClass");
out.println("Loaded one more");
} catch (Exception e) {
e.printStackTrace(out);
}

out.println("</pre></body></html>");
out.close();
}

In this servlet a custom classloader is instantiated which loads a class in that classloader. That class is really present in the web classloader, but the custom classloader is tricked by not delegating to the parent classloader; instead the classloader is instantiating the class using the bytes of the class obtained through getResourceAsStream().

In the servlet it tries to allocate as many of these custom classes as possible, i.e. until the memory exception occurs. Next, the memory is made eligible for garbage collection, and one more classloader is allocated thereby forcing garbage collection.

The number of custom classes that can be loaded until a memory exception occurs, is a good measure of how much permgen space memory is available. As it turns out, this metric is a much more reliable than the one that you get from jconsole.

And more

Edward Chou is thinking of some other ideas to further automate the process of determining exactly where the cause of a classloader leak is. E.g. it should be possible to identifiy the erroneous reference (the red line in the picture) automatically, since this reference is from one classloader to another. Check his blog in the coming days.

Update (April 2007): You can find an interesting usage of jhat's Object Query Language on Sundarajan's blog to compute histograms of reference chains.