Sunday, October 15, 2006

Classloader leaks: the dreaded "java.lang.OutOfMemoryError: PermGen space" exception

Did you ever encounter a java.lang.OutOfMemoryError: PermGen space error when you redeployed your application to an application server? Did you curse the application server, while restarting the application server, to continue with your work thinking that this is clearly a bug in the application server. Those application server developers should get their act together, shouldn't they? Well, perhaps. But perhaps it's really  your fault!

Take a look at the following example of an innocent looking servlet.

package com.stc.test;

import java.io.\*; import java.util.logging.\*; import javax.servlet.\*; import javax.servlet.http.\*;
public class MyServlet extends HttpServlet { protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { // Log at a custom level Level customLevel = new Level("OOPS", 555) {}; Logger.getLogger("test").log(customLevel, "doGet() called"); } }
Try to redeploy this little sample a number of times.  I bet this will eventually fail with the dreaded java.lang.OutOfMemoryError: PermGen space error. If you like to understand what's happening, read on.

The problem in a nutshell

Application servers such as Glassfish allow you to write an application (.ear, .war, etc) and deploy this application with other applications on this application server. Should you feel the need to make a change to your application, you can simply make the change in your source code, compile the source, and redeploy the application without affecting the other still running applications in the application server: you don't need to restart the application server. This mechanism works fine on Glassfish and other application servers (e.g. Java CAPS Integration Server).

The way that this works is that each application is loaded using its own classloader. Simply put, a classloader is a special class that loads .class files from jar files. When you undeploy the application, the classloader is discarded and it and all the classes that it loaded, should be garbage collected sooner or later.

Somehow, something may hold on to the classloader however, and prevent it from being garbage collected. And that's what's causing the java.lang.OutOfMemoryError: PermGen space exception.

PermGen space

What is PermGen space anyways? The memory in the Virtual Machine is divided into a number of regions. One of these regions is PermGen. It's an area of memory that is used to (among other things) load class files. The size of this memory region is fixed, i.e. it does not change when the VM is running. You can specify the size of this region with a commandline switch: -XX:MaxPermSize . The default is 64 Mb on the Sun VMs.

If there's a problem with garbage collecting classes and if you keep loading new classes, the VM will run out of space in that memory region, even if there's plenty of memory available on the heap. Setting the -Xmx parameter will not help: this parameter only specifies the size of the total heap and does not affect the size of the PermGen region.

Garbage collecting and classloaders

When you write something silly like

 private void x1() {
        for (;;) {
            List c = new ArrayList();
        }
    }

you're continuously allocating objects; yet the program doesn't run out of memory: the objects that you create are garbage collected thereby freeing up space so that you can allocate another object. An object can only be garbage collected if the object is "unreachable". What this means is that there is no way to access the object from anywhere in the program. If nobody can access the object, there's no point in keeping the object, so it gets garbage collected. Let's take a look at the memory picture of the servlet example. First, let's even further simplify this example:

package com.stc.test;

import java.io.\*; import java.net.\*; import javax.servlet.\*; import javax.servlet.http.\*;
public class Servlet1 extends HttpServlet { private static final String STATICNAME = "Simple"; protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { } }

After loading the above servlet, the following objects are in memory (ofcourse limited to the relevant ones):

In this picture you see the objects loaded by the application classloader in yellow, and the rest in green. You see a simplified container object that holds references to the application classloader that was created just for this application, and to the servlet instance so that the container can invoke the doGet() method on it when a web request comes in. Note that the STATICNAME object is owned by the class object. Other important things to notice:

  1. Like each object, the Servlet1 instance holds a reference to its class (Servlet1.class).
  2. Each class object (e.g. Servlet1.class) holds a reference to the classloader that loaded it.
  3. Each classloader holds references to all the classes that it loaded.
The important consequence of this is that whenever an object outside of AppClassloader holds a reference to an object loaded by AppClassloader, none of the classes can be garbage collected.

To illustrate this, let's see what happens when the application gets undeployed: the Container object nullifies its references to the Servlet1 instance and to the AppClassloader object.

As you can see, none of the objects are reachable, so they all can be garbage collected. Now let's see what happens when we use the original example where we use the Level class:
package com.stc.test;

import java.io.\*; import java.net.\*; import java.util.logging.\*; import javax.servlet.\*; import javax.servlet.http.\*;
public class LeakServlet extends HttpServlet { private static final String STATICNAME = "This leaks!"; private static final Level CUSTOMLEVEL = new Level("test", 550) {}; // anon class!
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { Logger.getLogger("test").log(CUSTOMLEVEL, "doGet called"); } }

Note that the CUSTOMLEVEL's class is an anonymous class. That is necessary because the constructor of Level is protected. Let's take a look at the memory picture of this scenario:


In this picture you see something you may not have expected: the Level class holds a static member to all Level objects that were created. Here's the constructor of the Level class in the JDK:

 protected Level(String name, int value) {
this.name = name;
this.value = value;
synchronized (Level.class) {
known.add(this);
}
}

Here known is a static ArrayList in the Level class. Now what happens if the application is undeployed?


Only the LeakServlet object can be garbage collected. Because of the reference to the CUSTOMLEVEL object from outside of AppClassloader, the  CUSTOMLEVEL anyonymous class objects (LeakServlet$1.class) cannot be garbage collected, and through that neither can the AppClassloader, and hence none of the classes that the AppClassloader loaded can be garbage collected.

Conclusion: any reference from outside the application to an object in the application of which the class is loaded by the application's classloader will cause a classloader leak.

More sneaky problems

I don't blame you if you didn't see the problem with the Level class: it's sneaky. Last year we had some undeployment problems in our application server. My team, in particular Edward Chou, spent some time to track them all down. Next to the problem with Level, here are some other problems Edward and I encountered. For instance, if you happen to use some of the Apache Commons BeanHelper's code: there's a static cache in that code that refers to Method objects. The Method object holds a reference to the class the Method points to. Not a problem if the Apache Commons code is loaded in your application's classloader. However, you do have a problem if this code is also present in the classpath of the application server because those classes take precedence. As a result now you have references to classes in your application from the application server's classloader... a classloader leak!

I did not mentiond yet the simplest recipe for disaster: a thread started by the application while the thread does not exit after the application is undeployed.

Detection and solution

Classloader leaks are difficult. Detecting if there's such a leak without having to deploy/undeploy a large number of times is difficult. Finding the source of a classloader leak is even trickier. This is because all the profilers that we tried at the time, did not follow links through classloaders. Therefore we resorted to writing some custom code to find the leaks from memory dump files. Since that exercise, new tools came to market in JDK 6. The next blog will outline what the easiest approach today is for tracking down a glassloader leak.

61 comments:

edwardchou said...

One point about trying to detect Classloader leak is that it is very difficult to know what exactly is going on inside AppServer. If you are familiar with the source code of AppServer, then it will help you a bit, since you will have a general idea of where to look. Because these leaks likely to be at most unexpected places. I think that is another reason these kind of problems are difficult to debug for application developers.

Ramon said...

This behaviour of the JDK seems buggy to me. I cannot find any reason for the reference shown with the red arrow in the last figure. I understand that a child class must reference the parent. But the parent class referencing an anonymous child? If the child class disappears, the parent class can continue executing perfectly. Suppose that this reference did not exist. That would allow to free a child anonymous class if there are no instances of this anonymous child. Isn't this the correct behavour?

Tim said...

@Ramon:
It's not a JDK issue, its actually an implementation detail of the Level class.
Level.class is maintaining a hard, static reference to all instances of Level. Given that Level.class is distributed by the JDK, it will likely have been loaded by the boot classloader.
Thus, no instances of Level can be garbage collected meaning their anonymous inner class definitions and the classloaders that point to them cannot be garbage collected.

Frank Kieviet said...

Hi Aji,


Last week we had a customer report of a permgen space problem; it turned to be a problem with an OS that was missing some patches. Although that OS was not HPUX; nevertheless it may be worthwhile to check that your OS has the latest patches installed.


If the problem is easily reproducible and you escalate it to customer support, our engineering department will have a fix in no time for you.







Since HPUX probably doesn't support JDK 6 yet, one way of going about diagnosing this problem would be to run the Integration Server with HPROF enabled, trigger a memory dump (kill -3) and inspect the dump file with the modified jhat from JDK 6.


Frank







Ramon said...

@Tim: Thanks, you are right. However, this seems to me a bug in java.util.logger.Level. It should hold weak references.

al0 said...

I guess that it should be filed as bug (or improvement request) against Level implementation. It seems that changing on known array from storing Level themselves to weak (or soft) references to them should alleviate the problem.

al0 said...

I guess that it should be filed as bug (or improvement request) against Level implementation. It seems that changing on known array from storing Level themselves to weak (or soft) references to them should alleviate the problem.

downeyt said...

Excellent! Since reading this article the other day, I have found and resolved two memory leaks that have been troubling me for some time.

I resolved one issue by modifying the source code of a package. Once I knew what the memory leak was, I was able to find information about it on the web.

Similarily, for the second issue, I obtained several clues from other online discussions and resolved the leak by moving the offending class to a different classloader.

I never had a way to determine what the leaks were, but your fantastic article has opened the door. Thanks.

Gloria Rocío Giménez Gamarra said...

Hey! so this was the problem we had in here! My boss change de default space for the PermGen to 26 mb, 'cause non of us knows what the default size was ('cause in the app server it wasn't setted), 'till now! Keep on doing this blogs... A little late, but now we know how to deal with this in other installations we make.
Bye!

Jesse Glick said...

By the way, I have filed a Java bug about this topic: #6543126

Sony Mathew said...

Great info Frank. All Framework and AppServer coders should take note.
A bad implementation by Level class. You are absolutely right - they should not have forced sub-classing to create new Levels.
Even changing Level implementation to use WeakReferences now is going to break existing code.

Tim Downey said...

Since every class has a reference to the classloader that created it, does this mean that every Enum declared in a servlet will cause a memory leak?

It seems that the Enum class is created as a static class and has a static pointer to the classloader.

I have just done a memory test and there is a static reference to the classloader for every value of an Enum. Among these are the values for the Enum for the log4j logger levels.

Frank Kieviet said...

Re Tim Downey:


Hi Tim,


I'm not familiar with the latest versions of Log4J: the one that I used doesn't have Java 5 style enums. In any case, if an enum class is created in classloader X, and a reference to an instance of this enum class is stored in another classloader, the classloader X can indeed not be garbage collected. If this is the case, it looks very much like the problem with the java.util.logging.Level problem described above.




Frank




Tim Downey said...

Hi Frank,
I have looked into the log4j source and the reason for the leaks is the Level class, not enumerations. However, there is a problem with enumerations, too. Even if I create an Enum and only use it in the same class, I get a leak. I am guessing that the implementation of the Enum class is doing something similar to what the Level class is doing.

Roger said...

Thank you! I appreciate your taking the time to write it up including great illustrations and also Edward Chou's and others work and discussion. I'll be sure to keep an eye out for what you write about next including the new jdk6 tools.

Frank Kieviet said...

I received the following additional information from Tim Downey:




I was creating an example to demonstrate the leak, when I tried one more test. After reloading the web app, I then ran your servlet that forces a GC collection in the perm gen. After that, the 'leaks' went away.



When a Java 1.5 enum is used, some static references are added to the class that point to the values of the enum. These objects are not collected on a reload, but they are collected when the perm gen is collected.



I guess that this means that such static references will cause more collections in the perm gen, but should not cause an out of memory error.



Tim




Gary said...

Thanks for this. I found this after attended your BOF.
On the other hand, JVM could do better job to garbage collect CUSTOMLEVEL class and only leave out Level.INFO, Level.SEVERE, etc.. JVM knew they were shared/used by other classes, otherwise, it would not establish that red link(in above diagram) in the first place.

Frank Kieviet said...

Hi Gary,


It's difficult for the JVM to do a better job: the Level.class object can definitely not be GC-ed and hence the static datamembers also cannot be GC-ed.


However, the implementation of the Level class is, eh, shall we say, not the best of all possible designs. For instance, the Level class could do without the list of all instances: it's only used for serialization. Also, why would you need to sub-class the Level class? A factory method would be a better approach.


Frank


dave said...

Hi Frank



working on a tapestry application, with log4j, and a heap of other libraries. After running through and searching for leaks as suggested, i see hundreds (if not thousands) of references to static properties, and enums. In terms of fixing our own code, is it just a case of changing all the static properties to regular properties, and all the static methods to regular methods (as well as adding methods to return former static properties), and getting rid of enums? I'm still learning the trade, so forgive my ignorance, my level of understanding is not that great.



thanks


dave

Frank Kieviet said...

Hi Dave,


I don't think the problem is in the use of static variables that you see as memory leaks. It's all about identifying the links from the server code to classes in your application. Did you try to use jhat? If you like I can send you the latest version (not available in the JDK yet) that has some more advanced filtering (courtesy of Edward Chou).


Frank

dave said...

hmmm... perhaps i didn't select the right links, or maybe i was just interpreting the results incorrectly. i'll read through again, more thoroughly.


yes, i downloaded jdk 6 and played with jmap and jhat, if you could send me the updated version that'd be great.


dtra82 at gmail dot com



thanks

Frank Kieviet said...

You can download the updated jhat.jar from Edward Chou's blog.

David Mackenzie said...

Adding a servlet listener to release log4j references on destroy context may help:

public class ApplicationLifecycleListener implements ServletContextListener

{

public void contextDestroyed(final ServletContextEvent sce)

{

LogFactory.release(Thread.currentThread().getContextClassLoader());

}

public void contextInitialized(final ServletContextEvent sce)

{

}

}

karl said...

+1! garbage collection is my number one feature of java, but...

Shripad Agashe said...

In the conclusion the article mentions that the leak is because of reference from outside. From the example given it looks like a case of cyclical reference to me. If Level class not loaded before creating the servlet, then it has to be loaded by the AppClassLoader. Since the class of Level contains reference to CustomLevel, custom level can not be GCed and hence the AppClassLoader can not be GCed and hence any classes loaded by AppClassLoader can not be GCed.

Frank Kieviet said...

Re Shripad:

The "outside" is the classloader that loads the Level class. The AppClassloader doesn't have load the Level.class: it will delegate such a load request to the parent classloader. On the other hand, the CustomLevel class is in fact loaded by the AppClassloader, so now there's a reference from the classloader that loaded Level to the classloader that loaded CustomLevel.

Frank

Shripad Agashe said...

Hi Frank,

One interesting thought came to me is regarding use of static objects. Lets take a case where a class C1 has object O1 as a static member variable. Now if the O1 is updated after loading, should the class C1 be allowed to be GCed? Since if the class C1 is loaded and unloaded again, it may lose the state of O1. Not sure how this use case would work.

Tony Tung said...

Can someone answer this question about the bad Level implementation: does it mean that whenever I used the stcLogger that comes with JCAPS JCD, I have the issue of class loader leak? or is the stcLogger not using the Level class that comes with JDK?

Hari said...

Hai Frank

Excellent article. Currently we have integrated OSGI framework to our appserver. We never considered the points mentioned in this blog. Thinking of reviewing the code of service bundles we are providing and do some testing by doing multiple deploy-undeploy. Can you give some suggestions ?

Frik Strecker said...

Java enums are definitely causing memory leaks when reloading web applications under tomcat.

We are using Java 1.6 and tested with SUN JVM and JRocket. As you mentioned, the enums keep pointers to classloader and the classloader does not unload. This causes memory leaks.

With SUN JVM, you get a perm space out of memory error after just a few reloads, but not with JRocket (since it does not use the permspace).

Does anyone know how to resolve this memory leak caused by Java enums other than not using it?

Thanks,

Frik

Frank Kieviet said...

Re Frik:

I haven't seen this problem myself, perhaps because I haven't been using enums much.

If it's indeed a problem, it's a critical bug that should be fixed asap. Perhaps you can file a ticket for this problem?

Frank

Frik Strecker said...

Based on the comments by Tim Downey on this page, the enums keep a pointer to the classloader which results in a leak on reload.

I am not sure exactly how to proof or disproof what he found in his tests, but I found projects with enums do not get unloaded properly. So there seems to be merit for what he is saying and therefor my previous post. However, Java reflection does not show this pointer, but I do not what the bytecode will show. I will try a decompiler today and see what it shows.

I will be hooking up JProfiler today to see if I can get to the bottom of this, but any ideas are welcome.

As you said, if this is the case, then this is a serious bug that warrants attention.

garys said...

We also notice this with Inner classes. Even if you manually de-reference the instance, the containing instance seems to retain a reference,

Frank Kieviet said...

Re Frik:

If there are any links between a class and enums, it should show up in memory dumps that can be obtained with jmap and analyzed with jhat. I think that would be the easiest way to look into the issue.

Frank

Frank Kieviet said...

Re Garys:

I don't believe there are any classloader leaks because of the use of inner classes: these are used all over the place, including in applications of which the classes are in fact GC-ed correctly after undeployment.

Frank

Rama said...

Hi Everyone,

There is been a problem in my application server trying to deploy an application with multiple web contexts, and the total ear file size is slightly more than a Gig. The JVM throws out the Perm Gen Space error when trying to Unzip the file during deployment.

Application Server being used JBoss 4.2.

Thank you.

Rama

Frank Kieviet said...

Re Rama:

With an EAR of about a Gb, the problem that you're running into may not be a classloader leak, but simply because of the fact that it will take a lot of memory to load all these classes into memory. Try to increase your permspace memory.

Frank

Renaud Bruyeron said...

Hi Frank,

this is hugely informative, thank you so much for the post.

I do have a question though: I am facing the dread PermGen OOM, but the tomcat instances are configured to not allow redeployment of the webapps.

This setting was chosen to address this very specific problem (tomcat+struts suffers from it because of the "static reference to the classloader in a Servlet" pattern).

Now if you take redeployment out of the equation, what else can explain the PermGen OOM? My guess is that it could be the number of jars/classes in my webapps, or String.intern() (through XML parsing for example).

What I am looking for is a way to either 1) inspect the permgen 2) calculate my permgen requirement based on the number of classes / Strings / classloaders

Using YourKit profiler does not help much because the profiler does not tell me what is in what generation.

Any ideas?

- Renaud

Frank Kieviet said...

Re Renaud,

You can use jconsole to inspect the various memory pools, including permgen space.

To see how much memory you need, you could increase the permgen memory setting, start Tomcat, measure the permgen usage, deploy your application, and measure again.

Frank

Renaud Bruyeron said...

Hi Frank,

thank you for your reply.

The thing is, I am beyond the basics already: the webapp seems to be leaking permgen \*after\* deploy - i.e. permgen grows in spikes after a few days.

See http://www.deuxc.org/issue-permgen-cms.png

This shows a graph of MemoryMXBean.getNonHeapMemoryUsage().getUsed()

over time. Notice the spikes between the 6th and 7th of June, the 9th and 10th and the final one (which caused the famous PermGen OOM) at the end.

When I correlate this to the actual activity of the application, the spikes happen on publication jobs (this is a content management system). This involves a lot of XSL processing and hibernate activity.

What I am really looking for is something to \*inspect\* what is inside the nonheap memory pools so that I can get a sense of 1) why it's growing like it is (i.e. is it a leak?) 2) what the appropriate setting should be avoid OOM if it is not a leak

What do you think?

Jol said...

Thank you, this is very helpful. I am having a permgen problem with eclipse and needed to understand permgen. I added -XX:MaxPermSize=128m

to eclipse.ini which I hope will help in addition to Xms Xmx

Igor said...

Thanks for the post, very helpful! I'm using Java 6 + Tomcat 5.5 + Struts 1.2.x + Hibernate and note that the PermGen memory increase each time I deploy/undeploy an application. I tried to use another JVM (BEA JRockit ) but the problem is the same, this JVM has a memory area called "Class Memory" with a behaviour very similar. Maybe the problem is the Struts' classloader.

Harry said...

And... Still relevant three years later. Thanks.

Souvik Das said...

Hi Frank,

It was really a nice article to under stand the concept of classloader and Permgen space.thanks a lot.

I have the following questions:

if such kind of application is deployed in server and the same error comes

what is the quick solution to make the server up?

if we increase the maxpersize in memory arguements does it going to help??

Thx

Souvik

Frank Kieviet said...

Re Souvik:

If there's a true memory leak, increasing the permgen setting of the VM will only postpone the system running out of memory. Depending on how the memory leaks, e.g. only when the application is redeployed, increasing the memory settings may be a useful workaround, similarly to restarting the server when redeploying the leaking application.

HTH,

Frank

Marian said...

If I understand correctly, the problem resides in the java.util.logging package and the fact that it is used by the JVM that also starts the whole server; what's hapening if one puts rt.jar in WEB-INF/lib? Wouldn't in this case the java.util.logging classes be garbage-collected when the server stops?

Frank Kieviet said...

Re Marian:

First of all, the java.util.logging.Level issue is just an example. Indeed the problem in this example is that the JVM (root classloader = parent classloader) is holding a reference to the class that is loaded in the application classloader.

Moving rt.jar into the web application won't work, because even if you setup the delegation model to self-first, no classloader is allowed to try to self-first delegate classes that start with java.\* or javax.\*.

For other examples, where the classes are not java.\* or javax.\*, moving the jar to the application classloader would indeed likely solve the problem. Instead of moving, one could also copy the jar into a web application and turn on the self-first delegation model.

HTH,

Frank

snio said...

<p><a href="http://www.hlcsuperstore.com/mobile-phones_cell-phone-accessories.html">cell phones accessories</a></p>

<p><a href="http://www.hlcsuperstore.com/mobile-phones_cell-phones.html">cell phones</a></p>

<p><a href="http://www.hlcsuperstore.com/mobile-phones_cell-phone-batteries.html">cell phone battery</a></p>

Alexis said...

Thank you for your help. I'm a maven fan and always trying to automate deploy on continuus integration. I don't understand why that OOM is always a fatality on medium/big sized project.

Regards,

Paul Nolan said...

Hi Frank,

Great article, thank you. I was wondering if you had any plan to write a similiar article about the code cache in JDK 6? In my situation we have a home grown application server. My code that runs in this server uses Janino to generate code at runtime. All the permGen issues have been resolved i.e. permGen does not increase anymore over time and is correctly garbage collected. However, the code cache increases continually until we get an OutOfMemoryError (swap space). I am finding this more difficult to solve than the permGen issues.

best.

Frank Kieviet said...

Re Paul:

Hi Paul, I don't have much expertise in native compilation, code caching, or the VM for that matter, but it seems to me that if permgen is constant and if the code cache keeps growing, you've hit a bug in the JVM. Did you try to bring this up with the JDK team?

Frank

lava kafle said...

Perfect!! I never knew PermGen is growing exponentially across each deployment. I will never use that setting ever again in my Jrockit as I never use Sun's JDK to avoid memory problems, but unsuccessful till now. I even compile in Jrockit to avoid memory problems, but always got OOM and PermGenSpace

Chris said...

My problem was fixed by putting referenced libraries in the web server's shared library folder rather than the deployed application's library folder.

stamp duty calculator said...

I get this problem quite a bit, cheers for the heads up. Off to your next post to see how I can fix it.

blueocean said...

Hi everybody,

I get the same pb. But it seems not cause by redeployed the applications.

have we anothers reasons ?

thanks

Nesma Rashad said...

I am facing out of memory permgen exception.I tried to increase my permgen and log class loading and unloading.Unfortunately I am down after two days and when I traced the log I found that loading of each class took place one time and no grabage collector is up until the permgen reached maximum then garbage collector kept on running nt able to unload any class I am wondering what made my permgen increase whileeach class is loaded only once .I forgot to say that I am using tomcat and Sun JVM 6

Mike Sweeney said...

Is there a way for the app to monitor perm gen space usage?

e.g. our app writes a WARN to Tomcat log if Runtime.getRuntime().freeMemory() falls below 10 mb, but that doesn't help w/ the perm gen problem.

guest said...

why? it said i neede to anser a simple math question

Torsten Mielke said...

Your link to the follow up article on how to fix this problem is broken.
Although I foun the article on your blog, perhaps you could correct it.

Thx.

Plumbr said...

Talking about new tools - there is one that makes the discovery of classloader leaks easy. In most cases Plumbr memory leak detector finds the leak already after the first redeploy, and tells you exactly what is leaking.

With Plumbr leak report you can solve the leak within minutes.

Anonymous said...

So what's the right way to create a new Level? Should we simply avoid doing that? Can we be safe using just the predefined levels?