Frank Kieviet's Engineering Notebook

Sunday, December 21, 2008

The size of Java objects

Here's a blog entry / research note / note-to-self that I wrote months ago, but never found the time to publish.

At JavaOne there was an interesting BOF about Efficient XML. It made me wonder about how efficiently the use of DOM in Java is. To find out, I wrote a small program to find out how many bytes of RAM a Java object uses. The program counts how many objects it can allocate before it runs out of memory. Here's the class:

    public static class H {
        public H next;
        byte[] b = new byte[8];
    }

By changing the size of the byte array and running the program again, the size of an H instance can be calculated, as well as the total memory space available for object allocation in the test program. This number can be used in subsequent runs. Running the program on this trivial class on a Windows machine yielded a size of 16 bytes for each allocation.

    public static class H {
        public H next;
    }
16 bytes

The VM apparently uses 8 byte alignment, because when another reference is added to the class above, the size does not increase, but it does increase in steps of 8 bytes when more members are added:

    public static class H {
        public H next;
        public H next1;
    }
16 bytes

    public static class H {
        public H next;
        public H next1;
        public H next2;
    }
24 bytes

    public static class H {
        public H next;
        public H next1;
        public H next2;
        public H next3;
    }
24 bytes

    public static class H {
        public H next;
        public H next1;
        public H next2;
        public H next3;
        public H next4;
    }
32 bytes

Hence an empty class takes 8 bytes, and each object references takes 4 bytes. This is much better than what I intuitively expected: underneath there should be at least a pointer to a virtual method table (4 bytes), there should be a pointer to this object from some VM list of objects for memory management. I'd expected that list to be a linked list, and I'd expected that there perhaps would be some additional flags for garbage collection, etc. Clearly the VM does it much more efficiently than I would have done.

Similarly, the size of arrays can be measured. A char[] comes down to 12 bytes plus two times the number of characters, rounded up to an 8 byte boundary. The size of a String object is more interesting: it is 32 bytes plus two times the number of characters; the minimum size is 40 bytes. Hence, a string of 3 characters is 48 bytes, and an eight character string is 56 bytes.

Now let's take a look at XML, a DOM structure to be exact. How much does an element with a text node take, as in this XML snippet?

<root>
  <ElementName>1000000</ElementName>
  <ElementName>1000001</ElementName>
  <ElementName>1000002</ElementName>
...

(line breaks and indentation added for clarity; "ElementName" is a string constant)

Measuring this yields 144 bytes per node. Let's add another child element:

<root>
  <ElementName>
    <SubEl>1000000</SubEl>
  </ElementName>
  <ElementName>
    <SubEl>1000001</SubEl>
  </ElementName>
  <ElementName>
    <SubEl>1000002</SubEl>
  </ElementName>
...

Measuring this yields 200 bytes per repeating element. When we add an attribute the size increases again:

<root>
  <ElementName>
    <SubEl last="false">1000000</SubEl>
  </ElementName>
  <ElementName last="false">
    <SubEl>1000001</SubEl>
  </ElementName>
  <ElementName last="false">
    <SubEl>1000002</SubEl>
  </ElementName>
...

This increases the size to 312 bytes per repeating element

Summarizing the results:

new Object()	8 bytes
""	40 bytes
"123"	48 bytes
"12345678"	56 bytes
int data member	4 bytes
byte data member	1 byte
Object reference data member	4 bytes
char[n]	12 + 2*n
<ElementName>1234567</ElementName>	144 bytes
<ElementName> <SubEl>1234567</SubEl> </ElementName>	200 bytes
<ElementName> <SubEl last="false">1234567</SubEl> </ElementName>	312 bytes

Indeed, DOM is not very efficient with respect to memory usage. The last XML snippet was only 62 single byte characters. The information content is actual a lot less: since ElementName, SubEl, and last are constant, they could be replaced with a reference. Without using schema information, the information content can be encoded with about 25 bytes. With JAXB it can be done with 88 bytes.

Hello world

This is my first blog entry on blogger.com. It's really a continuation of my other blog, the one that I started at http://blogs.sun.com/fkieviet. Why did I start a new blog? Blogs at blogs.sun.com are tied to Sun Microsystems, my employer. With the world in economic turmoil, Sun included, I felt it was a good idea to have a blog not tied to my employer. A few words about me: I'm a software engineer. Building software has been my hobby since the Commodore 64. I'm glad that I've been able to turn this hobby into a full time profession. I currently work as a Senior Staff Engineer at Sun Microsystems. There I work as a lead in the SOA/Business Integration group, and am a contributor to OpenESB, an open source platform for Integration and SOA.

It's been a long time since my last blog entry on blogs.sun.com. Been busy -- work has been piling up, and it's difficult to justify taking time to write a blog when people are waiting on me for work to be finished. (And that's another reason to start this blog instead of continuing on blogs.sun.com.) How I find the time to write this then? I'm on vacation this week!

Monday, May 19, 2008

JavaOne 2008

It's a week after JavaOne 2008 now. I finally have time to post a blog. I've been extremely busy for and before JavaOne: not only with the presentations that I gave at JavaOne, but also because the Java CAPS 6 code freeze was the week before JavaOne.

At JavaOne I gave three presentations:

For Java University (the day before JavaOne), I presented a part of Joe Boulenouar's class "How Java EE 5 and SOA Help in Architecting and Designing Robust Enterprise Applications". In my part I covered ESBs, JBI and Composite Applications.

A technical session: TS-5301 Sun Java Composite Application Platform Suite: Implementing Selected EAI Patterns. I presented this with Michael Czapski, a colleague in Sun's field organization in Australia. He's also the author of the book Java CAPS Basics: Implementing Common EAI Patterns. In this session we went over a number of EAI patterns from Hohpe and Woolf's book and showed that when you use the right Integration Middleware, you use these patterns almost without realizing it.

A Birds-of-a-feather session: BOF-6211: Transactions and Java Business Integration (JBI): More Than Java Message Service (JMS). I presented this with Murali Pottlapelli, a colleague in Monrovia. Since there was interest in the slides that we presented, and because unlike Sessions, the slides of BOFs are not made available by the JavaOne organization, you can download the slides of Transactions and JBI: More Than JMS from my blog. I also recorded the sound using my MP3 player, but the quality of the recording is pretty bad. Nevertheless, I've also uploaded the mp3 of Transactions and JBI: More Than JMS.

What's next? Now that CAPS 6 is almost out of the door, we're going to focus on the next release. Even more than in the past, we'll be doing this in open source. More to come!

Saturday, October 6, 2007

Server side Internationalization made easy

Last year I wrote a blog entry on my gripes with Internationalization in Java for server side components. Sometime in January I built a few utilities for JMSJCA that makes internationalization for server side components a lot easier. To make it available to a larger audience, I added the utilities to the tools collection on http://hulp.dev.java.net. What do these utilities do?

Generating resource bundles automatically

The whole point was that when writing Java code, I would like to keep internationalizable texts close to my Java code. Rather than in resource bundles, I prefer to keep texts in my Java code so that:

While coding, you don't need to keep switching between a Java file and a resource bundle.
No more missing messages because of typos in error identifiers; no more obsolete messages in resource bundles
You can easily review code to make sure that error messages make sense in the context in which they appear, and you can easily check that the arguments for the error messages indeed match.

In stead of writing code like this:

        sLog.log(Level.WARNING, "e_no_match_w_pattern", new Object[] { dir, pattern, ex}, ex);

Prefer code like this:

        sLog.log(Level.WARNING, sLoc.t("E131: Could not find files with pattern {1} in directory {0}: {2}"
          , dir, pattern, ex), ex);

Here's a complete example:

public class X {
    Logger sLog = Logger.getLogger(X.class.getName());
    Localizer sLoc = Localizer.get();

    public void test() {
        sLog.log(Level.WARNING, sLoc.t("E131: Could not find files with pattern {1} in directory {0}: {2}"
          , dir, pattern, ex), ex);
    }
}

Hulp has an Ant task that goes over the generated classes and extracts these phrases and writes them to a resource bundle. E.g. the above code results in this resource bundle:

# DO NOT EDIT
# THIS FILE IS GENERATED AUTOMATICALLY FROM JAVA SOURCES/CLASSES

# net.java.hulp.i18ntask.test.TaskTest.X
TEST-E131 = Could not find files with pattern {1} in directory {0}\\: {2}

To use the Ant task, add something like this to your Ant script, typically between <javac> and <jar>:

<taskdef name="i18n" classname="net.java.hulp.i18n.buildtools.I18NTask" classpath="lib/net.java.hulp.i18ntask.jar"/>
<i18n dir="${build.dir}/classes" file="src/net/java/hulp/i18ntest/msgs.properties" prefix="TEST" />

How does the Ant task know what strings should be copied into the resource bundle? It uses a regular expression for that. By default it looks for strings that start with a single alpha character, followed by three digits followed by a colon, which is this regular expression: [A-Z]\\d\\d\\d: .\*.

Getting messages out of resource bundles

With the full English message in the Java code, how is the proper localized message obtained? In the code above, this is done in this statement:

sLoc.t("E131: Could not find files with pattern {1} in directory {0}: {2}", dir, pattern, ex)

The method t takes the string, extracts the message ID out of it (E131) and uses the message ID plus prefix (TEST) to lookup the message in the right resource bundle, and returns the substituted text. The method t lives in class Localizer. This is a class that needs to be declared in the package where the resource bundles are placed. The class derives from net.java.hulp.i18n.LocalizationSupport. E.g.:

public static class Localizer extends net.java.hulp.i18n.LocalizationSupport {
    public Localizer() {
        super("TEST");
    }

    private static final Localizer s = new Localizer();

    public static Localizer get() {
        return s;
    }
}

The class name should be Localizer so that the Ant task can be extended later to automatically detect which packages use which resource bundles.

Using the compiler to enforce internationalized code

It would be nice if the compiler could force internationalized messages to be used. To do that, Hulp includes a wrapper around java.util.logging.Logger that only takes objects of class LocalizedString instead of just String. The class LocalizedString is a simple wrapper around String. The Localizer class produces these strings. By avoiding using java.util.logging.Logger directly, and instead using net.java.hulp.i18n.Logger the compiler will force you to use internationalized texts. Here's a full example:

public class X {
    net.java.hulp.i18n.Logger sLog = Logger.getLogger(X.class);
    Localizer sLoc = Localizer.get();

    public void test() {
        sLog.warn(sLoc.x("E131: Could not find files with pattern {1} in directory {0}: {2}"
          , dir, pattern, ex), ex);
    }
}

Logging is one area that requires internationalization, another is exceptions. Unfortunately there's no general approach to force internationalized messages in exceptions. You can only do that if you define your own exception class that takes the LocalizedString in the constructor, or define a separate exception factory that takes this string class in the factory method.

Download

Go to http://hulp.dev.java.net to download these utilities. The jars (the Ant task and utilities) are also hosted on the Maven repository on java.net.

Saturday, September 29, 2007

Using Nested Diagnostics Contexts in Glassfish

What is a Nested Diagnostics Context?

Let's say that we're writing a message driven bean (MDB) that we'll deploy on Glassfish. Let's say that the MDB's onMessage() method grabs the payload of the message and calls into a stateless session bean (SLSB) for processing. Let's say that the implementation of the SLSB calls into org.apache.xparser:

my.company.MDB > my.company.SLSB > org.apache.xparser

Let's say that he xparser package may log some warnings if the payload is not properly formatted. No problem so far. Now let's say that the application is put in production together with a dozen other applications and that many of these applications use this library. The administrator once in a while finds these warnings in Glassfish's server.log:

[#|2007-09-17T18:36:03.247-0400|WARN|sun-appserver9.1|org.apache.xparser.Parser
|_ThreadID=18; ThreadName=ConsumerMessageQueue:(1);
|Encoding missing, assuming UTF-8|#]

Let's say that the administrator wants to relay this information to the developer responsible for this application. Using the category name org.apache.xparser.Parser, the administrator can find out what code is responsible (a third party component in this case), but how can the administrator find out which application is responsible for this log output?

One approach is to always log the application name before calling into the SLSB, so that the administrator can find the application name using the _ThreadID: he would look at the _ThreadID of the warning, then look for a message earlier in the log that has the same _ThreadID that identifies the application. Not only is this cumbersome, it's also a big problem that the application now fills up the log with the application name just in case the SLSB would log something.

It would be nice if somehow the MDB could associate the thread with the application name, so that if code downstream logs anything, the log message will be adorned with the application name:

[#|2007-09-17T18:36:03.247-0400|WARN|sun-appserver9.1|org.apache.xparser.Parser
|_ThreadID=18; Context=Payrollsync; ThreadName=ConsumerMessageQueue:(1);
|Encoding missing, assuming UTF-8|#]

In Log4J, this is quite simple using Log4J's NDC class: before the MDB calls into the SLSB, it would call NDC.push("Payrollsync") to push the context onto the stack, and after the SLSB it would call NDC.pop(). NDC stands for Nested Diagnostic Context. It's called nested because it maintains a stack, so that the SLSB could push another context onto the stack, hiding the context of the MDB, and pop the context off the stack to restore the stack in its original state before returning. Of course each thread has to have its own stack.

The NDC is a nice facility in Log4J. Unfortunately, in java.util.logging there's no such facility. Let's build one!

Building a Nested Diagnostic Context

The Nested Diagnostics Context will have to keep a stack per thread. When the logging mechanism logs something, it needs to peek at the stack and add the top most item on the stack to the log message. The stack needs to be accessible somehow by both the application that sets the context and by the logging mechanism. A complicating factor in this is that it needs to work with both delegating-first and self-first classloaders. The latter is found in some web applications (special setting in sun-web.xml) and in some JBI components. Furthermore, we would like to use this mechanism in Glassfish and avoid having to make changes to the Glassfish codebase. Lastly, we need to avoid making changes to the application that would cause the application to be no longer portable to other application servers.

How do we expose an API to the application so that it can push and pop contexts onto and off the stack? We could define a new Java API, but that would mean that unless an application packages the jar with that new API, it cannot be deployed on an application server instance that doesn't have the NDC code in its classpath. Here's a solution that doesn't require a new API: reuse the existing java.util.logging.Logger API! We'll define two special logger names, one for pushing contexts onto the stack, and one for popping contexts off the stack. Since we're tapping into the log stream anyways, this is not as far a stretch as it may seem. Here's how an application uses this mechanism:

Logger.getLogger("com.sun.EnterContext").fine("Payrollsync");
slsb.process(msg.getText());
Logger.getLogger("com.sun.ExitContext").fine("Payrollsync");

The loggers com.sun.EnterContext and com.sun.ExitContext are special loggers that we'll develop; messages written to these loggers directly interact with the context stack. Through these special loggers, this example will result in adding the context to any log messages that are produced in the slsb.process(msg) call. On other application servers without these special loggers, this will result in logging the context at FINE level before and after the call to the SLSB is made, so that one can associate a log message using the _ThreadID; it will not do anything if FINE logging is turned off.

What if we want to add more than one context parameter to the log message? For instance, what if we want to add the ID of the message that we're processing?

Logger.getLogger("com.sun.EnterContext")
.log(Level.FINE, {0}={1}, {2}={3}, new Object[] {"Application", "Payrollsync", "Msgid", msg.getMessageID()});
slsb.process(msg.getText());
Logger.getLogger("com.sun.ExitContext")
.log(Level.FINE, {0}={1}, {2}={3}, new Object[] {"Application", "Payrollsync", "Msgid", msg.getMessageID()});

The special logger will take the Object[] and push these on the stack. The message string "{0}={1}, {2}={3}" is there merely for portability: if the the code is deployed onto an application server to which we didn't install the NDC facilities, this will simply log the context parameters at FINE level.

Implementation

In a stand alone Java application, you would simply set your own LogManager and implement the NDC functionality there. Glassfish already comes with its own LogManager, and we don't want to override that. Rather, we want to plug in new functionality without any changes to the existing code base. Here's what we need to do:

create the special loggers com.sun.EnterContext and com.sun.ExitContext
hookup these special loggers
hook into the log stream to print out the context

To create the special loggers, we can simply create a new class that derives from java.util.logging.Logger, say EntryLogger. Next, we need to make sure that when someone calls Logger.getLogger("com.sun.EnterContext"), it will be this class that is returned. Without making any changes to the LogManager, the way that that can be accomplished is by instantiating the new EntryLogger and registering it with the LogManager immediately. This has to be done before anybody calls Logger.getLogger("com.sun.EnterContext"). In other words, we should do this before any application starts. In Glassfish there's an extensibility mechanism called LifeCycleListeners. An object that implements this interface can be loaded by Glassfish automatically upon startup.

Lastly, we need to find a way to add the context to the log entries in the log. Glassfish already has a mechanism to add key-value pairs to each log entry: when formatting a LogRecord for printing, Glassfish calls LogRecord.getParameters() and checks each object in the returned Object[] for objects that implement java.util.Map and java.util.Collection. For objects that implement java.util.Map, Glassfish adds the key-value pairs to the log message. For objects that implement java.util.Collection, Glassfish adds each entry as a String to the log message.

If each LogRecord can somehow be intercepted before it reaches Glassfish's Formatter, the context can be added as an extra parameter to the LogRecord's parameter list. This can be done by adding a new java.util.logging.Handler to the root-Logger before Glassfish's own Handler. For each LogRecord that this new Handler receives, it will inspect the Context stack and add a Map with the Context to the LogRecord. Next, the root-Logger will send the LogRecord to Glassfish's own Handler which takes care of printing the message into the log. Once again, the LifeCycleListener is the ideal place to register the new Handler.

Give it a spin!

You can download the jar that has these new classes and/or download the sources. Put the jar in Glassfish's lib directory. Restart the server and install the LifeCycleListener:

LifeCycleListener Configuration

Wednesday, May 16, 2007

JavaOne 2007

All of last week I was at JavaOne. It was an exhausting but very interesting week. Like last year, there were many interesting sessions, too many to list them here. Let me just mention the one I enjoyed most was the one by Neal Gafter on Closures for the Java Programming Language (BOF-2358). I can't wait until they're in the Java language!

Not only did I attend sessions and BOFs, I also presented BOFs. Three of them to be precise. I recorded the audio on my MP3 player. Unfortunately the quality of the audio is pretty bad. I'm posting the audio recordings below. I'm also posting the slides. Here they are:

BOF8847: Developing Components for Java Business Integration: Binding Components and Service Engines

Presented by Frank Kieviet, Alex Fung, Sherry Weng, and Srinivasan Chikkala
Attendance: about 100

You cannot cover how to write JBI components in just 45 minutes. We were also not sure about what the audience was interested in. That's why we assumed that the audience would consist mostly of people who have never written a JBI component before, and are relatively new to JBI. That's why we decided to talk mostly about general information on JBI and JBI components, and highlight the power of JBI and discuss how to go about developing one.

As an experiment I wanted to try a new format (at least new for me): rather than slicing up the session into four parts of 10 minutes, we cast the session into a "discussion forum". Of course the questions and answers (and even the jokes) were well rehearsed.

Unfortunately, the audio/visual people that control the meeting rooms, had forgotten to start the session timer. As a result the audio was cut unexpectedly just a minute before we could finish up.

Nevertheless, I think it was an interesting session.

Presentation JavaOne07-BOF8847 (pdf)

Audio JavaOne07-BOF8847 (mp3)

BOF8745: Leveraging Java EE in JBI and vice versa

Presented by Frank Kieviet and Bhavanishankara Sapaliga
Attendance: about 60

This BOF was originally to be presented by Vikas Awasthi and Bhavanishankara Sapaliga, but Vikas couldn't make it, so I replaced him. We focused the session on how JBI and EE can play together, trying to make it interesting for both JBI application developers as well as for EE developers. At the end I ran a demo with NetBeans showing three different scenarios. The demo-gods were with me: the demo went very smoothly. Unfortunately I forgot to demo how to add an EJB to a composite application. Another valuable lesson learned.

Presentation JavaOne07-BOF8745 (pdf)

Audio JavaOne07-BOF8745 (mp3)

BOF9982: The java.lang.OutOfMemoryError: PermGen Space error demystified

Presented by Edward Chou and Frank Kieviet
Attendance: about 116

This session was on Thursday night at 10pm. That night was the JavaOne After dark bash. Free beers, music and snacks for everyone. Therefore we didn't expect much of an attendance: memory leaks are a rather dry subject, and why leave the party early to go to this session? Also, some of our thunder had been stolen by SAP who demo-ed a tool to track memory leaks in a morning-session earlier that week. So we were quite surprised when about 116 people turned up for our session. Most stayed until the very end, and there were also quite a few interesting questions. Apparently a lot of people struggle with memory leaks in permgen space -- in my presentation I mention that I get about a hundred hits on my blog every day from people who search for this memory exception in Google.

Presentation JavaOne07-BOF9982 (pdf)

Audio JavaOne07-BOF9982 (mp3)

Tuesday, May 1, 2007

JavaOne / memory leaks revisited...

Memory leaks in print

A few months ago, Gregg Sporar together with A. Sundararajan started an article on memory leaks in the magazine Software Test & Performance. While writing that, he stumbled upon my blog and decided to cover the "java.lang.OutOfMemoryError: PermGen space" exception too. I offered to collaborate on the article. The article eventually grew so much it was split in two. Part one was published a month ago. Yesterday, part two was published.

Memory leaks at JavaOne

Edward Chou submitted a proposal for a BOF at JavaOne 2007. He and I will be presenting a BOF on the "java.lang.OutOfMemoryError: PermGen space" exception. I'll try to record the session with my MP3 player and post it on my blog.

In preparation for our presentation, we've been looking at some real-life examples of permgen memory leaks. We took a few memory dumps that came from actual customers in actual production environments. We discovered a few more improvements we could make to jhat: it was already fairly simple to track the leaks with jhat; with these changes it becomes really simple. We were actually quite surprised how simple. More on that in a future entry, either on my blog or on Edward's.

More at JavaOne

Speaking about JavaOne... I have my hands full. Next to the memory leaks BOF, I'm also presenting a BOF on JBI ("How to develop JBI components") and I'll be co-presenting another BOF on "EE and JBI."