Here's a blog entry / research note / note-to-self that I wrote months ago, but never found the time to publish.
At JavaOne there was an interesting BOF about Efficient XML. It made me wonder about how efficiently the use of DOM in Java is. To find out, I wrote a small program to find out how many bytes of RAM a Java object uses. The program counts how many objects it can allocate before it runs out of memory. Here's the class:
public static class H {
public H next;
byte[] b = new byte[8];
}
By changing the size of the byte array and running the program again, the size of an H instance can be calculated, as well as the total memory space available for object allocation in the test program. This number can be used in subsequent runs. Running the program on this trivial class on a Windows machine yielded a size of 16 bytes for each allocation.
public static class H {
public H next;
}
16 bytes
The VM apparently uses 8 byte alignment, because when another reference is added to the class above, the size does not increase, but it does increase in steps of 8 bytes when more members are added:
public static class H {
public H next;
public H next1;
}
16 bytes
public static class H {
public H next;
public H next1;
public H next2;
}
24 bytes
public static class H {
public H next;
public H next1;
public H next2;
public H next3;
}
24 bytes
public static class H {
public H next;
public H next1;
public H next2;
public H next3;
public H next4;
}
32 bytes
Hence an empty class takes 8 bytes, and each object references takes 4 bytes. This is much better than what I intuitively expected: underneath there should be at least a pointer to a virtual method table (4 bytes), there should be a pointer to this object from some VM list of objects for memory management. I'd expected that list to be a linked list, and I'd expected that there perhaps would be some additional flags for garbage collection, etc. Clearly the VM does it much more efficiently than I would have done.
Similarly, the size of arrays can be measured. A char[] comes down to 12 bytes plus two times the number of characters, rounded up to an 8 byte boundary. The size of a String object is more interesting: it is 32 bytes plus two times the number of characters; the minimum size is 40 bytes. Hence, a string of 3 characters is 48 bytes, and an eight character string is 56 bytes.
Now let's take a look at XML, a DOM structure to be exact. How much does an element with a text node take, as in this XML snippet?
<root>
<ElementName>1000000</ElementName>
<ElementName>1000001</ElementName>
<ElementName>1000002</ElementName>
...
(line breaks and indentation added for clarity; "ElementName" is a string constant)
Measuring this yields 144 bytes per node. Let's add another child element:
<root>
<ElementName>
<SubEl>1000000</SubEl>
</ElementName>
<ElementName>
<SubEl>1000001</SubEl>
</ElementName>
<ElementName>
<SubEl>1000002</SubEl>
</ElementName>
...
Measuring this yields 200 bytes per repeating element. When we add an attribute the size increases again:
<root>
<ElementName>
<SubEl last="false">1000000</SubEl>
</ElementName>
<ElementName last="false">
<SubEl>1000001</SubEl>
</ElementName>
<ElementName last="false">
<SubEl>1000002</SubEl>
</ElementName>
...
This increases the size to 312 bytes per repeating element
Summarizing the results:
new Object()
|
8 bytes
|
""
|
40 bytes
|
"123"
|
48 bytes
|
"12345678"
|
56 bytes
|
int data member
|
4 bytes
|
byte data member
|
1 byte
|
Object reference data member
|
4 bytes
|
char[n]
|
12 + 2*n
|
<ElementName>1234567</ElementName> |
144 bytes
|
<ElementName> |
200 bytes
|
<ElementName> |
312 bytes
|
Indeed, DOM is not very efficient with respect to memory usage. The last XML snippet was only 62 single byte characters. The information content is actual a lot less: since ElementName, SubEl, and last are constant, they could be replaced with a reference. Without using schema information, the information content can be encoded with about 25 bytes. With JAXB it can be done with 88 bytes.
2 comments:
Interesting topic. Would you know is there a quick utility or api that allows me to find out the size of a java object likce C, specifically sizeof operator? I need to measure the size of JMS messages that my app is producing.
Thank you
You may want to look at vtd-xml as the state of the art in XML processing, consuming far less memory than DOM
vtd-xml
Post a Comment