Sunday, July 30, 2006

J2EE JCA Resource Adapters: Poisonous pools

Introduction

As a developer of a  JCA Resource Adapter (RA) you're responsible for all aspects between the EIS and the EJB, including connection failures so that poisoned pools are avoided.

Wait! Too many acronyms in one sentence? Poisoned pools? What am I talking about? Here's a short refresher. JCA is the Java Connector Architecture and defines how Enterprise Java Beans (your application) can communicate with Enterprise Information Systems (EIS). Examples of Enterprise Information Systems are ERP systems, CRM systems and as well as other enterprise systems such as databases and JMS. The "conduit" between the EJB and the EIS is the Resource Adapter (RA). The communication can originate both from the EIS and from the EJB. The former is called inbound, the latter is outbound. In this write-up I'm looking at outbound only.

Creating a connection from your application to an EIS is often expensive. That is why the container (the application server) provides for connection pooling so that connections can be reused rather than getting recreated. This brings with it that there is a risk that faulty connections accumulate in the pool, thus causing a poisoned pool. In such a situation the application can no longer communicate with the EIS.

Seem's simple enough, doesn't it? Let's look a bit closer...

Typical time scales

One of the services that an application server provides to applications is the pooling of resources. As such, when your application uses an outbound connection of a resource adapter, the application server will maintain a pool of connections. When your application needs a connection, the application server tries to satisfy this request first by checking the pool of idle connections; if there are no idle connections, a new connection is created or the application is blocked until a connection is returned to the pool. When the application closes a connection, it is returned to the pool.

Creating a new connection typically involves creating one or more new TCP/IP connections, authentication by the EIS, creating an internal session in the EIS and its associated memory structures and data, etc. This makes creating a new connection expensive, the time scale of connection creation is usually in the order of 50-300 ms. These expensive operations can be avoided when reusing an idle connection: the time scale of reuse is measured in microseconds rather than microseconds. Next to consider is the time scale of connection use by your application, typically often in the range of 1-5 ms.

To show the effects of connection pooling on throughput, let's assume that your application takes 3 ms to process a request, and that the time it takes to create a new connection is 300 ms, while re-using a connection takes 0.03 ms. The processing time is 3.03 ms with pooling, and 303 ms without pooling. A difference of a factor hundred! Sure, I made up the numbers in this example, but they are likely close to what you'll encounter in every day practice.

In addition to the sub-second time scale of connection use and creation, there is another timescale to consider: the typical time scale of the duration of a failure. Communication failures are most often caused by the EIS becoming unavailable temporarily. This can have two causes: a loss of network connectivity, or because of the EIS being restarted. The latter is more probable and will be considered here as the typical failure scenario. Restarting an EIS typically takes from half a minute to several minutes. It is important to keep this timescale in mind when considering error handling strategies.

Mechanics of connection pooling

An outbound connection from the application to the EIS is represented by a ManagedConnection. The ManagedConnection holds the physical connection to the EIS. The lifecycle of a ManagedConnection is under control of the application server: a resource adapter creates a ManagedConnection when the application server tells it to; likewise the resource adapter destroys a connection only when instructed to do so by the application server.

A problem occurs when there is a communication failure with the EIS. For example, if a resource adapter connects to an external EIS, and this external server is restarted, the connections in the pool are all invalid. If the application were to use one of these connections, a failure would certainly occur. The failure would be propagate to your application code through an exception. A likely result would be that the transaction would be rolled back, and that the operation would be attempted again. The application server may not be able to distinguish this communication failure due to a faulty connection from other errors, so it may use the same faulty connection again on the next attempt, thereby ensuring that the same problem will happen for the next transaction. Effectively, the whole application has become inoperable because of the “poisoned pool”. To break out of this cycle, the resource adapter should let the application server know that connections are faulty so that the application server then can make the resource adapter recreate a new connection and avoid putting faulty connections back in the pool. There are several ways to do this.

  • Signal to the application server that a connection is no longer valid
  • Respond negatively when the application server asks whether a connection is valid

Signalling the application server that a ManagedConnection is faulty

After the application server instructs the Resource Adapter to create a new ManagedConnection, it calls the following method on that ManagedConnection:

public interface ManagedConnection
{
   addConnectionEventListener(ConnectionEventListener listener)

    // other methods omitted for clarity
}

The managed connection uses this ConnectionEventListener object to notify the application server of connections being closed. This object can also be used to let the application server know that the connection is faulty using the CONNECTION_ERROR_OCCURRED event. Upon receiving this event, the application server will typically destroy the connection immediately.

This approach can only be used if the resource adapter has some way of finding out when the connection is broken. In practice this turns out to be quite difficult: most resource adapters are not written from from scratch but rather make use of some client jar that takes care of the communication with the EIS. Often, the vendor of the resource adapter is not the vendor of the EIS, and even if this were the case, the vendor of the EIS most likely needs to make a client jar available independent from a resource adapter anyways. It is not uncommon that these client jars don't provide any mechanism to propagate connection failures to the caller. For instance, if the EIS is JMS, the client jar will expose only the JMS API and there is no way in the JMS API to tell the caller of a method that the physical connection is faulty.

Because the application server will destroy the connection immediately upon receiving the CONNECTION_ERROR_OCCURRED event and will roll back the transaction, it is important that the ManagedConnection does not throw this event as a result of an application condition rather than a faulty connection.

Fortunately there are alternatives to the CONNECTION_ERROR_OCCURRED mechanism.

The ValidatingManagedConnectionFactory

Rather than telling the application server that the connection is faulty, the managed connection can also wait for the application server to ask the managed connection whether it is still a valid connection. To make that work, the managed connection factory needs to implement the ValidatingManagedConnectionFactory interface:

public interface
ValidatingManagedConnectionFactory 
{
    Set getInvalidConnections(Set connectionSet) 
}

The application server will check if the managed connection factory implements this interface, and if so, it will call the getInvalidConnections() method periodically.

Again it is often a problem for the managed connection factory to know if a connection is valid or not. For example, if the resource adapter wraps a database connection, there is often no way to find out if a connection is still "live" or not. E.g. on a JDBC connection the isClosed() method does not return any status information on the physical connection but only returns whether the close() method was called.

Passive negative checks
One way for a managed connection to keep track of possible connection problems is to monitor exceptions being thrown from the client runtime to the application. If there is a way for the ManagedConnection to discern application errors (e.g. a  syntax error in a prepared statement) from communication failures, the managed connection can assume that it may be faulty if the exception count is greater than zero. For example, if the resource adapter wraps a JMS provider, it could reasonably assume that exceptions from methods like send(), publish() etc. indicate connectivity problems.

Passive positive checks
If it is not possible to passively monitor connection failures, perhaps it is possible to keep track of when a connection was used without any problem for the last time. If a connection was not used for more than say 30 seconds, you could mark that connection as invalid. Of course there is a risk that the application uses the connection less often than once every 30 seconds; if that is the case, the expense of recreating a connection may not be that bad.

Active validity check
Another way is for the managed connection to actively check the connection validity. If the managed connection uses an Oracle connection underneath, it could do a select on the DUAL table. However, it is important that this check is not very time consuming.

Keep an eye on expenses!
Above it was mentioned  that the applicaton server will call the getInvalidConnections() method periodically. How often does the application server call this method? The application server may have a timer thread that will go over all the idle connections in the connection pool and check to see if they are still valid. There are some serious problems with this approach if it is the only time that the application server calls this method: when the system is processing at or near capacity, the application server will hardly ever find idle connections in the pool.

That's why application servers typically will call the getInvalidConnections() method before it gives out a connection to the application. A simple but expensive approach is for the application server to call this method every time an application is given to the application. A smarter approach is to do this not more often than every so many seconds, a value that is configurable for the server. This value is chosen based on the expected failure duration. As was mentioned earlier, the expected failure duration is likely greater than 30 seconds. Hence, it makes little sense for the application server to call getInvalidConnections() more often than every 30 seconds.

Keep in mind however that there is no standard on what application servers do, so it is important to make sure that the getInvalidConnections() method is fast on average. If calling an expensive method is the only way to find out if a connection is valid, the managed connection factory could keep track of when it was called last, so that it will not call this expensive method more than every so many seconds. A guess can be made what a reasonable time span is by looking at how expensive the check is, and keeping the timescales of connection failures in mind, again 30 seconds being a reasonable ballpark number.

Desparate measures
If there's really no way for the managed connection to find out anything about the validity of the connection, it could resort to a crude but effective workaround: it can set a limit on the lifetime of the connection, e.g. one minute. Again, this time interval is based on the timescales of connectivity failures. This will have a small adverse effect on performance: connections are destroyed and recreated more often than they need to be. This effect will not be very big however: most of the time the application can in fact reuse an existing connection. In the example above with a connection time of 300 ms, the throughput goes down by 0.5% when the maximum lifetime of a connection is 1 minute.

Should a connection failure occur, the faulty connection will be reused for less than one minute, so the problem will eventually correct itself. If the application is used continuously, and if the expected downtime is more than one minute, this will not make any difference to the application because during the one minute in which the connection is faulty: the EIS is unavailable anyways.

Complicating factor: transaction enlistment

Resource adapters declare in the ra.xml what level of transactions they support. There are three levels: XATransaction, LocalTransaction and NoTransaction. If a resource adapter supports XATtransaction, this means that the resource adapter supports XA; the application srver will call getXAResource() on the ManagedConnection to get hold of the XAResource object to control the transaction. Resource adapters that support LocalTransaction return an instance of the LocalTransaction class when the application server calls getLocalTransaction() on the ManagedConnection. This interface has methods begin(), commit() and rollback(). Resource adapters that only support NoTransaction don’t participate at all in transactions.

If a resource adapter supports XATransaction, the managed connection will have to be enlisted each time for every transaction. The transaction manager in the application server will call start() on the XAResource. The start() call is the very first method that the application server calls on the managed connection after getting it out of the pool. The start() method will typically call into theEIS, causing an exception if the exception is faulty. The best way of dealing with this is for the application server to discard the connection, i.e. call destroy() and remove the connection from the pool. Some application servers (e.g. the Integration Server in Java CAPS) do that. So for these application servers it may suffice to do nothing in your resource adapter and still avoid poisoned connection pools. However, there are plenty of other application servers that will propagate the exception to the application and return the connection to the pool. And you do want your resource adapter to work well with any application server, don't you?

For application server that don't destroy connections when the enlistment fails, it is critical that the resource adapter has to provide for a fault detection strategy. Unfortunately, an exception on the start() method is difficult to detect for most resource adapters, because resource adapters often expose the XAResource from the client runtime directly to the application server's transaction manager. There's good reason for this, because there's an inherent problem with XAResource wrappers as I noted in my previous blog. In these situations the passive positive check as I outlined above may be useful.

Conclusion

When developing a resource adapter, it's crucial to provide for connection failure detection. Keep in mind:

  • different application servers behave differently, e.g. different frequency of calling getInvalidConnections(), different behavior when the enlistment of a connection fails
  • transaction enlistment failures may be the only failures that occur; can you detect them?
  • There are different ways of guessing if a connection is valid, even if the monitoring of failures doesn't work:
    • track when a connection was used without failure
    • assign a maximum lifetime
  • Keep an eye on expenses! Make sure that connections are not recreated every time, and make sure that active health checks don't happen too often.
With all this, keep in mind the different time scales:
  • how long it takes to create a new connection
  • how long a typical connection failure lasts
  • how many requests an application is likely to process per second

Saturday, July 22, 2006

J2EE JCA Resource Adapters: The problem with XAResource wrappers

Let's say that you're writing an outbound JCA Resource Adapter. Let's say that it supports XA. Let's say that you would need to know when the transaction is committed. You would be tempted to provide a wrapper around the "native" XAResource. If you are, read on: there are some problems you need to consider before doing that! Warning: technical warning alert! The remainder of this posting is full of technical terms like XAResource and ManagedConnection.


First let me explain in more detail what I am talking about.

Introduction


A client runtime typically is a library that takes care of the communication between a Java client and a server. For instance, a JMS client runtime is one or more jars that implement the JMS api and takes care of the communication with the JMS server. Likewise, a JDBC client runtime library implements the JDBC api to provide connectivity with a database server.



Many adapters that support XA either wrap around or build on top of an existing client runtime. For example, a JMS resource adapter typically wraps around an existing JMS client runtime. A workflow engine adapter may internally use a JDBC connection for persistence so it can be said that it builds on top of a JDBC client runtime.




How does the native XAResource fit in with the JCA container? When the application server requests the XAResource from the managed connection through the getXAResource() call, the managed connection may return its own implementation of the XAResource object, or it may return the XAResource implemented by the client runtime (the "native" XAResource). The former type is essentially a wrapper around the XAResource implemented by the client runtime.




Why is this important? Often it is necessary for a managed connection to be notified of the progress of a transaction: a managed connection may need to update its state after the transaction has been committed or rolled back. The JCA spec does not provide a standard way of doing this other than through the XAResource. This may invite you (the developer of the adapter) to write a wrapper around the XAResource instead of exposing the XAResource of the underlying client runtime directly.


There are some problems associated with the wrapper-approach, which will next be discussed in detail.



How should isSameRM() be implemented?

The isSameRM() method is called by the transaction manager to find out if two XAResource-s use the same underlying resource manager. If this is the case, instead of creating a new transaction branch, the transaction manager will join the second XAResource into the same transaction branch.

The method isSameRM() can be implemented as follows:



















class WrappedXAResource implements XAResource {
  private XAResource delegate;

public boolean isSameRM(XAResource other) {
 if (other instanceof WrappedXAResource) {
  return delegate.isSameRM(other.delegate);
} else {
return delegate.isSameRM(other);
}
}
}




Let's look at a scenario where there are three resources to be enlisted in the same transaction. Two resources belong to the same resource adapter (say W1 and W3), and the other resource belongs to an unknown entity, say R2. Let's assume that the the underlying resource manager is the same. This can happen for example when the resource adapter builds on top of JDBC driver, and the other entity is in fact a database connection to the same database.




Let's say that the application server enlists the resources in this order in the transaction: W1, R2, W3. The transaction manager may call the isSameRM() method as follows:


Case A


  1. enlist W1:
  2. enlist R2:
  3. W1.isSameRM(R2); // returns true; R2 is joined into W1
  4. enlist R3:
  5. W1.isSameRM(W3); // returns true; W3 is joined into W1










In this case, all resources are joined, i.e. one branch with W1 receiving the prepare/commit/rollback calls.




Alternatively, the transaction manager may invoke the isSameRM() call as follows:




Case B


  1. enlist W1:
  2. enlist R2:
  3. R2.isSameRM(W1); // returns false
  4. enlist R3:
  5. W3.isSameRM(W1); // returns true; W3 is joined with W1










In this case there will be two transaction branches with W1 and R2 receiving both the prepare/commit/rollback calls.




Exactly how the transaction manager invokes the isSameRM() method depends on the implementation of the transaction manager and may be differ from one implementation to another.




Now let's look at what happens if the resources happen to be enlisted in this order: R1, W2, W3




Case C


  1. enlist R1
  2. enlist W2
  3. R1.isSameRM(W2); // returns false
  4. enlsit W3
  5. R1.isSameRM(W3); // returns false
  6. W2.isSameRM(W3); // returns true; W3 is joined into W2












In this case there will be two branches with R1 and W2 receiving prepare/commit/rollback calls




Case D


  1. enlist R1
  2. enlist W2
  3. W2.isSameRM(R1); // returns true; W2 is joined into R1
  4. enlist W3
  5. W3.isSameRM(W1); // returns true; W3 is joined into R1










This case results in one transaction branch with R1 receiving the prepare/commit/rollback calls, and W2 or W3 receiving none.




To avoid case D where none of the wrappers receive prepare/commit/rollback calls, the implementation of isSameRM() should only consider other wrappers, and never consider an unwrapped XAResource:

public boolean isSameRM(XAResource other) {
if (other instanceof WrappedXAResource) {
return delegate.isSameRM(other.delegate);
} else {
return false;
}
}

This will also take care of the intransitive behavior of isSameRM() where W1.isSameRM(R2) returns true, while R2.isSameRM(W1) returns false.




Note that if multiple wrappers are joined together, only one wrapper will receive the prepare/commit/rollback calls. It is possible to keep track of all resources that are joined together, but this code becomes rather complicated although feasible. A simpler approach is to always return false in the isSameRM() method:

public boolean isSameRM(XAResource other) {
return false;
}

The obvious drawback is that this will result in more transaction branches and will be more expensive.




There's another complication that may result in the wrapper not getting any commit/rollback calls. This has to do with optimizations in the resource manager.




XAResource.prepare()


If R1 and W2 are really using the same resource manager, but the isSameRM() call returned false, there will be two transaction branches from the perspective of the transaction manager. The underlying resource manager however will see two branches of the same with the same global transaction id. The resource manager may then decide to join these two branches together internally. The result is that when the transaction manager calls XAResource.prepare() on W2, the underlying XAResource may return XA_RDONLY. If the tranaction manager receives this signal, it should not call commit() or rollback() on that resource.




The wrapper can provide more code to deal with this situation: instead of delegating the call to prepare() to the underlying XAResource and returning the return value to the caller (the transaction manager), the wrapper should make sure that it will never return XA_RDONLY. It should store this fact in its internal state, so that when the transaction manager calls commit() or rollback(), the wrapper will check if it had overruled XA_RDONLY and not call commit() or rollback() on the underlying XAResource.




The expense of having multiple branches


The performance difference between a transaction with a single branch and a transaction with two branches is enormous. In the case of a single branch, the transaction manager can skip the call to prepare() and only needs to call commit(onephase=true). The transaction manager does not need to log any state to its transaction log. Any write operation to the disk, both by the underlying resource manager and the transaction manager writing to the transaction log is expensive. This is because to be able to guarantee transactional integrity, the write-operations will have to guarantee that the data is in fact on the disk, and not in some write cache. This is done by “syncing” the data to disk. This is an expensive operation; even a fast hard drive can not sync to the disk faster than say 100 times per second. So, changing a single branch transaction to a transaction with two branches, is in fact very expensive.


An alternative


Instead of using wrappers around the XAResource, it's also possible to register interest in the outcome of the transaction by registering a javax.transaction.Synchronization object with the transaction manager. This interface declares two methods: beforeCompletion() and afterCompletion(). The latter takes an argument to indicate if the transaction was committed or rolled back.




The Synchronization object needs to be registered with the javax.transaction.Transaction object using the registerSynchronization(Synchronization sync) method; this object can be obtained from the javax.transaction.TransactionManager object using the getTransaction() method. The question is how to obtain a handle to the TransactionManager. This is not specified in the J2EE spec, and different application servers make the transaction manager available in different ways. As it turns out, most application servers bind the transaction manager in JNDI and for a few others, some extra code is necessary to invoke some methods on some classes. A notable exception is IBM WebSphere that does not provide access to the javax.transaction interfaces, but provides its own proprietary interfaces. However, with some extra code, the same behavior can be obtained. The bottom line is that it is doable to develop some code that can register a Synchronization object on all current application servers.




The approach using a Synchronization object does not suffer from the performance penalty of causing multiple transaction branches when only one would suffice. Hence, this is a better alternative than using wrappers.



Friday, July 21, 2006

Resource adapters at JavaOne

At JavaOne 2006 I gave the presentation "Developing J2EE Connector Architecture Resource Adapters". I did that together with Sivakumar Thyagarajan. He works in Sun's Bangalore office.

That he works there, while I work in Monrovia California... that's one of the interesting things about working at Sun: it's very international. Sun has people in all corners of the world. Last week I was on a phone call with perhaps 20 other people, and 12 of them were from other countries.

I talked to Sivakumart a few times on the phone before JavaOne, but met him only face to face for the first time at JavaOne. That's also one of the nice things about JavaOne: you get to meet people face to face you otherwise only talk with on the phone.

The presentation went pretty well, and the audience agreed: at the end of the presentation all audience members can fill in an evaluation form. Here are the feedback results
 
Overall quality
Speakers
Our presentation
4.32
4.28
Average at JavaOne
3.99
3.90

If you're interested, here's some more information:
JavaOne session information of "Developing J2EE Connector Architecture Resource Adapters"
Slide presentation "Developing J2EE Connector Architecture Resource Adapters"

I've recorded the presentation on my MP3 player:
Audio presentation of "Developing J2EE Connector Architecture Resource Adapters" (.wav)
Audio presentation of "Developing J2EE Connector Architecture Resource Adapters" (.mp3)

Next: JCA Resource adapters

At Sun I'm responsible for the application server and the JMS server that are shipping as part Java CAPS. As such I've been involved quite a bit in resource adapters (Java Connector Architecture, or JCA).

All this serves as an introduction to some more technical blogs on resource adapters.

Saturday, July 15, 2006

Automatic log translation

Why would I want to translate logs from one locale to another?

Say that I were to build a product, and I do a nice job to make it internationalizable. The product is a success, and it is localized into various languages. Next, a customer in a far-away place sends an email complaining about the server failing to start up. For my convenience, he attached the log file to it. Since I did a good job writing decent error and logging messages, I expect it to be no problem to diagnose what's going wrong. But oops, since I did such a nice job internationalizing the product, the log is in some far-away language, say Japanese! Now what?

Let me give exemplify the example. Let's say that the log contains this entry:
(F1231) Bestand bestelling-554 in lijst bestellingen kon niet geopend worden: (F2110) Een verbinding met 
server 35 kon niet tot stand gebracht worden.
Looks foreign to you? (It doesn't to me, but I would have great problems if this example were in Japanese).

Fortunately, through the error codes, we could make an attempt to figure out what it says by looking up the error codes. However, if there are many log entries, this becomes a laborious affair. Would it be possible to obtain an English log file without tedious lookups and guess work?

I think there are a few different approaches to this problem:
  1. always create two log files: an English one and a localized one.
  2. store the log in a language-neutral form, and use log viewers that render the log in a localized form
  3. try to automatically "translate" the localized log file into English
The trouble with the first two approaches is that in Java localization happens early rather than late. Let me explain what I mean  by that. If you have a compound message as in the example, at each point that information is added to the message, the message is localized. The example above could have occurred through the following code sequence:
try {
...
throw new Exception(msgcat.get("F2110: Could not establish connection with server {0}", serverid));
} catch (Exception e) {
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
throw new IOException(msg, ex);
}
The exception message is already localized when it is thrown. E.g. in the catch-block, there is already no language neutral message anymore. It would have been nice if there were a close cousin to the Object.toString() method: one that takes the locale: toString(Locale) and if the Exception class would take an Object instead of limiting itself to a String.

In a previous product where I had more control over the complete codebase, I approaches this problem by introducing a specialized text class that supported the toString(Locale) method, and Exception classes that could return this text class. This solution was also ideal for storing text in locale-neutral form in a database, so that different customers could view the data in different locales.

There is a kludgy work-around: we could change the  msg.get() method so that it returns a String that is locale neutral rather than localized. A separate method would convert the locale neutral String into a localized String, e.g. msg.convert(String, Locale). This method would have to be called any time a String would be prepared for viewing, e.g. in the logging  for a localized log.

In the products that I am currently working on, these approaches to support locale-neutral strings are not feasible because they would require widespread. So let's take a look at option 3.

Given the resource bundle
F1231 = Bestand {1} in lijst {0} kon niet geopend worden: {2}
F2110 = Een verbinding met de server {0} kon niet tot stand gebracht worden.
and
F1231 = Could not open file {1} in directory {0}: {2}
F2110 = Could not establish connection with server {0}
let's see if there is a way to automatically translate the message
(F1231) Bestand bestelling-554 in lijst bestellingen kon niet geopend worden: (F2110) Een verbinding met 
server 35 kon niet tot stand gebracht worden.
into
(F1231) Could not open file bestelling-554 in directory bestellingen: (F2110) Could not establish a
connection with server 35.
I think it is possible to build a tool that can do that. The tool would read in all known resource bundles (possibly by pointing it to the installation image, after which the tool would scan all jars to exttact all resource bundles), and translate them into regular expressions. It would have to be able to recognize error codes (e.g. \\([A..Z]dddd\\) ) and use these to successively expand the error message into its full locale neutral form. In the example, the neutral form is:
[F1231, {0}=bestellingen, {1}=bestelling-554, {2}=[F2110, {0}=35]]
The neutral form then can be easily converted into the localized English form.

Internationalization of logging and error messages for the server side (cont'd)

In Thursday's entry I proposed that a tool is needed to make it easier to internationalize sources where the English error messages are kept in the source file, and the foreign language messages are in resource bundles.

Let's talk about this tool. There are a few different approaches to go about this tool. As Tim Foster remarked in his comments, it's possible to parse the source code. This approach is doable, especially when existing tools are used (Tim mentioned http://open-language-tools.dev.java.net.

Another approach is to parse the compiled byte code. Using tools like BCEL, it's fairly simple to read a .class file, and extract all the strings in there. It could easily be run on the finished product: just add some logic to go over the installation image, find all jars, and then iterate over all .class files in the jars.

Fortunately the compiler makes a string that is split up over multiple lines in the source into one:
String msg = msgcat.get("F1231: Could not open file {1}"
+ " in directory {0}: {2}", dir, file, ex);
is found in the .class file as:
F1231: Could not open file {1} in directory {0}: {2}
So it's simple to extract strings from a .class file. But how can we discern strings that represent actual error messages from other strings? Error messages can be discerned from other strings because they start with an error message number. The error message number should follow a particular pattern. In the example
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
the pattern (regular expression) is
[A-Z]dddd\\: 
Note that a similar trick would have been used when parsing source code, unless some logic is applied to find only those strings thar are used in particular constructs, like calling methods on loggers, or constructors of exceptions. This can quickly become very complicated because often log wrapper classes are used instead of java.util.Loggers.

This is also the answer to a question that I didn't pose yet: in the following code,
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
throw new IOException(msg, ex);
how does the msgcat object localize messages? It does that by extracting the error code from the message (F1231) applying the same regular expression, or by splitting the string on the colon. In either case, it's important to have a convention on how the error message looks like or is embedded into the message.

Next problem: how to re-localize an existing log so that an American support engineer can read a log from his product that was created on a Japanese system?

Friday, July 14, 2006

Internationalization of logging and error messages for the server side (cont'd)

I ended my previous blog with "Isn't there a better way"? Well... how would I like to use error messages in Java source code? I would simply want to write something like this:

String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
throw new IOException(msg, ex);
The advantages are clear:
  1. No switching to a different file to add the error message
  2. The error message is visible right there in the source code -- easy to review!
  3. It's easy to see that the arguments in {0}, {1} are correct   
But how are would we deal with the same error message (same error code) being used in multiple places in the source? Well... there's not a good anwer for that. Fortunately, error messages tend to be unique, i.e. rarely would the same error message be reused.

How is this source file internationalized? We need a tool! The tool will
  1. locate all error messages in the code base
  2. generate properties files for all desired languages if they don't exist     
  3. print out a list of all properties files that need to be localized
Here's what I would like to see as the output of the tool:
msgs_en_US.properties
# AUTOMATICALLY GENERATED# DO NOT EDIT
# com.stc.jms.server.SegmentMgr
F1231 = Could not open file {1} in directory {0}: {2}
and   
msgs_nl_NL.properties
# com.stc.jms.server.SegmentMgr
# F1231 = Could not open file {1} in directory {0}: {2}
# ;;TODO:NEW MESSAGE;;F1231 =

so that a human translator can easily add the translations to the foreign properties files. As you can see, it includes the location (Java class) where the message was encountered.

msgs_nl_NL.properties
# com.stc.jms.server.SegmentMgr
# F1231 = Could not open file {1} in directory {0}: {2}
F1231 = Bestand {1} in lijst {0} kon niet geopend worden: {2}
Ofcourse the tool would have a way of handling if you would change the error message in the source code. The translated properties file would look like this:   
msgs_nl_NL.properties
# com.stc.jms.server.SegmentMgr
# F1231 = File {1} in directory {0} could not be opened: {2}
# F1231 = Could not open file {1} in directory {0}: {2}
# ;;TODO: MESSAGE CHANGED;;
F1231 = Bestand {1} in lijst {0} kon niet geopend worden: {2}
Any ideas on how to build this tool?

Wednesday, July 12, 2006

Internationalization of logging and error messages for the server side

This is how one would throw an internationalizable message:
String msg = msgcat.get("EOpenFile", dir, file, ex);
throw new IOException(msg, ex);
The localized message is typically in a .properties file, e.g.
EOpenFile=F1231: Could not open file {1} in directory {0}: {2}
Each language has its own .properties file. The  msgcat class is a utility class that loads the .properties file. Logging messages to a log file typically uses the same constructs.

Looks cool, right? So what's my gripe?
  1. when coding this, you would have to update .properties file in addition to the .java file you're working on
  2. It's easy to make a typo in the message identifier,  EOpenFile in the example above; there is no compile time checking for these "constants".
  3. It's difficult to check that the right parameters are used in the right order ({0}, {1}) etc.
  4. When reviewing the .java file, it's difficult to check that the error message is used in the right context and that the error message is meaningful in the context.
  5. When reviewing the .properties files, it's difficult to determine where these error messages are used (if at all!) -- you can only find out through a full text search
Isn't there a better way?

Sunday, July 9, 2006

Hello world

Hello world... this is my first ever blog. So, who am I?

I came to Sun through the acquisition of SeeBeyond. There are quite a lot of differences between SeeBeyond and Sun. SeeBeyond was a bit of a secretive company. Contact with the "outside world" was not exactly stimulated, rather frowned upon. Sun on the other hand is all for openness and communication with the world outside of Sun. Hence this blog.

What can I tell about my time before Sun and SeeBeyond? Well, to begin at the beginning: a defining moment in my life was when I took a programming class in high school. As soon as I could, I bought a Commodore 64 on which I toiled countless hours away coding tools in the immensely primitive BASIC of that machine. Ever since, my one great interest has been developing software.

For the past ten-or-so years, I have been involved in development of commercial software products. Early on in my career I was in software for the chemical process industry (my background is really chemical engineering); soon I discovered that I should not limit myself to chemical engineering and I moved into enterprise software. First into B2B software, then into integration software.

In a previous life I spent five years working at the Eindhoven University of Technology in the department of Chemical Engineering. Much of it was in compuational modelling and laboratory automation; I had a tiny softare company on the side for the evening hours. After I earned my PhD I left for the USA.

What will I do with this blog? No personal musings... but I hope to write plenty of technical ones.