Saturday, July 15, 2006

Internationalization of logging and error messages for the server side (cont'd)

In Thursday's entry I proposed that a tool is needed to make it easier to internationalize sources where the English error messages are kept in the source file, and the foreign language messages are in resource bundles.

Let's talk about this tool. There are a few different approaches to go about this tool. As Tim Foster remarked in his comments, it's possible to parse the source code. This approach is doable, especially when existing tools are used (Tim mentioned http://open-language-tools.dev.java.net.

Another approach is to parse the compiled byte code. Using tools like BCEL, it's fairly simple to read a .class file, and extract all the strings in there. It could easily be run on the finished product: just add some logic to go over the installation image, find all jars, and then iterate over all .class files in the jars.

Fortunately the compiler makes a string that is split up over multiple lines in the source into one:
String msg = msgcat.get("F1231: Could not open file {1}"
+ " in directory {0}: {2}", dir, file, ex);
is found in the .class file as:
F1231: Could not open file {1} in directory {0}: {2}
So it's simple to extract strings from a .class file. But how can we discern strings that represent actual error messages from other strings? Error messages can be discerned from other strings because they start with an error message number. The error message number should follow a particular pattern. In the example
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
the pattern (regular expression) is
[A-Z]dddd\\: 
Note that a similar trick would have been used when parsing source code, unless some logic is applied to find only those strings thar are used in particular constructs, like calling methods on loggers, or constructors of exceptions. This can quickly become very complicated because often log wrapper classes are used instead of java.util.Loggers.

This is also the answer to a question that I didn't pose yet: in the following code,
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
throw new IOException(msg, ex);
how does the msgcat object localize messages? It does that by extracting the error code from the message (F1231) applying the same regular expression, or by splitting the string on the colon. In either case, it's important to have a convention on how the error message looks like or is embedded into the message.

Next problem: how to re-localize an existing log so that an American support engineer can read a log from his product that was created on a Japanese system?

1 comment:

Stephanie said...

I like your post thank you!!