Saturday, July 15, 2006

Automatic log translation

Why would I want to translate logs from one locale to another?

Say that I were to build a product, and I do a nice job to make it internationalizable. The product is a success, and it is localized into various languages. Next, a customer in a far-away place sends an email complaining about the server failing to start up. For my convenience, he attached the log file to it. Since I did a good job writing decent error and logging messages, I expect it to be no problem to diagnose what's going wrong. But oops, since I did such a nice job internationalizing the product, the log is in some far-away language, say Japanese! Now what?

Let me give exemplify the example. Let's say that the log contains this entry:
(F1231) Bestand bestelling-554 in lijst bestellingen kon niet geopend worden: (F2110) Een verbinding met 
server 35 kon niet tot stand gebracht worden.
Looks foreign to you? (It doesn't to me, but I would have great problems if this example were in Japanese).

Fortunately, through the error codes, we could make an attempt to figure out what it says by looking up the error codes. However, if there are many log entries, this becomes a laborious affair. Would it be possible to obtain an English log file without tedious lookups and guess work?

I think there are a few different approaches to this problem:
  1. always create two log files: an English one and a localized one.
  2. store the log in a language-neutral form, and use log viewers that render the log in a localized form
  3. try to automatically "translate" the localized log file into English
The trouble with the first two approaches is that in Java localization happens early rather than late. Let me explain what I mean  by that. If you have a compound message as in the example, at each point that information is added to the message, the message is localized. The example above could have occurred through the following code sequence:
try {
throw new Exception(msgcat.get("F2110: Could not establish connection with server {0}", serverid));
} catch (Exception e) {
String msg = msgcat.get("F1231: Could not open file {1} in directory {0}: {2}", dir, file, ex);
throw new IOException(msg, ex);
The exception message is already localized when it is thrown. E.g. in the catch-block, there is already no language neutral message anymore. It would have been nice if there were a close cousin to the Object.toString() method: one that takes the locale: toString(Locale) and if the Exception class would take an Object instead of limiting itself to a String.

In a previous product where I had more control over the complete codebase, I approaches this problem by introducing a specialized text class that supported the toString(Locale) method, and Exception classes that could return this text class. This solution was also ideal for storing text in locale-neutral form in a database, so that different customers could view the data in different locales.

There is a kludgy work-around: we could change the  msg.get() method so that it returns a String that is locale neutral rather than localized. A separate method would convert the locale neutral String into a localized String, e.g. msg.convert(String, Locale). This method would have to be called any time a String would be prepared for viewing, e.g. in the logging  for a localized log.

In the products that I am currently working on, these approaches to support locale-neutral strings are not feasible because they would require widespread. So let's take a look at option 3.

Given the resource bundle
F1231 = Bestand {1} in lijst {0} kon niet geopend worden: {2}
F2110 = Een verbinding met de server {0} kon niet tot stand gebracht worden.
F1231 = Could not open file {1} in directory {0}: {2}
F2110 = Could not establish connection with server {0}
let's see if there is a way to automatically translate the message
(F1231) Bestand bestelling-554 in lijst bestellingen kon niet geopend worden: (F2110) Een verbinding met 
server 35 kon niet tot stand gebracht worden.
(F1231) Could not open file bestelling-554 in directory bestellingen: (F2110) Could not establish a
connection with server 35.
I think it is possible to build a tool that can do that. The tool would read in all known resource bundles (possibly by pointing it to the installation image, after which the tool would scan all jars to exttact all resource bundles), and translate them into regular expressions. It would have to be able to recognize error codes (e.g. \\([A..Z]dddd\\) ) and use these to successively expand the error message into its full locale neutral form. In the example, the neutral form is:
[F1231, {0}=bestellingen, {1}=bestelling-554, {2}=[F2110, {0}=35]]
The neutral form then can be easily converted into the localized English form.


Suresh said...

Aren't you speaking essentially about reverse localization ? To get ASCII messages from a localized one. Why the name re localization ?

Frank Kieviet said...

I used the term "relocalization" because the goal is to translate a set of messages from one locale to the other, e.g. from Japanese to English.

Suresh said...

Just google for relocalization and reverse localization you'll get totally different meanings, my point is this may confuse some reader who already are familiar with the terms.

Frank Kieviet said...

I see your problem with the term relocalization. Thanks! I'll change it to "translation".