Friday, January 15, 2010

How to enlist and delist XA resources: trial and error

Once in a while a problem pops up with how a middleware stack supports XA. The problem I'm discussing here is the one of how to exactly enlist and delist XAResource-s with isSameRM()returning true. The XA spec is vague on this very topic, so this warrants some exploration.

First of all, what do I mean with "enlist and delist XAResource-s with isSameRM()returning true?"

Enlisting and delisting

When using an XA capable system, the connection needs to be enlisted in the transaction before it is used. A typical sequence on the XAResource, let's call it r1, is:

   1: r1.start(xid, TMNOFLAGS);
   2: ... // use (r1)
   3: r1.end(xid, TMSUCCESS);
   4: r1.commit(xid, true);

With a second resource in the mix, the call sequence may become:

   1: r1.start(xid1, TMNOFLAGS);
   2: ... // use r1
   3: r2.start(xid2, TMNOFLAGS);
   4: ... // use r2
   5: r2.end(xid2, TMSUCCESS);
   6: r1.end(xid1, TMSUCCESS);
   7: r1.prepare(xid1);
   8: r2.prepare(xid2);
   9: p1.commit(xid1, false);
  10: p2.commit(xid2, false);

The transaction is now escalated to a full two-phase transaction. But if r1 and r2 represent the same resource manager, a shortcut can be taken: one resource can "piggy back ride" on the other resource. The isSameRM() method can be used to check if this should be attempted. If it returns true, the sequence may become:

   1: // MULTIPLE ACTIVE
   2: r1.start(xid, TMNOFLAGS);
   3: r1.isSameRM(r2); // returns true
   4: r2.start(xid, TMJOIN);
   5: r2.end(xid, TMSUCCESS);
   6: r1.end(xid, TMSUCCESS);
   7: r1.commit(xid, true);

The transaction now again is a single-phase transaction, and the performance difference with the two-phase commit case is usually very significant.

This sequence doesn't work for a number of popular systems, e.g. MQSeries and Oracle. Unfortunately the XA specification, but that specification is not quite clear at all about what resource managers should be able to do, so there is to some extent some trial and error involved.

Trial and error

How should isSameRM() be used then? By testing a number of different systems, it turns out that for some systems, there can be only one enlisted XAResource active in the transaction. If a second XAResource joins the transaction, the first one should be deactivated. Here is an example:

   1: // SINGLE ACTIVE 1
   2: r1.start(xid, TMNOFLAGS);
   3: r1.isSameRM(r2); // should return true
   4: ... // use r1
   5: r1.end(xid, TMSUSPEND);
   6:  
   7: r2.start(xid, TMJOIN);
   8: ... // use r2
   9: r2.end(xid, TMSUCCESS);
  10:  
  11: r1.start(xid, TMRESUME);
  12: ... // use r1 again
  13: r1.end(xid, TMSUCESS);
  14:  
  15: r1.commit(xid, true);
A variation of this is calling TMSUCCESS instead of TMSUSPEND. In that case TMJOIN should be called instead of TMRESUME:
   1: // SINGLE ACTIVE 2
   2: r1.start(xid, TMNOFLAGS);
   3: r1.isSameRM(r2); // should return true
   4: ... // use r1
   5: r1.end(xid, TMSUCCESS);
   6:  
   7: r2.start(xid, TMJOIN);
   8: ... // use r2
   9: r2.end(xid, TMSUCCESS);
  10:  
  11: r1.start(xid, TMJOIN);
  12: ... // use r1 again
  13: r1.end(xid, TMSUCESS);
  14:  
  15: r1.commit(xid, true);
I ran these tests on a number of systems, and here is what I found:
  isSameRM() Test: multiple active Test: single active 1 Test: single active 2
STCMS yes yes yes yes
JMQ as of 4.4 yes no: throws on line 7 yes
WebSphereMQ 6 yes no: blocks on line 4 no: blocks on line 7 yes
Derby 10.5.3.0 yes no: blocks on line 4 yes yes
Oracle 11.1.0.6 yes no: blocks on line 4 yes yes
MySQL 5.1 yes no: throws on line 4 no: throws on line 5 no: throws on line 7

As can be seen, the SINGLE ACTIVE 2 code sequence works best.

As a side note, MySQL is showing surprising behavior: TMJOIN and TMSUSPEND are not supported (as documented in the MySQL documentation), so why does it bother to return true on isSameRM()? Behavior like this makes it difficult to write portable code: a container now has to provide a wrapper around the DataSource that corrects for this. It would have been much better if it simply had returned false on isSameRM().

How does this relate to what application code can do in for instance a Java EE container? That's the topic of a different blog post.

3 comments:

sysprv said...

That's great :) hope this leads to better compatibility between CAPS/GFESB and WebSphere MQ.

And this under-documented mystery: http://www.google.com/search?q=oracle-xa-recovery-workaround&ie=utf-8&oe=utf-8&

fkieviet said...

Re sysprv:

Indeed, this highlights a problem with GlassFish and MQSeries; a problem I built a workaround for in JMSJCA, but based on this information, the workaround can be made better performing.

Re Oracle recovery: you'd wish there was a CTK that would test XA implementations!

Frank

Ludovic Orban said...

I really am digging up an old subject but I wanted to add my 2 cents.

Here is what the XA spec says on the subject (chapter 3.3, 'join' paragraph, page 14):

"If multiple threads use an RM on behalf of the same XID, the
RM is free to serialise the threads' work in any way it sees fit. For example, an RM may block a second or subsequent thread while one is active."

As for the usefulness of isSameRm(), I've seen so many creative (ab)uses for that method and so many different implementations that I'm not sure anymore what it's really supposed to do.

That's another subject I could have added over there: http://blog.bitronix.be/2011/02/why-we-need-jta-2-0/

Oh well, I'm ranting again...