Tuesday, January 16, 2007

JMSJCA, a feature rich JMS Resource Adapter, is now a java.net project

JMSJCA is now a java.net project. It can be found here: http://jmsjca.dev.java.net.

The project is currently used in Java CAPS, JMS Grid and the JMS BC as part of the open-jbi-components project.

The connector can be used as a J2EE 1.4 Resource Adapter, but its libraries can also be used  as an abstraction layer to JMS servers from  non J2EE-code. As such, the adapter acts like a library that hides the complexities of transactions, concurrency, connection failure detection, JMS server implementation idiosyncracies, etc. That is how it is used in the JMS BC as part of the open-jbi-components project.

Sunday, January 14, 2007

Logless transactions

A few months ago, in my blog entry Transactions, disks, and performance I went into the importance of minimizing the number of writes. Transaction logging is one of those cases where minimizing the number of writes greatly enhances performance. In this entry, I'll describe a way to avoid transaction logging altogether.

What is transaction logging? Transaction logging refers to persisting the state of a two-phase transaction so that in the event of a crash, the transaction can either be committed or rolled back (recovered). I won't go into the details of what XA is; more information about XA transactions can be found elsewhere, e.g. in Mike Spille's XA Exposed.

Let me illustrate what recovery is using a "diagram". Consider an XA two phase transaction with three Resource Managers (RMa, RMb, and RMc). To indicate what happens at what time, I'll put all actions in a table; each row corresponds to a different time.

time
RMa
RMb
RMc
Coordinator
t1
start(xid1a, TMNOFLAGS)



t2

start(xid1b, TMNOFLAGS)

t3


start(xid1c, TMNOFLAGS)
t4
end(xid1a, TMSUCCESS)



t5

end(xid1b, TMSUCCESS)

t6


end(xid1c, TMSUCCESS)
t7
prepare(xid1a)



t8

prepare(xid1b)

t9


prepare(xid1c)
t10



log
t11
commit(xid1a, false)



t12

commit(xid1b, false)

t13


commit(xid1c, false)
t14



delete from log

At t10 the transaction manager records the decision to commit to the log. Let's say that the system crashes after t10, say between t11 and t12. When the system restarts, it will call recover() on all known Resource Managers and it will read the transaction log. In the transaction log it will find that xid1x was marked for commit. Through recover() it will find that xid1b and xid1c are in doubt. It knows that these two need to be committed because of the commit decision in the log.

What happens if the system crashes before the commit decision is written to the log, for example between t8 and t9? Upon recovery, the recover() method of RMa, RMb and RMc return xid1a and xid1b (but not xid1c because prepare was not called on RMc yet). The transaction manager will rollback RMa and RMb because no commit decision was found in the log.

SeeBeyond's Logless XA Transactions

Let's take a look at the recover() method on the XAResource. This method returns an array of Xid objects. Each Xid object holds two byte[] arrays. These two arrays represent the global transaction ID and the branch qualifier. They are typically random numbers picked by the transaction manager. The Resource Managers that receive these Xids should use these objects as identifiers and return them in the recover() method unmodified.

At SeeBeyond, Jerry Waldorf and Venugopalan Venkataraman came up with an idea to use the storage space in the byte[] arrays of the Xid as a way to persist the transaction state. Here's how it works. Let's modify the above example by removing transaction logging:

time
RMa
RMb
RMc
Coordinator
t1
start(xid1a, TMNOFLAGS)



t2

start(xid1b, TMNOFLAGS)

t3


start(xid1c, TMNOFLAGS)
t4
end(xid1a, TMSUCCESS)



t5

end(xid1b, TMSUCCESS)

t6


end(xid1c, TMSUCCESS)
t7


prepare(xid1c)

t8

prepare(xid1b)

t9
prepare(xid1a)


t10


commit(xid1c, false)
t11

commit(xid1b, false)

t12
commit(xid1a, false)



A commit decision is still being made, but this decision is no longer persisted in a separate transaction log. In stead, it is persisted in xid1a. If the system finds xid1a upon recovery, it knows that a commit decision was made. If it doesn't find xid1a, it knows that a commit decision was not made. Note that the order in which both prepare and commit are called on the three Resource Managers is very important.

As in the first example, if the system crashes before a commit decision has been made, it will rollback any resources upon recovery. E.g. if the system crashes between t8 and t9, it will encounter xid1c and xid1b and will call rollback() on these because it cannot find a record of a commit-decision for xid1, i.e. it cannot find xid1a. Hence, xid1b and xid1c need to be rolled back.

If the system crashes after a commit decision has been made, for example between t10 and t11, it will find xid1b and xid1a. Since xid1a signifies a commit decision, both xid1b and xid1a should be committed.

So far so good. But how does the transaction manager know that if it encounters xidb it should look for xida to figure out if a commit decision was made? This is where the transaction manager uses the byte[] of the Xid: it stores this information in one of them.

Complicating factors

A problem in this scheme occurs when the prepare(xid1a) method returns XA_RDONLY. If that happens, commit(xid1a, false) cannot be called, and RMa will not return xid1a upon calling recover(). Recall that xid1a had special significance! Hence it is important to order the Resource Managers such that the first one on which prepare() is called, is both reliable and will not return XA_RDONLY. However, in normal EE applications, the application prescribes in which order resources are enlisted in a transaction. Hence, to use this logless transaction scheme, the application server either needs to be extended with a way to specify resources a priori, or the application server needs to be extended with a learning capability so that it knows which resources are enlisted in a particular operation so that it can pick the right resource manager to write the commit decision to.

The SeeBeyond logless transaction approach is one of the ways that transaction logging can be made less exensive. In a future blog, I'll cover additional ones.