Frank Kieviet's Engineering Notebook: Logless transactions

A few months ago, in my blog entry Transactions, disks, and performance I went into the importance of minimizing the number of writes. Transaction logging is one of those cases where minimizing the number of writes greatly enhances performance. In this entry, I'll describe a way to avoid transaction logging altogether.

What is transaction logging? Transaction logging refers to persisting the state of a two-phase transaction so that in the event of a crash, the transaction can either be committed or rolled back (recovered). I won't go into the details of what XA is; more information about XA transactions can be found elsewhere, e.g. in Mike Spille's XA Exposed.

Let me illustrate what recovery is using a "diagram". Consider an XA two phase transaction with three Resource Managers (RM_a, RM_b, and RM_c). To indicate what happens at what time, I'll put all actions in a table; each row corresponds to a different time.

time	RM_a	RM_b	RM_c	Coordinator
t1	start(xid1_a, TMNOFLAGS)
t2		start(xid1_b, TMNOFLAGS)
t3			start(xid1_c, TMNOFLAGS)
t4	end(xid1_a, TMSUCCESS)
t5		end(xid1_b, TMSUCCESS)
t6			end(xid1_c, TMSUCCESS)
t7	prepare(xid1_a)
t8		prepare(xid1_b)
t9			prepare(xid1_c)
t10				log
t11	commit(xid1_a, false)
t12		commit(xid1_b, false)
t13			commit(xid1_c, false)
t14				delete from log

At t10 the transaction manager records the decision to commit to the log. Let's say that the system crashes after t10, say between t11 and t12. When the system restarts, it will call recover() on all known Resource Managers and it will read the transaction log. In the transaction log it will find that xid1_x was marked for commit. Through recover() it will find that xid1_b and xid1_c are in doubt. It knows that these two need to be committed because of the commit decision in the log.

What happens if the system crashes before the commit decision is written to the log, for example between t8 and t9? Upon recovery, the recover() method of RM_a, RM_b and RM_c return xid1_a and xid1_b (but not xid1_c because prepare was not called on RM_c yet). The transaction manager will rollback RM_a and RM_b because no commit decision was found in the log.

SeeBeyond's Logless XA Transactions

Let's take a look at the recover() method on the XAResource. This method returns an array of Xid objects. Each Xid object holds two byte[] arrays. These two arrays represent the global transaction ID and the branch qualifier. They are typically random numbers picked by the transaction manager. The Resource Managers that receive these Xids should use these objects as identifiers and return them in the recover() method unmodified.

At SeeBeyond, Jerry Waldorf and Venugopalan Venkataraman came up with an idea to use the storage space in the byte[] arrays of the Xid as a way to persist the transaction state. Here's how it works. Let's modify the above example by removing transaction logging:

time	RM_a	RM_b	RM_c	Coordinator
t1	start(xid1_a, TMNOFLAGS)
t2		start(xid1_b, TMNOFLAGS)
t3			start(xid1_c, TMNOFLAGS)
t4	end(xid1_a, TMSUCCESS)
t5		end(xid1_b, TMSUCCESS)
t6			end(xid1_c, TMSUCCESS)
t7			prepare(xid1_c)
t8		prepare(xid1_b)
t9	prepare(xid1_a)
t10			commit(xid1_c, false)
t11		commit(xid1_b, false)
t12	commit(xid1_a, false)

A commit decision is still being made, but this decision is no longer persisted in a separate transaction log. In stead, it is persisted in xid1_a. If the system finds xid1_a upon recovery, it knows that a commit decision was made. If it doesn't find xid1_a, it knows that a commit decision was not made. Note that the order in which both prepare and commit are called on the three Resource Managers is very important.

As in the first example, if the system crashes before a commit decision has been made, it will rollback any resources upon recovery. E.g. if the system crashes between t8 and t9, it will encounter xid1_c and xid1_b and will call rollback() on these because it cannot find a record of a commit-decision for xid1, i.e. it cannot find xid1_a. Hence, xid1_b and xid1_c need to be rolled back.

If the system crashes after a commit decision has been made, for example between t10 and t11, it will find xid1_b and xid1_a. Since xid1_a signifies a commit decision, both xid1_b and xid1_a should be committed.

So far so good. But how does the transaction manager know that if it encounters xid_b it should look for xid_a to figure out if a commit decision was made? This is where the transaction manager uses the byte[] of the Xid: it stores this information in one of them.

Complicating factors

A problem in this scheme occurs when the prepare(xid1a) method returns XA_RDONLY. If that happens, commit(xid1_a, false) cannot be called, and RM_a will not return xid1_a upon calling recover(). Recall that xid1a had special significance! Hence it is important to order the Resource Managers such that the first one on which prepare() is called, is both reliable and will not return XA_RDONLY. However, in normal EE applications, the application prescribes in which order resources are enlisted in a transaction. Hence, to use this logless transaction scheme, the application server either needs to be extended with a way to specify resources a priori, or the application server needs to be extended with a learning capability so that it knows which resources are enlisted in a particular operation so that it can pick the right resource manager to write the commit decision to.

The SeeBeyond logless transaction approach is one of the ways that transaction logging can be made less exensive. In a future blog, I'll cover additional ones.

1 comment:

Frank Kieviet said...: Hi Ludovic,

To your first question, how recover() knows about all resource managers: the transaction manager is called by the application server and asked to recover; it passes a list of XAResources to the transaction manager. The application server obtains this list of XAResources by going over all resource adapters that are deployed in the server, i.e. all global resource adapters and jdbc connection pools, as well as all resource adapters embedded in EAR files.

How can you prevent that prepare() returns XA_RDONLY? You would have to know about the resource managers. For instance, if you know the internal architecture of a particular JMS server, you can say with certainty that it won't ever return XA_RDONLY. On the other hand, Oracle is not a good candidate because it may return XA_RDONLY. Yes, having to know the internal details of the participating resource managers is a major drawback of this approach.

To your third question about performance: we haven't measured concurrent processing versus a serial approach; I can't say much about that. By the way, a concurrent approach has another interestering effect: the decrease in time between prepare() and commit() than throughput. It's another potential drawback of this approach.

Frank; March 13, 2012 at 9:08 PM

Frank Kieviet's Engineering Notebook

Sunday, January 14, 2007

Logless transactions

SeeBeyond's Logless XA Transactions

Complicating factors

1 comment:

About Me

Links

Most popular...

Followers

Search This Blog

Blog Archive