[ZODB-Dev] Problem with handling of data managers that join transactions after savepoints (original) (raw)

Jim Fulton jim at zope.com
Mon May 10 16:41:17 EDT 2010


The following is complex. Unless you're a ZODB developer or nearly so, you may want to skip this. :)

I'm looking into a problem we've run into and found a problem with the way savepoints are handled that was exposed by recent tightening of the way transaction-related methods are called.

The problem arises from the way the data manager abort method:

def abort(transaction):
    """Abort a transaction and forget all changes.

    Abort must be called outside of a two-phase commit.

    Abort is called by the transaction manager to abort transactions
    that are not yet in a two-phase commit.
    """

is called. As the documentation says, this is called when a transaction is aborted. Any data manager called should assume that it is no longer joined to a live transaction. ZODB's Connections assume exactly this.

When a data manager joins a transaction after there have been savepoints in the transaction, there needs to be a way to handle rolling back to the older savepoints. It's too late to ask the data manager for a data-manager savepoint. In this case, a special data-manager savepoint is created that calls abort on the new data manager whenever an older savepoint is rolled back. This use of abort is at odds with the documentation of the abort method, because rollng back a savepoint doesn't abort the transaction.

The problem for ZODB, and presumably, for other data managers is that when abort is called, the datamanager (Connection) markes itself as needing the join the transaction. If data are modified in the connection, the connection will join again, at which point the data manager will be doubly joined. When the transaction is committed, the transaction methods, tpc_commit, commit, tpc_vote, and tpc_finish are called multiple times. In ZODB 3.10, the second tpc_begin call leads to an error, because it had been called before.

(If a pre-ZODB 3.10 ZEO client talked to a ZODB 3.10.0a1 server, the multiple calls led to a commit lock being held forever on the server, preventing further commits. ZODB 3.10.a2 detects the multiple calls and raises an error at the second vote call, causing the client transaction to fail and the server to continue committing other transactions.)

Among the ways to fix this:

A. Change transaction._transaction.AbortSavepoint to remove the datamanager from the transactions resources (joined data managers) when the savepoint is rolled back and abort called on the data manager. Then, if the data manager rejoins, it will have joined only once.

Update the documentation of the data manager abort method (in IDataManager) to say that abort is called either when a transaction is aborted or when rolling back to a savepoint created before the data manager joined, and that the data manager is no longer joined to the transaction after abort is called.

This is a backward incompatible change to the interface (because it weakens a precondition) that is unlikely to cause harm.

B. Disallow joining a transaction after there are savepoints.

This makes a common use case more complicated. Suppose I want to do a batch of work made up of work items. I want to commit the batch as a whole and want to skip items when there are problems. In pseudo code this looks like:

  for item in items:
      savepoint = transaction.savepoint()
      try:
          ... do the item of work
      except:
          ... there was a problem
          savepoint.rollback() # skip the item and keep going
  transaction.commit()

Note that the first savepoint is created before we do anything. Disallowing joining after savepoints would make this scenario a lot more complicated.

C. Add a new data manager method to handle this use case. The semantics of the new method is that the data manager should discard any changes but should not rejoin the transaction.

If the data manager doesn't support this new method, then an error is raised if a savepoint is rolled back that was created before the data manager joined.

This is more backward incompatible than A and compicated data managers.

D. Change the transaction join method to ignore multiple joins of the same data manager. This would just hide a deeper problem, which rarely turns out well in the long term.

I plan to implement A soon if there are no objections.

Unless someone somehow convinced me to do D, I'll also add an assertion in the Transaction.join method to raise an error if a data manager joins more than once.

Jim

-- Jim Fulton



More information about the ZODB-Dev mailing list