Design Considerations When Using Transactionality

Neil Stevenson | Sep 8, 2022

Database transactions often underpin online business transactions. The database transactions are the brakes if we compare business transactions to race cars. Just as the fastest driver is the one that uses the brakes the least, the quickest business transactions are the ones that depend on database transactions the least.

And, of course, you can’t eliminate brakes, as you still need to worry about safety. So, the goal is to use the right amount of braking (or, in our case, the right amount of database transactions).

This blog post will examine how your application can be faster without sacrificing safety.

Action and transaction

We will follow a familiar scenario, moving money between bank accounts.

The “Action” is the business event, the logical view.

Move $10 from account A to account Z.

The “Transaction” is the technical event, the implementation. This may include commit or rollback, be ACID or BASE, and is a way to ensure correctness.

The coding choice

The outcome of the action is that account A has $10 less, and account Z has $10 more.

A transaction is required if two conditions hold.

  1. It is implemented as a two-step operation.
  2. You wish it to appear as a one-step operation. (Atomicity)

Condition 1 is just the choice of book-keeping method.

Condition 2 is usually described as a requirement, but it may just be a wish.

Let’s review the alternatives.

Single-entry and double-entry accounting

As background, consider the classic accounting systems.

Single-entry dates to 3000BC; double-entry is newer, from 1494AD.

The single-entry system might have this list of actions for $50 deposits and the $10 transfer from above.

Item 1 : Account A : $50 deposit
Item 2 : Account Z : $50 deposit
Item 3 : Account A : $10 transfer : to : Account Z

In the double-entry system, actions are recorded twice. Each account has its list of actions.

Account A:
Item 1 : $50 deposit
Item 2 : $10 transfer : to : Account Z

&

Account Z:
Item 1 : $50 deposit
Item 2 : $10 transfer : from : Account A

Double-entry is easier for humans as the volume of actions increases. Computer systems typically mirror business processes, and the implementation where one action requires two data updates naturally follows.

Implementation 1 – double-entry with transactions

Following from above, what is wrong with the classic approach, the ACID transaction?

Here it would be:

Start transaction.
Decrement account A balance by $10.
Increment account Z balance by $10.
Commit.

It’s an all-or-nothing approach and is very appealing.

Both accounts show their previous balance; then both show the new balance.

In the rare event of some IT crash, the transaction is rolled back.

So let’s review two things that are wrong with this approach.

Speed

Update access to account A and account Z needs to be suspended for everyone else for the duration of updating both data records.

If you were to code this yourself, you’d use locks. Locks stop other processing, hence impact on application speed.

Nothing else may be trying to update accounts A or Z at this time, so you might think nothing is delayed. But there is still the time cost to lock and unlock.

A transaction essentially just locks, handled for you.

Correctness & Isolation

Imagine while we ran the above transaction that someone else ran a query summing account balances, a scan of all account records.

First, you might start your transaction.

Then the query might obtain the balance of account A. Your transaction is incomplete, so the query gets the old value for account A ($50).

Then your transaction completes successfully.

Then the query might obtain the balance of account Z. Your transaction is complete, so the query gets the new value for account Z ($60).

So the query returns $110 instead of $100, even though your update was transactional.

Here the transaction has “write” isolation. Both writes happened atomically, so the transaction has correctness. But a concurrent read from another place has incorrectness. “Read” isolation would stop the concurrent read while the transaction runs, meaning the data can only have one user at a time, which is unacceptable.

There are many other such scenarios exposing logical flaws in transactions.

Implementation 2 – double-entry without transactions

Imagine we removed the transaction wrapper from the above, so two independent updates to accounts for the one action.

How do we win? What do we lose?

We win obviously on speed. There are now no locks needed.

Concurrent queries are no better or worse than before. The query may still return $100 or $110 as a race condition.

What we think we have lost is guaranteed consistency, but have we?

Guaranteed Consistency

The worry in the above scenario is account A is updated, and the system fails, so account B isn’t updated.

A transaction stops this, despite its other problems, as already noted.

What a transaction guarantees are immediate consistency.

Eventual consistency will frequently be acceptable.

We would expect failures to be rare. We would expect to know about them promptly. So on any such failure, we just run a one-off process to complete any half-done business action.

Implementation 3 – single-entry

The third approach would be to implement single-entry accounting, as machines can handle this at scale even if humans can’t.

Reviewing what we saw before, this is just an event journal!

Item 1 : Account A : $50 deposit
Item 2 : Account Z : $50 deposit
Item 3 : Account A : $10 transfer : to : Account Z

Each line item is a single line, either written or not.

We have consistency and no need for locks.

If we wish to know the current balance of account A, it’s just a query against the event journal. But if there is a lot of data, this query may take enough time to run that it is noticeable to the human eye. So we might instead go with a materialized view.

What is stored?

To review, what is stored with the different approaches?

We will store records for each account and all their transactions.

For double-entry, the account holds the current balance.

For single-entry, the account may not hold the current balance; we might calculate it when needed from the transactions.

Or, for single-entry, we might refresh the account with a balance using a materialized view.

Materializing a view

For our single entry, we might refresh the balance continuously or periodically.

A stream processing job could observe the event journal. When a new action is written, the affected accounts can be updated.

Or, a scheduled task could scan the event journal to do a similar thing.

For either mechanism, we need to consider failure. If we replay events, we need to know whether or not to apply them. Events are sequential, so this is as simple as recording the last sequence number that the balance relates to.

Consistency once again

In many of the approaches above, when the action has been applied, the balances for account A and account Z do not update at precisely the same time. One is soon after the other.

From a bank customer’s perspective, this is fine.

It wouldn’t be unknown for account A and Z owners to know each other since one sends money to the other. Account A’s owner would see the cash remaining. If account Z’s owner doesn’t see it arrive immediately but does see the funds come pretty soon, that’s ok. More than a few minutes or even seconds would be poor by modern standards.

Reconciliation is the safety net. A process or process applies the actions to the accounts. If they fail, we can rerun them. But it’s only software, so there should always be distrust. A diligent bank will have cross-checks running anyway to ensure everything has been applied at least by the end of the working day.

Summary

Transactions slow down processing. Transactions do not ensure correctness in the broader sense. They run correctly but do not guarantee that others do not see inconsistent data.

If you choose a single-entry approach, you don’t need transactions. If you choose a double-entry approach, you might or might not need transactions. It depends on the consistency model you can agree upon with the business user.

Relevant Resources

View All Resources
About the Author

Neil Stevenson

CTO, Hazelcast Platform

Neil is a solution architect for Hazelcast®, is the industry leading in-memory computing platform. In more than 30 years of work in IT, Neil has designed, developed and debugged a number of software systems for companies large and small.