Initial release.
This document introduces TransactionKit, a C library for lockless multi-reader, multi-writer transaction capable hash tables. In addition to the C library, support for Objective-C is available in the form of two classes: TKDictionary and TKMutableDictionary. Both are subclasses of their Foundation counterparts, but use TransactionKit to provide the storage for keys and objects making them safe to use from multiple threads.
In its current form, TransactionKit is loosely modeled after the OpenStep Foundation / Mac OS X Cocoa Foundation C-based NSMapTable. In fact, there is a wrapper API for the C-based NSMapTable and NSHashTable that is nearly identical to their Foundation equivalents. In addition to this, there is a NSMutableDictionary compatible Objective-C object that provides easy access to TransactionKits features.
The core of TransactionKit is a hash table which allows for storing and retrieving key / value pairs. The difference between TransactionKits hash table and other hash tables is TransactionKits hash table allows for lockless, concurrent read and write access to the hash table by any number of threads.
This is done by using a handful of common atomic operations such as increment, add, decrement, logical AND, logical OR, and Compare and Swap. In practice, for most architectures / higher level abstractions, these are mostly just fancy Compare and Swap, or CAS, operations in disguise. Another important atomic primitive is the singly linked list, which is almost always a fancy CAS based operation as well. This primitive allows for the atomic enqueueing and dequeueing of items that forms an "atomic push and pop stack".
In very simple terms, these primitives are used to build a data structure that is a hash table with Multi-Version Concurrency Control, or MVCC. Each access, either read or write, requires a transaction be opened and then closed when finished. Each transaction is assigned a transaction number that is unique for that table. The transaction number is nothing more than an atomically incrementing counter.
Logically, a transaction creates a snapshot of the contents of a hash table at the moment the transaction is opened. As long as the transaction remains open, that frozen-in-time snapshot version of the hash table is available. When a transaction closes, the resources used to maintain that snapshot can be released. This allows a thread to start a transaction that begins to enumerate the contents of the hash table. Then, a second thread can begin another transaction that begins to remove some items from the hash table. The first thread is unaffected because of MVCC, it still has the frozen snapshot version of the hash table from when its transaction began. The first thread can still perform a search and retrieve an item even after the second thread has deleted it from the hash table. No mutations to the hash table that occur in transactions created after the first threads transaction are visible to the first thread. If a third thread begins a transaction, its snapshot version of the hash table will not contain the deleted items because its snapshot was created after the second threads snapshot.
In its current form, TransactionKit does not provide a means for completely serializable transactions. In fact, the current transaction isolation level provided by TransactionKit is extremely simple, essentially a Mutated At Transaction Number <= Current Transaction Number, which can produce some non-deterministic, non-repeatable read results if the mutating transaction performs a rollback. The hash table itself remains completely consistent at all times, however. For right now this is "good enough" and is perfectly useable for many common uses.
Transaction isolation is fairly relaxed. The goal has not been to achieve perfect, serialized transaction isolation, but to provide a degree of isolation thats useable in practice. This means that there are cases where the contents of the hash table for a given transaction may change slightly from moment to moment under specific circumstances. Take, for example, two transactions. The first, earlier transaction begins to remove some items from the table before the later one begins. Then, if the later transaction begins to enumerate the contents of the table before the earlier transaction commits or rolls back, if the later transaction enumerates the table again, it may enumerate a different set of results depending on whether or not the first transaction rollsback. If the earlier transaction does rollback, the items that the earlier transaction removed up to the point when the later transaction began may suddenly "appear" in the results of subsequent enumerations of the later transaction. As is often the case, there are trade-offs in terms of performance and the degree of concurrency possible.
To achieve a tighter degree of isolation would probably require a much more complicated mechanism for conflict resolution at transaction commit time, including support from the application using TransactionKit to try to perform all the steps again of the transaction that just failed to commit properly. This significantly increases complexity all around, and for typical uses of simple hash tables, is probably not required by most applications. In terms of SQL transaction isolation levels, TransactionKit is probably closest to Read Uncommitted, or in other words the lowest degree of isolation possible. Again, in practice, this is usually more than sufficient.
Keeping with the "good enough for now" theme, tracking which transactions are outstanding is currently very simple. In short, when there are no transactions open on the table, the transaction number that closed without any transactions open is used to mark the lowest open transaction number which can possibly still be referencing data in the hash table. In practice this simple system is sufficient for most uses. If the difference between the lowest open transaction number and the current transaction number rises above a certain threshold, TransactionKit will heuristically yield a threads time slice once or twice before acquiring its transaction number in the hope that this will allow other threads to finish their transactions and thus increase the odds that a transaction will close with no open transactions. Since most transactions tend to be short lived in practice, this tends to work quite well.
TransactionKit is broken down in to two basic components:
This was done in an effort to, hopefully, may the basic functionality portable to more platforms. The core library is written in C. While there is some Objective-C in the core library, it's a trivial amount overall and fairly easy to remove completely if required.
Although no formal API documentation is currently available, TransactionKit provides several API interfaces which are essentially clones of pre-existing API's. Therefore, for the time being, you are referred to the Foundation documentation as an interim reference.
The simplest way to make use of TransactionKit is to use the Xcode project to build the TransactionKit.framework. At this time, due to the experimental nature of TransactionKit its complicated multithreading nature, users are expected to be able to integrate a third party framework, or any of the other targets available, in to their application on their own, without guidance. Realistically, if linking to the built framework or static libraries is beyond your abilities, you probably should not be experimenting with TransactionKit. Debugging multithreaded programming can be extremely difficult and require considerable skill, and linking a library to your code should not be any problem for you.
Objective-C support is provided by:
The two classes provided, TKDictionary and TKMutableDictionary, are fairly complete clones of their Foundation equivalents. TKDictionary is a subclass of NSDictionary, and TKMutableDictionary is a subclass of TKDictionary. Both should be drop in replacements for their base foundation equivalents. Notable highlights include:
While most of the low level storage and retrieval methods have been written from scratch to use TransactionKit, some of the higher level functions make use of their NSDictionary counterparts, such as - description, + dictionaryWithContentsOfFile:, and NSCoding support. These are typically implemented by creating a NSDictionary clone of the current TKDictionary object, and then calling the appropriate method on the cloned dictionary.
Features not implemented or supported:
At this time, there is no addition API methods added, the objects are straight re-implementations of their Foundation base objects. Support for transactional access to the contents of the dictionary is not exposed, although transactions are used internally when accessing or making changes to the collection.
When enumerating the contents of a TKMutableDictionary via any of the enumeration methods, the enumeration itself is wrapped in a transaction. The consequence of this is that the dictionary itself can then mutate without effecting an enumeration that is in progress. This allows other threads to continue using, and mutating, the dictionary along with the thread enumerating the dictionary to mutate the dictionary in the middle of enumeration. Mutating a NSMutableDictionary while it is being enumerated is illegal, and typically if you would like to make changes to the dictionary based on the enumeration, you must make a copy of the dictionary and perform the mutations on the copy.
The example code below demonstrates Objective-C 2.0's fast enumeration of a NSDictionary that is mutated in the middle of the enumeration:
When executed, the program generates an exception and aborts:
The same example code, when using a TKMutableDictionary allows the dictionary to be mutated during the enumeration. The enumeration is not effected in any way, even though in the example all the keys and objects are removed from the dictionary on the very first key that is enumerated:
When executed, no exception is generated, and all the keys that were present in the dictionary at the start of the enumeration are enumerated. At the end of the enumeration, the contents of the dictionary are logged, indicating that it is in fact empty:
While just an illustrative example, it shows that a dictionary can be safely enumerated by a thread even while other threads continue to use and mutate the dictionary at the same time.
In addition to the TKDictionary object, equivalents for Foundations NSMapTable and NSHashTable C based API are also provided in the form of TKMapTable and TKHashTable. These should be drop in replacements, probably only requiring a namespace prefix change to TK to use.
Below are three tables containing some benchmark speed results comparing TransactionKits TKMapTable to Foundations NSMapTable. In the first table, NSInteger keys and values are stored using NSIntegerMapKeyCallBacks and NSIntegerMapValueCallBacks, and their TransactionKit equivalents. This represents a rough approximation of the maximum speed possible since the operations on the keys and values to be stored are very fast, typically only a single instruction or two. There is also zero memory allocation maintenance overhead for storing raw integers.
In this particular case, the overhead required for TransactionKits concurrent access is greater than the time it takes to perform the simple operations on NSIntegers required to store, compare, and retrieve them from the hash tables.
Type | Entries | Ops / sec | Relative to TK | Relative to NS | Threads | CPUs | Machine |
---|---|---|---|---|---|---|---|
TKMapTable | 4096 | 842,917.426 | 1.000 | 0.355 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
NSMapTable | 4096 | 2,368,989.003 | 2.810 | 1.000 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
The next two tables store Objective-C NSNumber objects in the NSMapTable structures. In the first table, a maximum of 16 entries are allowed in the table at any time. In the second time, up to 4096 entries are allowed. Objects require significantly more overhead for their basic operations, such as comparisons and retain / release memory allocation maintenance.
When using a NSMapTable, a OSSpinLock was used to ensure that only one thread was modifying the table at a time. Since an object needs to be sent a retain message when stored in the table, and a release when removed, these messages must be sent when the table is exclusively locked. Also, when retrieving a value from the table with a key, the keys must be sent an isEqual: message, also while the table is exclusively locked.
Since no locking is required to use a table with TransactionKit, these operations can happen in parallel among all the threads. All threads are always performing work.
Type | Entries | Ops / sec | Relative to TK | Relative to NS | Threads | CPUs | Machine |
---|---|---|---|---|---|---|---|
TKMapTable | 16 | 311,884.048 | 1.000 | 1.356 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
NSMapTable | 16 | 229,905.311 | 0.737 | 1.000 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
Type | Entries | Ops / sec | Relative to TK | Relative to NS | Threads | CPUs | Machine |
---|---|---|---|---|---|---|---|
TKMapTable | 4096 | 256,581.197 | 1.000 | 1.239 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
NSMapTable | 4096 | 207,036.430 | 0.806 | 1.000 | 16 | 1 | 17" 1.5GHz G4 PowerBook |
Theoretically, the advantages of TransactionKit should continue to improve with additional CPUs. This is because the lock required to safely use a NSMapTable from multiple threads allows for only a single thread, and therefore a single CPU, to access the table at a time. TransactionKit allows all threads on all CPUs to access the table at the same time, without any blocking. Unfortunately, I do not have a multi-CPU machine to benchmark multi-CPU performance.
A fundamental problem for all concurrent data structures is how much of the data structure can be changed atomically, or indivisibly. This is almost always limited to the natural word size of the CPU, which in practical terms means either 32 or 64 bits at a time. Since data structures almost always consists of more than one natural word size, this presents several difficult obstacles if several threads and CPUs are going to be accessing and updating a data structure at the time while maintaining a logically consistent view of the data structure among all the threads.
Existing concurrent hash table designs overcome this problem in a number of ways. Hash tables generally fall in to one of two classes, chaining or open addressing. In chaining, a hash collision is dealt with by chaining together all the items that share a common hash bucket in to a list. A lookup requires a linear scan of the hash chain list, which is hopefully kept to a minimum by using a hash function that evenly distributes the items to be stored among all the hash buckets, ideally with only one item per hash bucket. Open addressing stores only one item per hash bucket and hash collisions are dealt with by adding a deterministic amount to the hash key and checking the new hash keys resulting hash bucket, repeating the process as necessary or until all possible hash buckets have been searched.
Open addressing is popular in concurrent hash tables because a hash bucket can be a simple pointer to a more complicated data structure. Adding an item to the hash table is a simple atomic Compare and Swap of an empty hash bucket location. If the swap fails because another thread filled the hash bucket then move on to the next hash bucket and try again.
Deleting items consists of tombstoning a hash bucket. For example, the more complicated data structure the hash bucket points to can contain a liveness flag. When set, future lookups will skip the hash bucket as if it didn't exist and move on to the next hash bucket. Once a hash bucket is live, the item the hash bucket refers to could be in use by any thread at any time. It's not possible to deallocate a hash bucket and the item it points to in an atomic fashion since any thread could be using 'some part' of the hash buckets data structure or item it points to. As a consequence, these types of hash tables are not well suited to workloads that remove many items from the hash table over its life time. Strategies to deal with the problem vary, but a common solution is when the number of tombstones crosses a threshold, the hash table is exclusively locked and the tombstones are purged from the hash table.
TransactionKit takes a novel approach. Alterations to the hash table data structure that require updating more than a single natural word size worth of state in an atomic fashion are broken down in to their individual steps that can be performed atomically. Each individual step is tagged with the transaction number that it was performed, and the next step can only be performed when there are no transactions outstanding that can possibly refer to the transaction number used to perform the step. This allows complex mutations to the hash table data structure to occur in a logically consistent fashion that is coordinated among all threads.
Take the case of removing an item from the hash table. A transaction must be opened, and in that transaction an item is logically removed from the hash table. Threads in transactions that start after the transaction that performed the remove will still see the data structure in the hash chain linked list, but they can determine by comparing transaction numbers as to whether or not the item in the hash chain should be ignored.
Once the lowest open transaction number is greater than the transaction number that performed the removal, it can begin the dequeue process. In a single linked list with three items, dequeueing the item in the middle is the most difficult to do atomically. This is because it involves changing the pointer of the next item in the list for the first item to the value of the next item pointer from the second item, or in other words changing the first items pointer for the next item in the list to that of the third item in the list, and then freeing the resources used by the second item. The difficulty comes from the fact that other threads may be examining the second items data structure because they are traversing the list, and so it may still be live and in use by some thread.
Because of MVCC, however, TransactionKit can safely perform the first step of updating the first items pointer for the next item in the list to point to the third item while recording the transaction number that this step was performed. Other threads that happen to be traversing the list during this time will always safely skip over the second item. In the case where a thread happens to read the next item in the list as the second item, that thread will compare the transaction numbers used to perform the initiating removal and proceed on to the third item, safely skipping it. If that threads transaction then traverses the list again, but after the first items next item pointer swap has occurred, the result is effectively the same: the second item is safely skipped.
Once the lowest open transaction number is greater than the transaction number used to perform the pointer swap, no executing thread can possibly still contain a reference to the deleted node. The node, and all of its resources, can now be safely deallocated. Any thread that continues to use the item pointed to by the node about to be deallocated must have made provisions to ensure it remains live. In the case of Objective-C when the item is an object, the retrieving thread will have followed the proper retain and release methodology that would ensure that the deallocation of the hash chain node does not cause premature object deallocation. TransactionKit will also automatically retain and autorelease a retrieved object to ensure that the object remains valid until at least the retrieving threads current NSAutoreleasePool is popped.
TransactionKit is distributed under the terms of the BSD License, as specified below.
Copyright © 2008, John Engelhart
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.