=head1 Efficient Locking for Concurrent Operations on B-Trees Philip Lehman (CMU) & S. Bing Yao (Purdue), 1981 B Lehman & Yao present a locking algorithm that provides high concurrency in that at most 3 nodes of the tree are locked at any one time on a write & no nodes are locked on a read. The algorithm requires a slight change to the b-tree, namely the addition of a right neighbor pointer (referred to as the "link pointer") and a key which names the maximum value stored under this node (the "high key"). Correctness is guaranteed partially through a well-ordered locking protocol. =head1 More Detail The modified b-tree is called a b-link-tree because it contains an additional link pointer in each node of the b-tree. The link pointer points to the right neighbor of the node and is used to maintain correctness of the tree during split operations. This leads to greater concurrency because reads need never lock any nodes. The following algorithms are described and proofs of correctness are given. =over 4 =item * B The search algorithm proceeds as a b-tree algorithm would except that link pointers are sometimes followed when necessary. =item * B Insertion into a "safe" node (a node which is not at its splitting threshold of 2k elements) is done in the same way as in normal btrees: the node is locked, split & unlocked. In "unsafe" nodes, the node is locked, split into two (we'll call these pages A & B where the values in A are < the values in B). While the split is happening in a private copy of the page, other readers are free to read the public copy of A. All of the link pointers & high keys are set, then B is written back to disk. At this point (when B is written but A has not been written), correctness of the tree is still insured because no page but the private copy of A has a pointer to B. Finally, A is written to the disk and the lock on A is released. Next, the parent pointer is modified by adding a pointer to the new B page. If necessary, the node is split. The worst case for number of locks held (3) occurs when we need to traverse the parent node(s) because it has split since we last looked at it. =item * B Deletion is not dealt with in the sense that underflow is ignored. As values are deleted from a node, a node that drops below k values is not merged with another node. If disk utilization dips below an acceptable amount, the authors suggest that the index be rebuilt offline. =back B Punts on deletion.