Faster IP Lookups using Controlled Prefix Expansion

V. Srinivasan and George Varghese (WUSTL), 1998

Summary. The authors suggest three techniques that, when applied to existing router lookup algorithms, yield lookup times of 226nsec (about twice as fast as known algorithms) and worst case insert/delete times of 2.5ms.

More detail

The Internet Lookup Problem: Map incoming IP addresses to outgoing interface. Requires more than just a one-to-one mapping because of the millions of IP addresses seen. Instead, the database consists of pairs. This reduces database size AT THE EXPENSE OF requiring a more complex lookup: longest matching prefix. Ex:

USA.* -> Boston, USA.CA.* -> LA
All packets with address USA forwarded to Boston EXCEPT USA.CA packets which are forwarded to LA.
Must match the longest prefix to be correct.

Performance Model. Routing Environment: use four publicly available databases for comparisons. Implementation Model: Software: 300MHz Pentium II, NT, 512Kb L2, cache line 32 bytes. Hardware: cheap forwarding engine at a clock rate of 10 nsec (?). They map their analyses onto these models to get average and worst case lookup/insert/delete times for their lookup algorithms.

Techniques:

Controlled Prefix Expansion. Reduce the number of prefix lengths by expanding shorter prefixes to longer. By comparing more bits at a time, reduce the height of the trie/tree. This reduces the number of memory optimizations. Clearly one wants to make this small; from the paper it appears that the number of prefix lengths is chosen by targeting a certain forwarding rate/lookup time.
Picking Optimal Expansion Levels. Clearly want to make the number of prefix lengths small, but what lengths should be chosen? The goal here is to minimize the amount of memory required for the data structure.
Local Restructuring.
- Leaf pushing. Make internal nodes pointers only, data at leaves. This saves storage since internal nodes only need pointers. Cost: potentially slow rebuild on insert/delete.
- Cache alignment. Use (1) packed arrays and (2) semi-perfect hashing: settle for hash functions that have no more than 6 collisions per entry because 6 will fit in a cache line in model environment.

Miscellaneous points:

Increasing trie stride (#bits compared per READ) decreases the number of READS. Ex, 1-bit strides can require 32 READS.
Measure of lookup algorithm often in DRAM lookups since this is the bottleneck. Attempt to get the data structure into cache and perform cache alignment. Lots of interesting, known techniques touched upon.
insert/delete important because IPMA reports that the number of routing prefixes announced and withdrawn by BGP in a single day was ~1.8M -> 25/sec.
of the 1.8M above, 47k were route changes (not insertions/deletions), so route changing somewhat important. If you do the numbers, I don't see what the big point is, though.
Binary search on prefix levels (previous work): Perform a binary search through hash tables. Ex: hash key, if hash value found at root hash tree, done. Otherwise, will find a marker pointing to next level to hash on. This was not explained well at all.

Criticisms.

Presentation needs work. Many key concepts are vague (binary search on prefix lengths). Needed a picture of fixed-stride trie and variable-stride trie (not clear if variance in horizontal or vertical axis). Background work not explained (Patricia tries).