Fast and Flexible: Parallel Packet Processing with GPUs and Click
ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) 2013.
We introduce Snap, a framework for packet processing that outperforms traditional software routers by exploiting the parallelism available on modern GPUs. While obtaining high performance, it remains extremely flexible, with packet processing tasks implemented as simple modular elements that are composed to build fully functional routers and switches. Snap is based on the Click modular router, which it extends by adding new architectural features that support batched packet processing, memory structures optimized for offloading to coprocessors, and asynchronous scheduling with in-order completion. We show that Snap can run complex pipelines at high speeds on commodity PC hardware by building an IP router incorporating both an IDS-like full-packet string matcher and an SDN-like packet classifier. In this configuration, Snap is able to forward 40 million packets per second, saturating four 10 Gbps NICs at packet sizes as small as 128 byes. This represents an increase in throughput of nearly 4x over the baseline Click running comparable elements on the CPU.