Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems

Peter Druschel and Gaurav Banga (Rice), 1996

Summary. Network servers can experience slowdown and even livelock under overload. The authors present a network subsystem architecture which alleviates (or eliminates if the NI is programmable) these overload problems via a combination of early-demultiplexing and lazy receiver processing.


More Detail

The proposed architecture depends upon both early demux and LRP to provide increased throughput under increased load. Below describes how each can result in one or more pathological conditions found in traditional network subsystems including

Aspects in traditional systems that give rise to problems include (note: this dissection of the problem/solutions isn't perfect but conveys the main points in the paper):

TCP packets for whose receivers are not blocked are processed by a special application thread. The priority of the special thread reflects the application process' priority. Timely processing of TCP packets is required for efficiency (to keep the pipe full).

The send side of communication in a traditional system isn't as unfair: the processing of the packet to be sent is done in the context of the process that called send() up until the data is copied to the NI buffer. Packets queued in the NI queue are removed and transmitted in the context of the NI interrupt handler.

Regarding flow control: flow control (window size regulation) and congestion control (exponential decrease) in TCP regulates only traffic of already established connections. Network servers still vulnerable to SYN packet attacks.

Firewalls extablish a new TCP connection for each flow that passes through it.

LRP trivially depends on early demux: in order to know priority and when to schedule, must know owner.

Authors implement NI-demux in NI interface using firmware developed in Cornell's Unet project and soft demux in the network driver's interrupt handler.

If a fragment of an IP packet arrives for which the first fragment has not yet arrived, packet put on special NI queue. Uncommon case.

LRP only increases UDP delay only if the CPU is idle and receiving process is blocked on disk access (or similar thing) when packet arrives. To solve this problem, check in idle loop for packet receipt. I believe the time added is just the packet-processing time that would have happened right away in the case of traditional subsystem (see graph in paper margin).

ARP, RARP, ICMP, IP forwarding charged to special daemon processes that act as proxies for a particular protocol.

Performed a number of experiments. Showed their architecture

The performance section is worth rereading.