tahoe

Congestion Avoidance and Control

Van Jacobson (LBL) and Michael Karels (Berkeley), 1988

Summary. Due to explosion of network usage, congestion problems have become severe. The problem is not within the protocol specification, but within the implementation of the protocol. The authors observed that TCP flows should obey a conservation of packet principle: a packet should not be introduced into a flow until a previous packet has left the flow. They claim that if such a principle is followed such that flows are in equilibrium (stable with a full window in transit) networks should be robust in the face of congestion. However, packet conservation can fail in 3 ways. Below are the 3 failure modes & what the authors suggest to avoid the failure modes.

the connection can fail to reach equilibrium (use slow-start to quickly get to equilibrium)
a packet is introduced prematurely (use average rtt with variance)
the connection cannot reach equilibrium due to resource limits (in times of congestion, use +increase/*decrease)

More Detail

Slow Start. The conservation of packet principle introduces a chicken & egg problem: one can only send out packets in response to an ack; yet to get an ack, one must send out a packet. Slow start is an algorithm for getting a flow up to the equilibrium point fast. The algorithm is:

introduce state variable cwnd. When starting, set cwnd to one.
when an ack is received, increase cwnd by one
sender sets window size W to MIN(cwnd, advertised window)

``Slow'' start isn't really slow: it reaches equilibrium in RlgW where R is .the round-trip time and W is the equilibrium window size.

Round-trip timing. Under conservation of packets, the only way to introduce a packet early is due to a timing error or, in other words, the RTO (round-trip time-out) is incorrectly set. A common mistake is to not use the variance in rtt. Queue theory says that the variance of RTT increases greatly with load. An example: if the load is 75%, one can expect RTT to vary by a factor of 16. The authors use a low-pass filter to estimate the RTT: RTT = RTT * alpha + R*(1-alpha). They've chosen to make alpha = .9. They then adjust the RTO to take into account the variance: RTO = beta * RTT. Again, beta is not a fixed value but rather an estimated variance.

Congestion Avoidance. A congestion avoidance algorithm has two components: feedback from the network and an endpoint policy that acts on the feedback. Here, the feedback used is the acks (or lack of acks) and the policy is additive increase of the window size when feedback indicates no congestion and multiplicative decrease otherwise. The additive increase is 1; the multiplicative factor for decrease is .5.

Multiplicative increase leads to the rush-hour effect: it's easy to saturate the network but very difficult to recover. (Recall, also, that Jain stated that +/+ is unfair.)

The algorithm is as follows:

* decrease: on timeout, set cwnd to 50% of W
+ increase: on each ack, increase cwnd by 1/cwnd (just a fraction, but by the time all the acks have been received, the fractions will add up to one).
sender always sets W to MIN(advertised window, cwnd)

Congestion Avoidance vs. Slow-start algorithm. The algorithms for .slow-start and congestion avoidance are presented separately because they address different implementation pathologies. In reality, however, they are implemented together. Here is the (real) algorithm with the two combined:

on timeout: set variable ssthresh = 50% of W, set cwmd = 1 (to trigger slow-start)
when ack received: if (cwnd > ssthresh), cwmd += 1; else cwmd += 1/cwmd.

Future Work. The next step is to address congestion detection at the .gateway level. The aggregate information necessary to detect congestion and provide fairness is only available there. If the feedback mechanism used is to drop packets, the gateways get self-protection for free (since they just drop the packets of the network hog in questions.) They note that Jain's queue averaging over the regeneration period might help with the bursty nature of network traffic but it has other problems under high-load.