OTERS (On-Tree Efficient Recovery using Subcasting):
A Reliable Multicast Protocol
Dan Li and David R. Cheriton (Stanford), 1998
Summary. OTERS leverages subcast to create a reliable
multicast protocol that has both relatively low recovery latency
and low traffic load. OTERS requires "minimal" extensions to
the IP traceroute service.
More Detail
Existing protocols sacrifice low recovery latency for low traffic load
(or vice-versa) and/or require additional, complex state in the routers
and/or network transport layer. OTERS achieves low-latency
by detecting loss early (high) in the multicast tree while limiting
traffic load by leveraging subcast to retransmit and ignoring
duplicate NAKs seen within the same subtree within a certain interval.
Subcast defined to be:
S = subcast message source
R = router at root of subcast tree
So= multicast data source
M = multicast group address
[x, y](p) == [src, dst](payload)
S sends to R:
[S, R] ([So, M](payload))
Upon reciept, R swaps headers (not strip!) and sends:
[So, M] ([S, R](payload))
By swapping the headers rather than stripping them, receivers
can detect who sent the subcast packet and filter/drop packets
if necessary for security reasons.
OTERS consists of two protocols:
- FTFP: fusion-tree formation protocol. DRs,
Designated Receivers, are found by sending out (and listening to)
FTFP packets containing the following information (these are
per-router/per-source packets):
- H: number of hops from sender to router R
- D: recently experienced delay from source
- L: recently experienced loss rate from source at sender
- TTL of the message (from which receivers can deduce
their distance from sender)
As group members compare their distances to R to other members'
distances to R, a DR is chosen for each router. Eventually, a
DR is chosen for each subtree in the multicast tree.
- LRP: loss-recovery protocol. When a loss is detected,
send a NACK to the DR. DR forwards to it's DR IFF it has not
forwarded one recently and it hasn't recently retransmitted the data.
IP Multicast extension. Need "simple" mods to IGMP
traceroute and IP encapsulation. Authors point out that other
protocols would benefit from this additional functionality. Further,
incremental deployment possible: non-conforming routers will not
respond to backtrace/subcast requests. End effect is that the subcast
tree sizes increase. Worst case is a single subcast tree.
Early-start effect. Members higher up in the tree will
detect losses earlier and subcast them down. Members far from
the source (low in the tree) benefit. [This is an especially
important effect given that Handley 98 showed that MBone losses
are highly-correlated.] Authors show via simulation that their
protocol does benefit from early-start: the average recovery
latency of most group members is less than the average RTT
from retransmitters (DRs ?).
SCREAM: Source-Constrained version of OTERS. All NACKs propogated
to source; repairs only sent from source. Adds delay, but authors
claim its not a big deal. XXX
Compared against four other RM protocols:
- TMTP
- TMTP-unicast
- SRM
- SRM-ideal (no duplicate avoidance for lowest recovery latency).
.
Simulated 100 and 600 node topologies. For 600 node sim, took
into account session messages (since their traffic load increases
significantly as membership scales up). Came within 17% of the low
bound for recovery latency (SRM-ideal). Generates less traffic than
the others---orders of magnitude less for 600 node sim.
Comments:
- Vague on how losses are detected; mention sequence number or
timeout, but algorithm not spelled out. Do we assume TCP semantics?
How are tail losses detected?
- Short on references. For example, previous subcast work
not referenced.
- Vague on whether DR can cache data. I believe that it can
based on the fact that SCREAM addresses security problems
with non-source repairs.
- They spell NACK NAK.
- Vague on algorithms for determining delay, loss-rate.