Scalable Data Naming for Application Level Framing in Reliable
Multicast
Suchitra Raman and Steven McCanne (UCB), 1998
Summary. Current transport level protocols name data with
sequential numbers, a scheme that does not permit receivers to
"flexibly tailor their reliability semantics" leading to more
efficient use of the underlying network. The authors propose
a naming scheme that exposes the structure of application data
to the transport layer, enhancing the applications's control over
the reliability.
More Detail
Announce-listen. In this approach to RM, a source
periodically multicasts the data to some agreed upon address and
interested parties listen in. Eventually, all parties will have
a consistent copy of the data. Although simple and robust,
announce-listen is suboptimal for two reasons:
does not scale well in terms of bandwidth and the time-to-convergence may be
too great.
"SRM adds a level of indirection to the announce/listen framework. In
SRM, a source enlists the announce/listen framework to disseminate
'meta-data' that summarizes al the available data without actually
sending it...As an optimization, a source might multicast data once
upon creation to avoid delay."
ALF. To perform receiver-reliability on traditional protocols,
need:
- rich naming scheme -> not mapping to linear namespace
- scalability
SNAP solves (1) with two-dimensional naming scheme: it uses announce/listen
to disseminate page structure that provides a common-syntax that
enables receiver reliability. SNAP addresses (2) by
exploiting hierarchy to gain a level-of-indirection by providing
a summary fingerprint.
SNAP Overview.
- Namespace: Hierarchical, per-source structure onto
which an application maps data.
- ADU: Smallest unit of data that can be processed
independently; . ADUs named
within a container sequentially starting from zero. Bytes within
an ADU are named sequentially starting from zero. ADU's are named
with the pair (CID, ADU seqno).
- Fragments: Fragments within an ADU are numbered using
byte boundaries. To signify that more ADU fragments are expected,
have a "more" bit set to 1. A "more" bit of 0 indicates the
last fragment in an ADU. Since the ADU is the smallest unit of data
that an app sees, only the transport layer (SRM) sees fragments.
- Container: Nodes in the namespace hierarchy; containers
can contain ADUs, other containers, or both; containers may contain
just one ADU. Containers named using CID's assigned sequentially
starting at a randomly chosen Initial CID (ICID). Containers
are the unit of reliability.
- Namemap Container: A special container that resides
at the left-most position of the namespace tree. Permits re-use of
"protocol machinery".
- Signature: Representation of a namespace;
s(C) = right_edge, if C is a leaf-level container;
h(right_edge, s(C1),...s(Ck)), otherwise
where h() is the MD-5 hash function, k is the
number of children that C contains, and right_edge
is the pair (max ADU Seqno, last byte offset in last ADU)
(latter in case ADU not done being transmitted).
Sources multicast their namespace signatures in periodic
Session Announcements.
- Session announcments: s(namespace root) plus
some number of container signatures. Which containers are
sent out is chosen heuristically based on the amount of
time passed since their signature was last sent out.
- Don't care containers: Since apps may practice
selective reliability, they may not care about certain
repairing certain containers. For these containers, the
most current signature is stored to prevent spurious signature
mismatches higher up in the tree.
Fragmentation. Relying on IP to do link-level frag and
reassembly can be inefficient since the loss of one fragment could
result in the retransmission (multicast) of an entire packet. To avoid
fragmentation, SRMv2 uses 1024 byte MTU.
Source IDs. The Source ID needs to meet the following three requirements:
- Uniqueness. RTP uses random 32-bit words; okay for
non-reliable protocol, but not reliable protocol.
- Time invariance. Unlike RTP apps where "app state
is ephemeral", rm apps require that a source have the same
state (id) throughout the entire session - possibly even
through leave and joins / crashes and recoveries.
- Terminal independence. A source could move from
machine to machine; can't use IP address/port pair.
SNAP uses MD-5 hashed, application-supplied arbitrary strings.
Loss Recovery \ Namespace Discovery. Non-tail losses are
detected via gaps in sequence numbers (fragments), CID's (containers),
ADU seqnos (ADUs). Tail losses are discovered via mismatches between
signatures sent out in session announcments.
Should losses in anything other than fragments be detected (fragment
loss can be detected but not repaired because the app supplies repair
data and it does not deal in fragments - only ADUs), SRMv2 informs the
application. It is then the applications responsibility to determine
if a repair is desired and, if so, trigger the repair.
Assuming the app desires repairs, if a signature mismatch is detected,
a namespace repair request is sent. In response, a list of
the children and their signatures is sent. The source can then traverse
the response and send repair requests for specific ADU's.
If a whole container is lost (caused by a tail loss in the name map),
the container name in the map and the ADU's are repaired. (?)
A namespace repair request is expected not to be the common case; the
common case is that the container signature will be sent in the session
announcent (not sure what the reasoning is here).
Performance. Three components to delay for tail losses:
- State update (can't detect until
- SRM damping delay
- Repair response time
Showed constant (non-increasing) number of duplicates for requests and
replies. Showed linear convergence time as a function of the state
update period.
API. API presented.
Questions/Comments.
- Why is SNAP restricted to reliable multicast? Oh - in intro,
do mention that ALF replies to unicast as well as multicast.
- In section 2, page 3, paragraph 2, it is explained that upon
receipt of the fragment that has the "more" bit set to 0, srm
checks if the ADU is complete. If so, passes to application.
This check should be done after every fragment is received in
case of out-of-order receipt, shouldn't it?
- Why a more bit to detect fragment loss? Out-of-order f
arrivals can make this complicated. Why not just use an ADU
size? Aren't these known at the outset based on a bytecount
passed to srm_start()?
- What about ATM networks? Perhaps these aren't common enough
- they're usually leaf networks.
- Container vs. Node terminology.
- Say that the probability of source name collision is the
same as in RTP, but the SNAP name is 64-bit and RTP is 32-bit.
Why is the probability the same?
- What happens if I decide I care about a Don't Care container?
Does the app notify SRM that it would like to send a request?
- Performance only went to 55 nodes due to ns constraints.
Would be interesting what happens with more nodes.
- How do you do resource control acctg on SRM repairs?
Other:
- leaky bucket
- token bucket
- ADU as defined in end-to-end paper