Micro: a comparison of simulated and experimental results for micro benchmarks

Executive Summary
Goals
Experiments
Observations

Executive Summary

In cases where the delay was reliable, delays differed significantly between real and simulated results: 18.7% in the non-congested case and 64.5% in the congested case.

When uncongested, the real and simulated throughput were equal. When congested, however, simulated throughput differed slightly to significantly: 3.5% to 18.2% for the two congested experiments.

Even in these simple experiments, when congestion was present the simulator slowed to 6 and 7 times the specified runtime (5 secs.).

The above suggests that if we're interested in finding differences between simulated results and testbed results, we should look for experiments that introduce congestion into the environment (our experience suggests that this will result in different delay and bandwidth numbers) and are relatively complicated (our experience suggests this will highlight slow simulator runtime).

Goals

The goals are manyfold and interrelated:

If we can characterize the differences between simulated results and experimental results in small experiments, we can extrapolate where simulators are most likely to show erroneous results in larger experiments. It is in these areas where we can show that the testbed can complement network simulators.
(Note that our testbed has uses other than as a complement to network simulators, but that's our emphasis here.)
If we do not understand the testbed's behavior in small experiments, we'll be biting off more than we can chew when we attempt larger-scale experiments.
We don't have the hardware currently to support larger experiments and this one is worthwhile and fits within the current topology.
These simple experiments will help us in understanding the testbed's performance and realized throughput.

Experiments

Six simple experiments, each slightly more complex than the last, were performed. They are listed below in order of complexity (though none are very complex). Each experiment was performed on each of two platform types: on real machines and the ns simulator. There were three different real machine configurations. The platforms are described below the individual experiments.

Results for the real machine experiments were averaged over 10 runs (not 30, but enough to collect an average). Simulations were only run once.

Experiments:

One-way 1M, Uncongested. We sent UDP packets from one process to another on the same node. We set the delay of the link in ns to 0ms. The goal is to get an idea of how much end-node overhead the simulator has vs. the real machine.
- Platforms: S, 1M
- Code: S, 1M: Source and Sink (Note that the description at the top of the Source file does not necessarily describe how this experiment was run since we ran the source and sink on the same node.)
- Packet size: 1024 bytes
- Duration: 5 secs (all experiments were 5 seconds).
- Results script

One-way 2M, Uncongested. We sent UDP packets from one node to another. The nodes are connected with 100Mbs ethernet.
- Platforms: S, 2M
- Code: S, 2M: Source and Sink
- Packet size: 1024 bytes
- Duration: 5 secs.
- Results script

One-way 3M, Uncongested. We sent UDP packets from one node to another. We do not induce congestion.
- Platforms: S, 3M
- Code: S, 3M: Source and Sink
- Packet size: 1024 bytes
- Duration: 5 secs
- Results script

One-way 3M, Congested. Same as above except we send 10 1400 byte packets (1400 is a size where we don't fragment yet this still causes congestion at the bottleneck link).
- Platforms: S, 3M
- Code: S, 3M: Source and Sink
- Packet size: 1400 bytes
- Duration: 5 secs
- Results script

RTT 3M, Uncongested. We send one 1400 byte packet every 10ms. A "Bouncer" on the other end of the router bounces the packet back to the source.
- Platforms: S, 3M
- Code: S, 3M: Source and Sink and Bouncer
- Packet size: 1400 bytes
- Duration: 5 secs
- Results script

RTT 3M, Congested. Same as above except we send 10 1400 byte packets every 10ms.
- Platforms: S, 3M
- Code: S, 3M: Source and Sink and Bouncer
- Packet size: 1400 bytes
- Duration: 5 secs
- Results script

Platforms:

Simulator (S). We used the following libraries to build ns:
- ns-2.1b5
- tclcl-1.0b8
- otcl-1.0a4
- tck8.0.4
- tk8.0.4
We extended ns to include a UDP Agent capable of sending application-defined packets. Here is the code for this simple extension.
In experiments #2-#7, a delay had to be set for the links.

One Machine (1M). One dual Pentium II/350 machines with 128M SDRAM running FreeBSD 3.0-Current (SMP).

Two Machines (2M). Two dual Pentium II/350 machines with 128M SDRAM running FreeBSD 3.0-Current (SMP). Connected via 100Mbs switched ethernet.

Three Machines (3M). Three Pentium II/300 machines with 128M RAM running FreeBSD 2.2.6-STABLE (PIICAM). Connected point-to-point in the following manner:
```
      155.99.213.184/29     155.99.213.190/29
        (falcon) <-- 100Mbs --> (chukar) <-- 10Mbs --> (vulture)
                            155.99.213.198/29       155.99.213.193/29
        
```
Chukar routes packets between Falcon and Vulture on two otherwise unused subnets thus eliminating cross-traffic. Copies of the rc.conf files required for this configuration are in ~kwright/src/vint/ns/tcl/kw/etc-[falcon,chukar,vulture].tar. (Note: There is one thing that the rc.conf files don't reflect: I modified the 10Mbs endpoints to be full-duplex by hand:
```
        On chukkar:
	ifconfig fxp2 media 10baseT/UTP mediaopt full-duplex
  
        On vulture:
	ifconfig fxp1 media 10baseT/UTP mediaopt full-duplex
	
```
These configurations could be added into the rc.conf ifconfig lines.

Observations

Timer latency.

Average received packet send interval in ms.
Experiment	Machine	Simulator
One-way 1M, Uncongested	10.04	10.00
One-way 2M, Uncongested	10.03	10.00
One-way 3M, Uncongested	10.00	10.00
RTT 3M, Uncongested	10.01	10.00

The estimated send intervals are omitted for the congested experiments because, since packets were lost due to congestion, the difference in send times between consecutive packets received does not reflect accurately the difference in send times since dropped packets would serve to spuriously increase the interval.

The real-time delay intervals tended to be slightly longer than the simulator's. This did have an effect on throughput, but only slightly. In one-way 1M, the delay was on average 10.04. This could make a difference in throughput: 498 packets rather than 500. As we see below, this was exactly the number observed: the simulator reported a throughput of 500 packets where the real machines generated only 498 packets.

Overhead. One-way 1M was an attempt to figure out if, once the network delay was factored out, what the overhead of ns was compared to a real machine's overhead. With the appropriate link delay set to 0 in ns as follows:

      $ns duplex-link $n0 $n1 10Mb 0ms DropTail

Delay in ms for *One-way 1M, Uncongested*.
1M	S with 10Mbs	S with 100Mbs
0.12	0.82	0.082

From the relationship between the 10Mbs and 100Mbs S delays, we can see the bw is still being taken into account in spite of a 0 delay. A more correct way to do this would be to change the ns script to have two attached agents assigned to a single node.

We can conclude something about ns overhead from the other experiments. I ran the unix command time on the ns run. Below are the results:

`time` output for all S runs.
Experiment	`time` output
One-way 1M, Uncongested	1.733u 0.031s 0:01.86 94.6% 2463+2982k 0+12io 0pf+0w
One-way 2M, Uncongested	(Either never measured or lost)
One-way 3M, Uncongested	2.046u 0.038s 0:02.32 89.2% 2508+3140k 0+61io 0pf+0w
One-way 3M, Congested	8.293u 0.438s 0:28.51 30.5% 2383+3333k 27+605io 202pf+0w
RTT 3M, Uncongested	2.365u 0.086s 0:04.19 58.2% 2378+3085k 9+122io 10pf+0w
RTT 3M, Congested	12.240u 0.442s 0:35.02 36.2% 2354+3352k 0+1150io 0pf+0w

All of the uncongested experiments ran in less than the 5 second simulation length. (This is possible in ns because the simulated time is not real-time so that if there are no events in the event loop to process, the scheduler moves the clock ahead to the next event time and proceeds.)

When the experiments induce congestion, however, the simulator slows significantly: to 28.51 and 35.02 seconds for the one-way and round-trip 5-second simulations, respectively.

Though we expected the simulator to slow down when executing complicated simulations, I was surprised to see the runtime increase by so much for such a simple experiment (few nodes, little traffic, no route changes or calculations, etc.). It would be interesting to see how much the simulator slows down in a larger, more complex experiment.

Throughput. The simulated and experimental results from test cases which didn't induce congestion were similar. This is to be expected because there is no reason for a packet not to eventually be delivered since no packets are dropped. and each platform sends the same number of packets. The slight descrepencies between real and simulated results in the uncongested cases are caused by the slightly longer timer delay (discussed above).

The two uncongested test cases' results are a different matter: they show slight to significant differences: from 3.5% to 18.2%.

All results are in the table below.

Throughput (packets/bytes)
Experiment	M Throughput	S Throughput	% Off of Real Throughput
One-way 1M, Uncongested	(498/509952)	(500/512000)	+00.4%
One-way 2M, Uncongested	(498.3/510259.2)	(500/512000)	+00.3%
One-way 3M, Uncongested	(499.9/511897.6)	(500/512000)	+00.0%
One-way 3M, Congested	(4312.9/6038060)	(4463/6248200)	+03.5%
RTT 3M, Uncongested	(500.2/700280)	(500/700000)	-00.0%
RTT 3M, Congested	(3775.17/5285233.33)	(4463/6248200)	+18.2%

Delay.

The packet delays measured for each experiment are listed below. Note that the packet delay can only be considered reliable in the round-trip time experiments due to the possibility of coarse-grained NTP. These experiments are highlighted in pink.

Packet Delay in ms. Pink experiments indicate reliable delays.
Experiment	M Delay (secs)	S Delay (secs)	% Off of Real Delay
One-way 1M, Uncongested	.00012	.00008	N/A
One-way 2M, Uncongested	.00047	.00020	-57.4%
One-way 3M, Uncongested	.00097	.00136	+86.0%
One-way 3M, Congested	.06958	.05056	-27.3%
RTT 3M, Uncongested	.00284	.00231	-18.7%
RTT 3M, Congested	.14415	.05117	-64.5%

It is important to look at how the link delay (versus the packet delay) is set in ns. The link delays for one-way 2M was set to be equal to half the RTT of an ICMP request sent from the source to sink on the 2M platform. This is the ping output from Kamas to Eureka (the real machines):

   30 packets transmitted, 30 packets received, 0% packet loss
   round-trip min/avg/max/stddev = 0.226/0.239/0.255/0.008 ms

        $ns duplex-link $n0 $n1 100Mb .1195ms DropTail

The ping results for the 3M experiments are:

        source to router:
        30 packets transmitted, 30 packets received, 0% packet loss
        round-trip min/avg/max/stddev = 0.130/0.140/0.226/0.016 ms
        PING 155.99.213.193 (155.99.213.193): 56 data bytes

        router to bouncer:
        30 packets transmitted, 30 packets received, 0% packet loss
        round-trip min/avg/max/stddev = 0.311/0.320/0.407/0.018 ms

        $ns duplex-link $n0 $r 100Mb .14ms DropTail
        $ns duplex-link $r $n1 10Mb .32ms DropTail

RTT 3M, Uncongested

In fact, they are less than the combined one-way link delay specified (.14 + .32 = .46ms).

On the other hand, since the delays are relatively close in the uncongested RTT case yet so disparate in the congested RTT case, this leads me to believe that ns is not modeling our congestion scenario accurately. That is, it doesn't appear that the different results for RTT 3M, Congested are caused merely by bad delay numbers in the simulator.

Graphs

In spite of the fact that we're probably not modeling the same thing in ns as in our real environment, it is still interesting to quantify the differences between platforms and identify the reasons for the differences. Doing so will allow us to continue with bigger experiments with greater intuition about where the two platforms will differ.

Phase plots
In a phase plot, a marker is printed at coordinates (x,y) if there exists a value n such that x=rtt(n) and y = rtt(n + 1). The (x,y) plane is referred to as the phase plane. In the case where congestion is not present, we expect the points to be centered on the line rtt(n) = rtt(n+1). For example, here is the phase plot for RTT 3M, Uncongested where no packets are dropped and no congestion is present. (I also plotted this with lines; as expected, it's a mess that yields no additional information over the points only.)
It takes each packet a minimum of 1.12 ms to cross the second link and .0112ms to cross the first link. The total one-way delay is 1.12ms + .0112ms = 1.23ms. Round-trip time is then a minimum of 2.46ms. We know from the delay section that the RTT 3M, Uncongested experiment had an average RTT of 2.84ms. The graph visually shows this.
Compare with the graph above the following phase plot of RTT 3M, Congested.
- Real
- Including lines in the phase plot sheds some light on why this is so: Real with lines.
- Simulated
Dropped Packet.
Drop behavior (I include 2 to show that the regularity extends across experiments):
- Dataset 7
- Dataset 8
Sequence Number X RTT
3M, Congested but sending 20 packets at 20ms intervals.
- Phase plot.
  - Notice that the difference between the lines is greater. This is because the queue grows much larger for each send interval because we send 20 packets rather than just 10 before giving the queue a chance to decrease. This drop should be a difference of 18 packets because 2 packets should still be in the queue at the beginning of the next send interval. We also expect an increase of 2ms rather than just 1ms every interval.
- With lines.
  - As in the phase plot for 10packets: Increasing until a break in sending, decrease at the beginning of each send interval.
  - The lower line (y=x-.018)shows that the difference bet ween the lines is indeed about 18 packets.
  - Increase of 2ms rather than just 1ms every interval.
- Dropped packets.
  Notice that
  - Packets are dropped earlier in the 20packet/20ms experiment than the 10packet/10ms experiment. This should be within 10ms of the 10packet/10ms experiment, however.
  - As we reach steady state, it looks like we drop packets in two interleaving quantities. The next graphs will tell us what quantities those are. .
- Drop Behavior.
  - The two quantities that packets are dropped at are 32 and 12.
- Sequence Number X RTT
3M, Congested but sending 30 packets at 30ms intervals.
- Phase plot.
- Phase plot with lines.
- Drop Behavior.
- Sequence Number X RTT
One-way 3M, Congested. To verify that the drop behavior is the same regardless of one-way or RTT.
- Drop behavior.
- Sequence number X RTT
  - Points.
  - Lines.

Open Questions

Why is the FreeBSD router dropping packets in clusters exactly 15ms apart? Why are these more packets in the queue (~73) than the FreeBSd code allows (50). Some guesses:
- Alastair thought that maybe that the outgoing queue is clearing, but the software isn't getting the interrupt until after 23 packets have been sent.
- Mike had some ideas for figuring out what to do next: just a printf() for the queue size in the kernel for both the incoming and outgoing queues might be interesting. He suggested just building a new kernel to do that rather than running a debugger on the existing kernel.
- Need to be sure to check into what stats are already generated before rebuilding a kernel to collect the same stat.

How are the user-specified delays used in ns? Are they end-to-end delays, wire delays, something else?

Would manipulating queue length help the simulator times? Because I did not know the queue lengths for the respective platforms, I left the simulator's queue lengths undefined. One of two things probably happened: either a default queue-length was assumed or an infinite length was assumed. A longer simulated queue would result in longer packet delays; a shorter queue would result in shorter delays, but more dropped packets.

Should we ask one of the current ns developers to take a look and give their opinions?

If we choose to do so, I think we could proceed like so:
- Fix ns or modify our nodes so that these experiments come up with the same results.
- Proceed with more complicated experiments -- compare non-fixed ns with changed ns results. Fix ns where necessary.
- Collaborate with someone tasked with calibrating ns. This would be a great use for the testbed.
It's important to fix ns because otherwise we're not really making any contribution to the community or learning much of anything other than that, gee, the results differ.

Payload vs. Packet Size. Kevin pointed out that in some places above I should be using the term "payload" above. This brought up the question: how does ns interpret the user-defined packetsize_ variable: as payload size or packet size? If packet size, does ns add on bytes for IP and link-level headers? I'm guessing "packet size" and "no".

Kristin Wright

Last modified: Tue Jan 25 16:28:18 MST 2000