[e2e] Why do we need TCP flow control (rwnd)?
Fred Baker
fred at cisco.com
Tue Jul 1 11:26:17 PDT 2008
On Jul 1, 2008, at 9:03 AM, David P. Reed wrote:
> From studying actual congestive collapses, one can figure out how to
> prevent them.
OK, glad to hear it. I apologize for the form of the data I will
offer; it is of the rawest type. But you may find it illuminating.
Before I start, please understand that the network I am about to
discuss had a very serious problem at one time and has fixed it since.
So while the charts are a great example of a bad issue, this
discussion should not reflect negatively on the present network in
question.
The scenario is a network in Africa that connected at the time to the
great wide world via VSAT. It is a university, and at the time had
O(20K) students behind a pair of links from two companies, one at 512
KBPS and one at 1 MBPS. I was there in 2004 and had a file (my annual
performance review) that I needed to upload. The process took all day
and even at the end of it failed to complete. I wondered why, whipped
out that great little bit of Shareware named PingPlotter (which I
found very useful back when I ran on a Windows system) and took this
picture:
ftp://ftpeng.cisco.com/fred/collapse/ams3-dmzbb-gw1.cisco.com.gif
The second column from the left is the loss rate; as you can see, it
was between 30 and 40%. The little red lines across the bottom are
indicative of individual losses, and indicate that they were ongoing.
Based on this data, I convinced the school to increase its bandwidth
by an order of magnitude. It had the same two links, but now at 5 and
10 MBPS. Six months later I repeated the experiment but from the other
end, this time not using PingPlotter because I had changed computers
and PingPlotter doesn't run on my Mac (wah!). The difference between
the raw file and the "edited" file is 22 data points that were clear
outliers. In this, I ran simultaneous pings to the system's two
addresses, and in so doing measured the ping RTT on both the 5 and 10
MBPS path. You will see very clearly that the 10 MBPS path was not
overloaded and didn't experience significant queuing delays (although
the satellite delay is pretty obvious), while the 5 MBPS path was
heavily loaded throughout the day and many samples were in the 2000 ms
ballpark.
ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-edited.pdf
ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005.pdf
The delay distribution, for all that high delay on the 5 MBPS path, is
surprisingly similar to what one finds on any other link. Visually, it
could be confused with a Poisson distribution.
ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-delay-distribution.pdf
Looking at it in log-linear, however, the difference between the two
links becomes pretty obvious. The overprovisioned link looks pretty
normal, but the saturated link has a clear bimodal behavior. When it's
not all that busy, delays are nominal, but it has a high density
around 2000 ms RTT and a scattering in between. When it is saturated -
which it is much of the day - TCP is driving to the cliff, and the
link's timing reflects the fact.
ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-log-linear-delay-distribution.pdf
A sample space of one is an example, not a study - data, not
information. But I think the example, coupled with our knowledge of
queuing theory and general experience, supports four comments:
(1) there ain't nothin' quite like having enough bandwidth. If the
offered load vastly exceeds capacity, nobody gets anything done. This
is the classic congestive collapse scenario as predicted in rfcs 896
and 970. A review of Nagle's game theory discussion in 970 is
illuminating.
(2) there ain't nothin' quite like having enough bandwidth. In a
statistical network, if the offered load approximates capacity, delay
is maximized, and loss (which is the extreme case of delay) erodes the
network's effectiveness.
(3) TCP's congestion control algorithms seek to maximize throughput,
but will work with whatever capacity they find available. If a link is
in a congestive collapse scenario, increasing capacity by an order of
magnitude results in TCP being released to increase its windows and,
through the "fast retransmit" heuristic, recover from occasional
losses in stride. It will do so, and the result will be to use the
available capacity regardless of what it is.
(4) congestion control algorithms that tune to the cliff obtain no
better throughput than algorithms that tune to the knee. That is by
definition: both the knee and the cliff maximize throughput, but the
cliff also maximizes queue depth at the bottleneck. Hence, algorithms
that tune to the knee are no worse for the individual end system, but
better for the network and the aggregate of its users. The difference
between a link that is overprovisioned and one on which offered load
approximates capacity is that on one TCP moves data freely while on
the other TCP works around the fragility in the network to provide
adequate service in the face of performance issues.
More information about the end2end-interest
mailing list