[e2e] Why do we need TCP flow control (rwnd)?
David P. Reed
dpreed at reed.com
Tue Jul 1 12:36:06 PDT 2008
I think of this kind of stuff as *real* network research, Fred. Good
stuff.
Fred Baker wrote:
> On Jul 1, 2008, at 9:03 AM, David P. Reed wrote:
>> From studying actual congestive collapses, one can figure out how to
>> prevent them.
>
> OK, glad to hear it. I apologize for the form of the data I will
> offer; it is of the rawest type. But you may find it illuminating.
> Before I start, please understand that the network I am about to
> discuss had a very serious problem at one time and has fixed it since.
> So while the charts are a great example of a bad issue, this
> discussion should not reflect negatively on the present network in
> question.
>
> The scenario is a network in Africa that connected at the time to the
> great wide world via VSAT. It is a university, and at the time had
> O(20K) students behind a pair of links from two companies, one at 512
> KBPS and one at 1 MBPS. I was there in 2004 and had a file (my annual
> performance review) that I needed to upload. The process took all day
> and even at the end of it failed to complete. I wondered why, whipped
> out that great little bit of Shareware named PingPlotter (which I
> found very useful back when I ran on a Windows system) and took this
> picture:
>
> ftp://ftpeng.cisco.com/fred/collapse/ams3-dmzbb-gw1.cisco.com.gif
>
> The second column from the left is the loss rate; as you can see, it
> was between 30 and 40%. The little red lines across the bottom are
> indicative of individual losses, and indicate that they were ongoing.
>
> Based on this data, I convinced the school to increase its bandwidth
> by an order of magnitude. It had the same two links, but now at 5 and
> 10 MBPS. Six months later I repeated the experiment but from the other
> end, this time not using PingPlotter because I had changed computers
> and PingPlotter doesn't run on my Mac (wah!). The difference between
> the raw file and the "edited" file is 22 data points that were clear
> outliers. In this, I ran simultaneous pings to the system's two
> addresses, and in so doing measured the ping RTT on both the 5 and 10
> MBPS path. You will see very clearly that the 10 MBPS path was not
> overloaded and didn't experience significant queuing delays (although
> the satellite delay is pretty obvious), while the 5 MBPS path was
> heavily loaded throughout the day and many samples were in the 2000 ms
> ballpark.
>
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-edited.pdf
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005.pdf
>
> The delay distribution, for all that high delay on the 5 MBPS path, is
> surprisingly similar to what one finds on any other link. Visually, it
> could be confused with a Poisson distribution.
>
>
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-delay-distribution.pdf
>
>
> Looking at it in log-linear, however, the difference between the two
> links becomes pretty obvious. The overprovisioned link looks pretty
> normal, but the saturated link has a clear bimodal behavior. When it's
> not all that busy, delays are nominal, but it has a high density
> around 2000 ms RTT and a scattering in between. When it is saturated -
> which it is much of the day - TCP is driving to the cliff, and the
> link's timing reflects the fact.
>
>
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-log-linear-delay-distribution.pdf
>
>
> A sample space of one is an example, not a study - data, not
> information. But I think the example, coupled with our knowledge of
> queuing theory and general experience, supports four comments:
>
> (1) there ain't nothin' quite like having enough bandwidth. If the
> offered load vastly exceeds capacity, nobody gets anything done. This
> is the classic congestive collapse scenario as predicted in rfcs 896
> and 970. A review of Nagle's game theory discussion in 970 is
> illuminating.
>
> (2) there ain't nothin' quite like having enough bandwidth. In a
> statistical network, if the offered load approximates capacity, delay
> is maximized, and loss (which is the extreme case of delay) erodes the
> network's effectiveness.
>
> (3) TCP's congestion control algorithms seek to maximize throughput,
> but will work with whatever capacity they find available. If a link is
> in a congestive collapse scenario, increasing capacity by an order of
> magnitude results in TCP being released to increase its windows and,
> through the "fast retransmit" heuristic, recover from occasional
> losses in stride. It will do so, and the result will be to use the
> available capacity regardless of what it is.
>
> (4) congestion control algorithms that tune to the cliff obtain no
> better throughput than algorithms that tune to the knee. That is by
> definition: both the knee and the cliff maximize throughput, but the
> cliff also maximizes queue depth at the bottleneck. Hence, algorithms
> that tune to the knee are no worse for the individual end system, but
> better for the network and the aggregate of its users. The difference
> between a link that is overprovisioned and one on which offered load
> approximates capacity is that on one TCP moves data freely while on
> the other TCP works around the fragility in the network to provide
> adequate service in the face of performance issues.
>
More information about the end2end-interest
mailing list