[e2e] Open the floodgate

Thu Apr 22 06:26:50 PDT 2004

Alex - your note reflects a tremendous misunderstanding of TCP.   Is TCP 
supposed to correct for failing hardware anywhere on the path?   The answer 
is no.   TCP is a protocol that provides end-to-end error control - which 
ensures error-freeness over best efforts networks.

Does TCP obviate the need for local error control, because it does it on an 
end-to-end basis?   No - it was never supposed to.

The end-to-end analysis applies here:

1. end-to-end reliability cannot be provided at the link level.   Thus we 
must provide it on an end-to-end basis.

2. there is vast improvement in the operating point that can be achieved by 
doing local error recovery at the link (or within AS) level - local error 
recovery allows for tighter control loops, and should be done without 
adding to the end-to-end delay.   Thus it is appropriate to optimize 
(improve) the performance of links by retransmission.

The second point also captures what Bob Kahn crystalized in creating the 
Internet - the concept of "best efforts".   The word "best" clearly does 
not mean "no effort".   What it means is a subset of the end-to-end 
argument - do what you can where what you do is unambiguously helpful, but 
don't take on the impossible burden of assuring high-level properties with 
low-level mechanisms.

The worst botch I have ever seen in my consulting to commercial network 
installations was a Fortune 500 company that really misunderstood 
this.   They had been convinced to put in frame relay links between all 
their sites, and to use frame relay's "perfect end-to-end" delivery mode 
between their locations.    That's not a "best efforts" link if you think 
about it - it's a stranded soldier maintaining fanatical adherence to duty 
20 years after the war is over.

What happened?   If any link downstream failed (turned off), the frame 
relay link started filling buffers in every underlying switch.   It took 
many seconds to fill up, then when the downstream link came back up, it 
dumped many seconds worth of completely useless traffic into the destinations.

The frame-relay sales engineer just could not understand why turning off 
his low-level reliability made his customer happier.  In fact, he kept 
trying to get them to turn it back on - saying that the problem must have 
been with the routers.

Ultimately, this is the all-too-human problem of perseverating based on an 
incorrect theory of the world.   There's nothing wrong with theories, but 
their utility depends on matching their assumptions to reality.   The 
reality of the Internet is not the reality of traditional control theory.

Control theory
	- in the presence of competing and evolving goals at the user level (no 
single objective function to maximize, but instead a need to develop the 
most flexibility - that is the most diverse set of stable operating points 
in control theoretic terms) and
	-in the presence of highly coupled interactions with the clients (the WWW 
invented caching, which changed the operating point in a completely 
unpredictable way, without consulting the network planners) and
	-in the presence of an evolving set of underlying communications technologies

is now a new science.  This is partly because of people like John Doyle and 
Sally Floyd who took on the challenge of constructing a new control theory 
to match the requirements of the Internet.   Yes, everyone involved in 
developing TCP knows control theory.  But few of them have the illusion 
that the world exists to fit that theory.