[e2e] Reacting to corruption based loss

Tue Jun 7 07:22:08 PDT 2005

There are two effects that corruption causes.

First, it lowers the end-to-end error-free capacity of the channel.

Second, it causes congestion because it lowers the potential end-to-end 
error-free rate of the channel.

Clearly, the response of lowering the input rate is one possible way to 
deal with the second phenomenon.   However, this second effect is 
indistinguishable from bottleneck congestion or overload congestion.   
So, given that we have an effective way to deal with transient overload, 
why would "corruption" need a new layered interface change.

So any key difference should relate to the first impact.

It is well known that there are good reasons to create codes that cross 
packet boundaries.   So-called erasure codes or digital fountain 
techniques provide the ability on an end-to-end basis to deal with data 
losses that are packet centric.    If errors are "bursty" in time, 
spreading any particular end-to-end bit across several packets (or even 
across several paths with independent failures) is a good end-to-end 
response to corruption.

So the utility of separation of corruption from overload losses is to be 
able to code better.    Suppose a packet's header is salvageable but its 
data is not (perhaps putting a code on the header, rather than a 
checksum would help here!)   Would it be helpful in improving the 
effective end-to-end capability if decoded at the endpoint?   Absolutely 
- if there are priors that give you a reasonable error model.

But the real question here is about coding a stream across a network 
with packet corruption.   It probably is better to look at the 
end-to-end perspective, which includes such things as latency (spreading 
a bit across successive packets adds latency when decoded at the 
receiver) and control-loop latency (how fast can the endpoints change 
coding of a stream to spread across more packets and more paths, 
compared to a more local, rapid, link-level response).

The observation that 802.11 slows rates automatically based on link 
quality points out the issue here - such a local tactic improves all 
end-to-end paths with one fell-swoop, whereas there is the possiblity 
that end-to-end responses will be too slow, or else drive each other 
into mutual instability if the rate of change of link quality varies 
faster than the end-to-end control loop timing can resolve.

I'd argue that intuitions of most protocol designers are weak here, 
because the state of the system as a whole is not best managed either at 
the link level or at the end-to-end "session" level - but at the whole 
network level.   RED and ECN are decentralized "network level" control 
strategies - which end up providing a control plane that is implicit 
among all those who share a common bottleneck link.   SImilarly, coding 
strategies that can deal with "corruption" require a "network level" 
implicit control, not an intuitive fix focused on the TCP state machine.