[e2e] Reacting to corruption based loss
David P. Reed
dpreed at reed.com
Tue Jun 7 07:22:08 PDT 2005
There are two effects that corruption causes.
First, it lowers the end-to-end error-free capacity of the channel.
Second, it causes congestion because it lowers the potential end-to-end
error-free rate of the channel.
Clearly, the response of lowering the input rate is one possible way to
deal with the second phenomenon. However, this second effect is
indistinguishable from bottleneck congestion or overload congestion.
So, given that we have an effective way to deal with transient overload,
why would "corruption" need a new layered interface change.
So any key difference should relate to the first impact.
It is well known that there are good reasons to create codes that cross
packet boundaries. So-called erasure codes or digital fountain
techniques provide the ability on an end-to-end basis to deal with data
losses that are packet centric. If errors are "bursty" in time,
spreading any particular end-to-end bit across several packets (or even
across several paths with independent failures) is a good end-to-end
response to corruption.
So the utility of separation of corruption from overload losses is to be
able to code better. Suppose a packet's header is salvageable but its
data is not (perhaps putting a code on the header, rather than a
checksum would help here!) Would it be helpful in improving the
effective end-to-end capability if decoded at the endpoint? Absolutely
- if there are priors that give you a reasonable error model.
But the real question here is about coding a stream across a network
with packet corruption. It probably is better to look at the
end-to-end perspective, which includes such things as latency (spreading
a bit across successive packets adds latency when decoded at the
receiver) and control-loop latency (how fast can the endpoints change
coding of a stream to spread across more packets and more paths,
compared to a more local, rapid, link-level response).
The observation that 802.11 slows rates automatically based on link
quality points out the issue here - such a local tactic improves all
end-to-end paths with one fell-swoop, whereas there is the possiblity
that end-to-end responses will be too slow, or else drive each other
into mutual instability if the rate of change of link quality varies
faster than the end-to-end control loop timing can resolve.
I'd argue that intuitions of most protocol designers are weak here,
because the state of the system as a whole is not best managed either at
the link level or at the end-to-end "session" level - but at the whole
network level. RED and ECN are decentralized "network level" control
strategies - which end up providing a control plane that is implicit
among all those who share a common bottleneck link. SImilarly, coding
strategies that can deal with "corruption" require a "network level"
implicit control, not an intuitive fix focused on the TCP state machine.
More information about the end2end-interest
mailing list