[e2e] Agility of RTO Estimates

Fri Jul 15 12:18:38 PDT 2005

Craig Partridge wrote:
> In message <42D7EB1D.8050003 at web.de>, Detlef Bosau writes:
> 
> 
>>My question is, with repect to mobile wireless networks as UMTS or GPRS:
>>How "quickly" does RTO adapt? I expect, this is restricted by the ES-ES
>>latency, the packet rate (i.e. "samplin rate"), the burstiness of
>>traffic etc.
>>Can this "RTO model" follow e.g. the latency variations met on the
>>mobile network in "real time"?
>>Or are there basic limiations. (At least, I expect so.)
> 
> 
> I'll take a stab at this and be delighted to be corrected by others who
> know better.
> 
> I believe the immediate issue is not the "RTO model" but rather the
> question of what RTO estimator you use.  In the late 1980s there was

Basically, it´s the same question. Maybe, I was confusing there.

The "RTO model" consists of
1. the RTT estimate,
2. the variation estimate,
3. the "recipe", how you "cook" a confidence interval from those parameters.

> a crisis of confidence in RTO estimators -- a problem we dealt with by
> developing Karn's algorithm (to deal with retransmission ambiguity) and
> improving the RSRE estimation algorithm with Van Jacobson's replacement.
> 
> Van did a bunch of testing of his estimator on real Internet traffic and
> looked to see how often the estimator failed.  (Note that spurious
> timeouts are only one failure -- delaying a retransmission overly long
> after the loss is also a failure.)  He picked an estimator that was
> easy to compute and gave good results in the real world.
> 
> If there's reason to believe the estimator today is working less well, we
> could obviously replace it.  That doesn't mean the RTO model needs fixing.

I don´t want to fix the RTO model itself - just to be not misunderstood.

I only want to understand the basic limiations. E.g.: The RTT estimate 
("SRTT") _has_ to rely on a certain time series of RTT observations 
taken from the flow.

Similar to the sentence: "Nehmen Se de Menschen, wie se sind. Andere 
jibt et nich.", Konrad Adenauer :-) Or in english (hopefully, the 
translation is not too bad): Adenauer advised us: Take the people as 
they are. There are no others.

BTW: Sp. T. and delaying a retransmission overly long are basically the 
same problem. In each statistical test, you have two kinds of errors. 
The ones of the first kind: Falsely reject a correct zero hypothesis.
If your z.h. is "The packet is correctly delivered and acknowledged", a 
sp. t. is an error of the first kind.
Then, there are the ones of the second kind: Falsely "accept" (precisely 
"not reject", because tests make a decision whether or not to reject a 
z.hyp.) a wrong zero hypothesis.

Back to RTT estimators.

You have to rely on a certain time series. Depending at least on your 
throughput, this series is restricted to a certain "sampling rate". From 
this, the resolution of the estimator, i.e. it´s ability to follow 
network property changes in their original bandwidth is limited.

A concrecte example: Properties of an UMTS channel may change extremely 
quickly. The transport latency for a radio block may change several 
times even _within_ one IP packet (which may be split into severyl RB 
for transmsission). Thus the end-to-end latency for a packet will change 
several times within one packet transmission.

It is obvious that a RTT estimate _cannot_ follow these changes, 
independent of the chosen estimator.

(It is a very rough analogy, but I always think of Shannon´s sampling 
theorem here.)

> 
> Second point is that the RTO model now works in concert with other
> mechanisms.  I.e. it used to be that we relied only on RTO to determine
> if we should retransmit.  Now we have Fast Retransmit to catch certain
> types of loss.
> 

...which raises other questions of course, e.g. the question whether 
packet reordering is neglectible or not.

However, for the moment I don´t think about that.

The underlying question in fact is: When I could place a bandwidth 
restriction upon network property changes (don´t ask me how ;-), but for 
the moment, let´s assume I could), which restriction would be enough to 
allow RTT and variation estimators to follow network properties "quickly 
enough"? I.e. to keep the risk of spurious timeouts etc. at a constant 
level?

Please note: I do not say _avoid_ here, because in a test, the level of 
significance _is_ the propability for an error of the first kind. 
Particularly for spurious timeouts, that means these are not restricted 
to wireless network but are an inherent (and inevitable!) part of TCP 
which is met on _all_ networks.

In other words: What (bandwidth) restrictions must be eonforced on 
network properties, to maintain a "constant" level of significance for 
the RTO test here?

I think about this for weeks now and sometimes, I fear that I have to 
rely only simulations on this one. And I must reveal a secret here: I 
hate simulations. Not only, that simulations can "prove" everything and 
nothing - but sometimes I fear that the NS2 is for networks what Google 
is for reality.....

(Not to be misunderstood: A well done simulation may provide useful 
insight. However, it does not replace a thorough rationale for proposed 
mechanisms.)

Detlef Bosau

-- 
Detlef Bosau
Galileistrasse 30
70565 Stuttgart
Mail: detlef.bosau at web.de
Web: http://www.detlef-bosau.de
Mobile: +49 172 681 9937