[e2e] Agility of RTO Estimates, stability, vulneratibilites

Sireen Habib Malik s.malik at tuhh.de
Tue Jul 26 06:58:16 PDT 2005


Hi,

A technical discussion on heavy-tailed distributions is perhaps not 
relevant to this list, however, one gets an impression that these 
distributions are not relevant/suitable from Internet's point of view.

> "Note that the load distribution cannot be characterized by a stable a 
> priori description, because load is itself responsive at all 
> timescales to behavior of humans (users, app designers, cable plant 
> investors, pricing specialists, arbitrage experts, criminal hackers, 
> terrorists, network installers, e-commerce sites, 
> etc.)".............................and......."you are fooling yourself 
> if you start with a simple a priori model, even if that model passes 
> so-called "peer review" (also called a folie a deux - mutually 
> reinforcing hallucinations about reality) and becomes the common 
> problem statement for a generation of graduate students doing network 
> theory.  In my era, the theorists all assumed that Poisson arrival 
> processes were sufficient.   These days, "heavy tails" are assumed to 
> be correct.   Beware - there's much truth and value, but also a deep 
> and profound lie, in such assertions and conventional wisdoms. "

 From a network's point of view, the users (of all kinds) generate data 
- let us call it their on-phase. After downloading/generating data, they 
go into thinking- or reading-phase. This is the off-phase. The users 
remains in the on- and off-phase for randomly distributed times.

Each user cycles through this On-Off behavior. This is the starting 
point of atleast one way of modeling Internet traffic.

Poisson arrival process assumption at the session level is still ok, but 
the data, or files, these arrivals cause to flow through the net, are 
heavy-tailed distributed. This assumption is correct because empirical 
studies have showed us that - time and again.

There is this proof that says that if either, or both, the on- and 
off-times of the on-off source are heavy-tailed distributed then the 
resultant traffic is LRD in nature. Simply put, LRD is tied to the 
large/infinite variance of the heavy-tailed distributions.

Now the situation gets more complicated when we consider that the 
packets generation process in the heavy-tailed on-phase is not 
Poissonian, rather controlled by TCP. The protocol introduces additional 
burstiness in the small-time scales. This is also known as Multifractality.

Therefore, the two most significant factors from Internet traffic's 
point of view are the heavy-tailed distributed file-sizes and congestion 
control mechanism of TCP.

Please note, even if the variance in real world is not infinite, and 
that the LRD is only visible for some orders of time-scale, the queue 
performance is still significantly different from the one based on the 
simple assumption of Poissonian renewal arrival process (of packets on 
the line).

Side note: 90% Internet traffic is based on TCP. The small-flow model 
holds for the web-traffic, the long-flows model is relevant to the P2P 
downloads. Traffic measurements show that P2P traffic now makes almost 
50% (or perhaps more) of the TCP traffic. See Sprint website for traffic 
traces and analysis.

> Those of you who understand the profound difference between Bayesian 
> and Classical statistical inference will understand ...

!!!


Sireen Malik


Hamburg University of Technology, Germany



More information about the end2end-interest mailing list