[e2e] Skype and congestion collapse.

Fri Mar 4 05:18:36 PST 2005

Note that the voice channel for Skype is not at all elastic. As a general rule 
effectively all VoIP traffic is totally inelastic. You get either all the 
flow or none.    This not specific to Skype.

Back in the mid-90s , there were lots of folks looking at mechanisms to make 
such media flows adaptive in the face of congestion (e.g., Steve McCanne's 
work.. some stuff I did with Michael Speer)  Interesting stuff, but there 
were a couple of things which kept such approaches from taking off:

1) For the most part codecs are not continuously adjustable in terms of 
bandwidth.   They tend to jump in discrete steps that can be fairly large. 
E.g., when you jump between major classes of codecs, or when you use typical 
layering approaches.   There was some work in video codecs with lots of steps 
(e.g., we worked with some stuff Avideh Zakhor did), but none of them seemed 
to have gained any commercial traction.   

I think most commercially-oriented work just tried to get as much quality with 
as little bandwidth as possible.    I am certainly not an expert in this area 
(codec design), but I am guessing that the additional complexity of getting 
such fined-grained layering was in practical tension with just getting good 
quality at low cost (complexity and bandwidth).     

The small number of steps combined with the size of each step made such 
approaches not worth the trouble for congestion control.    They work best 
for dealing with heterogenous link characteristics (e.g., broadband vs skinny 
wireless).

Skype basically has picked one of the best low bandwidth codecs around that 
still gives good quality (see RFC 3951 which describes the codec they are 
reputedly using).  Skype claim (on their web site) to vary the bandwidth 
between 3-16kbps depending on the quality of your link.   The iLBC spec 
suggests two framings which would give effective rates of 15-20Kbps, so that 
is in some conflict with the Skype claims and is by no means adaptive.  

This codec does have some nice error concealment properties that mean random 
congestion-induced drops will not totally trash your voice stream, so while 
the other TCP flows are backing off to make room for your inelastic flow (or 
when they rudely try to probe for more bandwidth) you can continue to 
talk :-).

In any event, once you have picked the best audio codec you can, at the lowest 
rate, there is really no multiplicative decrease possible.   This is the way 
it is for effectively all VoIP traffic today. At least Skype is not using 
G.711.  I think you will see all VoIP endpoints converging to something like 
the codec Skype is using.   For example, most recent SIP ATAs support G729 
(some support iLBC), which is (IIRC) about the same bandwidth.

2) So the other solution suggested at the time was to do network admission 
control.  No need to go into that one here  (I am in the RSVP witness 
protection program :-)) other than to say that complexity lost out again.  
Cheaper just to provision more bandwidth.

In fact, Jon's later comment points out that this has become true for the 
backbone.    James later made a comment that the main problem is actually at 
the access link.  I agree.     

But the problem in this case is not with the VoIP traffic, it is with the 
OTHER traffic.  If I am using VoIP and just web browsing on a reasonable 
link, there is generally no problem.  The small number of non-VoIP TCP 
connections generally leave enough fair share for my frugal voice connection.     
However, my current day job is to design and implement P2P protocols.   
Generally such protocols can create large numbers of TCP connections per 
node.   A that point, even with ideal fairness, the proportion of the link 
left over is less than required for the voice stream.  (Obviously, from a 
practical standpoint, it is much worse.)  So although the individual TCP 
flows may be well behaved, the P2P application as a whole is not.   This is 
kind of like the original Netscape/HTTP1.0 problem, but on steroids.

Don