[e2e] Skype and congestion collapse.
Donald Hoffman
don at dhoffman.net
Fri Mar 4 05:18:36 PST 2005
Note that the voice channel for Skype is not at all elastic. As a general rule
effectively all VoIP traffic is totally inelastic. You get either all the
flow or none. This not specific to Skype.
Back in the mid-90s , there were lots of folks looking at mechanisms to make
such media flows adaptive in the face of congestion (e.g., Steve McCanne's
work.. some stuff I did with Michael Speer) Interesting stuff, but there
were a couple of things which kept such approaches from taking off:
1) For the most part codecs are not continuously adjustable in terms of
bandwidth. They tend to jump in discrete steps that can be fairly large.
E.g., when you jump between major classes of codecs, or when you use typical
layering approaches. There was some work in video codecs with lots of steps
(e.g., we worked with some stuff Avideh Zakhor did), but none of them seemed
to have gained any commercial traction.
I think most commercially-oriented work just tried to get as much quality with
as little bandwidth as possible. I am certainly not an expert in this area
(codec design), but I am guessing that the additional complexity of getting
such fined-grained layering was in practical tension with just getting good
quality at low cost (complexity and bandwidth).
The small number of steps combined with the size of each step made such
approaches not worth the trouble for congestion control. They work best
for dealing with heterogenous link characteristics (e.g., broadband vs skinny
wireless).
Skype basically has picked one of the best low bandwidth codecs around that
still gives good quality (see RFC 3951 which describes the codec they are
reputedly using). Skype claim (on their web site) to vary the bandwidth
between 3-16kbps depending on the quality of your link. The iLBC spec
suggests two framings which would give effective rates of 15-20Kbps, so that
is in some conflict with the Skype claims and is by no means adaptive.
This codec does have some nice error concealment properties that mean random
congestion-induced drops will not totally trash your voice stream, so while
the other TCP flows are backing off to make room for your inelastic flow (or
when they rudely try to probe for more bandwidth) you can continue to
talk :-).
In any event, once you have picked the best audio codec you can, at the lowest
rate, there is really no multiplicative decrease possible. This is the way
it is for effectively all VoIP traffic today. At least Skype is not using
G.711. I think you will see all VoIP endpoints converging to something like
the codec Skype is using. For example, most recent SIP ATAs support G729
(some support iLBC), which is (IIRC) about the same bandwidth.
2) So the other solution suggested at the time was to do network admission
control. No need to go into that one here (I am in the RSVP witness
protection program :-)) other than to say that complexity lost out again.
Cheaper just to provision more bandwidth.
In fact, Jon's later comment points out that this has become true for the
backbone. James later made a comment that the main problem is actually at
the access link. I agree.
But the problem in this case is not with the VoIP traffic, it is with the
OTHER traffic. If I am using VoIP and just web browsing on a reasonable
link, there is generally no problem. The small number of non-VoIP TCP
connections generally leave enough fair share for my frugal voice connection.
However, my current day job is to design and implement P2P protocols.
Generally such protocols can create large numbers of TCP connections per
node. A that point, even with ideal fairness, the proportion of the link
left over is less than required for the voice stream. (Obviously, from a
practical standpoint, it is much worse.) So although the individual TCP
flows may be well behaved, the P2P application as a whole is not. This is
kind of like the original Netscape/HTTP1.0 problem, but on steroids.
Don
More information about the end2end-interest
mailing list