[e2e] Re: crippled Internet

Thu Apr 26 19:58:28 PDT 2001

VoIP sound quality?

This is a ridiculous measure without constraining the problem.
First, sound "quality" relates to codec quality more than latency/jitter.

Second, end-to-end task latency (which is related to human ability to 
conduct normal turn-taking by introducing "satellite-like dealy") in a 
real-time voice channel is dominated by the jitter buffer size, which 
introduces a delay proportional to jitter (std dev of delay), so total VoIP 
latency is (avg.latency)+c*(std.latency)+(source processing latency)+(dest 
processing latency), for some c that is set to control the dropped-frame 
rate.  Unless great care is taken, the latter terms dominate.

Third, many currently deployed VoIP systems use TCP rather than UDP or RTP 
because the latter protocols don't work over NAT boxes and firewalls.  Ask 
Real Networks about what percentage of streaming (non-telephony) content 
actually goes out in UDP or RTP form from their customers' servers.  It 
isn't much.  The result is that retransmission of lost frames by TCP 
amplifies jitter.

Here's what's needed for VoIP:

1) very low latency software codec.  You want a codec that encodes frames 
that are 10-20 msec long, and pumps them out immediately in 
packets.  Unfortunately, to get good compression to fit on 33Kb links over 
PPP, you get people trying to encode longer frames, and similarly, you get 
people trying to cut IP overhead by cramming multiple frames in 
packets.   This is not typically what is used in most "mass market products".

2) very low latency hardware codec.  You want a hardware codec that 
delivers frames to the software codec instantly -  device driver typically 
needs to use a  mapped buffer shared with the client plus a signalling 
system that has very little jitter (which is not the case in most OS's in 
popular use (windows and Linux for example are not great at user-level 
real-time device stuff).

3) very low latency "small packet" Internet stack.  If using TCP, don't 
want to "dally".  Prefer to use UDP or RTP.  Most OS's don't have stacks 
that pay attention to latency on small packets - they go for throughput, so 
there's a lot of path lengths that focus on optimizing throughput at the 
cost of introducing latency for small packets (buffer management, for example).

4) ability to pump data between hardware audio driver and internet 
interface preempting background threads.  Since this thread is not cpu 
bound, since audio processing is not costly on today's processors that have 
embedded DSP, the data pump goes blocked frequently.  Waking up the data 
pump is burdened with task wakeup latency, which is poor in a system like 
Windows or Mac, for example that does not have an effective priority mechanism.

5) ability to manage audio output just before it goes to hardware codec so 
that if a frame is missing due to packet drop that compatible noise is 
inserted into the gap.

Most PC audio cards, PC OSs, and software codecs do not meet these 
criteria.  So most of the "commercial" VoIP products for the "mass market" 
cannot do a good job, so the network becomes the bottleneck.

And the access network also introduces serious problems, at least in the 
case of a dialup line, which is where many people try to evaluate VoIP - 
thinking that "56Kb" is sufficient bandwidth, they don't realize the delays 
introduced by V.92 compression and PPP are serious problems.  I haven't 
measured the delay & jitter introduced by PPPoEt (used by almost all DSL 
broadband providers, and possibly some cable modem providers), but PPP may 
well be problematic there as well, if there is competing traffic.

Significant improvements can be achieved in sound quality by using 
techniques that compensate for lost packets due to errors and 
congestion.  These are not used either in "commercial" products.

So, I would sum this up by saying that before blaming the network for user 
perceptions, we have to control for very big factors due to the lack of 
attention to "sound quality" in the source and destination software.  There 
is much to be improved here, and the adoption of standards that were 
designed for dedicated isochronous point-to-point lines by the VoIP people 
has been a large part of the problem (H.32x).

- David
--------------------------------------------
WWW Page: http://www.reed.com/dpr.html