[e2e] Protocols breaking the end-to-end argument
rick jones
perfgeek at mac.com
Sat Oct 24 09:24:23 PDT 2009
On Oct 24, 2009, at 5:06 AM, William Allen Simpson wrote:
> rick jones wrote:
>> Perhaps he is referring to chips which provide TCP/Transport
>> Segmentation Offload - aka TSO - the functionality that allows the
>> stack to hand the chip a chunk of data > the MTU, along with the
>> initial TCP/IP headers and the connection's on the wire MSS, and
>> then have the chip otherwise statelessly segment that larger chunk
>> of data into MSS-sized segments for transmission on the wire/fibre/
>> etc.
> It is indeed. Since the hardware driver is unaware of many things,
> such as path MTU, this is one of its serious impediments.
WRT PathMTU, the implementations with which I am familiar have the
stack telling the NIC the on-the-wire size (what I tend to call the
effective MSS) to use on each "large send" where that effective MSS is
updated based on PathMTU information as/if it arrives.
> Sure, there are measurements that show several percentage points less
> CPU, but in most cases we're not CPU bound. I'm not sure what problem
> it's solving, other than a checkbox to differentiate commodity
> products.
When the functionality was introduced in the 1GbE NICs it was to allow
them to be driven at link-rate with the then-contemporary CPUs, not
only for easily dismissed (well, not IMO :) things like netperf
TCP_STREAM, but also for items customers actually did like file
transfers, or clustered database traffic, etc. (ie if you can't get
there with netperf, you ain't going to get there with FTP)
Now, this may be a place where my world starts to diverge from the
rest of the end2end community's - indeed many of my employer's
customers do things across the big-I Internet, but they do far more
across their corporate LANs and intranets. I can see where being CPU
bound talking across the big-I Internet is perhaps rare, but being CPU
bound when talking across the corporate 1 Gig LAN was not rare. And
essentially we have One Protocol to Rule Them All...
Yes, CPUs today are "faster" than at the dawn of 1 Gig Ethernet. We
are also at the dawn (perhaps a little past, depends I suppose on
one's deployment longitude) of 10 Gig Ethernet. Bless their hearts,
when a customer upgrades their network from one speed to the next,
they care little about Amdahl's Law etc and get quite agitated when
one cannot achieve link-rate on the next higher speed. Well, they
might give you a generation's worth of lee-way, but by the time the
second generation of the NIC arrives, their expectations are pretty
firm. If your solution cannot achieve link-rate, your solution is not
selected.
TSO and GRO, like Jumbo Frames, can be thought of as the inevitable
"inter-reaction" between customer expectations and a de jure network
MTU size that has remained unchanged since the dawn of Ethernet. Or,
put another way, we have begun treating the Ethernet MTU as damaged
and routed around it.
>> And if that upsets him, we better not tell him about the 10G NICs
>> also doing receive offload... :)
> I'd heard of it, but thought that was pretty uniformly rejected.
> Heck,
> the most basic TCP decision points would be impossible to implement,
> revise, or test.
"LRO" (multiple segment coalescing done in the chip and an uber frame
hitting the host with the intermediate headers stripped) has been
rejected in Linux-land in favor of GRO, which preserves the arriving
segment boundaries via some clever linking of buffers (and perhaps
some header-data split but I'm fuzzy there).
>> BTW, I do not believe that any router actually has TSO happen to
>> TCP segments contained within the IP datagrams passing through it -
>> although
>
> Only recently trying to decipher the Linux stack, but it all appears
> to
> go through the same queue, routed packets included. If the box
> receives
> a jumbogram on one interface, it can be re-segmented out another, and
> I've not found any support for PMTUD or ECN or anything.
>
>
>> there have been issues in Linux with LRO (Large Receive Offload,
>> distinct from General Receive Offload) when the system was acting
>> as either a router or a bridge - because TSO doesn't happen in that
>> path :)
> Again, I'm not as familiar with Linux-only terminology. A quick
> Google
> turns up "Generic Receive Offload", and that appears to be explicitly
> designed to merge segments in routers, and re-segment out the other
> side:
>
> http://lwn.net/Articles/311357/
>
> I'm pretty sure this is contrary to the end-to-end [argument,
> principle,
> what-have-you]....
You are supposed to be ignoring the code-path behind the curtain :)
rick jones
http://homepage.mac.com/perfgeek
More information about the end2end-interest
mailing list