[e2e] Can we revive T/TCP ? => persistent connections

Mon Dec 26 13:10:41 PST 2005

On Dec 26, 2005, at 3:00 PM, Michael Welzl wrote:
>>
>> It doesn't.  Most links are clicked within the same site, and most
>> servers and browsers support persistent connections.  The connection
>> is only torn down after an idle period or some maximum number of
>> requests.
>
> In practice, this doesn't seem to be the case. In all the tests my
> students did (not a thorough measurement study, just some
> experiments), the server closed the connection after sending a page.
>
> I think this is due to the (quasi-)stateless operation that a HTTP  
> server
> can achieve this way - I mean, it's much more difficult to keep
> connections open for a longer period, and close them only after a
> timer expired, count the number of connections that should be cached,
> etc. etc. ... if poorly implemented, this might also not scale so  
> well.

Could you elaborate on how you did those tests?  A quick, highly  
scientific
check showed:

www.yahoo.com:  no connection caching
Google:  caching
microsoft.com:  caching
www.cmu.edu:  caching

The connection timeouts on some of those are fairly short;  some are  
long enough for subsequent clicks, many are only long enough to fetch  
embedded objects (a few seconds).
>
>
>> (I'm sure there are scenarios where it will, of course.)  In the Grid
>> context, if you're talking about a not-huge set of trusted nodes,
>> they can cache those TCP connections for quite a long time.
>
> But they don't - neither in the smaller nor in the larger Grids that I
> know of; I think it's because the notion of a "connection" is lost
> in the (vertical) communication across layers.

I'd suggest that that's not the fault of T/TCP, but the fault of the  
upper layers in the architecture...

>
> Grid Services are usually implemented on top of SOAP, which is
> stateless. How should SOAP tell HTTP to maintain a connection
> when it can't know whether a Grid Service will be called again? The
> decision to do so is up to the programmer, who however can't provide
> the remote SOAP instance with the necessary information because
> the notion of a "session" isn't part of SOAP.
>
> Could connections be cached in a transparent manner in such a
> scenario (e.g. by tweaking something at the HTTP level, but not
> above)? I think so, but I'm not 100% sure. Also, if it's possible,
> why isn't it done? In a Grid, this would surely make sense.

Yes.  Some SOAP and XMLRPC libraries do this.  See, e.g.,

http://www.gnuenterprise.org/tools/common/docs/api/public/

"gnue.common.rpc.drivers.xmlrpc.ClientAdapter.ClientAdapter:  
Implements an XML-RPC client adapter using persistent HTTP  
connections as transport."

>
>
>> An interesting example of this is the 'rex' system by Kaminsky and
>> Mazieres.  It's a remote execution tool much like ssh, but more
>> flexible.  It supports connection caching under the hood, so you
>> don't have to pay the setup time if you're using remote command
>> execution.  It's worth noting that the major delay they're avoiding
>> in the local area is the public key crypto processing time, but in
>> the wide-area, both can add significantly to the total delay.
>
> Thanks a lot for the pointer!
> By "under the hood", you don't mean it's transparent to upper
> layers, do you? How could it... I mean, if a web server decides
> to close a connection, there's nothing any system underneath it
> could do about it, I guess.

"under the hood" -- underneath what the upper layers see.  The web  
server can close the connection, and the client  
{library,binary,whatever} can open it up again without having to let  
the user's program running on top of it know what's going on.

>
> I heard the term "connection caching" before, and followed it, which
> led to a few papers on the subject and problems with this type of
> caching, but no standards. It doesn't seem to be an easy issue, but
> it looks like it's solvable. If I'm right and common web servers don't
> implement this (one could of course carry out a larger measurement
> study for this... perhaps it has already been done), wouldn't an
> Informational RFC which provides an overview of connection caching
> methods and suggests an implementation do the trick?

I believe you're mistaken.  Most web servers support it.  It's part  
of the HTTP 1.1 spec, and has been around literally for years.

>
> I'd be thankful for some pointers to the key papers about connection
> caching - e.g., where was it introduced?

Proposed:  1995 sigcomm, Mogul, "The Case for Persistent-Connection  
HTTP".  Dig around in some of his other papers, you'll get a good  
feel for what's going on.

HTTP 1.1 spec.  Persistent is the default.

HTTP 1.0 hack, the:

connection: keep-alive

header.

   -d