[e2e] end2end-interest Digest, Vol 83, Issue 4
Zama Ques
queszama at yahoo.in
Thu Feb 24 03:06:18 PST 2011
Hi Yan,
I tried testing with iperf today.
Started server on one side and connected to the client from another host and after sometime disconnected the cable on the server host.
Also , reduced tcp_keepalive to 1200 sec , so timeout value should be like 32 minutes with the other two tcp_keepalive related kernel parameter (probes and interval) .
The following are my findings .
I can see that client terminates the ESTABLISHED connection after around 16 minutes since the server is not reachable , that is before the TCP keepalive timeout.
Looks to me like this minutes is somehow related to TCP retransmission timeout which probably is determined by the following 3 parameters which comes to be around 18 minutes. .
$ cat /proc/sys/net/ipv4/tcp_retries1
3
$ cat /proc/sys/net/ipv4/tcp_retries2
15
$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60
Is my assumption correct here ?
The following is the netstats connection flow during my experiment
$ for i in {1..1000} ; do netstat -atn | egrep "5001" ; date ; sleep 60 ; done
tcp 0 447432 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
tcp 0 0 10.66.X.Y:43531 10.66.A.B:5001 TIME_WAIT
Thu Feb 24 14:47:16 IST 2011
tcp 0 3311576 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:48:16 IST 2011
(Network Cable removed during this time from the server)
tcp 0 3317368 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:49:16 IST 2011
tcp 0 3021976 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:50:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:51:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:52:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:53:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:54:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:55:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:56:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:57:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:58:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 14:59:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 15:00:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 15:01:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 15:02:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 15:03:16 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
Thu Feb 24 15:04:17 IST 2011
tcp 0 2511048 10.66.X.Y:43533 10.66.A.B:5001 ESTABLISHED
.. (comes out be arnd 15 minutes from the server went unreachable when the connection status changed in client)
Thu Feb 24 15:05:17 IST 2011
Thu Feb 24 15:06:17 IST 2011
Thu Feb 24 15:07:17 IST 2011
The following packet flow can be seen on client as sniffed by tcpdump during while I removed the network cable .
=====
14:50:20.158794 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1801285561:1801350721, ack 0, win 92, options [nop,nop,TS val 172872553 ecr 177048109], length 65160
14:50:20.164550 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1801350721:1801415881, ack 0, win 92, options [nop,nop,TS val 172872558 ecr 177048115], length 65160
14:50:20.394916 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172872992 ecr 177048116], length 1448
14:50:21.258921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172873856 ecr 177048116], length 1448
14:50:22.986922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172875584 ecr 177048116], length 1448
14:50:26.442922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172879040 ecr 177048116], length 1448
14:50:33.354923 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172885952 ecr 177048116], length 1448
14:50:47.178932 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172899776 ecr 177048116], length 1448
14:51:14.826929 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172927424 ecr 177048116], length 1448
14:52:10.122922 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 172982720 ecr 177048116], length 1448
14:54:00.714934 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173093312 ecr 177048116], length 1448
14:56:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173213312 ecr 177048116], length 1448
14:58:00.714920 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173333312 ecr 177048116], length 1448
15:00:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173453312 ecr 177048116], length 1448
15:02:00.714921 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173573312 ecr 177048116], length 1448
15:04:00.714936 IP edgebeauty-dr.43533 > shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441, ack 0, win 92, options [nop,nop,TS val 173693312 ecr 177048116], length 1448
Does my TCP stack look fine based on the experiments above .
Thanks
Zaman
--- On Wed, 23/2/11, Yan Cai <ycai at ecs.umass.edu> wrote:
From: Yan Cai <ycai at ecs.umass.edu>
Subject: Re: end2end-interest Digest, Vol 83, Issue 4
To: "Zama Ques" <queszama at yahoo.in>
Date: Wednesday, 23 February, 2011, 2:04 PM
Hi Zaman,
I guess there might be some unknown ftp configuration at CLIENT side
that causes this issue. You can isolate the problem first. I-perf
can be used to test functionality of tcp stack on your machine. If
it works as expected, then there is nothing wrong with tcp stack.
Next check the settings of the ftp client (not the ftp server) to
see if there is any specific configuration that causes this problem.
If it is hard to do that, my suggestion is to install a third party
ftp client application and test with that.
If none of them works, you might have to trace the traffic over the
cable attached to the client machine and determine what is going on.
Best wishes,
Yan
On 2/23/2011 1:52 AM, Zama Ques wrote:
Hi Yan,
Thanks for your suggestion . I am familiar with iperf but
the issue with us that it is a prod network and it is
advisable for me not to pump data on the network . Will
try to the experiment between two desktops connected by a
cross over cable.
What I was trying earlier was that I started FTP server on
one end and connected to the server from the client side.
$ ftp 10.66.X.X
Connected to 10.66.X.X
220 (vsFTPd 2.2.2)
Name (10.66.74.141:zama): anonymous
331 Please specify the password.
Password:
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
After that I disconnected the network cable from the
server and was monitoring the status of the connection on
the client side .
The status of the connection was like this before and
after disconnecting the network cable.
---
$ for i in {1..1000} ; do netstat -at | egrep "ftp" ;
date ; sleep 60 ; done
tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
ESTABLISHED
Wed Feb 23 11:47:53 IST 2011
tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
ESTABLISHED
Wed Feb 23 11:48:53 IST 2011
tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
ESTABLISHED
Wed Feb 23 11:49:53 IST 2011
...
...
Wed Feb 23 12:14:03 IST 2011
tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
ESTABLISHED
Wed Feb 23 12:15:03 IST 2011
===
If we see that the time is more than 25 minutes when the
server went down and the client has still maintained the
connection in established state.
My understanding is that the client should close the
connection after TCP restarsmit timeout happens or my
understanding is wrong.
Please clarify .
--Zaman
Message: 2
Date: Tue, 22 Feb 2011 09:55:13 -0500
From: Yan Cai <ycai at ecs.umass.edu>
Subject: Re: [e2e] query on behaviour of tcp_keepalive
and tcp
retransmit on Linux based systems
To: end2end-interest at postel.org
Message-ID: <4D63CE50.8050606 at ecs.umass.edu>
Content-Type: text/plain; charset="iso-8859-1"
Hi
According to your description, the expected behavior
should be as follows.
At the beginning senders at one side can send data to
the receivers at
the other side, and the receivers can receive data
without any problem.
When some of the receivers become off-line, the
affected senders should
no long receive positive acknowledgments, therefore,
lowering their
congestion windows (i.e., sending rate). Since in your
case the receiver
is off forever, some senders should further experience
timeout events.
After a few timeouts, the sender should CLOSE this
connection itself.
As far as I know, the whole procedure above should be
automatically
invoked in the sender side. This is how TCP (sender)
handles exceptions.
My suggestion is that you run a simple experiment on
your side to see if
TCP in your machine can work that way. The test can be
done using i-perf
to send a long long live TCP flow, and then take off
the receiver in the
middle of the transmission. The connection is expected
to be closed very
soon after the receiver is off.
Hope it helpful.
Yan
On 2/22/2011 4:24 AM, Zama Ques wrote:
> We need some clarifications on TCP_keepalive .
We are facing some
> issues on our Prod servers related to TCP
functionality .
>
> The issue is like this.
>
> We have some machines at one end sending data in
real time to another
> group of machines on the other hand . Now due to
some hardware issues
> on the other hand , some of the machines becomes
unresponsive/crashes.
> The client system which pumps data never came to
know that the server
> went unresponsive . The connection remains in
> ESTABLISHED state and the client always tries to
send data thinking
> that the connection is alive because of which we
are seeing backlog on
> client sides.
>
> Our understanding is like this on how TCP will
handle the connection.
>
>
> Q 1) Since the server went down , the client
will try to the
> retransmit the data until it times out. What is
the behavior of TCP
> after the timeout? Need clarification on
> the following things.
> a) Will the kernel will close the established
connection after the
> timeout . Looks like no in our case as we still
see the connection
> still in ESTABLISHED state after around more
> than 2 hours.
> b) Are there any kernel parameters which decides
the when the client
> is timeout after retransmission fails. What is
the behavior of TCP
> after the client retransmission timeouts.
>
>
> Q 2 ) There is something called tcp_keepalive
which if implemented in
> the kernel , by default it's there and comes to
be around 2 hrs 2
> minsutes , i think , the client will send some
TCP probes after the
> keepalive time ineterval and if it cannot reach
the server , then the
> established connection in the client side will be
closed by the kernel
> . This is my understanding. But I can see that
the connection still
> remains in established after the tcp_keepalive
time . We waited for
> around 2 hrs 30 minutes but the connection
remains in established
> state only. Tried reducing the keepalive time to
be around 10 minutes
> , but the connection remains in ESTABLISHED state
in client side .
>
>
> Where I went wrong .Please clarify my doubts
raised above . What
> should we do to resolve the problem we are seeing
above . Any help
> will be highly appreciated as we are going
through a hard time to
> resolve the issue .
>
> Thanks in Advance
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110222/50be8540/attachment-0001.html
------------------------------
_______________________________________________
end2end-interest mailing list
end2end-interest at postel.org
http://mailman.postel.org/mailman/listinfo/end2end-interest
End of end2end-interest Digest, Vol 83, Issue 4
***********************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110224/1f446983/attachment-0001.html
More information about the end2end-interest
mailing list