[e2e] end2end-interest Digest, Vol 83, Issue 4
Yan Cai
ycai at ecs.umass.edu
Thu Feb 24 06:54:50 PST 2011
Hi Zama,
Thanks for your detailed info. Yes. the tcp stack looks good. One quick
question, what kind of FTP client application is used in your test, the
previous one (where the connection issue is found) or a new third party one?
If it is the previous one, then your FTP client should also work well.
If you still encounter that issue, there must be something else wrong.
If your test is done with a new third party client, repeat the procedure
with the default client you were using.
Another thing you can check is the congestion window size during the
period from the time the cable is disconnected to the time the
ESTABLISHED connection is done. Notice that the congestion window size
is hard to measure directly, but you can do this by analyzing the trace
file for the period mentioned above. Use tcpdump to monitor a particular
NIC port and analyze the throughput for that particular FTP flow. The
amount of transmitted data over that PERIOD should be very small because
the congestion window during timeout is around 1 (in terms of packets).
If all the things above work normally, you should not have seen the
issues observed before.
Best wishes,
Yan
On 2/24/2011 6:06 AM, Zama Ques wrote:
> Hi Yan,
>
> I tried testing with iperf today.
>
> Started server on one side and connected to the client from another
> host and after sometime disconnected the cable on the server host.
>
> Also , reduced tcp_keepalive to 1200 sec , so timeout value should be
> like 32 minutes with the other two tcp_keepalive related kernel
> parameter (probes and interval) .
>
>
> The following are my findings .
>
> I can see that client terminates the ESTABLISHED connection after
> around 16 minutes since the server is not reachable , that is before
> the TCP keepalive timeout.
>
> Looks to me like this minutes is somehow related to TCP retransmission
> timeout which probably is determined by the following 3 parameters
> which comes to be around 18 minutes. .
>
> $ cat /proc/sys/net/ipv4/tcp_retries1
> 3
> $ cat /proc/sys/net/ipv4/tcp_retries2
> 15
> $ cat /proc/sys/net/ipv4/tcp_fin_timeout
> 60
>
>
> Is my assumption correct here ?
>
>
> The following is the netstats connection flow during my experiment
>
> $ for i in {1..1000} ; do netstat -atn | egrep "5001" ; date ; sleep
> 60 ; done
> tcp 0 447432 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> tcp 0 0 10.66.X.Y:43531
> 10.66.A.B:5001 TIME_WAIT
> Thu Feb 24 14:47:16 IST 2011
> tcp 0 3311576 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:48:16 IST 2011
> (Network Cable removed during this time from the server)
> tcp 0 3317368 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:49:16 IST 2011
> tcp 0 3021976 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:50:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:51:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:52:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:53:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:54:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:55:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:56:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:57:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:58:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 14:59:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 15:00:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 15:01:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 15:02:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 15:03:16 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> Thu Feb 24 15:04:17 IST 2011
> tcp 0 2511048 10.66.X.Y:43533
> 10.66.A.B:5001 ESTABLISHED
> ..(comes out be arnd 15 minutes from the server went unreachable when
> the connection status changed in client)
>
> Thu Feb 24 15:05:17 IST 2011
> Thu Feb 24 15:06:17 IST 2011
> Thu Feb 24 15:07:17 IST 2011
>
>
> The following packet flow can be seen on client as sniffed by tcpdump
> during while I removed the network cable .
>
> =====
> 14:50:20.158794 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1801285561:1801350721,
> ack 0, win 92, options [nop,nop,TS val 172872553 ecr 177048109],
> length 65160
> 14:50:20.164550 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1801350721:1801415881,
> ack 0, win 92, options [nop,nop,TS val 172872558 ecr 177048115],
> length 65160
>
> 14:50:20.394916 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172872992 ecr 177048116],
> length 1448
> 14:50:21.258921 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172873856 ecr 177048116],
> length 1448
> 14:50:22.986922 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172875584 ecr 177048116],
> length 1448
> 14:50:26.442922 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172879040 ecr 177048116],
> length 1448
> 14:50:33.354923 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172885952 ecr 177048116],
> length 1448
> 14:50:47.178932 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172899776 ecr 177048116],
> length 1448
> 14:51:14.826929 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172927424 ecr 177048116],
> length 1448
> 14:52:10.122922 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 172982720 ecr 177048116],
> length 1448
> 14:54:00.714934 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173093312 ecr 177048116],
> length 1448
> 14:56:00.714921 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173213312 ecr 177048116],
> length 1448
>
> 14:58:00.714920 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173333312 ecr 177048116],
> length 1448
> 15:00:00.714921 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173453312 ecr 177048116],
> length 1448
> 15:02:00.714921 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173573312 ecr 177048116],
> length 1448
> 15:04:00.714936 IP edgebeauty-dr.43533 >
> shopfrobsoon-dr.commplex-link: Flags [.], seq 1798969993:1798971441,
> ack 0, win 92, options [nop,nop,TS val 173693312 ecr 177048116],
> length 1448
>
>
> Does my TCP stack look fine based on the experiments above .
>
>
> Thanks
> Zaman
>
>
>
>
>
>
> --- On *Wed, 23/2/11, Yan Cai /<ycai at ecs.umass.edu>/* wrote:
>
>
> From: Yan Cai <ycai at ecs.umass.edu>
> Subject: Re: end2end-interest Digest, Vol 83, Issue 4
> To: "Zama Ques" <queszama at yahoo.in>
> Date: Wednesday, 23 February, 2011, 2:04 PM
>
> Hi Zaman,
>
> I guess there might be some unknown ftp configuration at CLIENT
> side that causes this issue. You can isolate the problem first.
> I-perf can be used to test functionality of tcp stack on your
> machine. If it works as expected, then there is nothing wrong with
> tcp stack. Next check the settings of the ftp client (not the ftp
> server) to see if there is any specific configuration that causes
> this problem. If it is hard to do that, my suggestion is to
> install a third party ftp client application and test with that.
>
> If none of them works, you might have to trace the traffic over
> the cable attached to the client machine and determine what is
> going on.
>
> Best wishes,
> Yan
>
> On 2/23/2011 1:52 AM, Zama Ques wrote:
>> Hi Yan,
>>
>> Thanks for your suggestion . I am familiar with iperf but the
>> issue with us that it is a prod network and it is advisable for
>> me not to pump data on the network . Will try to the experiment
>> between two desktops connected by a cross over cable.
>>
>> What I was trying earlier was that I started FTP server on one
>> end and connected to the server from the client side.
>>
>> $ ftp 10.66.X.X
>> Connected to 10.66.X.X
>> 220 (vsFTPd 2.2.2)
>> Name (10.66.74.141:zama): anonymous
>> 331 Please specify the password.
>> Password:
>> 230 Login successful.
>> Remote system type is UNIX.
>> Using binary mode to transfer files.
>>
>>
>> After that I disconnected the network cable from the server and
>> was monitoring the status of the connection on the client side .
>> The status of the connection was like this before and after
>> disconnecting the network cable.
>>
>> ---
>> $ for i in {1..1000} ; do netstat -at | egrep "ftp" ; date ;
>> sleep 60 ; done
>> tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
>> ESTABLISHED
>> Wed Feb 23 11:47:53 IST 2011
>>
>> tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
>> ESTABLISHED
>> Wed Feb 23 11:48:53 IST 2011
>> tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
>> ESTABLISHED
>> Wed Feb 23 11:49:53 IST 2011
>> ...
>> ...
>> Wed Feb 23 12:14:03 IST 2011
>> tcp 0 0 edgebeauty.c:50179 shopfrobsoon.c:ftp
>> ESTABLISHED
>> Wed Feb 23 12:15:03 IST 2011
>> ===
>>
>> If we see that the time is more than 25 minutes when the server
>> went down and the client has still maintained the connection in
>> established state.
>>
>> My understanding is that the client should close the connection
>> after TCP restarsmit timeout happens or my understanding is wrong.
>>
>> Please clarify .
>>
>> --Zaman
>>
>>
>> Message: 2
>> Date: Tue, 22 Feb 2011 09:55:13 -0500
>> From: Yan Cai <ycai at ecs.umass.edu>
>> Subject: Re: [e2e] query on behaviour of tcp_keepalive and tcp
>> retransmit on Linux based systems
>> To: end2end-interest at postel.org
>> Message-ID: <4D63CE50.8050606 at ecs.umass.edu>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Hi
>>
>> According to your description, the expected behavior should
>> be as follows.
>> At the beginning senders at one side can send data to the
>> receivers at
>> the other side, and the receivers can receive data without
>> any problem.
>> When some of the receivers become off-line, the affected
>> senders should
>> no long receive positive acknowledgments, therefore, lowering
>> their
>> congestion windows (i.e., sending rate). Since in your case
>> the receiver
>> is off forever, some senders should further experience
>> timeout events.
>> After a few timeouts, the sender should CLOSE this connection
>> itself.
>>
>> As far as I know, the whole procedure above should be
>> automatically
>> invoked in the sender side. This is how TCP (sender) handles
>> exceptions.
>>
>> My suggestion is that you run a simple experiment on your
>> side to see if
>> TCP in your machine can work that way. The test can be done
>> using i-perf
>> to send a long long live TCP flow, and then take off the
>> receiver in the
>> middle of the transmission. The connection is expected to be
>> closed very
>> soon after the receiver is off.
>>
>> Hope it helpful.
>> Yan
>> On 2/22/2011 4:24 AM, Zama Ques wrote:
>> > We need some clarifications on TCP_keepalive . We are
>> facing some
>> > issues on our Prod servers related to TCP functionality .
>> >
>> > The issue is like this.
>> >
>> > We have some machines at one end sending data in real time
>> to another
>> > group of machines on the other hand . Now due to some
>> hardware issues
>> > on the other hand , some of the machines becomes
>> unresponsive/crashes.
>> > The client system which pumps data never came to know that
>> the server
>> > went unresponsive . The connection remains in
>> > ESTABLISHED state and the client always tries to send data
>> thinking
>> > that the connection is alive because of which we are seeing
>> backlog on
>> > client sides.
>> >
>> > Our understanding is like this on how TCP will handle the
>> connection.
>> >
>> >
>> > Q 1) Since the server went down , the client will try to the
>> > retransmit the data until it times out. What is the
>> behavior of TCP
>> > after the timeout? Need clarification on
>> > the following things.
>> > a) Will the kernel will close the established connection
>> after the
>> > timeout . Looks like no in our case as we still see the
>> connection
>> > still in ESTABLISHED state after around more
>> > than 2 hours.
>> > b) Are there any kernel parameters which decides the when
>> the client
>> > is timeout after retransmission fails. What is the behavior
>> of TCP
>> > after the client retransmission timeouts.
>> >
>> >
>> > Q 2 ) There is something called tcp_keepalive which if
>> implemented in
>> > the kernel , by default it's there and comes to be around 2
>> hrs 2
>> > minsutes , i think , the client will send some TCP probes
>> after the
>> > keepalive time ineterval and if it cannot reach the server
>> , then the
>> > established connection in the client side will be closed by
>> the kernel
>> > . This is my understanding. But I can see that the
>> connection still
>> > remains in established after the tcp_keepalive time . We
>> waited for
>> > around 2 hrs 30 minutes but the connection remains in
>> established
>> > state only. Tried reducing the keepalive time to be around
>> 10 minutes
>> > , but the connection remains in ESTABLISHED state in client
>> side .
>> >
>> >
>> > Where I went wrong .Please clarify my doubts raised above .
>> What
>> > should we do to resolve the problem we are seeing above .
>> Any help
>> > will be highly appreciated as we are going through a hard
>> time to
>> > resolve the issue .
>> >
>> > Thanks in Advance
>> >
>> >
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL:
>> http://mailman.postel.org/pipermail/end2end-interest/attachments/20110222/50be8540/attachment-0001.html
>>
>> ------------------------------
>>
>> _______________________________________________
>> end2end-interest mailing list
>> end2end-interest at postel.org
>> http://mailman.postel.org/mailman/listinfo/end2end-interest
>>
>>
>> End of end2end-interest Digest, Vol 83, Issue 4
>> ***********************************************
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20110224/5e36eca8/attachment-0001.html
More information about the end2end-interest
mailing list