Sunday, August 25, 2013

TCP - TCP small window size causing latency

If you need a primer on window size and scaling you can check out my previous blogtorial that I posted a while ago. Today a client called and complained about latency. The basic premise was that they sent a New Order Single (FIX) and they didn't see the execution report for about 11 seconds. Application logs however showed that it was executed within sub microseconds and so why this 11 second delay? Obviously network equipment is not going to buffer the packets for multiple seconds. In order to troubleshoot this I turn to man's best friend (not dogs) -- but rather sniffers / packet captures ... perhaps I should have said nerd's best friend. :-) Once I started looking at the packet captures it all came together. I won't post the packet capture however a screen shot cant hurt.

Packet #6 comes in from the client with Data
Packet #7 is sent from the server with Data + Acknowledgement of Packet #6
Packet #8 is sent from the server with more Data
Packet #9 comes in from the client with new Data
Packet #10 is sent from the server with Data + Acknowledgement of Packet #9

Note the W=2011 on the client side. That is the Receive side window size advertised by the client. Remember window size is the amount of bytes that can be sent without an acknowledgement from the other side.

Now the interesting packet is Packet #11. Client is re-transmitting Packet #6 as if though the client never received Packet #7 from the server or Packet #8, and #10. Clearly there is packet loss. Also notice that we have not received any Acknowledgement of packets #7,#8,#10 sent from the server. So by this time the amount of unacknowledged packets (Length of Packet #7,#8,#10) is 1091 bytes. Since there are 1091 bytes unacknowledged, the most amount of bytes/data we can send from the server is calculated by subtracting 1091 (already sent by the server but remain unacknowledged by the receiver) from the Receive side window size advertised by the client (2011) so 1091 bytes - 2011 bytes = 920 bytes.

As you can see below this continues for some time. Re-transmits from the client and Re-ACKs from the server.

The server Re-transmits Packet #7, #8, #10 which are the unacknowledged packets and this time the client actually acknowledges the packets which opens up the receive window and the rest of the data packets that are queued in the server's TCP buffers starts to flow again.

When the client sent the order, the server's TCP stack buffered the execution report because the execution report packet was 1400 bytes and since there were 1091 bytes unacknowledged it couldn't send the 1400 byte reply. If the server would have sent the 1400 byte it would have been more than the window size that was being advertised by the client. Once the client successfully acknowledged the packets it opened up the window and everything else started to flow again.

Many more articles to come so stay tuned.

Please reshare/subscribe/comment/+1 if you like my posts as it keeps me motivated to write more and spread the knowledge.


  1. Hi Arwin,

    First off, Congrats for your blog. You've been doing a great job. Second, I'm very curious how did you set up your network for low latency monitoring. What I mean is, what sniffer product you've been using, TAP's and so on...

    Thank you

    1. Thank you :) The best way is to use TAPS wherever you can, although that can get very expensive so SPANs could be a substitute. I would deploy TAPS at the edge so you can deal with customer related issues. You can also install them in various points of the network if you want to troubleshoot internal issues. I wish I can go into detail but I cannot, however in the past I've used Network Instrument products (Gigamon/Gigastor/Gigavue) and Anue (Ixia now). Also if you got the capital definitely invest in Network Instrument's Observer. Observer was best explained to me by a colleage of mine (MV) as Wireshark on Steroids. This product is extremely useful to troubleshoot TCP related issues and can dramatically reduce the resolution time. Hope that helps.