Sunday, August 25, 2013

TCP - TCP small window size causing latency

If you need a primer on window size and scaling you can check out my previous blogtorial that I posted a while ago. Today a client called and complained about latency. The basic premise was that they sent a New Order Single (FIX) and they didn't see the execution report for about 11 seconds. Application logs however showed that it was executed within sub microseconds and so why this 11 second delay? Obviously network equipment is not going to buffer the packets for multiple seconds. In order to troubleshoot this I turn to man's best friend (not dogs) -- but rather sniffers / packet captures ... perhaps I should have said nerd's best friend. :-) Once I started looking at the packet captures it all came together. I won't post the packet capture however a screen shot cant hurt.

Packet #6 comes in from the client with Data
Packet #7 is sent from the server with Data + Acknowledgement of Packet #6
Packet #8 is sent from the server with more Data
Packet #9 comes in from the client with new Data
Packet #10 is sent from the server with Data + Acknowledgement of Packet #9

Note the W=2011 on the client side. That is the Receive side window size advertised by the client. Remember window size is the amount of bytes that can be sent without an acknowledgement from the other side.

Now the interesting packet is Packet #11. Client is re-transmitting Packet #6 as if though the client never received Packet #7 from the server or Packet #8, and #10. Clearly there is packet loss. Also notice that we have not received any Acknowledgement of packets #7,#8,#10 sent from the server. So by this time the amount of unacknowledged packets (Length of Packet #7,#8,#10) is 1091 bytes. Since there are 1091 bytes unacknowledged, the most amount of bytes/data we can send from the server is calculated by subtracting 1091 (already sent by the server but remain unacknowledged by the receiver) from the Receive side window size advertised by the client (2011) so 1091 bytes - 2011 bytes = 920 bytes.

As you can see below this continues for some time. Re-transmits from the client and Re-ACKs from the server.

The server Re-transmits Packet #7, #8, #10 which are the unacknowledged packets and this time the client actually acknowledges the packets which opens up the receive window and the rest of the data packets that are queued in the server's TCP buffers starts to flow again.

When the client sent the order, the server's TCP stack buffered the execution report because the execution report packet was 1400 bytes and since there were 1091 bytes unacknowledged it couldn't send the 1400 byte reply. If the server would have sent the 1400 byte it would have been more than the window size that was being advertised by the client. Once the client successfully acknowledged the packets it opened up the window and everything else started to flow again.

Many more articles to come so stay tuned.

Please reshare/subscribe/comment/+1 if you like my posts as it keeps me motivated to write more and spread the knowledge.