Sunday, February 17, 2013

Understanding TCP Window Size / Window Scaling

The easiest way to understand TCP Window size is to observe two people having a conversation. Every so often, the talker will wait for the listener to acknowledge that they have heard everything up to that point. Once the listener acknowledges then the talker begins to talk again. The amount of words spoken before waiting for an acknowledgement from the listener is much like the TCP window size. The official definition of the window size is "the amount of octets that can be transmitted without receiving an acknowledgement from the other side".

Let's assume that the receiver window size is 16,384 bytes which means that the sender can send up to 16,384 bytes before stopping to wait for an acknowledgement. Let's also assume that the maximum segment size is 1024. This means that the sender can send 1024 bytes 16 times before it will stop sending and wait for an acknowledgement. For the sake of simplicity, I will not discuss TCP slow start (Rapid Increase/Multiplicative Decrease), congestion avoidance, congestion window and congestion control etc. As a side note keep in mind that the sender rate is controlled by the min(congestion window, receive window).


So what is the optimum window size that two hosts should agree on? Well there are different school of thoughts, but let's look at one example where the window size is 2 x BDP. What the heck is BDP you ask? BDP is short for bandwidth-delay product. Here is how to calculate BDP. Bandwidth times one-way latency.

For example take two hosts connected with 20 Mbps of bandwidth and using PING/ICMP we conclude that the one-way latency is 20 ms. First let's convert 20 Mbps into bytes which turns out to be 2,621,440 bytes. Take 2,621,440 x .02 equals to approximately 52,428. So 2 x 52,428 =  104,856 bytes, therefore the optimum window size for 2 hosts with 20 Mbps of bandwidth with a 20 ms one-way latency is 104,856 bytes. That is the sender can send 104,856 bytes of data before the sender must wait for an acknowledgement from the receiver. The reason we are doing 2 x the BDP is because the sender does not have to wait the time it takes for the ACK from the receiver (that he got the first 52428 bytes of data) to come back to the sender. Instead of the sender sitting IDLE for the ACK to come back, the sender can use the ACK travel time (20 ms) to actually send another 52,428 bytes of data.

One problem, the TCP RFC states that the window size is a 16 bit field which means that the largest window size that can be advertised is 65,536 or 2 ^ 16 or 1111 1111 1111 1111 (16 bits). And as we saw earlier, it is optimum to use 104,856 for the window size. So how do we accomplish this? This is where window scaling comes into play. It is a TCP option that is sent with the initial TCP 3 way handshake and both sides MUST agree to use this option or else window scaling will not be used. Window scaling basically bitwise shifts the window size. So let's take a look at a wireshark packet capture to further explain this. Note that window scale option/shift count will be only be sent/negotiated in the initial 3 way handshake (SYN, SYN-ACK,ACK).


Flags 0x002 (SYN) - Initial SYN Packet sent by the sender.
Window size value: 8192 or 0010 0000 0000 0000 (16 bits).
Window scale Shift count: 8 - bit-wise shift window size by 8 to the left.
Window multiplier: 256 or 2 ^ shift count which is 2 ^ 8 in this case.


Flags 0x012 (SYN,ACK) - SYN, ACK packet sent by the receiver acknowledging the initial SYN packet.
Window size value: 5840 or 0001 0110 1101 0000 (16 bits).
Window scale Shift count: 7 - bit-wise shift window size by 7 to the left.
Window multiplier: 128 or 2 ^ shift count which is 2 ^ 7 in this case.


Now we are inspecting a packet farther down the conversation. This is a packet sent by 192.168.1.78 who notified the other end to use a shift count of 8.

Window size value: 64 or 0000 0000 0100 0000 (16 bits)
Window size scaling factor: 256 or 2 ^ 8 (as advertised by the 1st packet)
Although the window size states 64, the actual window size is 16,384 (64 * 256) meaning that the other side can send 16,384 bytes of data before stopping to wait for an acknowledgement.

As the conversation between the two hosts continue, the window size can be narrowed or widen using the widow size by specifying the window size value in the 16 bit field however the window size scaling factor must/will remains the same. For example if the sender wants to make the window size 104,856 the window size would be set as 410 in the 16 bit option field.

410 or 0000 0001 1001 1010 (16 bits) and since it will be shifted 8 to the left, the actual window size will be 104,960 or 0000 0000 0000 0001 1001 1010 0000 0000 (32 bits).

The maximum number of the shift count is 14 per RFC 1323 which means that the maximum window size can be 1 gigabyte. That is ONE BIG window size!!


Many more articles to come so stay tuned.

Please subscribe/comment/+1 if you like my posts as it keeps me motivated to write more and spread the knowledge.

13 comments:

  1. Excellent. Hope to see follow up to Slow Start blog tacking the other states,

    fast retransmit, congestion avoidance ...

    ReplyDelete
  2. Good explanation, thanks

    ReplyDelete
  3. Awesome! That really helped. :)

    ReplyDelete
  4. Great...really helps to improved troubleshooting level...

    ReplyDelete
  5. thanks very much i have searching for this. it help me a lot

    ReplyDelete
  6. I think 20Mbps = 20,971,520 B not 2,621,440

    ReplyDelete
    Replies
    1. If a Kilo is 1024 then 20Mbps would convert to 2,621,440 Bytes and 20,971,520 Bits.

      Thanks.

      Delete
  7. A very good article. Thank you...

    ReplyDelete
  8. Hi

    Nice explanation
    Just a point. RFC 1323 is obsolete and replaced by RFC 7323 as a standard.

    Regards

    ReplyDelete
    Replies
    1. Thank you very much. I'll have to give it a read.

      Delete
  9. Awesome explanation, very helpful

    ReplyDelete