Let's first define TLP (Tail Loss Probe) algorithm and how it operates. Read up on the RFC if you prefer but here is the summary.
FlightSize > 1: schedule PTO in max(2*SRTT, 10ms). -- Retransmit if an ACK is not received within 10ms.
FlightSize == 1: schedule PTO in max(2*SRTT, 1.5*SRTT+WCDelAckT). -- Retransmit if an ACK is not received within 200ms.
Basically, if a TCP ACK does not arrive from the receiver in 10ms after the 2nd packet has been transmitted, then retransmit the Segment and follow congestion control procedures (depends on the TCP implementation Illinois, RENO, Tahoe, Cubic etc).
So wait, one might be thinking what if the receiver had TCP Delayed ACK turned on which states that a TCP ACK can be delayed up to 500ms. Wouldn't TLP unnecessarily retransmit the segment?
That is a good point but let's take a closer look on why TLP wouldn't be affected by TCP Delayed ACK. TCP Delayed ACK does state that a TCP ACK can be delayed up to 500ms however, if a second packet is received then the receiver must ACK immediately. Basically TCP Delayed ACK cannot have more than 1 outstanding packet that has not been acknowledged yet. This is why when you take a look at a TCP connection in Wireshark, generally you'll notice that there is an ACK for every 2 packets. So if the receiving side had TCP Delayed ACK enabled it wouldn't be a problem for the sender because the TLP sets the RTO (retransmit timeout) to 10ms only after the 2nd segment is sent.
Now back on the main topic, let's take a look at SolarFlare's EF_DYNAMIC_ACK_THRESH and how it operates. This is taken directly from the SolarFlare guide and it states ...
Name: dynack_thresh default: 16 min: 0 max: 65535 per‐stack
If set to >0 this will turn on dynamic adapation of the ACK rate to increase efficiency by
avoiding ACKs when they would reduce throughput. The value is used as the threshold
for number of pending ACKs before an ACK is forced. If set to zero then the standard
delayed‐ack algorithm is used.
Essentially when the 2nd packet is sent, the sender is expecting an ACK back within 10ms, however when EF_DYNAMIC_ACK_THRESH is enabled with its default value of 16 (on the receiver side) means it can hold back 16 ACKs so we were holding the TCP ACK back.
Negative interaction between these 2 great features were causing 1000s of unnecessary retransmissions and even worse these retransmissions are classified as RTO (retransmission timeout) which meant that the cwnd (congestion window or send window) was cut down to 3 x MSS (maximum segment size) which resulted in queuing of the packets on the sender side (again depending on the TCP implementation on the sender side).
Think about TCP NAGLE (NODELAY) and TCP DELAYED ACK both of which by themselves are great concepts and are done with good intentions to compliment TCP, however when used together they produce the worst results for TCP throughput. As such, EF_DYNAMIC_ACK_THRESH which reduces receive side latency and TLP which reduces the time to detect packet loss are great by themselves but when used together produces bad results.
We solved the problem by setting the EF_DYNAMIC_ACK_THRES to 0 (which actually sets it to 1) and enforced regular TCP Delayed ACK mechanism.
Many more articles to come so ....
Please subscribe/comment/+1 if you like my posts as it keeps me motivated to write more and spread the knowledge.