Sunday, September 10, 2017

High CPU on Nexus 3K - Solved

In this blogtorial, I will demonstrate how I used 'ethanalyzer' on a Cisco Nexus 3K to solve an intermittent issue -- random adjacency drops of various routing protocols. Before we get into the details. let me first share with you on how I got involved in this troubleshooting to begin with. My good friend and colleague "BGP" bill aka self proclaimed "Multicast Guru" 😊 turns around says "Hey Weezy, you are a CCIE right? I have an open case with Cisco for a month why don't you just solve this issue?". Hmmmm ... above all else, one thing I've learned from the journey of becoming a CCIE is that I know very little. There is so much more to learn than you can imagine. In any case, I thought it was a noble challenge and as a bonus it peaked my curiosity. So Bill and I started chatting about the issue and as we were going through the motions, he gave me a great piece of information that would eventually steer me down the path to solving the issue, "Pete reported that he saw HIGH CPU usage compared to other similar routers".

So my 2 questions were:
  1. What is going to the CPU?
  2. What is coming out of the CPU? 
In order to answer those two questions, I turned to 'ethanalyzer' (which in my opinion, far inferior to tcpdump on Arista's platform, so Cisco if you are reading this add tcpdump to Cisco 3ks pretty please).

First thing I noticed was no matter how many times I ran the command below, the packets were always "FROM CPU". On a normal router you would see packets "TOCPU" and "FROMCPU".


Further investigation showed that the packets being sent from the CPU are all regarding "PIM Register-Stop".


Strange I thought to myself -- "PIM Register-Stop" messages shouldn't be constant and chatty. After some thought, I decided to check the routing table for 172.17.100.14 and found that the IP didn't exist in the routing table so it was taking the default and going to the internet zone.

Now my investigation turned to why is there no route to 172.17.100.14. As it turns out this was a transit network between 2 location and they had BGP between them and no IGP such as OSPF. Therefore the transit wasn't in the routing table.

I could have solved this a couple of ways:
  1. Redistribute the transit into OSPF
  2. Make use of a command 'ip pim register-source <interface>" 
I went with the latter since that could become our standard in our router configs going forward. I set it to "ip pim register-source loopback110" and since loopback110 was our management IP it would always be in all the routing tables. Problem solved ... I would say it took about 15-20 mins to solve. 

Many more articles to come so ....

Please subscribe/comment/+1 if you like my posts as it keeps me motivated to write more and spread the knowledge.