Update: check out the KB: https://kb.vmware.com/s/article/70655

I am not going to claim I found this because I didn’t but I think it’s important that the VMware / NSX community is aware of it because of the potential impact it can have on network performance in your virtual environment.

If you are a user of NSX and have specific DFW implementation the new “Firewall Rule Hit Count” feature introduced in NSX 6.4.2 might be impacting you.

After upgrading the vib from an older version of NSX on the first host as it came out of maintenance mode when VM’s shifted over to it a significant increase in network latency started to occur within the cluster… this immediately started to impact all kinds of network traffic in that cluster due to the shared nature of the setup.

We observed pCPU lockup errors in the logs…

To check your host SSH into the box and run dmesg | grep -i pcpu or check vRLI if you have it.

Viewing the hit count stats in the GUI is done per rule as below.

Unfortunately this feature appears to be enabled by default in NSX manager so as soon as you deploy the updated vibs to the hosts and VM’s move onto them that’s when the problem appears to start.

DISCLAIMER: PERFORM THE CHANGE AT YOUR OWN RISK!!!

To disable this feature globally and see if it improves latency / fixes your issues you have to do it via the API, it does not appear to be possible via the old or new UI.

Disable this feature by performing a PUT and BODY as in the screenshot below… then “Reset Rule Hit count” in the GUI….

Change applied

Reset the counters.

Remember this may not impact you at all depending on the DFW implementation + it’s a cool feature… so make sure you perform before and after measurements to determine if disabling it did help or not… I am in no way suggesting to disable it if you are not facing issues!! I am not an NSX expert by any stretch of the imagination but a user of the basics.

Hope this was helpful

vMan