NSX-T Edge Degraded fp-eth0 PNIC Down

Introduction

I run a physical lab for some workloads and a nested lab for others ( since my servers are old and cannot be upgraded to vSphere 7 ) which leads me to turn off the nested ESXi’s for sometimes long period of times. Just recently I turned on my nested NSX-T lab which was working all okay before and seems that I am not able to reach my T0 and subsequently my segments.

Problem

Now from a configuration perspective I did not change anything from the last known running state except shut everything down for an extended period of time. Both my T1 and T0 went down but without any errors what so ever and after searching for a while, I found that the Edge Node PNIC/Bond status was down as well which gave me at least a starting point to troubleshoot.

Now the Edge management network was still accessible but the Edge node was not able to reach the T0 assigned interface nor were the logical segments accessible anymore from the physical network. Nested labs with VLANs and Trunks are always a bit of a mess networking wise so I went to check on the networks on the VDS and found what seems to be the problem. The Edge Uplink port which is a logical segment port seems to be in a blocked state.

Solution

To unblock the port, note the VIF ID of the blocked port and the distributed switch name then login to the ESXi server that the Edge is currently running on via SSH and run the following command:

net-dvs -s com.vmware.common.port.block=false Nested-VLAN-98-99 -p 763d4ba9-ae34-49b2-b8eb-5709f24d408d

Obviously change the network name and the VIF to your corresponding environment and refresh both vCenter and NSX-T to see the updated status.

So I wanted to search for the root cause of this behavior and eventually found a VMWare KB that explained the same https://kb.vmware.com/s/article/66796 . Well you have to search for port blocked rather than the NSX-T error which is Edge fp-eth0 degraded and down. Looks like the default behavior is to mark the port as blocked after 24 hours from the transport node losing network access to the controller which can be edited via the procedure in the article.

May the Peace, Mercy, and Blessing of Allah Be Upon You