Port-Tracking is a simple on/off fabric-wide best practice configuration in the Cisco ACI Fabric.
Why would you want to enable this?
Reason 1 – In the event all uplinks from leaf to spine are down from some type of uplink failure, port-tracking will shut down all downlinks for a specified period of time. By default, all ports continue to operate even though the switch loses fabric connectivity. As a result, dual homed servers and routed connections continue to operate and blackhole traffic.
Reason 2 – During code upgrades, after a switch reboots, downlinks ports become operational before switch is re-joined to the fabric, and may incur packet loss. Port-Tracking ensures ports stay down usually until switch is properly recognized by the fabric and then properly enables the downlink ports.
A bug CSCvs10395 has been released for Port-tracking in January 2020 where if a failure of a REST call between the leaf switch and internal leaf switch REST server fails (rare) it will bring down all of the downlinks for the specified period of time. The switch will recover but an outage on all downlinks will be experienced.
Remediation step – disable port-tracking if in affected release or upgrade to 4.2(3j) or later.
Setting up a BFD session between Palo Alto and Cisco ACI Leaf or General Nexus Switch
If Device A (ex. Palo Alto) does not support BFD Echo and only BFD Control Packets, Device B (ex. Cisco Switch) will not utilize BFD Echo and will only use BFD Control Packets for the BFD session. As a result, the highest transmit interval between both peers multiplied by the multiplier = the hold-down time. Without BFD Echo, the hold-down time will be how long the BFD peer will wait till BFD session goes down.
Another consideration is that depending on the Palo Alto model, high CPU control plane traffic will effect BFD and may tear your adjacency/peering down.
I have tested 16 eBGP peers on Palo Alto 3220 connected to ACI leaf-A and 16 other eBGP peers on same Palo Alto connected to ACI leaf-B. If the BFD timers were anything below 900 x 3, after an ACI leaf-A or leaf-B reload the Palo Alto would randomly bring down eBGP neighbors from ACI leaf-B, even though no issue occurred between PAN and ACI leaf-B. BFD would tear down because of a control plane spike as PAN must be processing BFD in software. The only acceptable timers were 900 x 3. Anything lower, the Palo Alto would tear down BFD which would bring down the eBGP Peering.
If you are getting this error when you run show discoveryissues on the switch or if you notice FPGA version mismatch detected. Running version: 0x10 Expected version:0x11 in APIC.
Here is the fix.
# dir bootflash
Copy the .bin code version into clipboard
After reload, run the below commands.
On March 6th, I will be hosting an Advanced Cisco ACI event where I will cover the benefits of Cisco ACI, go over real-world Cisco ACI Fabric and Tenant naming conventions, review brownfield migration strategies and demo how easy it is to inject firewalls to inspect traffic between migrated networks. In addition, we will also have our Cisco Learning Partner that will provide a Cisco ACI training.