Port Tracking aka Fabric-Track in Cisco ACI

Port-Tracking is a simple on/off fabric-wide best practice configuration in the Cisco ACI Fabric.

Why would you want to enable this?

Reason 1 – In the event all uplinks from leaf to spine are down from some type of uplink failure, port-tracking will shut down all downlinks for a specified period of time. By default, all ports continue to operate even though the switch loses fabric connectivity. As a result, dual homed servers and routed connections continue to operate and blackhole traffic.

Reason 2 – During code upgrades, after a switch reboots, downlinks ports become operational before switch is re-joined to the fabric, and may incur packet loss. Port-Tracking ensures ports stay down usually until switch is properly recognized by the fabric and then properly enables the downlink ports.

A bug CSCvs10395 has been released for Port-tracking in January 2020 where if a failure of a REST call between the leaf switch and internal leaf switch REST server fails (rare) it will bring down all of the downlinks for the specified period of time. The switch will recover but an outage on all downlinks will be experienced.

Remediation step – disable port-tracking if in affected release or upgrade to 4.2(3j) or later.

Windows Baremetal NIC Teaming and Cisco ACI

  • After migrating Windows 2012 bare-metal servers configured with “Switch Independent NIC teaming” with “dynamic” distribution, Cisco ACI experienced rapid faults which would continue to be Raised and Cleared continuously.
    • Faults for various hosts
      • “ACI has detected multiple MACs using the same IP Address x.x.x.x”
    • Reason for Fault:
      • Windows NIC Teaming using Switch Independent Mode with dynamic load-balancing mode distributes network traffic load based on the TCP Ports address hash as modified by the Dynamic load balancing algorithm. The Dynamic load balancing algorithm redistributes flows to optimize team member bandwidth utilization so that individual flow transmissions can move from one active team member to another.
      • This results in ACI detecting multiple MACs using the same IP address on all Baremetal server uplinks.
      • In addition, other COOP faults are seen as ACI freezes Endpoints which cause a disruption and the bare-metal server becomes unavailable.
  • FIX:
    • 2-3 Options
      • Easy Fix
        • Modify the load balancing algorithm from dynamic to HyperVPort on the Windows bare-metal server.
          • Get-NetLbfoTeam | Set-NetLbfoTeam -LoadBalancingAlgorithm HyperVPort
          • Don’t worry, you don’t need HyperV deployed. With this algorithm, all traffic entering and exiting any given adapter will always use the same physical adapter.
      • Preferred Fix
        • Configure Switch Dependent NIC Teaming using either Static Etherchannel or LACP.
          • This is more time consuming on the ACI side since each bare-metal servers requires a unique port-channel policy group and this policy group needs to be separately bound to the respective EPG.
      • Temporary Band-Aid
        • Disable the secondary uplinks on each host so only 1 uplink is used per host until a solution is solidified.

APIC Data Layer Partially Diverged

After trying to join APICs from a secondary POD, I got the APIC Data Layer Partially Diverged Error:

APIC Data Layer Partially Diverged

In my case this was because I had the “Contract Viewer” app installed on the primary APICs. When the secondary site APICs tried to join the cluster, the 3rd party APP caused this error. I had to remove the APP and reinstall it to allow the secondary POD APICs to join properly.