I've talked a little about STP link failures in previous articles, but let's take a deeper dive into what STP does during the various STP failure scenario's. There are two types of link failures that can occur in a Layer 2 STP topology.
- Direct Link Failure: This is when a physical link fails and the port state changes to down.
- Indirect Link Failure: This is when there is not a physical failure, but there is no data flow over the link.
Once again, we will use the same 3 Switch topology as before. All switches are running Rapid PVST+ with SW1 as the Root bridge with a priority configured of 0. The same 4 VLANs are configured on all switches, VLAN 1, 10, 20 and 30.
As I mentioned earlier a direct failure is when a physical link failure occurs within the network. This could be a cable being disconnected or cut or fiber lead broken etc. STP will handle direct failures in different ways depending on where the failure is. In the above diagram, we'll call the segment between SW1 and SW2 Segment A, the segment between SW1 and SW3 segment B, the segment between SW2 and SW3 interfaces G1/0/3 and G1/0/2 respectively segment C, and the last segment between SW2 and SW3 segment D. In the first scenario, let's take a look at what would happen if segment C or D or both C and D failed.
With this type of failure there's no actual impact to normal data flow as both of these ports are in a Blocked state on SW3. Even though there is no impact, both SW2 and SW3 will still send TCN BPDUs to the root bridge. The Root Bridge receives the TCN BPDU and sends an ACK to both SW2 and SW3 and proceeds to send a configuration BPDU with the TCN flag set (If there was an upstream switch between SW2 and the Root Switch then that switch upon receiving the TCN BPDU sends an Ack and forwards the BPDU to the Root). This forces the switches to flush their Layer 2 MAC address table by changing the MAC aging timer to the Forward Delay timer until a second configuration BPDU is seen from the Root Bridge. Setting the MAC aging timer to the Forward Delay timer flushes old entries and ensures that only active devices are still communicating on the network. Once the second configuration BPDU has been received the switch MAC aging timers are restored to normal (300 seconds by default). If you look at the output of the show spanning-tree [vlan <10>] detail command from both SW2 and SW3, you can see that the last TCN was the same time the link was disconnected on both devices.
Output from SW2
output from SW3
In the next scenario the segment between SW1 and SW2, segment A, fails. In this scenario, a failure between SW1 and SW2 will cause data flow issues as segment A is SW2's connection to the Root Bridge.
When this type of failure occurs where a switch looses it's Root Port and all other ports are DP's, the switch assumes itself to be the new Root and sends out a BPDU advertising itself as the Root Bridge. At the same time SW1 will send out a configuration BPDU with the TCN flag set (There's no TCN BPDU as it's already the root) to all other devices on the network. SW3 will receive both the inferior BPDU from SW2 and the superior BPDU from SW1. The configuration BPDU with the TCN flag set from SW1 will cause SW3 to set the MAC aging timer to the forward delay timer and flush old MAC entries from the table until a second configuration BPDU is seen from the Root Bridge. The inferior BPDU sent from SW2 will be discarded and the ports that were blocking will transition to the forwarding state and forward SW1 superior BPDU. SW2 will receive the superior BPDU and the select it's new Root port based on the RP selection criteria. Looking at the output of the show spanning-tree [vlan <id>] detail command we can see in the pre-failure output that the last TCN was 1hr 20+ Minutes ago and there have been 38 TCN's on SW1, 28 on SW2 and 14 on SW3.
Once I disconnect the port we can see that SW2 has converged so that G1/0/3 is now the new Root Port and G1/0/4 is an Alt port and is Blocking.
If we take another look at the output of the show spanning-tree [vlan <id>] detail command from all three switches, we can see that the last TCN was less than 2 minutes ago for SW1, 49 seconds for SW2 and 24 seconds for SW3 and that the number of TCN's has increased. This failover was quite quick with RSTP with minimal packet loss. This failover if using 802.1D STP (PVSTP+) will take 2 x Forwarding Timer as the ports will need to transition from BLK to Listening and Learning states before reaching the final Forwarding state on SW2 and SW3.
In the last scenario for direct link failure, segment B fails between SW1 and SW3. This scenario again has a direct impact on data flow as G1/0/1 is SW3's Root Port. In this scenario however, SW3 has 2 Alt ports (because we're using RSTP) and failover will be much faster than with 802.1D STP. SW3 already knows that if it looses its RP, there are 2 Alt paths to root. These ports however are still in a Blocking state and will need to transition to Learning and Forwarding states, and TCN BPDUs will be sent.
When this type of failure occurs, SW1 will send configuration BPDUs but not SW3 as it's only other ports are both in a BLK state. SW1 will send a configuration BPDU with the TCN Flag set causing SW2 (and all other downstream switches) to flush their MAC tables by setting the MAC aging timer to the Forward Delay timer (remember that all DP's will forward BPDUs and BLK ports in order to stay blocked must be receiving BPDUs). Once SW1 has sent its second configuration BPDU, SW2 and SW3 will reset their MAC aging timers to normal and SW3 will transition it's ports from BLK to LIS and then decide which port to make the RP and which to make the ALT port and block. Let's take a look at the output of the show spanning-tree [vlan <id>] detail command pre and post link failure.
Output Pre failure
Output Post failure
Again you can see that the number of Topology changes has increased for all switches in the network from 2, 4 and 6 for SW1, SW2 and SW3, to 3, 5 and 7. Now if you take a look at the output from the show spanning-tree vlan 10 command on SW3, you can see that G1/0/2 interface is now the Root Port and the G1/0/3 interface is still in the BLK state as an ALT port.
An indirect failure is when the switches physical interfaces stays in an up/up state but there is a loss of data flow. This type of failure can occur for various reasons. This type of failure has a much slower failover time as it relies entirely on timers to remediate the issue. For this scenario say there is an issue with Segment A between SW1 and SW2 and SW2 stops receiving BPDUs on it's G1/0/1 interface. When this happens, SW2 will keep a copy of the superior BPDU in its cache for the Max Age time (20 seconds by default) before discarding it and sending it's own superior BPDU out all interfaces. At this point SW3 receives the inferior BPDU from SW2 and transitions it's BLK ports to the Learning state and then forwarding state and forwards the superior BPDU from SW1 to SW2. SW2 then selects it's new RP and ALT ports.
Simulating an indirect failure is quite difficult unless you are on one of the specific models of Cisco switch that allows you to filter control traffic. To try simulate this scenario, I enabled BPDU Filter on the G1/0/2 interface on SW1 using the interface configuration subcommand spanning-tree bpdufilter enable. If you take a look at the output of the show span vlan 10 commands below from SW2, you can see that after enabling BPDUFilter on SW1, the RP is still G1/0/1 on SW2 for a period of max age. Once the Max Age timer expires, the ports start to transition.
Note that the G1/0/1 interface on SW2 is still up and active and is a Designated port. Because this link in my scenario is actually active and sending and receiving frames it leaves a loop open and because I forgot to disable BPDUFilter, there was a broadcast storm and the switch came to a halt.