BGP Fundamentals

Submitted by rayc on Mon, 01/24/2022 - 14:07

BGP (Border Gateway Protocol) is the protocol of the Internet. BGP is designed to handle hundreds of thousands of routes unlike our IGP counterparts. Currently there are roughly 635 000 IPv4 routes on the internet. Could you imagine an IGP like OSPF running on an Internet router containing 635k routes? It would die if a link flapped. This is why BGP was created. Actually, there was an Internet routing protocol called EGP (Exterior Gateway Protocol) which was BGP's predecessor however it was replaced by BGP a long time ago. 

BGP is defined in RFC 4271. BGP uses Autonomous System Numbers to identify Peers outside of the routing domain. An ASN is a numeric value that is assigned to an Autonomous System. These values are registered and assigned by the IANA. If you are running a BGP with a Public ASN, it must be registered. The original RFC specifies that BGP have an ASN header of 16 bits allowing for up to 65535 ASN numbers. This was later revised by RFC 4893 to expand the ASN header to 32 bits allowing for up to 4294967296 ASN numbers. Within these ASN values, there is a range, like IPv4 RFC1918 addresses, that is considered for private use. These ASN numbers are:

  • 64512 - 65535
  • 4200000000 - 4294967296

 

BGP is referred to as a Path Vector routing protocol, as it doesn't make routing decisions based on links but instead on routing Paths. To do this, BGP utilisies BGP Path Attributes. Path attributes will be discussed in more detail later but for now, know that there are 4 types of BGP path attributes:

  1. Well Known Mandatory: These attributes must be recognised by all BGP implementations and Must be sent with all BGP advertisements.
  2. Well known Discretionary: These attributes must be recognised by all BGP implementations but do not have to be sent with BGP advertisements.
  3. Optional Transitive: These attributes are optional and may or may not be sent with BGP advertisements.
  4. Optional Non-Transitive: These attributes are also optional but do not get sent with BGP advertisements. 

 

When BGP advertises a route, it advertises all of the Network Layer Reachability Information (NLRI) with that route. NLRI consists of the network prefix, the prefix length, and all specific BGP Path attributes that go along with that route. These path attributes are then used by BGP to determine which path is the best path to that route. A Router can receive multiple BGP routes to a single prefix but will only ever choose one route that is the best. 

Unlike OSPF and EIGRP, BGP uses standard TCP over port 179 to establish a peering session with it's neighbouring routers. Because BGP sessions use TCP, it relies on TCP's built in features to handle fragmentation, sequencing and reliability. Unlike with an IGP, where neighbours are discovered using multicast hello messages, BGP peers must be manually configured. BGP neighbours must first establish a TCP session before beginning the BGP peering process. If a TCP session cannot be established, then a BGP peering cannot form. In order for BGP peers to form a peering, they must either be directly connected, or have a route to the Peer in the RIB other than the default route. If a BGP peer is directly connected, the BGP router will use ARP to discover the neighbouring router and establish a BGP session. You can view the TCP session details on the router using the show command show tcp briefNote that the router that initiates the BGP session will have the source port as 179.

Output of show tcp brief command

There are two different types of BGP peering sessions. eBGP (External BGP), and iBGP (Internal BGP). An iBGP peering is a peering between two BGP neighbours, that are in the same ASN. There are some differences between how iBGP and eBGP operate which I will discuss later. For now know that an eBGP session has a default TTL of 1 while iBGP has a default TTL of 255. This means that any eBGP peering that is not directly connected, is referred to as a multihop eBGP session and requires the TTL value to be reconfigured. When running iBGP, you would normally use an underlying IGP like OSPF or EIGRP to ensure that there is a route to the iBGP peer in the RIB and due to the TTL being 255, there is no need for the neighbours to be directly connected. To change the TTL value of an eBGP peering, use the bgp router configuration subcommand neighbor <ip> ebgp-multihop <ttl>

Another difference between iBGP and eBGP is that as an eBGP advertises a prefix, the eBGP peer will update the Next Hop address for that NLRI, as well as prepend the ASN to the Path List attribute prior to advertising. 

BGP Peering 

BGP routers use four message types in order to establish and maintain their peering relationships. 

  • OPEN
  • UPDATE
  • NOTIFICATION
  • KEEPALIVE

 

These messages are used in various stages of the BGP peering session to ensure that the neighbours are alive and routes are updated.

The BGP OPEN message is used to setup and establish the initial BGP peering. The BGP UPDATE message is used to advertise and withdraw prefix NLRI information. The BGP NOTIFICATION message is used to advise of an error in the BGP neighbour adjacency and to tear down the BGP peering. The BGP KEEPALIVE is used to ensure that the BGP peer is alive and reachable. Note that UPDATE messages can also be used as a keepalive for the BGP peering. 

The BGP OPEN message header contains the required information for a BGP router to determine if the sending Peer is valid and if the peers can form an adjacency. The OPEN header contains the BGP version, the AS Number of the sending router, the Hold Time, the BGP Router ID (RID) and any optional parameters used to establish session capabilities.

BGP OPEN message header

The BGP UPDATE message as mentioned above is used to advertise feasible routes to prefixes. This message contains all of the Path Attributes (PA) to reach a prefix.

BGP UPDATE message header

The NOTIFICATION message is sent when there is an error in the neighbour state. These messages contain the reason for the NOTIFICATION message.

BGP NOTIFICATION message header

The KEEPALIVE message is a basic message that simply let's the BGP peer know that it is still there. The Keepalive messages are by default sent at intervals of 1/3rd of the Hold Timer. If the Hold Timer is set to less than 3 seconds, the Keepalive messages are disabled. By default the hold time is set to 180 seconds on Cisco routers. 

BGP KEEPALIVE message header

BGP FSM (Finite State Machine)

When two BGP routers become peers, they go through a series of steps to determine the adjacency details and capabilities called the BGP Finite State Machine. There are 6 stages to the BGP FSM.

  1. Idle
  2. Connect
  3. Active
  4. Open Sent
  5. Open Connected
  6. Established

 

In the Idle state, the BGP router is listening for BGP peers and attempts to establish a TCP session with it's configured BGP neighbours. If the TCP session establishes, the BGP state transitions to Connect. I the TCP session fails to establish, the peer is reset and the ConnectRetry timer is set to 60 seconds. After the 60 seconds, the TCP session is attempted again. If the session fails again, the timer is disabled and the neighbour state remains in Idle. 

In the Connect State, the BGP peers initiate the 3 way Handshake. If this completes successfully, the ConnectRetry timer is reset and an OPEN message is sent. Once the OPEN message is sent, the BGP neighbour state changes to Open Sent and the neighbour with the highest IP address manages the BGP connection. If the ConnectRetry timer hits 0 in this state, a new TCP session is attempted and the ConnectRerty timer is reset to 60 seconds. The BGP neighbour state does not change from connect at this stage. If the TCP session fails, the neighbour state changes to Active. 

In the Active state, the BGP peer will again start a new 3 way handshake. If this is successful, an Open message is sent and the hold time is set to 4 minutes. From here, the state transitions to Open Sent. If the 3 way handshake fails, the state changes to Connect.

In the Open Sent state, the BGP peer is waiting for a reply to it's Open message from the peer. If there are no errors in the Open messages, a Keepalive message is sent, and the hold time is negotiated. The hold time will be negotiated as the lowest hold time configured by either peer. From here, the state moves from Open Sent to Open Confirm. If there is an error in the Open message negotiation, a Notification message is sent and the BGP state transitions to Idle.

In the Open Connect state, the BGP peers wait for either a Keepalive or a Notification message from the peer. If a Keepalive is received, the BGP state transitions to Established. If a Notification is received, the BGP state fails and transitions to Idle. 

In the Established state, each BGP peer will begin to exchange Update messages and the adjacency is considered Up and stable. In this state Keepalive messages are also sent based on the configured timers. 

When a router receives an Open message, it must perform a check to ensure the validity of the peer. The BGP router will check the Open messages BGP version, confirm that the Source IP of the Peer is the same that is configured, the AS Number of the peer is the same as configured, the BGP Router ID is unique and that all security parameters such as the password and TTL match. If any of these do not match, the peers will not establish a BGP adjacency. 

BGP Path Selection

BGP makes use of 3 tables for storing NLRI information and selecting the best path for each route. These tables are:

  • Adj-RIB-In - This table contains all of the original NLRI. Once all of the inbound route policies have been processed, this table is purged to save memory.
  • Loc-RIB - This table contains all of the NLRI of routes that are either originated locally or received from other BGP peers. This table is used for presenting routes to the IP RIB.
  • Adj-RIB-Out - This table contains all of the NLRI once outbound route policies have been applied. This is what's used to advertise NLRI to other BGP peers.

 

As prefixes are received an enter the Loc-RIB table, certain attributes are added to the NLRI depending on if the route is a Connected router, or is either a Static or Dynamically learned route. If the route is a Connected route, the Next Hop attribute is set to 0.0.0.0, the Origin attribute is set to i (IGP) and the weight is set to 32768. If the route is learned either Statically, or Dynamically, the NLRI Next Hop attribute is set to the Next Hop IP of the specified route, the Origin is set to i, the Weight is set to 32768 and the MED attribute is set to the metric of the IGP. 

output of BGP path attributes for IGP learned routes

 

In order for a route to be advertised from the Loc-RIB table, as in transition to the Adj-RIB-Out table, the router runs through the following process:

  • The Route must pass the validity check. This means that the NLRI must have a valid Next Hop and that the Next Hop is reachable. 
  • The Router then applies all outbound neighbour policies. As long as the route is not denied, the route is maintained in the Adj-RIB-Out table for later. 
  • From here, the NLRI is advertised to the BGP peer. If the Path attribute for the Next Hop is set to 0.0.0.0, the router changes the Next Hop to the IP address of the BGP session. 

 

The full BGP route selection process is as follows:

  1. The router stores the route in Adj-RIB-In table in it's original state. From this table, all inbound route policies are applied.
  2. The Loc-RIB table is then updated with the latest entries from the Adj-RIB-In table. Once updated, the Adj-RIB-In table is purged. 
  3. The router then checks the validity of the NLRI. If the NLRI passes the validity check fails, the routes remain in the Loc-RIB. 
  4. The router will then identify the best path to the prefix. Once the best path has been identified, the router proceeds to step 5. 
  5. The router will now install the best path in IP the RIB. From here, all outbound route policies are applied. Once outbound routes have been applied, the router will store all non-discarded routes in Adj-RIB-Out table and advertise them to the configured BGP peers. 

 

You can view the tables that I mentioned above using show commands. To view the Loc-RIB table use the show command show bgp <afi> <safi>. To view the Adj-RIB-Out table, use the show command show bgp <afi> <safi> neighbor <ip> advertised-routes. You can also view all of the Path Attributes for a route by looking at the specific prefix using the show command show bgp <afi> <safi> <ip> <length> detail

Output of the Loc-RIB and Adj-RIB-Out show commands.

Output of the show bgp ipv4 unicast detail command.

BGP Route Summarisation

BGP is capable of summarising routes just like any other routing protocol. Instead of using a network command to enable BGP on specific interfaces, BGP requires the use of the network <ip> mask <subnet> command to advertise prefixes. This command requires that there be a route to the specific network command in the routers IP RIB. BGP has two methods in order to configure route summarisation. You can either configure a static summary route to the Null0 interface on the router and then use the Network Command to advertise the route, or you can configure then dynamically using the bgp router configuration subcommand aggregate-address <prefix> <mask> [summary-only] [as-set]. Configuring this command automatically creates a Null0 route in the RIB to help prevent any routing loops from occuring. The aggregate-address command also requires that there be a prefix in the RIB within the specific summary range. If not then the route will not be advertised. By default, using the aggregate-address command does not prevent the router from advertising the longer prefixes. This is what the keyword summary-only does. Specifying this at the end of the aggregate command prevents all longer prefixes from being advertised as well as the summary. 

When a route is advertised as an aggrate, the atomic_aggregate attribute is set. This Attribute tells the neighbour that the route is an aggregate route and not a specific route for a prefix. When a route is advertised as an Atomic Aggragate, certain BGP attributes are not sent. These attributes are the AS path, the MED, and the community values. For this reason, when using the aggregate command, it can cause issues with route selection as the AS Path is both one of the Best Path selection attributes as well as a method of loop prevention. In order to keep the AS Paths in an aggregated route, you can use the as-set keyword. This keeps all AS Paths in the Path Attribute however the AS Path hop count will be 1 instead of the number of AS's that the route passes through. 

MP-BGP

Whith the introduction of IPv6 and MPLS etc BGP needed to make some changes to allow for the various address families and protocols used in modern networks. This is where MP-BGP (Multi Protocol BGP) or MBGP it is also sometimes referred to, comes in. MP-BGP allows for NLRI information to be sent. For not just IPv4, but also for IPv6, MPLS and L3VPNs. 

MP-BGP does this by introducing new address family identifies as well as new BGP optional and non transitive attributes. The new attributes are

  • MP reachable NLRI 
  • MP unreachable NLRI 

 

For this discussion we will only look at IPv4 and IPv6 bgp afi information. Inside the BGP header, the new afi and safi (subsequent address family identifier) fields are set to specific values for IPv4 and IPv6. For an IPv4 unicast NLRI, the afi and safi values are set to 1. For IPv6 unicast NLRI, the afi is set to 2, and the safi is set to 1. Multicast SAFI has a value of 2. 

In order to use BGP to send routing NLRI for IPv6 traffic, you need to specify the IPv6 address family in the BGP configuration using the bgp router configuration subcommand address-family ipv6 unicast. This configures MP-BGP to be ready to send and received IPv6 NLRI just as the IPv4 address family allows for sending of IPv4 traffic.

If your network is an IPv6 only network, or the router you're configuring has only IPv6 addressing, then you must statically configure a BGP Router ID. By default, the router will use the normal RID selection process as outlined above however if there is no configured interfaces in an up state with an IPv4 address, BGP will not form neighbour relationships. 

I would like to mention that current IOS versions support the advertisement of IPV6 prefixes over an IPv4 BGP connection. This is possible due to MP-BGP. In order to do this, you need to have an IPv4 BGP peer that is capable of accepting the IPv6 prefix NLRI. If both BGP routers are capable of accepting IPv6 NLRI, you need to configure the IPv6 address family, and activate the IPv4 peer under the IPv6 address family. You will also need to configure a route-map in order to change the next-hop value to an IPv6 address. This must be a global unicast IPv6 address and not a Link Local IPv6 address. This route-map is applied outbound under the IPv6 address family to the BGP peer. I will cover this in the configuration article, but for now, know that you do not need to have an IPv6 BGP peer in order to advertise IPv6 NLRI. 

IPv6 addressing can be summarised in BGP the same way as an IPv4 address. In fact, it's the exact same command. To summarise an IPv6 address, use the router bgp ipv6 unicast afi subcommand aggregate-address <ipv6/length> [summary-only] [as-set]. This aggregate command works just like when summarising an IPv4 prefix. If you do not specify the summary-only keyword, the aggregate prefix will be advertised as well as the longer prefixes. Also the as-set keyword performs the same function in IPv6 as it does with IPv4. 

Viewing the IPv6 BGP table is similar to IPv4 but instead of using the ipv4 keywork, you use ipv6. To view the Loc-RIB table for IPv6, use the show command show bgp ipv6 unicast. Also, like IPv4 NLRI Path attributes, you can view the Path attributes of IPv6 NLRI using the show command show bgp ipv6 unicast <prefix/length>

Output of show bgp ipv6 unicast and prefix/length commands