Advanced BGP

Submitted by rayc on Tue, 02/01/2022 - 14:44

In my previous article, I went through the fundamentals of BGP and the basic configuration of configuring BGP peers and advertising networks. In this article I will delve a little into just how powerful BGP route manipulation can be. First, let's take a look at the 4 typical redundancy scenarios that an organisation might use to establish their connection to their Internet Service Provider. 

  • Redundant Links to a single Service Provider and Single Service Provider peer
  • Redundant Links to a single Service Provider but using two separate Peers
  • A single Link to two different Service Providers
  • Redundant Internal devices with links two two Service Providers.

 

$ typical BGP redundancy scenarios

Having network redundancy is extremely important for an organisation to stay connected and running. If you have no redundancy, then you have no network. There's an old saying that I heard a long time ago in networking that goes "two is one, one is none". Each of the above scenarios, provides some level on redundancy. With Scenario 1, we have redundancy in that if one of the links fails, the other will take over. We have the same level of redundancy in Scenario 2, however we have the added benefit that if one of the Service Provider peers fails, we have a second to fail to. In scenario 3, we are introducing Service Provider redundancy in the event of an outage. Scenario 4 provides the highest level of redundancy and protects against internal Router failure and also Service Provider failure.

When building a redundant network and using BGP to peer with a Service provider, there are some factors that need to be taken into consideration to ensure that you do not become a Transit AS. A Transit AS is an AS that is used as a transit to reach another network. While this might be what you want in some cases, like a Datacentre, it would not be desired for an Internet Peering. If your Internet peering causes your AS to become a Transit AS, it can result in unpredictable routing and cause your Internet Circuits to become saturated with unwanted traffic. This can result in issues not just for you, but for others trying to reach those networks. 

In order to prevent this from happening, BGP has 4 methods of filtering/manipulating routes both inbound and outbound from the AS. Each of the below methods have their own purpose for use.

  • Distribute Lists
  • Prefix Lists
  • AS path filtering
  • Route-maps

 

Distribute Lists

Distribute Lists use IP access-lists to define which routes to permit or deny to or from a BGP peer. Distribute Lists are applied directly to a neighbour either inbound or outbound depending on what you wish to achieve. When configuring ACL's (Access Control Lists), there are two types on Cisco routers, Standard and Extended ACLs. Standard ACLs are either numbered from 1-99, 1300-199 or can be named. Extended ACLs, are numbered 100-199, 2000-2699, or can also be named. Standard ACL's only match on the source ip address, while extended ACLs are able to match on not just the Source IP, but the destination IP, protocol and port number as well. To configure a standard named ACL, use the global configuration command ip access-list [standard|extended] <name> followed by the ACE (Access Control Entry) entry command [permit|deny] <source> <wildcard>. To configure a Standard Numbered ACL, use either the global configuration command access-list <number> [permit|deny] <source> <wildcard> or ip access-list <number> followed by the ACE command [permit|deny] <source> <wildcard>. Below is an example of both ways to configure an ACL to permit 1.1.1.0/24. Note that there is an implicit deny at the end of all ACLs. 

Configuring Standard ACLs

To configure an extended Named ACL, use the command ip access-list extended <name> followed by the ACE [permit|deny] <protocol> <source> <src-wildcard> <dest> <dest> <wildcard> <port>. To configure an extended numbered ACL, use either the command access-list <number> [permit|deny]  <protocol> <source> <wildcard> <dest> <wildcard> <port> or the command ip access-list extended <number> followed by the ACE command [permit|deny] <protocol> <source> <wildcard> <dest> <wildcard> <port>. Note that if you specify the protocol as IP, you cannot specify a port as IP includes all ports and protocols. The below entry permits the source 1.1.1.0/24 network to reach the 2.2.2.0/24 network on TCP port 80 (WWW). 

Configuring Extended ACL's

The above examples show how to configure ACLs to match a source and destination for a firewall rule. When using ACLs to match prefixes for route filtering, the Source and Destination fields work differently depending on if you're using it for an IGP (OSPF, EIGRP, RIP), or for BGP. When you configure an extended ACL for use for an IGP like OSPF, the source field represents the network field, and the destination field represents the smallest prefix length allowed. For example:

  • The rule: permit ip host 172.16.0.0 host 255.240.0.0 will permit all prefixes from the 172.16.0.0/12 network.
  • The rule: permit host 182.16.1.1 will only permit the route for 192.168.1.1/32.

 

When you configure an extended ACL for BGP, the Source field represents the network portion, and the Destination filed matches the subnet mask. For example:

  • permit ip 10.0.0.0 0.0.0.0 255.0.0.0 0.0.0.0 will match all routes in the 10.0.0.0/8 range.
  • permit ip 10.0.0.0 0.0.255.0 255.255.255.0 0.0.0.0 will match all of the 10.0.x.0/24 routes. 
  • permit ip 172.16.0.0 0.0.255.255 255.255.255.0 0.0.0.255 will match all routes in the 172.16.x.x range with a prefix length of /24 to /32.
  • permit ip 172.16.0.0 0.0.255.255 255.255.255.128 0.0.0.127 will match all routes in the 172.16.x.x range with a prefix length from /25 to /32.

 

Let's use an example from the below topology. Let's say that from R2, we only want to received the 5.5.5.0/24 and 10.1.45.0/24 routes from R4. 

Topology

 

To achieve this, we use an extended ACL to permit only the specific subnets, 5.5.5.0/24, and 10.1.45.0/24. We then need to apply this ACL using a Distribute List to the R4 BGP peer on R2 in an inbound direction. To do this, use the IPv4 address family BGP configuration command neighbor <ip> distribute-list <ACL> [in|out].

Configuring a Distribution List

Here we can see that before applying the Distribute List, R2 is advertising all networks within the topology to R1. Once we apply the ACL, only the local networks on R2 and the 5.5.5.0/24 and 10.1.45.0/24 networks are being advertised. Getting your head around how ACL's are matched for routing protocol Distribute Lists can take some getting used to so here are a couple more examples.

ip access-list extended DL_ACL

permit ip 192.168.0.0 0.0.255.255 host 255.255.255.255

permit ip 100.64.0.0 0.0.255.0 host 255.255.255.128

This extended ACL will match all routes from 192.168.0.0 to 192.168.255.255 with any prefix length, and also the 100.64.x.0 route, with a /25 prefix length. 

 

Prefix Lists

Prefix lists also allow the network engineer to match on specific prefixes like in ACLs, but they provide a more flexible and configurable way to do so. A Prefix list match contains 2 parts, a High Order Bit Pattern, and a High Order Bit Count. For example, with the network 192.16.0.0/16, the 192.168.0.0 is the High Order Bit Pattern, and the 16 is the High Order Bit Count. So when a router is trying to match on the prefix list, it will take a look at the High Order Pattern, and the High Order Bit count of each route. This might seem to be just a different way of matching like an ACL, but the real power of a prefix list, comes with its ability to match a range of prefixes using an ge or le statement. To configure a prefix-list, use the global configuration command ip prefix-list <name> <seq> [permit|deny] <high-order-pattern/high-order-bit-coutn> [ge <value>|le <value>]  Let's take a closer look at how to configure a prefix list and some ways to match routes. 

Let's say we have the the routes 10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/25 and 10.0.3.128/25. Now let's say we want to deny the 10.0.3.0/25 and 10.0.3.128/25 routes from being received. Using a prefix list, we can do that with the following two commands.

ip prefix-list PREFIX seq 10 deny 10.0.3.0/24 ge 24 le 32

ip prefix-list PREFIX seq 20 permit 0.0.0.0/0 le 32

I'll break down the above statements a little further. The first list, will match and deny any network in the 10.0.3.0/24 range, with a prefix length of /24 through to /32. This will match our 10.0.3.0/25 and 10.0.3.128/25 routes. The second statement permits all routes with a length of /0 through to /32. The second statement is a way of matching everything. As with an ACL, prefix lists have an implicit deny at the end. 

Let's take another example, say we have the prefixes 10.0.0.0/8, 10.0.0.0/24, and 10.0.0.0/30. We then configure a prefix list using the command ip prefix-list PREFIX seq 10 permit 10.0.0.0/8 ge 22 le 26. So which of those three prefixes will be permitted? Well, looking at the High Order pattern and bit count, all routes would be permitted? wrong. Because we have the ge 22 le 26 statement at the end of the prefix list, we can tell that any routes with a prefix length shorter than 22 or longer than 26, will not match. So out of the three routes, only 10.0.0.0/24 is a match. 

What if you want to match a specific prefix, that is higher than the High Order Bit count? This requires that the ge and le values match. For example, to match ONLY, 10.0.0.0/24 in the above prefix list, we would use the command ip prefix-list PREFIX seq 10 permit 10.0.0.0/8 ge 24 le 24. Configuration of prefix-lists for IPv6 addressing is exactly the same except the command is ipv6 prefix-list <name> seq <number> [permit|deny] <high-oder-pattern/high-order-count> [ge <value> le <value>]

 

Using the same topology as before, Let's configure a prefix list on R5 to deny the Loopback IP's of R1, R2, and R3, and permit everything else. This could be done in a couple of ways but let's use the prefix list matching to it's full extent. We can do this with two commands as shown below. The first statement matches any prefix in the 1.0.0.0 - 3.0.0.0/6 range, with a specific prefix length, of /24. The second statement permits everything else. As with a Distribute List, applying a prefix list is done under the BGP address family configuration using the command neighbor <ip> prefix-list <name> [in|out]. One thing I would like to mention is that a BGP neighbour cannot filter using both a Distribute List and a Prefix List. 

 

Configuring a prefix-list

AS Path ACL

The next method of filtering routes in BGP, is by using AS Patch Access-Lists. These access-lists are just like an IP access-list, but they are used to match AS paths. They do this by using regular expressions. A regular expression uses certain symbols to match a string value. Below are some of the regular expression values that can be used with Cisco routers and AS Path Access Lists. 

Regex Value Description
_ matches a space
^ Matches the start of a string
$ Matches the end of a string
- Matches a range of characters in brackets
[] Matches a single Character or nesting of a range of characters
[^] Excludes the characters in brackets
() Allows a nesting of search patterns
| This is an OR function
. Matches a single Character including a space
* Matches zero or more characters
+ Matches one or more instances of a character
? Matches one or no instances of a character

 

This list can seem a bit daunting and regex can take some getting used to but with some practice, you will get the hang of it. For the most part you probably won't actually use a lot of the regex values and will only remember the most common ones. Below I have lost a few examples of common regex ACL values and what they do. 

Regex Example
^$ This matches all locally originated routes
permit ^200_ Permits only routes received from AS 200
permit _200$ Permits only routes originating from AS 200
permit _200_ Permits only routes that have passed through AS 200
permit ^[0-9]+ [0-9]+ [0-9] Permits routes with 3 or less AS paths in the AS path list.

As you can see sometimes regex can be straight forwards but sometimes it can be quite complex. You can a tually practice your regex skills online with the recall Internet BGP table by using what's referred to as a looking glass. Some providers provide this ability ao that if you are advertising routes, you can make sure that the path selection is correct. You can find a list of BGP Looking glass sites at http://bgp4.as/looking-glasses. To show you an example, I've selected pipe networks BGP looking glass and will use a query to find all AS's that originate from my providers AS, 4764.

Example output from PIPE Networks looking glass

You can also use regex to parse the BGP table. To do so, use the show command show bgp <afi> <safi> regexp <regex>. Let's take a look at R1, and find all routes that originate from R4. To do so, we use the command show bgp ipv4 unicast regexp _4$.

output of show bgp ipv4 uni regex _4$ command

What if we wanted to find all routes that have passed through AS4? for this we use the regex _4_.

Example output of routes that pass through AS4

Route maps

Route maps are probably the most effective and versatile way to manipulate route information in BGP or any routing protocol for that matter. Route maps allow you to match on specific criteria and alter those attributes on routes both coming in or being advertised. A route-map has 4 components to it. 

  • Sequence Number
  • Conditional Match Criteria
  • Processing Action (Either a Permit or Deny)
  • An optional Action.

 

When route-maps are processed, they are processed in sequence order. To configure a route map, use the global configuration command route-map <name> [permit|deny] <seq>. If you don't specify the sequence number, it defaults to the sequence number 10 (Prefix Lists default to 5). Also if you don't specify the processing action (permit/deny) the default is permit. Once you have configured the initial route-map statement, the next step is to configure any conditional matching criteria. With a route-map, this could be a prefix-list, Distribute-List, AS path ACL, metric value, route-type, community value (I'll go into this more later) etc. If you configure a route-map with no match statement, the default is to match everything. To configure a match statement, use the route-map configuration subcommand match [as-path|community|ip|ipv6|metric|tag|etc.] followed by what ever you're matching. There are so many options here I can't list them, but for an example, say we want to match a prefix-list called INTERNAL. To do this, we would use the command match ip address prefix-list <name>. Now that we have a match statement configured, the next step, if you need to, is to configure the optional action. This could be to alter BGP attributes like weight, or local preference, or could be to change the next-hop IP, configure a tag or community value. Again there are a number of options here that can be configured. 

When you configure a route-map, you might want to have multiple route-map statements. As mentioned earlier, to configure multiple statements, you simply configure the next sequence number. This allows you to use a single route-map, in order to perform all route-manipulations that you require. However, when configuring a route-map, remember that the processing of the route-map stops as soon as a match has finished processing the optional action if it's configured. If you do not configure an optional action, then nothing happens. There is a way to change this behaviour though. You can use the route-map keyword continue <seq>. This tells the route-map that upon a match, once the action is processed, continue processing on the configured sequence number. This isn't an ideal solution though as it can get confusing when trying to troubleshoot exactly what's happening. In saying that, I personally have used it in a route-map before, it just takes some extra thought. 

Let's take a look at an actual route-map scenario. Using our same topology from before, and being Network Engineers, we want to manually control and know our routing paths. So, let's configure the path from R1 to any routes in AS4, 5, and 6, to prefer the path through R3. Because this is an iBGP path, this is actually quite easy to do and can be done with a single route-map on R2. But first, let's take a look at the current BGP best path on R1 for all external routes to AS123. 

Output of sh bgp ipv4 un on R1

Here we can see that currently, all routes are preferred through R2. This would be due to BGP's best path algorithm selecting R2, as it has the lowest Router-ID. I'll go through the best path algorithm in more details later, but for now, we are going to use Local Preference to alter this path. The reason I'm using Local Preference, is that it is transitive inside an AS. This means that when you are using an iBGP peering session, Local Preference is sent with the route attributes. In order to make R3 the preferred path, we need to set the Local Preference to a higher value (The default is 100) and because I want to match all routes that are received from AS4, I don't need to specify any specific match criteria, I only need to set the Local Preference value. 

Route-map to set the local preference to 150 on R3

Now let's take a look at the route-map on R1 again.

R1 BGP table post route-map on R3

Notice something? We're no longer receiving routes from R2. This is what makes route-maps and configuring route-maps tricky. You really need to think about what you're doing. The reason we are no longer receiving routes from R2, is because R2, also now prefers the path through R3 due to the higher Local Preference. 

R2's BGP table post configuration of route-map on R3

This is a problem we need to fix. There are a few ways to do so, but the best method would be to configure the routes received from R4 on R2, to be preferred. To achieve this, we need another route-map on R2. We can either configure a route-map for routes received from R3, or routes received from R4. I'm going to configure it for routes received from R4 as this is best practice. To prefer the routes through R4, i'll once again use a Route-Map with no match statement, as I want to match all routes, and set the Weight of those routes to a value higher than the default (Which is 0). I'm using weight in this route-map as it is assessed earlier in the BGP path selection process than Local Preference, and it's non-transitive. 

Configuring a route-map to set the weight on R2

Now let's take another look at the BGP take of R1. 

R1 BGP table after configuring route-map on R2

Once again, we have routes being received from both R2, and R3, but the path through R3 is preferred due to the higher Local Preference. 

 

 

------------------- Notes 

 

if multiple match variable in single match statement, only need to match 1. If multiple match statements, need to match all

 

 

BGP has 2 ways to clear session. Hard and soft. soft invalidates BGP cache and requests full advertisement from peer

BGP communities are optional transitive. Can by 32bit number (0-4294967295) or 2 16 bit (0-65535:0-65535) new format.

Private BGP communites follow format of first 16 bit represent AS of community origination and second is pattern defined by originating AS

RFC4360 expanded communities by providing extended format commonly used for VPN services.

RFC 8092 supports communities larger than 32bit

must enable sending of community per neighbour - neighbr <ip> send-commun [stand|ext]

display community in new format use global command ip bgp-community new-format

to find routes with community value use sh bgp <afi> <safu> detail

to view routes with known commuity sh bgp <afi> <safi> community <value>

To match routes with community, use community-list. 2 types. standard (1-99) must match either well known or private (as:16bit), expanded (100-500) and use regex.

ip community-list <name> <permit|deny> <community-pattern>

if multiple communities on same list statement, must match all, ortherwise use mutliple statements.

set private community using RM

additive keyword preserves communities already set

routing decisions always start with longest match

BGP recalcs best path if, NH rechability changes, Failure of interface connected to eBGP peer, Redistribution change, receipt of new or removed paths

BGP best path algorithm

  1. Weight - locally significant 16 bit value. Higher is better. Not advertised.
  2. Local Pref - well-known discretionary. Only sent inside AS. 32 bit long. higher is better. default is 100.
  3. Locally Originated - Locally advertised routes -> Networks aggregated locally, BGP peer routes
  4. AIGP - Accumulated Interior Gateway Protocol. nontrans optional. Sent inside AS. Calculates conceptula path metric to route including NH. used accross multiple AS within unique IGP inside same company. path with AIGP preffered, if NH requires recursive lookup AIGP includes distance, If multi path and 1 has AIGP use it, Path commpared and lowest AIGP metric wins. Each router hop adds to metric
  5. AS Path - shortest wins. AS path prepending to manipulate routes
  6. Origin Type - IGP -> EGP -> ?
  7. Lowest MED - non trans. 32 bit value. Auto set to IGP path metric during network or redist. MED can be received from ebgp and sent inside AS, but not to other ebgp peer. No MED = 0
  8. eBGP over iBGP - eBGP -> Confederation member AS peer -> iBGP
  9. Lowest IGP metric to NH
  10. If both eBGP, prefer oldest
  11. Prefer route from peer with lowest RID
  12. prefer route with smallest cluster list length - non trans attrib appended by Route Reflector. Cluster list in RR used for loop prevention.
  13. Prefer path from lowest Neighbour IP - used only by iBGP as eBGP has oldest route