The Border Gateway Protocol is an exterior gateway protocol that is used to exchange routing updates between Autonomous Systems. It was first defined by RFC 1105 in 1989. Since 1994, it has been the building block of the Internet. BGP is a path vector routing protocol as routing decisions are made by exchanging lists of Autonomous Systems’ Paths (AS_PATH’s). Each AS_PATH lists all the Autonomous Systems that a packet needs to traverse to reach a specific destination network.
Peering: internal-BGP and external-BGP
The peering relationship with a BGP neighbor can be internal or external based on whether the two routers belong to the same AS (iBGP) or not (eBGP). Differently from iBGP peers, generally two eBGP routers connect directly (reside on the same subnet) to establish a peering relationship. The way that AS_PATH updates are exchanged also changes based on the type of peering. For example: updates received by eBGP peers are propagated to other eBGP and iBGP peers. On the other hand, updates learned by iBGP peers are only advertised to eBGP routers to avoid routing loops. This mechanism is called split horizon and is implemented by distance vector protocols to prevent routing loops. In this case, split horizon requires iBGP routers to be fully-meshed together to assure proper route propagation.
BGP Message Types
BGP establishes peering relationships between routers using the TCP protocol port 179. There are five message types that can be exchanged between routers:
- Open – To exchange application parameters.
- Update – To exchange AS_PATH information.
- Notification – To transmit error messages.
- Keepalive – In response to an Open message, or during normal operations, a router regularly sends this message to maintain a peering relationship.
- Route-Refresh – To request route re-advertisement.
A peering relationship between two boarder routers can be described by a finite state machine with six peering states. The peering states are also ruled by three timers, which are:
Connect Retry – (default 120 seconds) the BGP process tries to reconnect to the configured peer when this timer expires.
Hold Time – (default 90 seconds) this timer rules the maximum time that BGP waits between successive messages (Keepalive or Update) from its peer before closing the connection.
Keep Alive – (default 30 seconds) a Keepalive message each time this timer expires.
The six peering states are the following ones:
- Idle – This is the initial state where the router refuses all connections.
- Connect – The router is waiting for the completion of the TCP connection:
- If the TCP connection is successful, the router sends an Open message, and the state will transition to Open Sent.
- If the TCP connection fails, the state will transition to Active and continue to listen for incoming TCP connections.
- If the Connect Retry time expires, the BGP process resets the time and loops back to the Connect state.
- If any other event, BGP goes to Idle.
- Active – The router is listening for and accepting TCP connections:
- If the TCP connection succeeds, BGP sends a Passive Open, moves to the Open Sent state, and then sends an additional Open message.
- If the Connect Retry timer expires, BGP restarts the timer, initiates a BGP connection to the peer and moves to the Connect state.
- If the peer is not an expected one, the BGP rejects the connection attempt and stays in the Active state.
- Open Sent – The BGP process is waiting for an Open message from its peer:
- If an error is detected in the Open message, BGP sends a notification message and moves to the Idle state.
- If the Open message is valid, BGP sends a Keepalive message and compares the parameters in the message, then moves to the Open Confirm state.
- If the TCP connection ends, BGP closes the connection and moves to the Active state.
- If any other event occurs, the BGP process sends a Notification message and moves to the Idle state.
- Open Confirm – The BGP process is waiting for a Keepalive or Notification message:
- If a Keepalive message is received, BGP moves to the Established state.
- If the Keep Alive timer expires, BGP sends a Keepalive message and resets its Keep Alive timer.
- Established – This is BGP’s operational state during which BGP exchanges Update, Notification, and Keepalive messages:
- If the Keep Alive timer expires, the BGP process sends a Keepalive message and resets the timer.
- Each time BGP sends Keepalive and Update messages, the Keep Alive timer is reset and so is the Hold Time timer.
- If an unexpected event occurs, BGP sends a Notification message and switches to the Idle state.
Putting it all together, we get to the BGP finite state machine, which is captured in the following picture:
Figure 1: The BGP finite state machine (not all transitions listed)
An attribute defines the characteristics of the paths that are included in an Update message. The path attributes are divided into the four categories:
- Well-known mandatory
- Well-known discretionary
- Optional transitive
- Optional non-transitive
Mandatory attributes must be included in every Update message. Discretionary ones don’t have to. Path updates carrying a transitive attribute should be accepted (even if the optional attribute is not supported) and propagated to a router’s peers. Non-transitive ones can be quietly ignored and not advertised. The following table summarizes the 10 attributes that are carried within Update messages.
|1||ORIGIN||Set by the originating AS, can be set to i for IGP, e for EGP, and ? if incomplete.||Well-known mandatory|
|2||AS_PATH||Set by the border router when the update leaves the AS, the leftmost enter is the AS that sent the prefix to you, while the rightmost entry is the originator of the prefix.||Well-known mandatory|
|3||NEXT_HOP||Set by the router to the local interface address used to reach the neighbor. In iBGP peering is not changed, in eBGP peering is changed. Set when an update crosses an AS boundary.||Well-known mandatory|
|4||MULTI_EXIT_DISC||Defines the preferred entry point to the local AS.||Optional non-transitive|
|5||LOCAL_PREF||Included in Update messages that a BGP speaker sends to its iBGP neighbors. Updates with higher local preferences are used.||Well-known discretionary|
|6||ATOMIC_AGGREGATE||When presented with a set of overlapping routes, a router selects a less specific route and BGP attaches this attribute.||Well-known discretionary|
|7||AGGREGATOR||AS number and router ID of the router that creates the aggregate route.||Optional transitive|
|8||COMMUNITY||Group of destinations which share some common property.||Optional transitive|
|9||ORIGINATOR_ID||Carries the router ID of the router originator in the local AS while route reflection is deployed.||Optional non-transitive|
|10||CLUSTER_LIST||A cluster is represented by a CLUSTER_ID. A CLUSTER_LIST is a sequence of CLUSTER_ID values that represents the path that the route has passed along.||Optional non-transitive|
The Path Selection Criteria
The BGP path selection criteria defines how routes are picked, or paths, to reach a specific destination network. BGP manages this process by using three routing tables:
- Adj RIB-IN – Routing information learned from inbound Update messages.
- Local RIB – Local routing information that BGP has selected by applying its local policies to the Adj RIB-IN.
- Adj RIB-OUT – Routing information that BGP has selected for advertisement to its peers.
The BGP path selection criteria is the following:
- Highest local preference.
- Shortest AS_PATH.
- Lowest origin code where IGP = 0, EGP = 1, Incomplete = 2.
- Lowest MULTI_EXIT_DISC.
- Prefer eBGP over iBGP paths.
- Lowest IGP cost to next-hop.
- Lowest BGP router ID.
- Shortest CLUSTER_LIST.
- Lowest peer IP address.
Please keep in mind that networking vendors may have a slightly different path selection criteria, and may have some extra checks included in their algorithm. Cisco, for example, uses a proprietary weight attribute that is checked at step 0. Other implementations may also add a step in the process that prioritizes older routes than newer ones.
To verify what routes a BGP process is using, routers have commands in place that will mark routes with the following flags:
- Used: Routes installed in the Forwarding Information Base (FIB) because the routing table manager selected them as the best or active routes; routers propagate these routes to their neighbors.
- Best: It’s the best route based on the path selection criteria.
- Valid: The entry is present in the RIB.
Also in this case, each networking vendor may implement its own way of reporting this information.
I hope this article provided a quick overview on how this important protocol operates and exchanges updates between routers. Here I only scratched the surface. Keep in mind that there’s more to it that I didn’t cover, such as the use of Route Reflectors, Confederations, Communities, and Route Policies. Perhaps I will cover that in another blog post in the near future 🙂 Please leave a comment if you have any feedback or suggestions.