Internet Routing
Baiscs
Internet: network of networks
High-level View on an IP Router
Control Plane
Routing protocols
Exchange of routing messages for calculation of routes
Data Plane
Lookup
Forwarding of packets at layer 3
Routing table
- Generated by routing protocol
- Entries: Mapping of destination IP prefixes to next hop (IP address)
- Optimized for the particular routing algorithm
- Performance is not critical
- Implemented in software
Forwarding table
- Used for packet forwarding
- Entries: Mapping of IP prefixes to outgoing ports (interface ID and MAC address)
- Optimized for longest prefix matching
- Performance is critical (lookup in line speed)!
- Partially uses dedicated hardware
Routing metric (also named cost, weight)
Metric used by a router to make routing decision
Can be applied to an individual link or to the overall path
Examples
- Utilization, latency, data rate
- Number of hops
Routing policy
- Policy-based routing decisions
- Policies are defined by network operator / owner
Distributed Adaptive Routing
- Currently most commonly used in the Internet
- An instance of the routing protocol in each router
- Exchange of routing information via routing messages
- Adaptation of the paths to the current situation in the network
Path computation
Network is modeled as graph
$$ G = (N, E) $$- $N$: nodes (routers)
- $E$: edges
- Links between routers are edges
- Edges are associated with metric
Example
Autonomous Systems
Structuring into autonomous systems
Internet routing can be divided into Autonomous Systems (AS)
- Routing inside an autonomous system using Interior Gateway Protocol (IGP)
- Routing between autonomous systems using Exterior Gateway Protocol (EGP)
Autonomous Systems
Identification: Unique number called Autonomous Systems Number (ASN)
- earlier 16 bit; now 32 bit
Properties
- Appears as a single entity to the outside
- Uniform routing policy
- Typically uniform interior routing protocol
- Different ASes can use different interior routing protocols
👍 Advantages
- Separated administrative domains
- Scalability by using two logical levels
- Routing protocol inside an AS (not global)
- Routing protocol between ASes
Important Properties
- Scalability of routing protocols
- Overhead increases with size of the network 📈
- Space for storing routing information
- Number of routing messages to exchange
- Computation overhead
- Overhead increases with size of the network 📈
- Operator autonomy
- Choice of interior routing protocol
- Hiding of internal network structure
- Scalability of routing protocols
Allocation
- IANA (Internet Assigned Numbers Authority) delegates allocation to Regional Internet Registries (RIR), e.g.,
ARIN (North America)
RIPE NCC (Europe, Middle East and Central Asia) APNIC (Asia-Pacific)
LACNIC (Latin America, Caribbean)
AfriNIC (Africa)
- IANA (Internet Assigned Numbers Authority) delegates allocation to Regional Internet Registries (RIR), e.g.,
Subdivision into ASs
Classification of ASes
Classification based on role
- Stub AS
- Small organizations and enterprises (Mostly operate only regionally)
- Connected to exactly one provider
- No transit traffic
- Multihomed AS
Large enterprises
Connected to several providers (reliability)
No transit traffic
- Transit AS
- Provider (Often global scope)
- Stub AS
Classification based on “economic position/influence”
- Tier 1, tier 2, tier 3 …
Different Roles
- End customer
- Uses Internet application
- Examples
- Universities
- Enterprises
- Customers of Internet Service Providers (ISP)
- Content delivery provider
- Requested by end customers / Internet application
- Provide content
- Examples: Google, Akamai, Yahoo, YouTube, Facebook…
- Requested by end customers / Internet application
Reachability across autonomous systems
Reachability
Main problem
- How to ensure mutual reachability?
- Cooperation among autonomous systems?
Basic concepts
- Transit: Purchased connectivity 💸
- Peering: Direct connection, typically between ASes of the same tier
Connectivity and Transit
Establish connectivity
Establish paths to all other ASes in the Internet
AS operator purchases connectivity from one or more ASes
Transit
Internet transit is the service of allowing network traffic to cross or “transit” a computer network, usually used to connect a smaller Internet service provider (ISP) to the larger Internet. (wiki)
Purchased connectivity 💸
- Upstream: provider (seller) of transit
- Downstream: customer (buyer)
Traffic exchange
In both directions
Only downstream AS must pay; usually volume rate
Transit AS: Provider AS that offers transit
Options for connecting a stub AS
Stub AS
Dualhomed stub AS
Multihomed stub AS
Peering
Private peering
Direct connection between two ASes, usually of same tier
No cost for traffic exchange; costs for network infrastructure apply
However
- Mostly only data traffic between privately peered ASes
- NO transit traffic of other ASes
Video explanation
Example: peering and transit combination
👍 Advantages
- Benefits both ASes: save transit costs, that otherwise would apply
- Shorter data paths: fewer AS hops between source and destination
🔴 Problems
- Direct connection of ASes complicated (Different geographical locations)
- Full mesh of $n$ ASes ($\frac{(n-1)n}{2}$ separate connections!)
Public peering
Through Internet exchange points (IXPs)
- Central public authority for interconnection
Members / customers: Monthly fixed charges per network port
Necessary for operation and maintenance of IXP‘s switching platform
- Different peering policies
- Open: AS is open for peering with all other ASes
- Selective: Peering only under given terms and conditions
- Restrictive: AS does not engage in new peering relationships
- No Peering: AS does not do any peering
- Different peering policies
Autonomous Systems and Transit/Peering
Tier 1
- Large global ASes with access to (all) other ASes
- Do not buy any transit. Sell transit
- Peering with other tier 1 ASes
- Examples: Deutsche Telekom, AT&T…
- Large global ASes with access to (all) other ASes
Tier 2
- Big national and inter-regional ASes
- Connection to providers of Internet applications
- Downstream of tier 1 ASes
- Sell transit to other ASes
- Usually employ peering
- Examples: Vodafone, Comcast, Tele2
- Big national and inter-regional ASes
Tier 3
- Small mostly regional ASes
- Connections with small providers of Internet applications
- Downstream of tier 2 providers
- Usually do not sell transit to other ASes
- Sell transit mostly to end customers/users
- Usually employ peering
- Examples: KabelBW, NETHINKS, Alice
- Small mostly regional ASes
Content Delivery Provider
🎯 Goal: FAST delivery of content (i.e. low latencies)
$\rightarrow$ Locations close to tier 1 peering points are preferred
Two basic alternatives
- Web servers are hosted directly in tier 1 ASes (Does not require an own AS number)
- Web servers are connected over own routers
Own AS number required
Peering with essential providers at important peering points
Examples: Google, Yahoo, Akamai
Content Delivery Network
World wide network with own AS number
- Thousands of Points of Presence (PoP) spread across the world
Point of Presence
Consists of access routers und core routers
Access router at the edge of a CDN
Core router inside a CDN
Customers are connecting through access routers
Objectives
- Load balancing at access routers
- Be close to customers $\rightarrow$ low latencies
Routing in and between Autonomous Systems
Classification
- Interior gateway protocols (IGPs) INSIDE one AS
A.k.a intra-domain routing protocols
Are encapsulated inside an AS, i.e., not visible to the outside
Different IGPs in different ASes possible
Metric-based
- Exterior gateway protocols (EGPs) BETWEEN ASes
Also named inter-domain routing protocols
Single protocol between all ASes
Policy-based
RIP: Routing Information Protocol
- Interior gateway protocol
- Very simple protocol that requires very little configuration
RIP in the Protocol Stack
- Application process routed implements RIP and manages forwarding table
- RIP routing messages are sent over UDP $\rightarrow$ NOT reliable
Routing Metric
Distance between source and destination = number of hops on the path (hop count)
Hop count
Refer to the number of intermediate devices through which data must pass between source and destination.
- Each time that a packet of data moves from one router (or device) to another, that is considered one HOP.
Limited range of values: 1 - 15
Value of 16 corresponds to “infinity”
RIP Routing Messages
- RIP protocol entities exchange routing messages
- UDP is used as transport protocol
- Types of routing messages
- Request message
- Requires complete routing table or part of it
- Response message for different reasons
- Response to specific query
- Regular update
- Triggered update
- Request message
Routing Updates
Outgoing
- Regular routing update
- Periodically, every 30 seconds
- Sends entire routing table to all its neighbors
- Entries in the routing table are periodically refreshed
- No refresh for at least 180 seconds? $\rightarrow$ Hop-Count is set to 16 („infinite“), corresponding route is invalidated
- Metric for route changes (triggered update)
- Only changes since the last update are communicated, not the complete routing table
- Rate limitation in order to reduce load on the network
Incoming
Entry for a destination address does not exist in routing table and received metric is not „infinite“ $\rightarrow$ Insert new entry in routing table
Current entry for a destination address in routing table has larger metric or routing update was sent by the “next router” for this destination $\rightarrow$ Modify entry
Otherwise $\rightarrow$ Ignore routing update
Example
Scenario
- Connecting lines represent either direct links or LANs between routers
- Ovals represent routers
We have the routing table of router D
30 seconds later D receives new routing update from Router A
- A tells D: “Hey, now I can reach Z through 4 hops”.
- I.e., now D can reach Z through $4+1=5$ hops
As 5 < 7 (the old number of hops to reach Z), D updates its routing table:
OSPF: Open Shortest Path First
OSPF Basics
Interior gateway protocol
Link state protocol
Each router in the network needs to learn complete topology of the network (Otherwise, calculated paths are inconsistent)
- Topology = Nodes and links with their costs (weights)
Each router separately computes shortest paths based on network topology
- Dijkstra shortest path algorithm
OSPF in the Protocol Stack
OSPF is located on top of IP $\rightarrow$ OSPF uses an unreliable communication service
Know the Neighbors
Each router
- learns its neighbors and
- monitors the state of the links to them
Link States of a Router
Router ID of neighbors: dynamically discovered by hello protocol
Availability: dynamically discovered by hello protocol or physical layer
Everything else is configured
Pre-Configuration of OSPF Router
Each router is pre-configured with the following parameters
Router ID: unique ID of a router in the network
Per-interface parameters
- Interface IP address (and mask)
- Interface output cost – metric
- Typically, inversely proportional to link data rate
Routing Metric
Each link is associated with link costs
Example: prefer links with higher data rate
$$ \text { Cost }=\frac{\text { Reference Data Rate }}{\text { Interface Data Rate }} $$- $\text{Reference Data Rate}$ can be configured
- E.g., to 1 $𝐺𝑏𝑖𝑡/𝑠$ or 10 $𝐺𝑏𝑖𝑡/𝑠$
- Should be consistent across all routers in network
Link State Advertisement (LSA)
Each router constructs router link state advertisements (LSAs)
- Router LSAs consist of information about its neighbors and links
- Example
Router floods its LSA on all its interfaces $\rightarrow$ All routers in the network must receive an identical copy of this LSA
Link State Database
Each router maintains a link state database
- Stores most recent LSAs from all other routers in the network
Link state database is used to
- Construct topology graph of the network
- Calculate routing table
Initial Synchronization of link state database
(Re-)start of a router
New router has an empty link state database
Initial database synchronization
Router asks neighboring router to share its database Performed
immediately after a “handshake” of the hello protocol
Routers exchange LSA headers with each other
If an LSA is missing it is requested from the neighbor router
$\rightarrow$ the routers are now considered as adjacent to each other
Link State Advertisement
Example
Each LSA is associated with a lifetime (LS Age
)
Set to “0” by advertising router
- When flooded, incremented by transmission delay (estimated value)
- As LSA is stored in database, Age is incremented over time
When LSA’s age reaches
MaxAge
, LSA is considered out-of-dateMaxAge
is set to 1 hour
Consequence: routers must refresh their LSAs every
LSRefreshTime
LSRefreshTime
is set to 30 minutes- Minimum value between generation of any particular LSA: 5 seconds
Hello Protocol
🎯 Goals
- Ensure bi-directional communication between neighboring OSPF routers
- Establish and maintain logical adjacencies
Determines identity and liveliness of neighboring routers
Hello Message
Contains own
router ID
Contains
router ID
of neighboring router, if known- If not yet known $\rightarrow$
router ID
is set to0.0.0.0
- If own
router ID
is contained in neighbor’s hello message $\rightarrow$ Communication is considered to be bi-directional
- If not yet known $\rightarrow$
Destination IP address of hello message:
224.0.0.5
(multicast address, “AllSPFRouters
”)$\rightarrow$ hello message is received and processed only by OSPF routers
Simplified Workflow
A router periodically sends a hello message on all its links
- “Hello, I am R1, I am still here”
- If known, hello message contains
router ID
of neighboring router- “my neighbor on this link is R2”
If no hello message is received for a pre-defined period of time $\rightarrow$ the link is considered to be down 🤪
- Standard value for periodic hello messages: every 10-30 seconds
- Fast hello extension: < 1 second
OSPF Message
Header of OSPF messages
- Version: OSPF Version, currently 2 for IPv4 and 3 for IPv6
- Type
- Hello
- database description
- link state request
- link state update
- link state acknowledgement
- Router ID: ID of originating router
- Area ID: OSPF area
- Checksum: Internet checksum over entire OSPF message
- AUType and Authentication: Optional authentication of originating router
Link State Update Message
Structure of a Link State Advertisement
- Consists of a header and a body
- LSA header: contains information used to uniquely identify the LSA
- Advertising router
- Sequence number of LSA at advertising router
- …
- LSA body: contains information of all operational links of the router
- Associated cost
- Type of link
- Reachability information
- …
LSA Header
LS Age
Time in seconds since LSA was originated
Options Optional capabilities supported by OSPF domain
LS Type Router LSA, network LSA …
Link State ID Uniquely identifies an LSA
Advertising Router OSPF router ID of originating router
LS Sequence Number Incremented each time a new LSA is generated
Checksum Over entire message exept age field
Length #bytes for entire LSA including header
Coping with Dynamic Changes
Issuing LSAs
- If nothing changes (link, router), nothing needs to be reported with respect to routing $\rightarrow$ keep quiet
- LSAs are refreshed every 30 minutes
- Besides periodic refreshes, communication is only needed in case of changes
- Interface changed to up or down
- Neighboring router on link is unreachable
- Configuration changes
- Minimum time between two consecutive LSAs of a router is set to 5 seconds (Due to stability reasons)
Synchronized Link State Databases
🎯 Goal: link state databases of all routers need to have identical content (need to be synchronized)
Following actions are needed
- Ensure that each LSA is received by every router in the network (reliable flooding)
- Ensure that all routers consistently either store or discard each LSA $\rightarrow$ fully deterministic comparison rules
- Ensure that expired LSAs are deleted from link state databases of every router
Reliable Flooding
Reception of each LSA is acknowledged by neighboring router
- Hop-by-hop acknowledgements
Router R stores received LSA
If R does not have an LSA from the advertising router
If the received LSA is newer than the one in the database
def is_new_LSA(received_LSA, cur_stored_LSA, MAX_AGE=60): if received_LSA.sequence_Nr > cur_stored_LSA.sequence_Nr: return True elif received_LSA.sequence_Nr == cur_stored_LSA.sequence_Nr: if received_LSA.checksum > cur_stored_LSA.checksum or cur_stored_LSA.age == MAX_AGE or received_LSA.age < cur_stored_LSA.age: return True else: return False
If R stores the LSA, it forwards it to its neighbors
- Uses multicast address
224.0.0.5
with hop limit of 1
- Uses multicast address
LSA Flooding Example
Router R receives LSA from advertising router R1
if
received_LSA.age == MAX_AGE
and no LSA from R1 is known$\rightarrow$ Send ACK and discard
If there is no LSA from R1 in database or received LSA is newer $\rightarrow$
Store/replace LSA
Send ACK
Update Age and flood LSA to neighbors
If already stored copy is newer
$\rightarrow$ Send stored copy back to advertising router R1
If LSA and stored copy are the same
$\rightarrow$ Discard LSA
Re-compute routes if content of link state database changed
OSPF Areas
Basic situation: Autonomous systems can grow rather large
Scalability problem
- LSA flooding and
- Route computation overhead
$\rightarrow$ do NOT scale 🤪
🔧 Solution: Divide an AS into areas (i.e., introduce additional level of hierarchy)
- Apply routing only within an area
- LSA flooding and route computation limited to an area
- Only routers within the same area have identical link state databases.
- Areas exchange summary information with each other
- Addresses reachable from these areas
- Typical size of an area: <100 routers
- Apply routing only within an area
OSPF Areas structure
- Two levels of hierarchy
- Area 0 – backbone of the autonomous system Backbone must be always connected
- All other areas are directly connected to backbone
- Area border routers (ABRs) interconnect areas
They belong to both: their area and the backbone
They run an instance of OSPF for each area they are connected to
They generate summary LSAs
- Contain ABR’s routing table for corresponding area
- List of destinations reachable within the area
- Associated with path cost from the ABR to destination
- ABR ́s routing table is constructed after intra-area path computation
- Contain ABR’s routing table for corresponding area
Handle summary LSAs: Same way as “regular” LSAs
- ABR forward summary LSAs of an area into backbone
- ABR forward summary LSAs from backbone into the area
Inter-Area Forwarding
Data between areas are forwarded through backbone (area 0)
End-to-end path consists of path segments
Segment between source and ABR of originating area
Segment between two ABRs in area 0, and
Segment between ABR of target area and destination
Routers within an area select ABRs so that resulting end-to-end path is a shortest path
Based on path costs of ABRs
Example
RIP vs. OSPF
RIP: based on distance vector
🔴 Problems
- Limited in metric selection and size
Only one metric (hop count)
Maximum path length of 15 hops
- Periodic updates every 30 seconds, even without changes
- Slow convergence, count-to-infinity $\rightarrow$ Not suitable for large networks 🤪
- Limited in metric selection and size
👍 Advantage: easier and requires less resources than OSPF
- Still sometimes used in small networks
OSPF: based on link-state
- Addresses shortcomings of RIP
- Faster convergence, no count-to-infinity, lower signaling overhead … 👏
- Large networks can be divided into areas
- Standard in large ASes (together with IS-IS)
BGP: Border Gateway Protocol
Good explanation:
Exterior Gateway Protocols
In aforementioned section, we have devided a large networks into different autonomous systems (ASes). In order to make autonomous systems to be able to communicate with each other, there should be at least one special intermediate system that serves as an interface to other ASes.
👍 Advantages:
- Scalability
Size of routing tables depends on size of AS
Changes in routing tables are only propagated within an AS
- Autonomy
- Internet = network of networks
- Routing can be controlled in the own network
Uniform interior routing protocol within the AS
Interior routing protocols of different ASes do not have to be identical
Border Gateway Protocol
- The most important exterior gateway protocol
- Path vector protocol
- Extension of distance vector approach
- BGP distributed paths, not metrics like costs etc.
- With paths it is easy to guarantee that no loops exist
- Based on policies of network operator
BGP in a Nutshell
What exactly is being distributed?
- Paths (also called routes) that consist of
- Target: prefixes (also called: network, network prefixes, IP address ranges)
- Attributes: path, next hop
- Each traversed AS adds its own AS number to the path
- Paths (also called routes) that consist of
Traffic “follows” UPDATE messages in opposite direction
Example: HW07
AS 100 announces prefix 1.6.17.0/24
. Describe how the routing information is distributed in the network.
The other two UPDATE messages (sent from R1 to R31 and R21) are handled in a similar way.
BGP Structure
External BGP (EBGP)
- Spoken between BGP routers of neighboring ASes
- Announcement and forwarding of path information
- Internal details of AS are NOT exchanged
Internal BGP (IBGP)
- Between BGP routers within an AS
- Synchronization of BGP routers of an AS
- Establishment of transit routes
Categorization of Routing Protocols
Interplay of the Routing Approaches
Routing with BGP and IGPs
Assume Alice wants to sent a packet to an external target ( not part of the local IGP domain, e.g., 2.3.4.5
).
How does IGP router know what to do with this packet?
Is not strictly prescribed by BGP
Network operators can configure this freely
Different approaches possible
Approach 1: IGP distributes “default” routes
Unknown address/prefix packets are routed to default BGP router via shortest path
- Good option for stub ASes
- Not practicable for transit ASes
Example
Approach 2: Publication of external routes via IGP
Allows more fine-grained control such as „all Google traffic goes this way“
Cannot be done with all external routes (scalability!)
Usually combined with default route
Example
Approach 3: IGP router also speaks BGP
Forwarding table is build from two routing tables (BGP + IGP)
Often the case with big backbone providers
Example
BGP-Sessions
Point-to-point
- Usually only between directly connected routers
- Neighbors are called “peers”
- BGP uses TCP connections between these routers
- Usually only between directly connected routers
How to establish TCP connection without working IP routing?
- IBGP: IGP of AS can be used
- EBGP
- Usually direct physical connection $\rightarrow$ no routing required
- Manual configuration at both ends of connection
IBGP Connections
Simplest case: all BGP routers are fully meshed and connected directly to each other
- BGP sessions must be kept active all the time
- Bad scalability 👎
Alternative: Concentrate IBGP traffic in a single router
- Called route reflector
- Only route reflector has to maintain sessions with everyone else
- More than one reflector used in practice for reliability reasons
Alternative: Form hierarchies of sub ASes
Called AS confederations
Can also be used to implement more complex policies
Confederation appears to outside as single AS
BGP Messages
OPEN
Establishment of BGP connection to peer BGP router
- Important: TCP connection must already exist!
Authentication
UPDATE
- Announcement of new or withdrawal of outdated path
- Attention: Only sent if new, better paths available
KEEPALIVE
Keeps connection alive in absence of UDPATE messages
Acknowledgment for an OPEN request
Recommended KeepAliveTimer: 30 s
NOTIFICATION
- Error message and tear down of BGP connection
Routing Information Base
BGP provides mechanisms for distributing path information
- Does NOT dictate how routes should be chosen
- No predefined routing metric
BGP uses policies
BGP instance of a router collects received and dispatched routing information in various internal tables
- Routing Information Base (RIB)
- Mainly for logical structuring
Structure
Adj-RIB-In (Adjacency RIB Incoming)
- Exists per peer
- Stores information received from this peer
Loc-RIB (Local RIB, Routing Information Base)
- „Actual routing table“
- Only preferred (= best=shortest) routes to destination networks are included here
- Forwarding Information Base (FIB) is build based on Loc-RIB
- „Actual routing table“
Adj-RIB-Out (Adjacency RIB Outgoing)
- Exists per peer
Contains routes published to this peer
Routing Table example: HW08
AS 100 announces prefix 1.6.17.0/24
. Fill out the simplified routing table of R5
🔴 Challenges
BGP “struggles” with many challenges and problems, e.g.,
Maintaining scalability
Security problems