TCP Evolution | Haobin Tan

TCP Evolution

TCP Extensions

TCP Options: Basics

TCP Header

TCP Options

🎯 Goal: Flexibility for new developments
TCP header field
- Each option is coded in TLV format (Type-Length-Value)
- Has variable but limited length
  - number of options is limited (max. 40 bytes)
  - TCP header length at most 60 bytes in total (incl. options)
TLV format
- Multiple of 32 bit words (If not padding is needed)
- Type
  - Selective acknowledgements
  - Time stamps
  - Window scaling
  - Maximum segment size
  - Multipath TCP
  - TCP fast open
  - …
- Length: Length of option
- Value: Option data

Option Selective Acknowledgements

TCP uses cumulative acknowledgements
- 👍 Pro: Very robust against loss of ACK segments
- 👎 Cons: Inefficient loss recovery
  - Sender can only learn about a single lost segment per RTT
  - Consequently
    - Fast retransmit/fast recovery can only recover one lost segment
      per RTT
    - Multiple losses often lead to retransmission timeouts and head-of-line blocking
Improvement: selective acknowledgements (SACK)
- Also acknowledge “out-of-order” data
- Implemented as TCP option
💡 Idea: Separately acknowledge continuous blocks of out-of-order data
Usage of SACK option negotiated during connection establishment
SACK option format
- Typically, only 2-4 blocks can be “SACKed” in one segment
Case
Handling:
- Use first entry of SACK option to report new information
- Use subsequent entries of SACK option for redundancy Used for redundancy,
  - if prior ACKs were lost
  - Should repeat most recently sent first blocks
Different alternatives
Example

Option Window Scaling

Header field receive window remains unchanged (16 bit)
Scaling factor can be changed
- E.g., measure window size in 32 bit words instead of bytes
Option is negotiated during connection establishment
- Within SYN and SYN/ACK segments
Scaling factor remains unchanged during lifetime of a TCP connection

Extension SYN Cookies

Multipath TCP (MPTCP)

Motivation
🎯 Goal: Extension of TCP for parallel usage of multiple paths within a single TCP connection
- Improves reliability
- Increases performance
Important requirements
- Application compatibility
- Network compatibility
Challenges
- Middleboxes

Connection vs. Subflow

MPTCP connection
- Communication relation between sender and receiver
- Consists of one or multiple MPTCP subflows
MPTCP subflow
- Flow of TCP segments operating over an individual path
- Started and terminated like a „regular“ TCP connection
  - Started with 3-way handshake
  - Closed with FIN or RST
- Can be dynamically added and removed to/from an MPTCP connection

Embedding into Protocol Stack

Connection Establishment

3-way handshake of TCP

TCP option MP_CAPABLE

X, Y: token for client and server
- Identification for subsequent addition/removal of subflows

Adding a Subflow

TCP option MP_JOIN

3-way handshake of TCP
Use tokens exchanged during MPTCP connection establishment

Sequence Numbers

Each MPTCP segment carries two sequence numbers

Data sequence number for overall MPTCP connection
Subflow sequence number for individual flow
- Each subflow has coherent sequence numbers without „holes“

Congestion Control

🎯 Goals of MPTCP
- Improve throughput
  Multipath flow should perform at least as well as a single path congestion control would on the best available path
- Do not harm
  Multipath flow should not take up more capacity from any of the resources shared than if it were a single flow
- Balance congestion
  A multipath flow should have as much traffic as possible off its most congested paths
Congestion Control algorithm only applies to increase phase of congestion avoidance
- Unchanged: slow start, fast retransmit, fast recovery and multiplicative decrease
Different congestion windows
- $CWnd\_i$ per subflow $i$
- $CWnd\_{total}$ per MPTCP connection (multipath flow)
Assumption: Congestion window maintained in bytes
Basic approach: Couple congestion control of different subflows
Linked increase (congestion avoidance)
For each ACK received on subflow $i$ , increase $CWnd\_i$ by
$\min \left( \underbrace{\frac{\alpha * \text { bytes }\_{\text {acked }} * M S S\_{i}}{C W n d_{\text {total }}}}\_{\text{ Increase for multipath subflow }}, \underbrace{\frac{\text { bytes }\_{\text {acked }} * M S S\_{i}}{C W n d\_{i}}}\_{\text{ Increase „regular“ TCP would get in same scenario }}\right)$
(any multipath subflow cannot be more aggressive than a TCP flow in the same circumstances (do not harm))
- $\alpha$ : Describes aggressiveness of multipath flow $\alpha=C W n d\_{\text {total }} \cdot \frac{\max \_{i}\left(\frac{C W n d\_{i}}{R T T\_{i}^{2}}\right)}{\left(\sum \frac{C W n d\_{i}}{R T T\_{i}}\right)^{2}}$

TCP in Networks with High BDP

Scalability Issues

It can take very long until the available data rate is fully utilized
Cause
- Very conservative behavior of congestion avoidance
  - Congestion window grows by one MSS per RTT
  - Slow window growth in congestion avoidance causes low average data rate
  ➡️ NOT efficient in networks with high bandwidth-delay products
Require faster increase of the congestion window in congestion avoidance

Faster Increase of Congestion Window

🎯 Goals
- High resource utilization in networks with high bandwidth delay product
- Quick reactions to changes of the situation within the network
- Fairness with respect to other TCP variants
Different types of fairness
- intra protocol fairness
  - All senders use same TCP variant
  - Goal: All flows should achieve same data rate
- With new TCP variants: inter protocol fairness
- Furthermore: RTT fairness
  - Fairness among TCP flows with different RTTs

CUBIC TCP

🎯 Goals
- Provide simple algorithm for networks with high bandwidth-delay product
- TCP-friendly
  Behaves like standard TCP (i.e., TCP Reno) in networks with short RTTs and small bandwidth
- Congestion avoidance
  Applies cubic function instead of linear window increase
- Performance should not be worse than TCP Reno
In comparison to TCP Reno
- Better RTT fairness (Window growth independent of RTT)
- Better scalability to high data rates
Currently default congestion control in all major operating systems

Congestion Window Increase

Independent from RTT
- Use of actual time $t$ that has passed since last congestion incident. I.e. Window growth depends on time between consecutive congestion events
- Apply cubic function
  $W(t)=C(t-K)^{3}+W_{\max } \quad \text { with } \mathrm{K}=\sqrt[3]{\frac{W_{\max }(1-\beta)}{C}}$
  - $C$ : predefined constant that determines aggressiveness of increase
  - $W\_{max}$ : congestion window size at latest congestion incident
  - $K$ : time period that it takes to increase current window to $W\_{max}$ (in case of no further congestions)
  - $\beta$ : multiplicative decrease of congestion window
    - $\beta = 0.5$ for TCP-Reno
    - $\beta = 0.7$ for CUBIC TCP

Congestion Window over Time

Example

Three CUBIC Modes

TCP-friendly region
- Ensures that CUBIC achieves at least same data rate as standard TCP in networks with small RTT
- Observation: in networks with small RTTs, Cubic ́s congestion window grows slower than with TCP Reno
- Approach: “emulation” of TCP Reno (which uses AIMD)
- $AIMD(\alpha, \beta)$
  - $\alpha$ : additive increase factor
    $W = W + \alpha$
  - $\beta$ : multiplicative decrease factor
    $W = \beta \cdot W$
  TCP Reno uses $AIMD(1, \frac{1}{2})$
- TCP-fair increment
  $\alpha=3 \cdot \frac{1-\beta}{1+\beta}$
  - Achieves same $W\_{avg}$ as $AIMD(1, \frac{1}{2})$
  - Average data rate of AIMD
    $W\_{avg} = \frac{1}{R T T} \sqrt{\frac{\alpha \cdot(1+\beta)}{2 \cdot(1-\beta) \cdot p}}$
    - $p$ : loss rate
- Window size of emulated TCP at time $t$
  $W\_{T C P}=W\_{\max } \cdot \beta+\frac{3 \cdot(1-\beta)}{1+\beta} \cdot \frac{t}{R T T}$
- Recall window size of TCP cubic
  $W(t)=C(t-K)^{3}+W_{\max }$
$\Rightarrow$ Rule
- $W\_{Cubic} < W\_{TCP}$ , then $CWnd$ is set to $W\_{TCP}$ each time an ACK is received
- otherwise, $CWnd$ is set to $W\_{Cubic}$ each time an ACK is received
Concave region: $CWnd < W\_{max}$ and not in TCP-friendly region
- For each received ACK $CWnd = CWnd+\frac{W\_{cubic}(t+R T T)-CWnd}{C W n d}$
Convex region: $CWnd > W\_{max}$ and not in TCP-friendly region
- $CWnd$ is increased very carefully
- searching for new 𝑊𝑚𝑎𝑥

TCP and Response Time

Basic Issue

Response time
- Time between initiation of a TCP connection and receipt of the requested data
- Important components
  - Handshake of TCP connection establishment
  - Slow start
  - Transmission of the object
- Macroscopic Model
  - Response time without applying congestion control
    - After 1st RTT: Client sends object request
    - After 2nd RTT
      - Client begins to receive object data
      - Receiver needs
        $t = \frac{\text{object size } O}{\text{data rate } D}$
    $\Rightarrow$ lower bound:
    $\text{Response time} \geq 2 RTT + \frac{O}{D}$
    ( With small objects, response time dominated by $RTT$ s)
Used Variables
- $RTT$ : round trip time [Seconds]
- $MSS$ : maximum segment size [bit]
- $W$ : Size of congestion window [MSS], given as multiples of MSS
- $O$ : Size of object that has to be transferred [bit]
- $D$ : Data rate [bit/s]
Observation
- $RTT$ s have significant influence on response time
- On connection establishment: 2 $RTT$ 𝑠 until reception of object begins
- During object transmission
  - Small windows create pauses: waiting for ACKs
- Majority of TCP connections in the Web has short lifetime
  $\rightarrow$ Slow start has significant impact on response time
🎯 Goals
- Avoid „empty“ RTTs without data transport
- Reduce RTTs needed for slow start

Bigger Initial Congestion Window

💡 Idea: Increase initial congestion window (IW)

at least 10 segments, thus, about 15 Kbytes

TCP Fast Open

🎯 Goal: Reduce delays that precede the transmission of an object
TCP Cookie
- Goal
  - Avoid DoS attacks
  - Disallow sending data within first SYN segment of first connection establishment to a server
  - Establish cookie for subsequent connections
- Use cookie $\rightarrow$ avoid state keeping at server
- Basic steps
  1. Client requests TFO cookie from server
  2. Client uses TFO cookies in subsequent TCP connections

HTTP/2

QUIC

Last updated on 2024-09-07

← Data Center 2021-03-19

Access Networks 2021-03-24 →