TCP Evolution

ζˆͺ屏2021-03-21 17.21.19

TCP Extensions

TCP Options: Basics

TCP Header

ζˆͺ屏2021-03-21 17.25.54

TCP Options

  • 🎯 Goal: Flexibility for new developments

  • TCP header field

    • Each option is coded in TLV format (Type-Length-Value)
    • Has variable but limited length
      • number of options is limited (max. 40 bytes)
      • TCP header length at most 60 bytes in total (incl. options)
  • TLV format

    ζˆͺ屏2021-03-21 17.31.16

    • Multiple of 32 bit words (If not padding is needed)

    • Type

    • Length: Length of option

    • Value: Option data

Option Selective Acknowledgements

  • TCP uses cumulative acknowledgements

    • πŸ‘ Pro: Very robust against loss of ACK segments

    • πŸ‘Ž Cons: Inefficient loss recovery

      • Sender can only learn about a single lost segment per RTT

      • Consequently

        • Fast retransmit/fast recovery can only recover one lost segment

          per RTT

        • Multiple losses often lead to retransmission timeouts and head-of-line blocking

  • Improvement: selective acknowledgements (SACK)

    • Also acknowledge β€œout-of-order” data
    • Implemented as TCP option
  • πŸ’‘ Idea: Separately acknowledge continuous blocks of out-of-order data

  • Usage of SACK option negotiated during connection establishment

    ζˆͺ屏2021-03-21 17.39.41

  • SACK option format

    ζˆͺ屏2021-03-21 17.40.45

    • Typically, only 2-4 blocks can be β€œSACKed” in one segment
  • Case

    ζˆͺ屏2021-03-21 18.16.04

    Handling:

    • Use first entry of SACK option to report new information
    • Use subsequent entries of SACK option for redundancy Used for redundancy,
      • if prior ACKs were lost

      • Should repeat most recently sent first blocks

  • Different alternatives

    ζˆͺ屏2021-03-21 18.17.23

  • Example

    ζˆͺ屏2021-03-21 18.17.42

Option Window Scaling

  • Header field receive window remains unchanged (16 bit)
  • Scaling factor can be changed
    • E.g., measure window size in 32 bit words instead of bytes
  • Option is negotiated during connection establishment
    • Within SYN and SYN/ACK segments
  • Scaling factor remains unchanged during lifetime of a TCP connection

Extension SYN Cookies

Multipath TCP (MPTCP)

  • Motivation

    ζˆͺ屏2021-03-21 21.37.30
  • 🎯 Goal: Extension of TCP for parallel usage of multiple paths within a single TCP connection

    • Improves reliability
    • Increases performance
  • Important requirements

    • Application compatibility
    • Network compatibility
  • Challenges

    • Middleboxes

Connection vs. Subflow

  • MPTCP connection
    • Communication relation between sender and receiver
    • Consists of one or multiple MPTCP subflows
  • MPTCP subflow
    • Flow of TCP segments operating over an individual path
    • Started and terminated like a β€žregularβ€œ TCP connection
      • Started with 3-way handshake

      • Closed with FIN or RST

    • Can be dynamically added and removed to/from an MPTCP connection

Embedding into Protocol Stack

ζˆͺ屏2021-03-21 21.43.10

Connection Establishment

3-way handshake of TCP

ζˆͺ屏2021-03-21 21.44.49

TCP option MP_CAPABLE

  • X, Y: token for client and server
    • Identification for subsequent addition/removal of subflows

Adding a Subflow

ζˆͺ屏2021-03-21 21.47.59

TCP option MP_JOIN

  • 3-way handshake of TCP
  • Use tokens exchanged during MPTCP connection establishment

Sequence Numbers

Each MPTCP segment carries two sequence numbers

ζˆͺ屏2021-03-21 21.57.57
  • Data sequence number for overall MPTCP connection
  • Subflow sequence number for individual flow
    • Each subflow has coherent sequence numbers without β€žholesβ€œ

Congestion Control

  • 🎯 Goals of MPTCP

    • Improve throughput

      Multipath flow should perform at least as well as a single path congestion control would on the best available path

    • Do not harm

      Multipath flow should not take up more capacity from any of the resources shared than if it were a single flow

    • Balance congestion

      A multipath flow should have as much traffic as possible off its most congested paths

  • Congestion Control algorithm only applies to increase phase of congestion avoidance

    • Unchanged: slow start, fast retransmit, fast recovery and multiplicative decrease
  • Different congestion windows

    • CWnd_iCWnd\_i per subflow ii
    • CWnd_totalCWnd\_{total} per MPTCP connection (multipath flow)
  • Assumption: Congestion window maintained in bytes

  • Basic approach: Couple congestion control of different subflows

  • Linked increase (congestion avoidance)

    For each ACK received on subflow ii, increase CWnd_iCWnd\_i by

    min⁑(Ξ±βˆ— bytes _acked βˆ—MSS_iCWndtotal βŸ_ Increase for multipath subflow , bytes _acked βˆ—MSS_iCWnd_i⏟_ Increase β€žregularβ€œ TCP would get in same scenario ) \min \left( \underbrace{\frac{\alpha * \text { bytes }\_{\text {acked }} * M S S\_{i}}{C W n d_{\text {total }}}}\_{\text{ Increase for multipath subflow }}, \underbrace{\frac{\text { bytes }\_{\text {acked }} * M S S\_{i}}{C W n d\_{i}}}\_{\text{ Increase β€žregularβ€œ TCP would get in same scenario }}\right)

    (any multipath subflow cannot be more aggressive than a TCP flow in the same circumstances (do not harm))

    • Ξ±\alpha: Describes aggressiveness of multipath flow Ξ±=CWnd_total β‹…max⁑_i(CWnd_iRTT_i2)(βˆ‘CWnd_iRTT_i)2 \alpha=C W n d\_{\text {total }} \cdot \frac{\max \_{i}\left(\frac{C W n d\_{i}}{R T T\_{i}^{2}}\right)}{\left(\sum \frac{C W n d\_{i}}{R T T\_{i}}\right)^{2}}

TCP in Networks with High BDP

Scalability Issues

  • It can take very long until the available data rate is fully utilized

  • Cause

    • Very conservative behavior of congestion avoidance

      • Congestion window grows by one MSS per RTT
      • Slow window growth in congestion avoidance causes low average data rate

      ➑️ NOT efficient in networks with high bandwidth-delay products

  • Require faster increase of the congestion window in congestion avoidance

Faster Increase of Congestion Window

  • 🎯 Goals
    • High resource utilization in networks with high bandwidth delay product

    • Quick reactions to changes of the situation within the network

    • Fairness with respect to other TCP variants

  • Different types of fairness
    • intra protocol fairness
      • All senders use same TCP variant
      • Goal: All flows should achieve same data rate
    • With new TCP variants: inter protocol fairness
    • Furthermore: RTT fairness
      • Fairness among TCP flows with different RTTs

CUBIC TCP

  • 🎯 Goals

    • Provide simple algorithm for networks with high bandwidth-delay product

    • TCP-friendly

      Behaves like standard TCP (i.e., TCP Reno) in networks with short RTTs and small bandwidth

    • Congestion avoidance

      Applies cubic function instead of linear window increase

    • Performance should not be worse than TCP Reno

  • In comparison to TCP Reno

    • Better RTT fairness (Window growth independent of RTT)
    • Better scalability to high data rates
  • Currently default congestion control in all major operating systems

Congestion Window Increase

  • Independent from RTT

    • Use of actual time tt that has passed since last congestion incident. I.e. Window growth depends on time between consecutive congestion events

    • Apply cubic function

      W(t)=C(tβˆ’K)3+Wmax⁑ with K=Wmax⁑(1βˆ’Ξ²)C3 W(t)=C(t-K)^{3}+W_{\max } \quad \text { with } \mathrm{K}=\sqrt[3]{\frac{W_{\max }(1-\beta)}{C}}
      • CC: predefined constant that determines aggressiveness of increase
      • W_maxW\_{max}: congestion window size at latest congestion incident
      • KK: time period that it takes to increase current window to W_maxW\_{max} (in case of no further congestions)
      • Ξ²\beta: multiplicative decrease of congestion window
        • Ξ²=0.5\beta = 0.5 for TCP-Reno
        • Ξ²=0.7\beta = 0.7 for CUBIC TCP
      ζˆͺ屏2021-03-21 23.21.53

Congestion Window over Time

Example

ζˆͺ屏2021-03-21 23.23.36

Three CUBIC Modes

  • TCP-friendly region

    • Ensures that CUBIC achieves at least same data rate as standard TCP in networks with small RTT

    • Observation: in networks with small RTTs, Cubic ́s congestion window grows slower than with TCP Reno

    • Approach: β€œemulation” of TCP Reno (which uses AIMD)

    • AIMD(Ξ±,Ξ²)AIMD(\alpha, \beta)

      • Ξ±\alpha: additive increase factor

        W=W+Ξ± W = W + \alpha
      • Ξ²\beta: multiplicative decrease factor

        W=Ξ²β‹…W W = \beta \cdot W

      TCP Reno uses AIMD(1,12)AIMD(1, \frac{1}{2})

    • TCP-fair increment

      Ξ±=3β‹…1βˆ’Ξ²1+Ξ² \alpha=3 \cdot \frac{1-\beta}{1+\beta}
      • Achieves same W_avgW\_{avg} as AIMD(1,12)AIMD(1, \frac{1}{2})

      • Average data rate of AIMD

        W_avg=1RTTΞ±β‹…(1+Ξ²)2β‹…(1βˆ’Ξ²)β‹…p W\_{avg} = \frac{1}{R T T} \sqrt{\frac{\alpha \cdot(1+\beta)}{2 \cdot(1-\beta) \cdot p}}
        • pp: loss rate
    • Window size of emulated TCP at time tt

      W_TCP=W_max⁑⋅β+3β‹…(1βˆ’Ξ²)1+Ξ²β‹…tRTT W\_{T C P}=W\_{\max } \cdot \beta+\frac{3 \cdot(1-\beta)}{1+\beta} \cdot \frac{t}{R T T}
    • Recall window size of TCP cubic

      W(t)=C(tβˆ’K)3+Wmax⁑ W(t)=C(t-K)^{3}+W_{\max }

    β‡’\Rightarrow Rule

    • W_Cubic<W_TCPW\_{Cubic} < W\_{TCP}, then CWndCWnd is set to W_TCPW\_{TCP} each time an ACK is received
    • otherwise, CWndCWnd is set to W_CubicW\_{Cubic} each time an ACK is received
  • Concave region: CWnd<W_maxCWnd < W\_{max} and not in TCP-friendly region

    • For each received ACK CWnd=CWnd+W_cubic(t+RTT)βˆ’CWndCWnd CWnd = CWnd+\frac{W\_{cubic}(t+R T T)-CWnd}{C W n d}
  • Convex region: CWnd>W_maxCWnd > W\_{max} and not in TCP-friendly region

    • CWndCWnd is increased very carefully
    • searching for new π‘Šπ‘šπ‘Žπ‘₯

TCP and Response Time

Basic Issue

  • Response time

    • Time between initiation of a TCP connection and receipt of the requested data

    • Important components

      ζˆͺ屏2021-03-22 17.27.37
      • Handshake of TCP connection establishment

      • Slow start

      • Transmission of the object

    • Macroscopic Model

      • Response time without applying congestion control

        ζˆͺ屏2021-03-22 17.28.54

        • After 1st RTT: Client sends object request

        • After 2nd RTT

          • Client begins to receive object data

          • Receiver needs

            t=object size Odata rate D t = \frac{\text{object size } O}{\text{data rate } D}

        β‡’\Rightarrow lower bound:

        Response timeβ‰₯2RTT+OD \text{Response time} \geq 2 RTT + \frac{O}{D}

        ( With small objects, response time dominated by RTTRTTs)

  • Used Variables

    • RTTRTT: round trip time [Seconds]
    • MSSMSS: maximum segment size [bit]
    • WW: Size of congestion window [MSS], given as multiples of MSS
    • OO: Size of object that has to be transferred [bit]
    • DD: Data rate [bit/s]
  • Observation

    • RTTRTTs have significant influence on response time

    • On connection establishment: 2 RTTRTT𝑠 until reception of object begins

    • During object transmission

      • Small windows create pauses: waiting for ACKs
    • Majority of TCP connections in the Web has short lifetime

      β†’\rightarrow Slow start has significant impact on response time

  • 🎯 Goals

    • Avoid β€žemptyβ€œ RTTs without data transport
    • Reduce RTTs needed for slow start

Bigger Initial Congestion Window

πŸ’‘ Idea: Increase initial congestion window (IW)

  • at least 10 segments, thus, about 15 Kbytes

TCP Fast Open

  • 🎯 Goal: Reduce delays that precede the transmission of an object

  • TCP Cookie

    • Goal

      • Avoid DoS attacks

      • Disallow sending data within first SYN segment of first connection establishment to a server

      • Establish cookie for subsequent connections

    • Use cookie β†’\rightarrow avoid state keeping at server

    • Basic steps

      1. Client requests TFO cookie from server

        ζˆͺ屏2021-03-22 17.40.26
      2. Client uses TFO cookies in subsequent TCP connections

        ζˆͺ屏2021-03-22 17.40.45

HTTP/2

QUIC