Internet is great. Thanks, TCP
The internet is unreliable. It’s almost impossible to keep people away. But it can also be unreliable: packets are dropped, links are clogged, bits are corrupted and data is corrupted. Oh, it’s dangerous out there! (I’m writing this in Kramer’s voice)
So how is it possible that our apps just work? If you’ve networked your app before, you know the drill: socket(),bind() Here, accept() there, maybe a connect() Over there, and it just works. Reliable, organized, uncorrupted data flows back and forth.
Websites (HTTP), email (SMTP) or remote access (SSH) are all built on top of TCP and just work.
why tcp
Why do we need TCP? Why can’t we use bottom layer IP?
Remember, the network stack goes: Physical -> Data Link (Ethernet/Wi-Fi, etc.) -> Network (IP) -> Transport (TCP/UDP).
IP (Layer 3) works at the host level, while the Transport Layer (TCP/UDP) works at the application level, using ports. IP can deliver packets to the correct host via its IP address, but once the data reaches the machine, it still needs to be handed off to the correct process. Each process “binds” to a port: its address within the machine. A common analogy is: the IP address is the building, and the port is the apartment. Processes or apps live in those apartments.
Another reason we need TCP is so that if a router (a piece of infra that your average user doesn’t control) drops packets or becomes overloaded, TCP at the edges (on users’ machines) can recover without the need for the router to participate. Routers remain simple, with reliability at the forefront.
Packets get lost, corrupted, duplicated, and rearranged. This is how the Internet works. TCP protects developers from these problems. It handles retransmissions, checksums, and thousands of other reliability mechanisms. If every developer had to implement these themselves, they would never have time to properly align their flexboxes, which is a terrible alternate universe indeed.
Jokes aside, data sent and received over a socket is guaranteed not to be corrupted, duplicated, or corrupted even if the underlying network is unreliable, which is why TCP is amazing.
flow and congestion control
When you step back and think about network communications, this is really what we’re trying to do: Machine A sends data to Machine B. Machine B has a limited amount of space and must store the incoming data somewhere before sending it to the application, which may be sleeping or busy. It takes the name of temporary storage get buffer And is managed by the kernel:
sysctl net.ipv4.tcp_rmem , net.ipv4.tcp_rmem = 4096 131072 6291456Minimum 4k, default 128k and maximum 8M.
The problem is that space is limited. If you are transferring a large file (hundreds of MB or even GB), you can easily affect the destination. So the receiver needs a way to tell the sender how much more data it can handle. This mechanism is called flow controlAnd the TCP segment includes a field called windowWhich specifies how much data the receiver is currently willing to accept.
Another problem is the burden on the network itself, even if the receiving machine has enough buffer space. You’re only as strong as your weakest link: some links hold gigabits, others only megabits. If you don’t tune in to the slowest link, congestion is inevitable.
Fun fact: In 1986, the bandwidth of the Internet dropped from a few dozen KB/s to much less 40 bps (Yes, bits per second! Yes, those numbers are wild!), which became known as congestion collapseWhen packets were lost and the system retried to send them, they made congestion even worse: a destruction cycle, To fix this, TCP introduced behaviors called ‘play nice’ and ‘back off’ crowd controlWhich helps prevent the Internet itself from getting blocked.
Some Code: A Plain TCP Server
With all low-level things like TCP, C examples are the way to go. Show it as it is.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include
#include
#include
#include
#include
#include
int sockfd = -1, clientfd = -1;
void handle_sigint(int sig) {
printf("\nCtrl+C caught, shutting down...\n");
if (clientfd != -1) close(clientfd);
if (sockfd != -1) close(sockfd);
exit(0);
}
int main() {
signal(SIGINT, handle_sigint);
sockfd = socket(AF_INET, SOCK_STREAM, 0);
int opt = 1;
// SO_REUSEADDR to force bind to the port even if an older socket is still terminating (TIME_WAIT)
setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
struct sockaddr_in addr = { .sin_family = AF_INET, .sin_port = htons(8080), .sin_addr.s_addr = INADDR_ANY };
bind(sockfd, (struct sockaddr*)&addr, sizeof(addr));
listen(sockfd, 5);
printf("Listening on 8080...\n");
clientfd = accept(sockfd, NULL, NULL);
char buf[1024], out[2048];
int n;
while ((n = recv(clientfd, buf, sizeof(buf) - 1, 0)) > 0) {
buf[n] = '\0';
int m = snprintf(out, sizeof(out), "you sent: %s", buf);
printf("response %s %d\n", out, m);
send(clientfd, out, m, 0);
}
close(clientfd); close(sockfd);
}
This creates a TCP server that echoes the message sent by the client by prefixing it with ‘You sent:’.
1
2
3
4
5
6
# compile and run server
gcc -o server server.c && ./server
# connect client
telnet 127.0.0.1 8080
# hi
# you sent: hi
127.0.0.1 (localhost) can be replaced with the remote IP and should work as is.
We used the following primitives/functions, following the Berkeley Sockets way of doing things (released with BDS 4.2):
SOCKET: Create an endpoint (structure in the kernel).BIND: associated with a port.LISTEN: Get ready to accept connections and have a specified queue size of pending connections (beyond that size, skip!).ACCEPT:Accept incoming connection (TCP server)CONNECT:attempt connection (TCP client)SEND: send dataRECEIVE:getting dataCLOSE:release connection
In the above example, we are using client/server dynamics in a request/response pattern. But I can add the following later send,
1
2
3
4
send(clientfd, out, m, 0);
sleep(5);
const char *msg = "not a response, just doing my thing\n";
send(clientfd, msg, strlen(msg), 0);
Compile, run and telnet:
1
2
3
4
5
client here
you sent: client here
client again
not a response, just doing my thing
you sent: client again
I typed in the telnet terminal: client hereThen client againI only got it you sent: client hereThe server was sleeping then. my second line, client againWaiting patiently in the receive buffer. server sent not a response, just doing my thingthen picked up my second TCP packet and replied you sent: client again,
It is largely a duplex bidirectional link. Each party sends what it wants, it just so happens that initially one listens and the other connects. The dynamics that follow need not follow a request/response pattern.
Catfishing Curl: A Dead Simple HTTP Server
Let’s create a very simple HTTP/1.1 server (later versions are more complicated).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// same as before
printf("Listening on 8080...\n");
int i = 1;
while (1) {
clientfd = accept(sockfd, NULL, NULL);
char buf[1024], out[2048];
int n;
while ((n = recv(clientfd, buf, sizeof(buf) - 1, 0)) > 0) {
buf[n] = '\0';
int body_len = snprintf(out, sizeof(out), "[%d] Yo, I am a legit web server\n", i++);
char header[256];
int header_len = snprintf(
header, sizeof(header),
"HTTP/1.1 200 OK\r\n"
"Content-Type: text/plain\r\n"
"Content-Length: %d\r\n"
"Connection: close\r\n"
"\r\n",
body_len
);
printf("header: %s\n", header);
printf("out: %s\n", out);
send(clientfd, header, header_len, 0);
send(clientfd, out, body_len, 0);
break; // one request per connection
}
close(clientfd);
}
1
2
3
4
~ curl localhost:8080
[1] Yo, I am a legit web server
~ curl localhost:8080
[2] Yo, I am a legit web server
we are using i To keep count of requests. We are establishing a TCP connection and returning the HTTP headers expected by the HTTP client (actually the TCP peer). A real HTTP server will return proper HTML, CSS, and JS, and handle many other options and headers. But underneath, it’s just a process using our trusted, reliable TCP.
actual bytes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
0 <----- 32 bits ------>
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Header|Rese-| Flags | Window Size |
| Len |rved | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options (if any) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data (Payload) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Each TCP segment has a header. And each TCP segment is contained within an IP packet. We have a source and destination port. 16 bits each, and that’s where the 64k port limit comes from!
Each transport-layer connection is 5-tuple (TCP/UDP, src IP, src port, dst IP, dst port),
Sequence and Acknowledgment Numbers
TCP reliability depends on two key areas: sequence numberindicating which bytes a segment contains, and acknowledgment numberShows which bytes have been received. Sequence numbers let the receiver interpret data order, locate and reorder out-of-order segments, and identify losses. uses tcp cumulative approvals– ACK of 100 means 0-99 bytes received. If 100-120 bytes are lost but subsequent bytes arrive, the ACK remains 100 until the missing data is received.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
1. A --> B: Send [Seq=0-99]
2. B --> A: Send [Seq=0-49]
3. B --> A: Receives A's [0-99] --> sends ACK=100
4. A --> B: Receives B's [0-49] --> sends ACK=50
5. A --> B: Send [Seq=100-199] --- lost ---
6. B --> A: Send [Seq=50-99] --- lost ---
7. A --> B: Send [Seq=200-299]
B receives --> notices gap (100-199 missing) --> sends ACK=100
8. B --> A: Send [Seq=100-149]
A receives --> notices gap (50-99 missing) --> sends ACK=50
9. A --> B: Send [Seq=300-399]
B still missing 100-199 --> sends ACK=100
10. B --> A: Send [Seq=150-199]
A still missing 50-99 --> sends ACK=50
11. A --> B: Retransmit [Seq=100-199]
B receives --> now has 0-399 --> sends ACK=400
12. B --> A: Retransmit [Seq=50-99]
A receives --> now has 0-199 --> sends ACK=200
The header length reflects how many 4-byte words are needed in the header because the option field is variable length, and thus so is the header.
tcp flags
There are further 8 flags (1 bit each). Some important points:
SYN: Used to establish a connection. ACK: indicates that the acknowledgment number is valid.
These two flags are at the heart of connection setup. Why establish a connection? To detect out-of-order or duplicate segments you need to track what has been sent and received i.e. maintain a state or connection.
SYN And ACK Participate in the famous 3-way handshake:
- A -> B:
SYN(I want to connect) - B -> A:
SYN,ACK(I got your SYN, I want to connect too!) - A -> B:
ACK(Understood, connection established!)
FIN Gives the flag tearing signal and also uses a handshake:
- X -> Y:
FIN(I want to disconnect) - y -> x:
ACK(Got your fin, whatever!) - y -> x:
FIN(I also want to disconnect – sometimes sent with previous ACK) - X -> Y:
ACK(got it!)
This is usually a 4-way (sometimes 3-way) goodbye handshake.
RST Is the reset flag. This indicates an error or forced shutdown – drop the connection immediately. sends an os RST If no process is listening or if the listening process has crashed. There is also a known TCP reset attack where intermediaries enter RST To terminate the connection (used by some firewalls).
window
We talked about this area in flow control. As mentioned above, it indicates how many bytes the receiver is willing to receive after the accepted number.
With the above example, running ss (Socket Statistics) Provides information about TCP connections.
1
2
3
4
ss -tlpmi
// State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
// LISTEN 0 5 0.0.0.0:http-alt 0.0.0.0:* users:(("server",pid=1113,fd=3))
// skmem:(r0,rb131072,t0,tb16384,f0,w0,o0,bl0,d0) cubic cwnd:10
rb131072 (128KB) while the received buffer size is tb16384 (16KB) The transmit buffer is the size where data waits before being sent over the network. Send-Q indicates bytes not yet accepted by the remote host, and Recv-Q Shows bytes received but not yet read by the application (for example, data from the second line in the telnet session above is waiting while the server was sleeping).
Ultimately,
Checksums are used for reliability. All 16-bit words in the TCP segment are added together, and the result is compared to the checksum. If they do not match, it means that some bits have probably been corrupted, and retransmission is required.
conclusion
It always amazes me how it all works. Network, Internet. Reliable and consistent. A few decades ago, sending a few KB was quite an accomplishment. And today, 4k streaming is common. God bless all the hard-working people who created this and made it possible!