In this article, I will cover the basics of How the internet works by covering the history of the web and the networking protocols that make it work and are important for us as web developers to know and understand.
How the Internet Works
When we type in the browser address bar a URL such as google.com, a request is sent to the ISP (internet service provider) which is the place we pay money to have access to the internet. From the ISP a request is sent to the DNS (domain name service) which is a kind of phone book that holds a list of URLs and their IP address. The DNS responds with an IP address to the ISP and from there to the browser.
With the IP address, the browser knows to send a request to google servers in our case. The google servers will send back the assets (HTML, CSS, and JavaScript) to the browser. The browser will convert the assets to a website which will be shown in the browser.
The internet backbone is the connection between computers around the world, which means that are a few routes before we get to the desired IP address. We can check the hubs by typing in the terminal “traceroute google.com” and we will see all the IP addresses of the hubs we route to before the desired address, from our route at home to the desired address which in our case is google.
By understanding how the web works we can improve the performance of our websites:
- location of the servers – If the servers are close to us it will be faster.
- How many trips – fewer requests to the servers will speed up the website
- Size of files – small HTML, CSS, and JavaScript files will be faster to transfer.
The WWW is a common language that computers can speak. It’s a combination of a language and protocol for sharing documents. After the creation of the first browser and the server, HTML, CSS, and JavaScript were invented:
- HTML – is a way for us to write text (1991)
- CSS – is a way for us to style HTML pages (1995)
- JavaScript – is a way for us to add interactivity(1995)
After the creation of the first browser more browsers were created by different companies over time, such as internet explorer, firefox, chrome, and safari which are the major ones. all the browsers had to agree on how they will read the HTML, CSS, and JavaScript but they implemented different things. This was the edge of browser wars. Today there is a governing body that creates standards so the differences are not big as they were.
Clouds
I don’t know about you, but when I think clouds I think of “care bears”.
The cloud does not actually exist, it’s just a bunch of actual computers somewhere else in the world. a cloud computer is not really a cloud computer, it is just someone else’s computer. for example, AWS which is a cloud service has computers in data centers in different places. so the cloud is actual computers that don’t belong to us. the cloud is just a network of computers that talk to each other.
Packets
The packets are little streams of data that are passed around between the computers in the network. when you type a URL in the browser or watch a youtube video, data is getting passed back and forth between a server and your client machine which is probably a browser, but it might be the console too. so the data that is passed around is in the form of packets
The packet has 5 basic layers:
- Application – protocols such as HTTP, FTP, SSH, and SMTP which we use in the various network layers.
- Transport – TCP/UDP
- Network (internet) – IP
- Link – wifi/ethernet connection
- Physical – the actual cables that connect stuff together.
As developers, we are interested primarily in the application, transport, and network layers.
Protocols Types
The protocols are the language that computer programs speak to each other. here are some of the network protocols:
- HTTP – browser web pages
- HTTPS – browser web pages with encryption
- SMTP – send and receive emails
- IMAP, POP3 – load emails from the inbox
- IRC – chat
- FTP – file transfer
- SSH remote shell over an encrypted connection
- SSL – low-level secure data transfer (used by HTTPS)
Transport Layer Protocols
The transport layer creates 2^16 ports on our computer, so when we start a node app on port 3000 we actually use one of the available ports created by the transport layer. you can think of your network connection as a hotel, the hotel is a single building (network connection) which have individually numbered rooms (ports) when someone comes to the hotel to find a guest in the hotel, he needs to know the room number.
so what really happens is that an application of a given machine will issue a network request, let’s say an HTTP request, and it will originate from port 3200, and it wants to talk with port 80 on another computer. that request will be handed off to the transport layer and it will be wrapped up in what’s called a segment. inside the segment, there will be metadata and it will have the destination port (port 80) and the source port (port 3200). the transport layer will hand that off to the network layer for further processing. when it gets to the receiving machine it will go through the process in reverse and eventually find the right port.
UDP and TCP Overview
There are 2 types of transport layer protocols: UDP and TCP.
UDP:
- Lightweight – 8 bytes for a headers
- Connectionless – the client doesn’t have to create a connection before talking to the server.
- Consistency – send data no matter what, even when there is a packet lost, the network is congested or packets are out of order.
In conclusion, UDP is fast but unreliable. UDP is used primarily for video games and real-time communication.
TCP:
- Connection Based – The client does a three-way handshake before communicating with the server, the client says I’d like to talk, the server says “yes” or “no” and the data start going accordingly.
- Reliable –
- as we so above, we know the connection is going to happen.
- delivery acknowledgments – every time data comes through, the server will let the client know that it got the data and vice versa.
- retransmission of data – if data doesn’t get received, the server can let the client know about it, and the client will send it again.
- in-order packet – a guarantee that packets arrive in the correct order regardless of what happens with the network.
- congestion control – when the network is overwhelmed, TCP will introduce latency to try and keep packet loss to a minimum.
In conclusion, TCP is slower than UDP but is more reliable. HTTP uses TCP because we need the connection to be reliable. if we are going to send a web page across the internet, we can’t allow the packets to show up in a different order which will make the HTML show up in the wrong order.
The TCP and IP together create an environment for 2 machines to talk to each other, the TCP is used for HTTP because it is reliable and UDP is not.
HTTP Overview
the HTTP protocol lives in the application layer. HTTP was made in the first place just to pass HTML. check out the first web page that was ever made in the following link http://info.cern.ch/.
These days, HTTP does not pass only HTML, it passes images, 4k videos, mp4, and any type of digital file you can think of.
HTTP is very efficient:
- it connects and remains connected until all the data has been sent.
- does not have to stay open, it is only connected when absolutely necessary. means that when the request arrives, the machines will disconnect entirely from each other as soon as the responder is ready the HTTP connection will re-establish across TCP and will send the response.
HTTP is stateless
- No dialogue – the machines only know about each other for as long as the connection is open. stateless means that the machine only knows about what it got right now and will respond based on it regardless of what happened before.
When the user types a URL in the browser, it goes through the internet connection to the ISP and bounced around until eventually via TCP-IP gets to the host machine. then a process will start of establishing a connection through the three-way process I mentioned above.: the TCP will say I want to make a connection, via TCP the server will respond with yes or no, and then the data will start to come, part of the data will be the HTTP request that will come to the server. this means that the request happened and finished which means that the connection is terminated, the TCP connection is still open but the HTTP request has been terminated.
After the request is finished, the client still waits for a response and an HTTP response. the server will do what it needs to do with the request, and when finished it will send an HTTP response to the client. this will close the connection, the TCP will be closed the client and server won’t remember that this is never happen
HTTP Message
The HTTP messages contain:
- “first line” – describes the type of request and the status of the response. for the request, it will be a method, path, and protocol such as “GET /Blog http/1.1” and the response will be a status such as “http/1.1 200”.
- “header” – specifies the request and describes the body. it contains metadata and comes in a form of key and value pairs.
- “body” – the content itself (binary data, HTML…)
for the list of all verbs available check the following link: https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods