Computer networking

Networking

What is the Internet?

The Internet is a really big network of computers (the really big one). A network of computers is some computers sending messages to each other. The Internet relies on protocols, which are algorithms implemented as computer programs which specify the rules used by the computers for communication. The protocols of the Internet also form a stack conceptually, and there are different overlapping versions of this model that have different names.

The lowest layer is the physical layer and could involve electromagnetic waves. Then there is a link layer (devices: switches; identifiers: MAC addresses), an Internet layer (protocol: IP; devices: routers; identifiers: IP addresses), a transport layer (protocols: TCP, UDP; identifiers: port numbers which identify processes), an application layer (protocols: HTTP), and even a session layer (protocol: TLS).

The link layer

The link layer sits on top of the physical layer and includes switches, devices operating here that forward Ethernet packets based on destination MAC addresses.

The Internet Protocol Suite

The two most important protocols are TCP (Transport Control Protocol) and IP (Internet Protocol). Together, TCP/IP is referred to as the Internet Protocol Suite.

The Internet layer

Above the link layer is the Internet layer, which has routers instead of switches. The routers use a routing table, which is set up by some other protocol unrelated to these, to send packets toward other routers that are closer to the target and know more about the entire network closer to the target. It makes no guarantee that it will arrive, or that it will arrive in order, but it makes a best effort attempt.

The transport layer

The two main protocols of the transport layer are TCP and UDP. UDP is pretty much just a stripped-down version of TCP that is used sometimes to bootstrap custom protocols in this layer.

TCP

TCP is on top of IP and guarantees that they are interpreted as being intact and in order. This introduces a downside called head-of-line blocking. If 5 packets get sent and the first is dropped somewhere, then the 4 packets that have already arrived must wait for the first packet to get resent (and it always will get resent when it gets dropped, because that is part of what TCP does to guarantee the complete data arrives).

The TCP handshake

A TCP connection is started with a handshake. The client sends a SYN packet, and then the server sends a SYN ACK packet, and then the client sends an ACK followed immediately by application data which means one roundtrip of messages has to occur before application data is transmitted.

UDP

UDP is an alternative protocol that can be used for things like video games where things are happening too fast to wait for a single dropped packet to get resent. A distinction to make between IP and TCP is that IP enables packets to be sent between computers (host-to-host), and TCP enables segments to be sent between processes.

Processes

TCP enables segments to be sent between processes, which are identified here with port numbers. There are some port numbers that are good to know, like 80 which is standard for the web server process waiting for HTTP requests. If HTTPS is being used instead of HTTP, the web server listens on port 443.

A similar but different thing to a process is a thread. A process will always have a primary thread, and it can spawn worker threads. When you have worker threads, it's possible for the process to no longer be thread-safe. This is a consideration with Sidekiq, a library (used with Rails) for queueing jobs that uses worker threads to do the jobs.

The application layer

The main protocol of the application layer is HTTP (Hypertext Transport Protocol). This is the most important layer to keep in mind when developing a web application like this one. When TLS is added to HTTP, it is called HTTPS.

HTTP stands for Hypertext Transport Protocol. The markup language used to describe the structure of a website as a document is HTML, which stands for Hypertext Markup Language.

The client-server pattern is a very common pattern where a client sends a message to a server, which can do something, and then sends a message back to the client. HTTP involves the client-server pattern/architecture. Databases also use the pattern, which is exhibited when you use a client program CLI like psql or sqlite3 to send SQL queries to the database server which then gives you a syntax error (if your query was syntactically correct, it will be the data you asked for). The docker CLI is also sending messages to a Docker daemon which does the heavy lifting.

The client sends an HTTP request to the server which responds with an HTTP response, a clear example of a client-server interaction. The client is the web browser sending the message on behalf of the user, and the server is a program called a web server running on a host computer. The requests and responses are both HTTP messages which have status codes, data parameters (URL parameters such as in the query string and POST data in the body), HTTP headers, and bodies. There are some differences between HTTP requests and responses in terms of what is required for it to be a valid one.

The web server might be Nginx or Apache. WEBrick is another simple web server that you can use right away after installing the rails gem and using the rails new rails_app_name command (the command to start the new Rails app with WEBrick is bin/rails server). A web server like Apache can be used to serve static assets. Conventionally the asset served to a GET request to the root path / is an HTML file called index.html. Instead of serving static assets, the web server can be communicating with an application server and Ruby on Rails application logic.

The web server is listening on port 80 or port 443 depending on if it's HTTP or HTTPS. It receives a message which is an HTTP request. It just passes the HTTP request to the Ruby on Rails web application. The application then has an opportunity to send commands to the database and make network calls which are HTTP requests to other servers. This can be done synchronously or asynchronously. For example, the primary thread of the program could spawn a worker thread to go send commands to the database and synchronously wait for it to complete. It could also send off a job to do it asynchronously. It may only put the job in a queue of jobs that worker processes/threads pick up as they get capacity too. Be careful not to press any "virus attack" buttons that cause too many jobs to get sent at the same time. Network calls can also be done synchronously or asynchronously with a design pattern like the reactor where it arranges for the response to be handled in some way when it arrives back later.

Anyway, so all that can happen, and many other things can happen too (like language models, emails, sending messages to client devices like mobile phones) and you probably forgot that the server needs to after all this, send the HTTP response to the client that sent it the HTTP request in the first place. The server also would have looked at things like the session in the HTTP response which could have stored an identifier for the user, parameters in the URL including the query string and POST data (the HTTP request body of a HTTP request with the POST method). The method of an HTTP request is an important thing to know about. GET requests are the standard ones your web browser sends when you click a hypertext link. Clicking the link causes the GET request to get sent to the server identified by that URL, which includes a domain name. The domain name identifies the host (a computer) by being a human-readable alias of the IP address of the host computer. Remember: the IP protocol enables host-to-host communication. IP addresses are the way that the computers are identified. Inside the computer, the processes are identified by port numbers. TCP enables process-to-process communication. It is in this transport layer, that sits above the Internet layer (which as you will remember, is on top of a physical layer that involves waves both electric and magnetic in nature depending on the motion of the reference frame (electromagnetic waves--see Griffith's Electromagnetic Theory)).

What is the World Wide Web?

The web is what you access using a web browser like Google Chrome or Microsoft Edge. It is all the websites that you get to see when you use Google. Google is by far the most dominant Internet search engine. It maintains an index, not unlike the index at the back of a textbook, of all of the web pages. If you remember from reading a textbook, there is a section at the end of it which has all of the key terms of the textbook in alphabetical order and accompanied by all the page numbers that you can see the key term discussed on. Google's index is a similar data structure that allows Google to efficiently search this huge amount of information for only things that will be relevant for your search. You often will find Google's index talked about in terms of SEO (Search Engine Optimization) which is the practice of trying to get your website ranked high in Google search results. You can tell Google not to index your website too. I use the Google Search Console to see how Google perceives my website. To do that, I just put an HTML tag in this website which Google uses to collect useful data for me. They might find the data useful too, I'm not sure. It is possible a "retrieval structure" (in a sense of the cognitive approach to memory based on a deep understanding of an activity) allows someone to be really good at that activity (having fast and nuanced thinking, perceiving more, knowing and remembering more, and making complex decisions quickly) is a similar kind of thing to an index.

Ruby is a programming language that might also be called a scripting language. It is mainly used for rapid web application development (e.g., gluing a database to a website) with the Ruby on Rails framework. Ruby and Rails are used for the server-side (also referred to as back-end) application logic that accepts the data submitted by the web browser (as the user agent uses the website) and after consideration of that data, interacts with the database and then sends a response to the user to be interpreted as the website in the browser. The database is managed by PostgreSQL, an impressive open-source object-relational database management system. Explicit SQL statements are scarcely used in the application logic here because Rails provides an object-relational mapping tool called Active Record (apparently an implementation of the active record pattern) which allows using the domain-specific language of Active Record instead.

Since the user interface is a website, the three main languages of front-end web development, HTML, CSS, and JavaScript, are being used. HTML describes the structure of the website and is really pretty straightforward for the most part. CSS describes how the HTML is displayed and is concerned with colors, fonts, sizes, placement of things, and a lot of other things too. At times it may seem to encroach on what JavaScript is concerned with, which is client-side interactivity. This is programming where the code is executing on the website without a round-trip message to the server (although JavaScript can make requests to servers too). The flashcards on this page implement this feature which uses a little JavaScript: clicking or tapping on them hides or reveals the back. These fundamental technologies of the web are enhanced and built upon by additional tools and frameworks. For example, the HTML is created on the server from ERB (Ruby embedded in HTML) files, almost all of the CSS classes are provided by a framework called Tailwind CSS, and the JavaScript is mostly written inside a modest JavaScript framework called Stimulus. Stimulus is one part of a larger framework called Hotwire along with Turbo, which is also being used here but it is more related to HTML.

Linux

Some videos about Linux: