| title | Unit 2: The Web and HTTP | |||||||
|---|---|---|---|---|---|---|---|---|
| id | unit2-topic2 | |||||||
| tags |
|
|||||||
| aliases |
|
Now that we understand the core principles of network applications, let's examine the most well-known application of all: the World Wide Web and its protocol, HTTP.
[!note] Definition: The World Wide Web consists of documents and other web resources which are identified by URLs. HTTP (HyperText Transfer Protocol) is the application-layer protocol that enables communication between web browsers and web servers to exchange these resources.
HTTP is the heart of the web. It follows a client-server model and uses TCP as its transport protocol because reliability is crucial for web pages to load correctly. A critical concept is that HTTP is stateless—the server maintains no information about past client requests.
How HTTP uses TCP connections has evolved over time.
- Non-Persistent HTTP (HTTP/1.0): A new TCP connection is opened for every single object on a webpage (the HTML file, each image, etc.). This is very slow due to the high overhead of repeated TCP handshakes.
- Persistent HTTP (HTTP/1.1): The browser opens a single TCP connection and reuses it to request multiple objects. This dramatically reduces latency.
HTTP communication consists of two types of messages: requests and responses.
This is an ASCII, human-readable message sent by the client.
GET /path/to/file/index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0
...
- Request Line: Contains the Method, the URL, and the HTTP version.
- Header Lines: Provide additional information to the server.
- Entity Body: Contains data being sent to the server (e.g., from a form). It is empty for GET requests.
| Method | Purpose |
|---|---|
| GET | Requests a representation of the specified resource. This is the most common method. |
| POST | Submits data to be processed to the specified resource (e.g., submitting a form or uploading a file). |
| HEAD | Identical to GET, but the server does not send the message body in the response. Useful for checking if a resource exists and getting its metadata. |
| PUT | Uploads a representation of the specified resource. Can be used to create or overwrite a file on the server. |
| DELETE | Deletes the specified resource. |
This is the message sent back by the server.
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1234
...
<html>...</html>
- Status Line: Contains the version, a status code, and a status message. (e.g.,
200 OK,404 Not Found). - Header Lines: Provide information about the response.
- Entity Body: The actual content of the requested object.
Since HTTP is stateless, cookies are used to allow servers to maintain session information.
- The server includes a
Set-Cookie:header in its response with a unique ID. - The browser stores this cookie.
- The browser includes this cookie in the
Cookie:header of all future requests to that same server. - The server uses the ID to retrieve session information (e.g., shopping cart items, login status).
A web cache is a server that satisfies HTTP requests on behalf of an origin server. It stores copies of recently requested objects.
- Benefits: Reduces response time for clients and reduces traffic on an institution's access link.
- Conditional GET: Caches use the
If-Modified-Sincerequest header to ask the server if the cached object is still valid. If it is, the server sends back a small304 Not Modifiedresponse, saving bandwidth.
You will notice that most websites today use https:// instead of http://.
- What it is: HTTPS is not a separate protocol. It is standard HTTP layered on top of TLS (Transport Layer Security). TLS is a security protocol that runs on top of TCP.
- Purpose: TLS provides the two key security services that HTTP lacks:
- Confidentiality: It encrypts all HTTP message content, protecting it from eavesdroppers (packet sniffers).
- Integrity & Authentication: It ensures that the client is talking to the real, authentic server (preventing MitM attacks) and that the messages have not been altered in transit.
- 2-Mark Questions:
- Is HTTP a stateful or stateless protocol? (Stateless).
- What is the purpose of cookies? (To add state to HTTP).
- What is the difference between HTTP and HTTPS? (HTTPS is HTTP over an encrypted TLS session).
- 10-Mark Question:
- "Explain the format of HTTP request and response messages, including common methods and status codes. Then, describe how the stateless nature of HTTP is overcome using cookies."
- Tip: Show the structure of both message types. Include a small list of methods (GET/POST). Then detail the 4-step cookie process: 1) Server sends Set-Cookie, 2) Browser stores it, 3) Browser sends it back in future requests, 4) Server uses it to identify the session.
- "Explain the format of HTTP request and response messages, including common methods and status codes. Then, describe how the stateless nature of HTTP is overcome using cookies."