Understanding HTTP: The Language of the Web

mccleod1290

This will be start of our new blog series, I am planning to to keep it simple and we will call this as Web Application Pentesting series. Websites and APIs are every where around us, from daily money transactions to streaming services to entertainment, media and music. Web application testing hence becomes one of the most valued skills and since it’s been quite long time, let’s get started. Our first blog in the series will be related to HTTP.

Some of the basic pre-requisites needed to follow along with this blog are:

Familiarity with linux
Willingness to learn and google out whenever and where ever you get stuck

1. Basic Introduction to HTTP and URL

Introduction to `HTTP`

Well if you ever had computer classes during your mid-school or highschool days then you do know that every computer teacher must have atleast taught you at once that http stands for hyper text transfer protocol and the https stands for the same thing but s in the https stands for secure. And they might even remind you of the URL scheme which has https:// at the start of the domain website. Understanding http is really important because websites that we see and use only daily basis communicate via http protocol. Hence in order to tamper or manipulate websites we should understand http atleast at an surface level

Now if you are interested in learning or reading about the evolution of the internet or web feel free to check out this article. In the forth coming paragraphs we will be referring to http synchronously with http protocol and we are making note here that protocol is a set of agreed upon standard followed by people, and this set of standards are widely used by both us and the websites and web servers to communicate and send data across the web.

HTTP has undergone significant transformations since its founding days:

HTTP/0.9: The first version of HTTP, released in 1991: Very simple, only API allowed was GET without any headers or response code. Here’s an example of a typical request.

GET /something.txt

HTTP/1.0: Introduced status codes, headers, and permitted more complex interactions, this version was released in 1996; a simple request in this version might look like this:

  something:webpage.php
  GET /something/webpage.php HTTP/1.0
  Host: www.example.com

HTTP/1.1: This version was published in 1999, with new things like persistent connections, chunked transfer encoding and cleaner header management. A sample request might look like:

  POST /submit HTTP/1.1
  Host: www.example.com
  User-Agent: MyBrowser/1.0
  Content-Type: www-form-urlencoded application/x
  Content-Length: 27

  name=John&age=30

HTTP/2 and HTTP/3: The iterations are modern contemporary with a focus on performance and security.
HTTP/2 and HTTP/3: Modern iterations focus on performance and security.
HTTP/2 leverages:
1. Multiplexing (Multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) and
2. Binary framing (This is the layer that makes all other function and performance optimizations possible in HTTP/2, and which dictates how HTTP messages are wrapped and transported over the wire from the client to the server, and vice versa.)

While HTTP/3 uses QUIC (built on UDP) for reduced latency and improved resilience.

Learn more about this new QUIC protocol here and here

As of today, HTTP/2 and HTTP/3 are the recommended standards due to their enhanced speed and efficiency.

References: RFC 9113 (HTTP/2) and RFC 9114 (HTTP/3).

Introduction to `URL`

Pentesting and exploiting misconfigurations are done by understanding the URLs. A URL consists of the below mentioned parts:

protocol://username:password@host:port/path?query#fragment

Protocol: HTTP, HTTPS, FTP, SMB, etc.
Host: Domain or IP (e.g., example.com).
Port: Defaults to 80 for HTTP and 443 for HTTPS but can vary.
Path: The resource location (e.g., /index.html).
Query: Key-value pairs (e.g., ?id=1&name=test).
Fragment: Section within the resource (e.g., #section1).

Exploiting URL Schemes in CTFs

Now if this is your first time you have heard the word CTF then no worries, CTFs or Capture the Flag is an cybersecurity contest where you will be given some tasks that require you to think critically, analytically and apply various cybersecurity concepts in order solve the task, and once you have correctly solved the task you will get an value called flag which will give you some points. Best places to get started with ctfs are tryhackme and hackthebox, feel free to check these websites on google.

Certain protocols like FTP (File transfer protocol, used to share files) or SMB (Server Message Block used to share resources remotely) embedded in URLs can be abused to retrieve files or gain unauthorized access:

FTP Authentication:

ftp://username:password@ftp.example.com/file.txt

Hackthebox has a cool machine called forge in which we need to get files from FTP using HTTP url. This can be achieved by creating an URL something like this.

http://admin.forge.htb/upload?u=ftp://user:heightofsecurity123!@127.0.0.1/

SMB Authentication:

smb://username:password@share.example.com/resource

Such manipulations are invaluable in CTF challenges, often leading to flag discovery.

2. HTTP Requests and Responses

Now HTTP operates via Request-Response Model . Do note that HTTP is a request-response protocol built on TCP/IP. A typical session involves client request and server response which looks somewhat similar to this

Client Request:

GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: curl/7.85.0
Accept: text/html

Server Response:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1256

<html>...</html>

Do note that you can look at these request and responses via :

Intercepting proxies like burp suiet or zap or caido
Using tools like curl
Browser dev tools

To watch this request, if you firefox user then check out this youtube tutorial and for chrome / chromium / google chrome users check out this tutorial

Key features in modern HTTP request and response are:

Headers: Metadata like Content-Type, Authorization, and Accept-Encoding.
Persistent Connections: Keeps TCP connections open for multiple requests (enabled by default in HTTP/1.1).
Chunked Transfers: Allows streaming of large responses in smaller, manageable chunks.

Now before we go to the next section, make sure you are comfortable with everything we have talked in this blog series so far, and if you are ready then proceed to the next paragraph, else re -read the blog and click on external links and references to understand what we have discussed so far.

Now the http requests have certain ways to communicate to the server and these are called http methods. And the server responds to these methods with certain http codes that is called http response or status code.

HTTP Methods

HTTP methods are just like commands you give the server to tell it what you want to do. Imagine the server is like a librarian, and these methods are the requests you make:

GET:
- Reads information about the server. (It doesn’t do anything on the server.)
- Example: Fetching user data, or opening a webpage.
POST:
- It creates or processes data on the server." (Here, you send data to the server for processing.)
- Example: Doing something that requires log in, say a login form or a new user.**
PUT:
- To create or update a resource.
- Example: Rewinging the email or profile picture of the user.
DELETE:
- (used cautiously because it * deletes * data.) Deletes a resource.
- Example: Deleting a user account.
HEAD:
- Retrieve headers only without downloading the data. Useful for validating something without downloading data.
- Example: How to check if a webpage has changed by looking at metadata.
OPTIONS:
- List server capabilities.
- Example:It is used to check if the server supports CORS (Cross-Origin Resource Sharing).

HTTP Status Codes

HTTP status codes are responses received by the librarian from the book you have requested. They’re grouped into categories:

1xx Informational: “I’m working on it.”
- Example: The server received your request, but has not yet responded to it.
2xx Success: “Everything’s fine!”
- Example:
  If you made a GET or POST request, then this returns a:
  - 200 OK
    Thanks to the HTTP status codes and error pages, we are able to state not only what was right about a request but also what went wrong. In this way, we can communicate with our API differently depending on the HTTP status code it returns.
  - 201 Posted — Your new user account was successfully created (POST).
3xx Redirection: “Go somewhere else.”
- Example:
  - The webpage you want has moved to a new address, or you can just change the URL in your browser.
  Context:
  
  Another 404 means also:
- The URL you entered is a dead end.
- The website you want doesn’t exist.
- You entered the incorrect URL in your browser.
- Or the server your browser reached has removed this page.
  - A temporarily redirect (not permanent) to another URL: 302 Found.
4xx Client Errors: “You made a mistake.”
- Example:
  - 400 Bad Request — You’re sending in bad or malformed request.
  - 401 Unauthorized — You have to log in before.
  - If the resource you’re looking for doesn’t exist, you get a 404 Not Found.
5xx Server Errors: “I made a mistake.”
- Example:
  The server couldn’t handle your request.
  - 503 Service Unavailable — The server is down or is overloaded.

3. Command Your Web: Mastering cURL and wget

HTTP is the basis of the web and tools such as curl and wget give you direct power to talk to servers. These tricks will help you move faster and with fewer worries, no matter if you’re testing APIs, downloading files, or mirroring websites.

Using `curl`

curl (Client URL) is a command line tool for sending HTTP requests. The protocol it supports includes HTTP, HTTPS, FTP and others, which makes it perfect for debugging and testing.

Basic Commands

Basic GET Request
Retrieve a webpage or resource:

curl https://example.com

The function used is this command which fetches the HTML content of the specified URL and shows it in the terminal. This is great for checking to see if it is possible to reach a resource.

Inspecting Headers
Fetch only the HTTP response headers with the -I (uppercase “i”) option:

curl -I https://example.com

Headers are metadata such as content type, server information and status codes and helps you understand how the server responds to a request.
Example output:

HTTP/2 200  
Content-Type: text/html; charset=UTF-8

Submitting Form Data
Simulate form submissions by sending data using the -d option with a POST request:

    curl -X POST -d "username=admin&password=1234" https://example.com/login

What is happening here?

-X POST: Is set to POST for an HTTP method.
-d: Passes the specified data to server within the request body.

Authentication
Handle HTTP Basic Authentication using the -u flag:

    curl -u admin:example.com/secure password

This is useful acquiring protected resources like admin dashboards withing your own power or APIs that may need credentials. By default, username and password are Base64 encoded and are sent to Authorization header

Advanced Features

Handling Redirects
Use the -L flag to follow redirects:

curl -L https://example.com

curl follows the redirect that is produced with a 3xx status code, and uses the code to retrieve the final resource.

Debugging Using Verbose Mode
Add the -v option for detailed debugging:

curl -v https://example.com

Verbose mode gives a detail about an complete communication process with request headers, response headers and the rest of it.

Downloading Files
Save a file to your system by combining -O (output to file) or -o (custom filename):

curl -O https://example.com/file.zip

-O: It simply saves the file with it’s original name.
-o custom_name.zip: It saves the file with a custom name.

Using wget

This is a non interactive tool intended to download files or entire directories. In scenarios it is good for which email downloads are necessary either sequentially or in a recursive manner.

Basic Commands

Downloading a File

wget https://example.com/file.zip

wget does its thing, whatever it is, it’ll just get that file, and put it into your working directory with its original name.
Recursive Downloads

Mirror an entire website or directory:

wget --recursive --no-parent https://example.com/docs/

--recursive: It downloads files contained in the specified directory.
--no-parent: Disallows downloading files in any directory but the specified one.

Use case:Archiving web pages or collecting all files in a directory will work best with this.

Advanced Features

1. Spiders and Testing URLs

Use --spider with the -S flag to check if a URL exists without downloading its content:

wget --spider -S https://example.com

This is useful for verifying the existence of a resource or testing URLs in bulk. The -S flag ensures proper handling of HTTP headers, which is essential for the command to work correctly.

Downloading Entire Websites

wget --mirror https://example.com

--mirror: Mirrors the website, preserving directory structure and timestamps.

Custom Headers

While wget isn’t as flexible as curl for API testing, you can still send custom headers using --header:

wget --header="Authorization: Bearer token" https://example.com/resource

Key Differences Between cURL and wget

Purpose:
- One of the features of cURL is that it was designed to transfer data and test HTTP APIs.
- Wget is meant to download primarily files.
Recursive Downloads:
- cURL doesn’t allow a recursive download.
- Moreover, wget has recursive downloads (i.e. download whole directories).
Resume Downloads:
- But again, these downloads need options or scripting to resume (e.g. -C -).
- However, the -c flag is one of the advantages that wget has built in support for resuming downloads.
Supported Protocols:
- cURL supports a large number of protocols like HTTP, HTTPS, FTP, SCP, SFTP, SMTP, LDAP …
- wget supports HTTP, HTTPS, FTP.
Customization:
- In fact cURL is very customizable when it comes to dealing with HTTP requests, headers and data transfer.
- wget by itself is only able to support basic download configurations.
Ease of Use:
- You need at least some familiarity with HTTP and data transfer concepts in order to use cURL.
- When it comes to getting a basic file down, wget is simpler and easier to use.
Testing APIs:
- It is best suited for testing API from cURLs as it can do the requests with custom headers, cookies and data.
- wget is not meant for API testing.
Downloading a File:
- If you know exactly which headers or payloads you want to download, cURL is great for this kind of one off downloads.
- wget is great for quick, straight forward downloading of files.
Downloading Entire Websites:
- This feature is not supported by cURL.
- However, for downloading entire websites, use of wget with the -r
Scripting Automation:
- Data transfer workflows are fine-grained is to be controlled with cURL.
- If you need to do a simple, automated bulk downloading, wget is the better way to go.

When to Use Each Tool

Use cURL if you’re:
- A creator or a tester in creating or testing the APIs.
- HTTP headers, cookies, or authentication workflows debugging.
- Web-based tasks, like form submissions, or API interactions.

Use wget if you’re:
- Someone who is a researcher or sysadmin archiving web data.
- For example download entire directories or webies recursively.
- Pick up where you left off with ongoing downloads.

What Are APIs and how to test them with `curl` and `wget`?

Application Programming Interfaces (APIs) allow two software applications to talk to each other. They provide a way to define a set of rules about to how to interact with the service, whether that’s sending data, receiving a response, or triggering an action. Restful APIs are a common example of how those using HTTP methods such as GET, POST, PUT, and DELETE manage resources.

Don’t worry, its trivial with both curl and wget to play along and just understand, what API are doing. Check out the following websites to test APIs like rest using [curl](https:Also, you can read about //www.fastly.com/blog/anatomy-of-a-curl-how-to-use-curl-to-test-an-origin-servers-response and wget.

References:
[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/38922510/c3acef47-cc0e-45a2-a8ff-45df86397495/http-info.txt
[2] https://stackoverflow.com/questions/38906626/curl-to-return-http-status-code-along-with-the-response
[3] https://www.fastly.com/blog/anatomy-of-a-curl-how-to-use-curl-to-test-an-origin-servers-response
[4] https://www.warp.dev/terminus/curl-post-request
[5] https://www.digitalocean.com/community/tutorials/how-to-use-wget-to-download-files-and-interact-with-rest-apis
[6] https://stackoverflow.com/questions/6264726/can-i-use-wget-to-check-but-not-download
[7] https://www.geeksforgeeks.org/wget-command-in-linux-unix/
[8]
[9] https://terminalcheatsheet.com/guides/curl-rest-api
[10] https://www.hostinger.in/tutorials/wget-command-examples/
[11] https://www.gnu.org/software/wget/manual/wget.html