This will be start of our new blog series, I am planning to to keep it simple and we will call this as Web Application Pentesting
series. Websites and APIs are every where around us, from daily money transactions to streaming services to entertainment, media and music. Web application testing hence becomes one of the most valued skills and since it’s been quite long time, let’s get started. Our first blog in the series will be related to HTTP
.
Some of the basic pre-requisites needed to follow along with this blog are:
- Familiarity with linux
- Willingness to learn and google out whenever and where ever you get stuck
1. Basic Introduction to HTTP and URL
Introduction to HTTP
Well if you ever had computer classes during your mid-school or highschool days then you do know that every computer teacher must have atleast taught you at once that http
stands for hyper text transfer protocol
and the https
stands for the same thing but s
in the https
stands for secure
. And they might even remind you of the URL
scheme which has https://
at the start of the domain website. Understanding http
is really important because websites that we see and use only daily basis communicate via http
protocol. Hence in order to tamper
or manipulate websites we should understand http
atleast at an surface level
Now if you are interested in learning or reading about the evolution of the internet or web
feel free to check out this article. In the forth coming paragraphs we will be referring to http
synchronously with http protocol
and we are making note here that protocol
is a set of agreed upon standard followed by people, and this set of standards
are widely used by both us and the websites and web servers to communicate and send data across the web.
HTTP has undergone significant transformations since its founding days:
HTTP/0.9: The first version of HTTP, released in 1991: Very simple, only API allowed was GET without any headers or response code. Here’s an example of a typical request.
GET /something.txt
HTTP/1.0: Introduced status codes, headers, and permitted more complex interactions, this version was released in 1996; a simple request in this version might look like this:
something:webpage.php
GET /something/webpage.php HTTP/1.0
Host: www.example.com
HTTP/1.1: This version was published in 1999, with new things like persistent connections, chunked transfer encoding and cleaner header management. A sample request might look like:
POST /submit HTTP/1.1
Host: www.example.com
User-Agent: MyBrowser/1.0
Content-Type: www-form-urlencoded application/x
Content-Length: 27
name=John&age=30
HTTP/2 and HTTP/3: The iterations are modern contemporary with a focus on performance and security.
HTTP/2 and HTTP/3: Modern iterations focus on performance and security.
HTTP/2 leverages:
Multiplexing (Multiplexing allows your Browser to fire off multiple requests at once on the same connection and receive the requests back in any order.) and
Binary framing (This is the layer that makes all other function and performance optimizations possible in HTTP/2, and which dictates how HTTP messages are wrapped and transported over the wire from the client to the server, and vice versa.)
While HTTP/3 uses QUIC (built on UDP) for reduced latency and improved resilience.
Learn more about this new QUIC
protocol here and here
As of today, HTTP/2 and HTTP/3 are the recommended standards due to their enhanced speed and efficiency.
References: RFC 9113 (HTTP/2) and RFC 9114 (HTTP/3).
Introduction to URL
Pentesting and exploiting misconfigurations are done by understanding the URLs. A URL consists of the below mentioned parts:
protocol://username:password@host:port/path?query#fragment
Protocol: HTTP, HTTPS, FTP, SMB, etc.
Host: Domain or IP (e.g., example.com
).
Port: Defaults to 80 for HTTP and 443 for HTTPS but can vary.
Path: The resource location (e.g., /index.html
).
Query: Key-value pairs (e.g., ?id=1&name=test
).
Fragment: Section within the resource (e.g., #section1
).
Exploiting URL Schemes in CTFs
Now if this is your first time you have heard the word CTF
then no worries, CTFs
or Capture the Flag
is an cybersecurity contest where you will be given some tasks that require you to think critically, analytically and apply various cybersecurity concepts in order solve the task, and once you have correctly solved the task you will get an value called flag
which will give you some points. Best places to get started with ctfs are tryhackme
and hackthebox
, feel free to check these websites on google.
Certain protocols like FTP (File transfer protocol, used to share files) or SMB (Server Message Block used to share resources remotely) embedded in URLs can be abused to retrieve files or gain unauthorized access:
FTP Authentication:
ftp://username:password@ftp.example.com/file.txt
Hackthebox
has a cool machine called forge
in which we need to get files from FTP
using HTTP
url. This can be achieved by creating an URL something like this.
http://admin.forge.htb/upload?u=ftp://user:heightofsecurity123!@127.0.0.1/
SMB Authentication:
smb://username:password@share.example.com/resource
Such manipulations are invaluable in CTF challenges, often leading to flag discovery.
2. HTTP Requests and Responses
Now HTTP
operates via Request-Response Model
. Do note that HTTP is a request-response protocol built on TCP/IP. A typical session involves client request
and server response
which looks somewhat similar to this
- Client Request:
GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: curl/7.85.0
Accept: text/html
- Server Response:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1256
<html>...</html>
Do note that you can look at these request and responses via :
- Intercepting proxies like
burp suiet
or zap
or caido
- Using tools like
curl
- Browser
dev tools
To watch this request, if you firefox user then check out this youtube tutorial and for chrome / chromium / google chrome users check out this tutorial
Key features in modern HTTP request
and response
are:
Headers: Metadata like Content-Type
, Authorization
, and Accept-Encoding
.
Persistent Connections: Keeps TCP connections open for multiple requests (enabled by default in HTTP/1.1).
Chunked Transfers: Allows streaming of large responses in smaller, manageable chunks.
Now before we go to the next section, make sure you are comfortable with everything we have talked in this blog series so far, and if you are ready then proceed to the next paragraph, else re -read the blog and click on external links and references to understand what we have discussed so far.
Now the http requests
have certain ways to communicate to the server and these are called http methods
. And the server responds to these methods with certain http codes
that is called http response or status code
.
HTTP Methods
HTTP methods are just like commands you give the server to tell it what you want to do. Imagine the server is like a librarian, and these methods are the requests you make:
GET:
- Reads information about the server. (It doesn’t do anything on the server.)
- Example: Fetching user data, or opening a webpage.
POST:
- It creates or processes data on the server." (Here, you send data to the server for processing.)
- Example: Doing something that requires log in, say a login form or a new user.**
PUT:
- To create or update a resource.
- Example: Rewinging the email or profile picture of the user.
DELETE:
- (used cautiously because it * deletes * data.) Deletes a resource.
- Example: Deleting a user account.
HEAD:
- Retrieve headers only without downloading the data. Useful for validating something without downloading data.
- Example: How to check if a webpage has changed by looking at metadata.
OPTIONS:
- List server capabilities.
- Example:It is used to check if the server supports CORS (Cross-Origin Resource Sharing).
HTTP Status Codes
HTTP status codes are responses received by the librarian from the book you have requested. They’re grouped into categories:
1xx Informational: “I’m working on it.”
- Example: The server received your request, but has not yet responded to it.
2xx Success: “Everything’s fine!”
- Example:
If you made a GET or POST request, then this returns a:
200 OK
Thanks to the HTTP status codes and error pages, we are able to state not only what was right about a request but also what went wrong. In this way, we can communicate with our API differently depending on the HTTP status code it returns.
201 Posted
— Your new user account was successfully created (POST).
3xx Redirection: “Go somewhere else.”
Example:
- The webpage you want has moved to a new address, or you can just change the URL in your browser.
Context:
Another 404 means also:
The URL you entered is a dead end.
The website you want doesn’t exist.
You entered the incorrect URL in your browser.
Or the server your browser reached has removed this page.
- A temporarily redirect (not permanent) to another URL:
302 Found
.
4xx Client Errors: “You made a mistake.”
- Example:
400 Bad Request
— You’re sending in bad or malformed request.
401 Unauthorized
— You have to log in before.
- If the resource you’re looking for doesn’t exist, you get a
404 Not Found
.
5xx Server Errors: “I made a mistake.”
- Example:
The server couldn’t handle your request.
503 Service Unavailable
— The server is down or is overloaded.
3. Command Your Web: Mastering cURL and wget
HTTP is the basis of the web and tools such as curl
and wget
give you direct power to talk to servers. These tricks will help you move faster and with fewer worries, no matter if you’re testing APIs, downloading files, or mirroring websites.
Using curl
curl (Client URL) is a command line tool for sending HTTP requests. The protocol it supports includes HTTP, HTTPS, FTP and others, which makes it perfect for debugging and testing.
Basic Commands
Basic GET Request
Retrieve a webpage or resource:
curl https://example.com
The function used is this command which fetches the HTML content of the specified URL and shows it in the terminal. This is great for checking to see if it is possible to reach a resource.
Inspecting Headers
Fetch only the HTTP response headers with the -I
(uppercase “i”) option:
curl -I https://example.com
Headers are metadata such as content type, server information and status codes and helps you understand how the server responds to a request.
Example output:
HTTP/2 200
Content-Type: text/html; charset=UTF-8
Submitting Form Data
Simulate form submissions by sending data using the -d
option with a POST request:
curl -X POST -d "username=admin&password=1234" https://example.com/login
What is happening here?
Authentication
Handle HTTP Basic Authentication using the -u
flag:
curl -u admin:example.com/secure password
This is useful acquiring protected resources like admin dashboards withing your own power or APIs that may need credentials. By default, username and password are Base64 encoded and are sent to Authorization header
Advanced Features
Handling Redirects
Use the -L
flag to follow redirects:
curl -L https://example.com
curl
follows the redirect that is produced with a 3xx status code, and uses the code to retrieve the final resource.
Debugging Using Verbose Mode
Add the -v
option for detailed debugging:
curl -v https://example.com
Verbose mode gives a detail about an complete communication process with request headers, response headers and the rest of it.
Downloading Files
Save a file to your system by combining -O
(output to file) or -o
(custom filename):
curl -O https://example.com/file.zip
Using wget
This is a non interactive tool intended to download files or entire directories. In scenarios it is good for which email downloads are necessary either sequentially or in a recursive manner.
Basic Commands
- Downloading a File
wget https://example.com/file.zip
wget
does its thing, whatever it is, it’ll just get that file, and put it into your working directory with its original name.
Recursive Downloads
Mirror an entire website or directory:
wget --recursive --no-parent https://example.com/docs/
--recursive
: It downloads files contained in the specified directory.
--no-parent
: Disallows downloading files in any directory but the specified one.
Use case:Archiving web pages or collecting all files in a directory will work best with this.
Advanced Features
1. Spiders and Testing URLs
Use --spider
with the -S
flag to check if a URL exists without downloading its content:
wget --spider -S https://example.com
This is useful for verifying the existence of a resource or testing URLs in bulk. The -S
flag ensures proper handling of HTTP headers, which is essential for the command to work correctly.
- Downloading Entire Websites
wget --mirror https://example.com
--mirror
: Mirrors the website, preserving directory structure and timestamps.
- Custom Headers
While wget
isn’t as flexible as curl
for API testing, you can still send custom headers using --header
:
wget --header="Authorization: Bearer token" https://example.com/resource
Key Differences Between cURL and wget
When to Use Each Tool
- Use cURL if you’re:
- A creator or a tester in creating or testing the APIs.
- HTTP headers, cookies, or authentication workflows debugging.
- Web-based tasks, like form submissions, or API interactions.
- Use wget if you’re:
- Someone who is a researcher or sysadmin archiving web data.
- For example download entire directories or webies recursively.
- Pick up where you left off with ongoing downloads.
What Are APIs and how to test them with curl
and wget
?
Application Programming Interfaces (APIs) allow two software applications to talk to each other. They provide a way to define a set of rules about to how to interact with the service, whether that’s sending data, receiving a response, or triggering an action. Restful APIs are a common example of how those using HTTP methods such as GET, POST, PUT, and DELETE manage resources.
Don’t worry, its trivial with both curl
and wget
to play along and just understand, what API
are doing. Check out the following websites to test APIs like rest using [curl](https:Also, you can read about //www.fastly.com/blog/anatomy-of-a-curl-how-to-use-curl-to-test-an-origin-servers-response and wget.
References:
[1] https://ppl-ai-file-upload.s3.amazonaws.com/web/direct-files/38922510/c3acef47-cc0e-45a2-a8ff-45df86397495/http-info.txt
[2] https://stackoverflow.com/questions/38906626/curl-to-return-http-status-code-along-with-the-response
[3] https://www.fastly.com/blog/anatomy-of-a-curl-how-to-use-curl-to-test-an-origin-servers-response
[4] https://www.warp.dev/terminus/curl-post-request
[5] https://www.digitalocean.com/community/tutorials/how-to-use-wget-to-download-files-and-interact-with-rest-apis
[6] https://stackoverflow.com/questions/6264726/can-i-use-wget-to-check-but-not-download
[7] https://www.geeksforgeeks.org/wget-command-in-linux-unix/
[8]
[9] https://terminalcheatsheet.com/guides/curl-rest-api
[10] https://www.hostinger.in/tutorials/wget-command-examples/
[11] https://www.gnu.org/software/wget/manual/wget.html