🏮 What is Subdomain Bruteforcing
Subdomain brute forcing is a method for discovering subdomains of a target domain. It operates by attempting to resolve a list of common subdomain names against the DNS servers of the target domain. If the DNS server delivers a valid IP address for a subdomain, that subdomain is deemed valid.
How does Subdomain Bruteforcing help hackers?
- To find hidden attack surfaces
- To identify subdomains that are vulnerable to subdomain takeover attacks.
- To find subdomains that are hosting sensitive data
—-
⭕ Building The tool
🗡Before building the tool itself let me give a brief of what you are going to build.
Importing all the necessary modules in our file
import argparse
import requests
from threading import Thread
from queue import Queue
from bs4 import BeautifulSoup
from termcolor import colored
from os import path
from sys import exit
from datetime import datetime
These are all the modules which will be used.
Taking command-line arguments
We will be using argparse
which is a module in Python that helps in parsing command-line arguments. If you want to have a detailed walkthrough of argparse
module check out my first blog. In this blog, I will just give a high-level overview of the code used to parse command-line arguments.
def get_args():
parser=argparse.ArgumentParser()
parser.add_argument('-d','--domain',dest="target",help="Domain to scan ",required=True)
parser.add_argument('-t','-threads',dest="threads",help="Specify the threads, (Default => 10)",type=int,default=10)
parser.add_argument('-r', '--follow-redirect', dest="follow_redirect", action='store_true',
help="Follow redirects")
parser.add_argument('-H', '--headers', dest="header", nargs='*',help="Specify HTTP headers (e.g., -H 'Header1: val1' -H 'Header2: val2')")
parser.add_argument('-a', '--useragent', metavar='string', dest="user_agent",default="SubBuster/1.0", help="Set the User-Agent string (default 'SubBuster/1.0')")
parser.add_argument('--ignore-code',dest='ignore_codes',type=int,help="Codes to ignore",nargs='*')
parser.add_argument('-ht', '--hide-title', dest="hide_title", action='store_true',help="Hide response title in output")
parser.add_argument('-mc', dest='match_codes', nargs='*',
help="Include status codes to match, separated by space (e.g., -mc 200 404)")
parser.add_argument('-ms', dest='match_size', nargs='*',
help="Match response size, separated by space")
parser.add_argument('-fc', dest="filter_codes", nargs='*',help="Filter status codes, separated by space")
parser.add_argument('-fs', nargs='*', dest='filter_size',
help="Filter response size, separated by space")
parser.add_argument('-w','--wordlist',dest='wordlist',required=False,default='wordlist.txt',help="Specify the wordlist to use !")
try:
return parser.parse_args()
except argparse.ArgumentError:
parser.print_help()
exit(1)
get_args()
is the function that is responsible for declaring all the command line arguments that will be used.
Let’s understand the different arguments that we are declaring :
-d
or --domain
: specify the target domain
-t
or -threads
: allows users to specify the number of threads
-r
or --follow-redirect
: instructs the script to follow redirects.
- -
-H
or --headers
: lets users specify custom HTTP headers for the requests sent during scanning. Users can provide multiple headers in the format -H 'Header1: val1' -H 'Header2: val2'
.
-a
or --useragent
: define a custom User-Agent string for the HTTP requests. By default, it’s set to “SubBuster/1.0”.
--ignore-code
: This argument allows users to specify HTTP status codes to ignore during the scanning process.
-ht
or --hide-title
: hides the response title in the output.
-mc
or --match-codes
: can specify a list of HTTP status codes to match during scanning. Multiple status codes can be provided as arguments (e.g., -mc 200 404
).
-ms
or --match-size
: used to specify response sizes to match during scanning. Users can provide multiple response size values as arguments.
-fc
or --filter-codes
: can specify a list of HTTP status codes to filter out during scanning. Multiple status codes can be provided as arguments.
-fs
: allows users to define response size values to filter during scanning. Multiple response size values can be provided.
-w
or --wordlist
: can specify the path to a wordlist file for generating subdomains. By default, it is set to ‘wordlist.txt’.
Coding out code outside functions
arguments = get_args()
target = arguments.target
hide_title = arguments.hide_title or False
redirection = arguments.follow_redirect or False
user_agent = arguments.user_agent
match_codes = arguments.match_codes
match_size = arguments.match_size
filter_codes = arguments.filter_codes
filter_size = arguments.filter_size
header = arguments.header
wordlist = arguments.wordlist
threads = arguments.threads
if match_size and filter_size:
print(colored(
"[+] For now, using Matching and Filtering Response Length together is not available!", 'red'))
exit()
if match_codes and filter_codes:
print(colored(
"[+] For now, using Matching and Filtering Response Status codes together is not available!", 'red'))
exit()
if not path.exists(wordlist):
print(colored("[-] Provide a valid wordlist file!", 'red'))
exit()
headers = {}
if header:
for h in header:
key, value = h.split(':', 1)
headers[key] = value.strip()
headers['User-Agent'] = user_agent
bruteforcer = SubdomainBruteforcer(target, wordlist, redirection, headers, match_codes, match_size, filter_codes, filter_size, hide_title, threads)
bruteforcer.print_banner()
bruteforcer.main()
We are simply verifying that the namespace in the global scope is __main__
and if that is the case we are executing the code
Let’s understand the code
- In the very first few lines, after calling the
get_args()
function, we parse command-line arguments and store the values in variables for later use.
- After that Several checks are performed to ensure that the specified options are valid:
- It checks if both matching and filtering response lengths are used together, and if so, it prints a warning.
- It checks if both matching and filtering response status codes are used together, and if so, it prints a warning.
- It verifies the existence of the specified wordlist file and prints an error message if the file does not exist.
- Custom headers are processed and turned into a dictionary, where each key-value pair in the header is separated and cleaned. The User-Agent is set to the value supplied or the default (“SubBuster/1.0”).
- Then an object named
bruteforcer
is created from the class SubdomainBruteforcer
using all the variables.
- Then after instantiating we are calling the
print_banner
method using the bruteforcer
object.
- At last, we call the main function, from where all the real functionality starts working.
Coding out SubdomainBruteforcer
class
Defining the __init__
method
class SubdomainBruteforcer:
def __init__(self, target, wordlist, follow_redirect, headers, match_codes, match_size, filter_codes, filter_size, hide_title, threads):
self.target = target
self.wordlist = wordlist
self.follow_redirect = follow_redirect
self.headers = headers
self.hide_title = hide_title
self.match_codes = match_codes
self.match_size = match_size
self.filter_codes = filter_codes
self.filter_size = filter_size
self.threads = threads
self.q = Queue()
self.subdomains = []
🙌 __init__
method is used as a class constructor.
Let’s understand various elements of the code.
- After initializing the class
SubdomainBruteforcer
, we are constructing our class using the __init__
method in it.
- The
__init__
method initializes the WebCrawler
object using various parameters.
- The
__init__
method also creates two instance variables which are self. q
and self.subdomains
, in the q variable we are assigning it a Queue() object, which q can use all the queue methods.
Defining the print_banner
method
As you may recall, we invoked the print_banner method after instantiating a class in the global space, so let’s write it down.
def print_banner(self):
print("-" * 80)
print(colored(
f"Subdomain Bruteforcing starting at {datetime.now().strftime('%d/%m/%Y %H:%M:%S')}", 'cyan', attrs=['dark']))
print("-" * 80)
print(colored("[*] Target Domain".ljust(20, " "),
'light_red'), ":", f"{self.target}")
print(colored("[*] Wordlist".ljust(20, " "),
'light_red'), ":", f"{self.wordlist}")
if self.headers:
print(colored("[*] Headers".ljust(20, " "),
'light_red'), ":", f"{self.headers}")
if self.match_size:
print(colored("[*] Match Res size".ljust(20, " "), 'light_red'),
":", f"{self.match_size}")
if self.threads:
print(colored("[*] Threads".ljust(20, " "), 'light_red'),
":", f"{self.threads}")
if self.match_codes or self.filter_codes:
if self.match_codes:
print(colored("[*] Match Codes".ljust(20, " "),
'light_red'), ":", f"{self.match_codes}")
if self.filter_codes:
print(colored("[*] Filter Codes".ljust(20, " "), 'light_red'),
":", f"{self.filter_codes}")
else:
print(colored("[*] Status Codes".ljust(20, " "),
'light_red'), ":", f"All Status Codes")
if self.filter_size:
print(colored("[*] Filter Response Size".ljust(20, " "), 'light_red'),
":", f"{self.filter_size}")
print("-" * 80)
print("-" * 80)
The code is simple, which takes all the values from the variables and prints them.
Defining the main
method
As you might remember, in the global scope after invoking the print_banner
method, we invoked the main
method of the object. This method is the entry point for the subdomain brute-forcing process.
def main(self):
for _ in range(self.threads):
thread = Thread(target=self.get_subdomain)
thread.daemon = True
thread.start()
with open(self.wordlist, 'r') as f:
self.subdomains.extend(f.read().strip().splitlines())
for subdomain in self.subdomains:
self.q.put(subdomain)
self.q.join()
def get_subdomain(self):
while True:
subdomain = self.q.get()
self.sub_brute(subdomain)
self.q.task_done()
- Inside the main method, firstly we create a loop in the range of the number of threads defined by the user, in the loop we create a thread and assign it the task to execute the class function
get_subdomain
and in the end, we start the threads.
- After that, we open and read the specified wordlist file (
self.wordlist
) which contains a list of subdomains to scan, strips any leading/trailing whitespace, and splits the content into a list of subdomains.
- Then through a loop, we are putting each subdomain in the subdomain variable into a shared queue().
Defining the get_subdomain
method
def get_subdomain(self):
while True:
subdomain = self.q.get()
self.sub_brute(subdomain)
self.q.task_done()
- Inside the method, we firstly create an infinite loop, and in the loop, the worker thread fetches a subdomain from the shared queue
self.q
and stores it in the variable named subdomain
- Then we call the
sub_brute()
method with the parameter subdomain
- After processing a subdomain, the worker thread signals that it has completed its task for that specific subdomain by calling
self.q.task_done()
- This is important for synchronization and is used in conjunction with the
self.q.join()
statement in the main
method. It allows the main thread to know when all worker threads have finished processing all subdomains in the queue, so it can safely exit.
Defining the sub_brute
method
def sub_brute(self, subdomain):
try:
url = f"https://{subdomain}.{self.target}"
response = self.make_request(url)
except:
pass
else:
response_length = len(response.text)
status_code = response.status_code
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string if soup.title else []
if self.is_match(status_code, response_length):
self.print_data(status_code, response_length, url, title)
- Inside the
sub_brute
method, we are taking an argument called subdomain
, in the try block, we create the URL that will go out on the web and then we call another method make_request
which is the function that will be responsible for making the request and retrieving the response.
- For any exception that occurs, we are doing nothing, as while making connections a lot will occur.
- If no exception occurs, the code reaches the else block where we are parsing
response_length
, status_code
and title from the response.
- Then we create a condition in which we call a function called
is_match
which takes parameters status_code
, and response_length
, the function is responsible for checking various filter and match codes conditions, if only those conditions get fulfilled then only the execution goes to the function print_data
which takes parameters such as status_code, response_length, URL, title and prints the results.
Defining the make_request
method
def make_request(self, url):
session = requests.Session()
session.headers.update(self.headers)
response = session.get(url, allow_redirects=self.follow_redirect, timeout=1)
return response
A make_request
method is responsible for fetching the request following various conditions such as whether redirect is allowed or not and after fetching the response it retrieves the request.
In the method,
- we first create a session, and in the session, we update our defined headers at last we fetch the request and return the response
Defining the is_match
method
This method is responsible for deciding whether results from the response should be displayed or not.
def is_match(self, status_code, response_length):
status_code = str(status_code)
response_length = str(response_length)
if self.match_codes:
if status_code not in self.match_codes:
return False
if self.filter_codes and status_code in self.filter_codes:
return False
if self.match_size:
if response_length not in self.match_size:
return False
if self.filter_size and response_length in self.filter_size:
return False
return True
- In the function method, we are firstly converting the arguments to string data type.
- After That, we have defined certain conditions that help us decide whether this response should be printed or not.
Defining the print_data
method[/color
def print_data(self, status_code, response_length, url, title):
status_code = int(status_code)
color = 'grey'
if 200 <= status_code < 300:
color = 'green'
elif 300 <= status_code < 400:
color = 'yellow'
elif 400 <= status_code < 500:
color = 'red'
elif 500 < status_code < 600:
color = 'magenta'
status_code_str = str(status_code).ljust(9, " ")
response_length_str = str(response_length).ljust(9)
url_str = url.ljust(30)
output = f"{colored(status_code_str, color)} {response_length_str} {url_str}"
if not self.hide_title:
output += f" [{title}]"
print(output)
The print_data
is the last function responsible for printing the data in a readable format.
So this is the end of the code, if you want to reach out the whole code you can check it out at: 🔗Link to code
🦊 Conclusion
Creating a multithreaded subdomain bruteforcer is a reasonably simple task. You can significantly improve the speed and efficiency of your brute-forcing operation by employing a multithreaded technique. Multithreaded subdomain bruteforcers are thus a valuable tool for penetration testers and security researchers 🔨🛡️.
However, it is critical to employ multithreaded subdomain bruteforcers with caution ⚠️. Bruteforcing can be a time-consuming and resource-intensive procedure that puts pressure on the target website. It is critical to only bruteforce websites with the owner’s permission 🤝 and to respect the target website’s resources 🌐.