Python for Hackers #7 | Building a multithreaded Subdomain Bruteforcer

calc1f4r

🏮 What is Subdomain Bruteforcing

Subdomain brute forcing is a method for discovering subdomains of a target domain. It operates by attempting to resolve a list of common subdomain names against the DNS servers of the target domain. If the DNS server delivers a valid IP address for a subdomain, that subdomain is deemed valid.

How does Subdomain Bruteforcing help hackers?

To find hidden attack surfaces
To identify subdomains that are vulnerable to subdomain takeover attacks.
To find subdomains that are hosting sensitive data
—-

⭕ Building The tool

🗡Before building the tool itself let me give a brief of what you are going to build.

Importing all the necessary modules in our file

import argparse
import requests
from threading import Thread
from queue import Queue
from bs4 import BeautifulSoup
from termcolor import colored
from os import path
from sys import exit
from datetime import datetime

These are all the modules which will be used.

Taking command-line arguments

We will be using argparse which is a module in Python that helps in parsing command-line arguments. If you want to have a detailed walkthrough of argparse module check out my first blog. In this blog, I will just give a high-level overview of the code used to parse command-line arguments.

def get_args():
    parser=argparse.ArgumentParser()
    parser.add_argument('-d','--domain',dest="target",help="Domain to scan ",required=True)
    parser.add_argument('-t','-threads',dest="threads",help="Specify the threads, (Default => 10)",type=int,default=10)

    parser.add_argument('-r', '--follow-redirect', dest="follow_redirect", action='store_true',
                        help="Follow redirects")
    parser.add_argument('-H', '--headers', dest="header", nargs='*',help="Specify HTTP headers (e.g., -H 'Header1: val1' -H 'Header2: val2')")
    
    parser.add_argument('-a', '--useragent', metavar='string', dest="user_agent",default="SubBuster/1.0", help="Set the User-Agent string (default 'SubBuster/1.0')")
        
    parser.add_argument('--ignore-code',dest='ignore_codes',type=int,help="Codes to ignore",nargs='*')
    
    parser.add_argument('-ht', '--hide-title', dest="hide_title", action='store_true',help="Hide response title in output")
    
    parser.add_argument('-mc', dest='match_codes', nargs='*',
                        help="Include status codes to match, separated by space (e.g., -mc 200 404)")

    parser.add_argument('-ms', dest='match_size', nargs='*',
                        help="Match response size, separated by space")
	
	    parser.add_argument('-fc', dest="filter_codes", nargs='*',help="Filter status codes, separated by space")
	    parser.add_argument('-fs', nargs='*', dest='filter_size',
	                        help="Filter response size, separated by space")
	    parser.add_argument('-w','--wordlist',dest='wordlist',required=False,default='wordlist.txt',help="Specify the wordlist to use !")
	    
	    try:
	        return parser.parse_args()
	    except argparse.ArgumentError:
	        parser.print_help()
	        exit(1)

get_args() is the function that is responsible for declaring all the command line arguments that will be used.
Let’s understand the different arguments that we are declaring :

-d or --domain: specify the target domain
-t or -threads: allows users to specify the number of threads
-r or --follow-redirect: instructs the script to follow redirects.
- -H or --headers: lets users specify custom HTTP headers for the requests sent during scanning. Users can provide multiple headers in the format -H 'Header1: val1' -H 'Header2: val2'.
-a or --useragent: define a custom User-Agent string for the HTTP requests. By default, it’s set to “SubBuster/1.0”.
--ignore-code: This argument allows users to specify HTTP status codes to ignore during the scanning process.
-ht or --hide-title: hides the response title in the output.
-mc or --match-codes: can specify a list of HTTP status codes to match during scanning. Multiple status codes can be provided as arguments (e.g., -mc 200 404).
-ms or --match-size: used to specify response sizes to match during scanning. Users can provide multiple response size values as arguments.
-fc or --filter-codes: can specify a list of HTTP status codes to filter out during scanning. Multiple status codes can be provided as arguments.
-fs: allows users to define response size values to filter during scanning. Multiple response size values can be provided.
-w or --wordlist: can specify the path to a wordlist file for generating subdomains. By default, it is set to ‘wordlist.txt’.

Coding out code outside functions

    arguments = get_args()
    target = arguments.target
    hide_title = arguments.hide_title or False
    redirection = arguments.follow_redirect or False
    user_agent = arguments.user_agent
    match_codes = arguments.match_codes
    match_size = arguments.match_size
    filter_codes = arguments.filter_codes
    filter_size = arguments.filter_size
    header = arguments.header
    wordlist = arguments.wordlist
    threads = arguments.threads

    if match_size and filter_size:
        print(colored(
            "[+] For now, using Matching and Filtering Response Length together is not available!", 'red'))
        exit()
    if match_codes and filter_codes:
        print(colored(
            "[+] For now, using Matching and Filtering Response Status codes together is not available!", 'red'))
        exit()
    if not path.exists(wordlist):
        print(colored("[-] Provide a valid wordlist file!", 'red'))
        exit()

    headers = {}
    if header:
        for h in header:
            key, value = h.split(':', 1)
            headers[key] = value.strip()
    headers['User-Agent'] = user_agent

    bruteforcer = SubdomainBruteforcer(target, wordlist, redirection, headers, match_codes, match_size, filter_codes, filter_size, hide_title, threads)
    bruteforcer.print_banner()
    bruteforcer.main()

We are simply verifying that the namespace in the global scope is __main__ and if that is the case we are executing the code
Let’s understand the code

In the very first few lines, after calling the get_args() function, we parse command-line arguments and store the values in variables for later use.
After that Several checks are performed to ensure that the specified options are valid:
- It checks if both matching and filtering response lengths are used together, and if so, it prints a warning.
- It checks if both matching and filtering response status codes are used together, and if so, it prints a warning.
- It verifies the existence of the specified wordlist file and prints an error message if the file does not exist.
Custom headers are processed and turned into a dictionary, where each key-value pair in the header is separated and cleaned. The User-Agent is set to the value supplied or the default (“SubBuster/1.0”).
Then an object named bruteforcer is created from the class SubdomainBruteforcer using all the variables.
Then after instantiating we are calling the print_banner method using the bruteforcer object.
At last, we call the main function, from where all the real functionality starts working.

Coding out `SubdomainBruteforcer` class

Defining the `init` method

    class SubdomainBruteforcer:
	    def __init__(self, target, wordlist, follow_redirect, headers, match_codes, match_size, filter_codes, filter_size, hide_title, threads):
	        self.target = target
	        self.wordlist = wordlist
	        self.follow_redirect = follow_redirect
	        self.headers = headers
	        self.hide_title = hide_title
	        self.match_codes = match_codes
	        self.match_size = match_size
	        self.filter_codes = filter_codes
	        self.filter_size = filter_size
	        self.threads = threads
	        self.q = Queue()
	        self.subdomains = []

🙌 __init__ method is used as a class constructor.

Let’s understand various elements of the code.

After initializing the class SubdomainBruteforcer, we are constructing our class using the __init__ method in it.
The __init__ method initializes the WebCrawler object using various parameters.
The __init__ method also creates two instance variables which are self. q and self.subdomains, in the q variable we are assigning it a Queue() object, which q can use all the queue methods.

Defining the `print_banner` method

As you may recall, we invoked the print_banner method after instantiating a class in the global space, so let’s write it down.

    def print_banner(self):
        print("-" * 80)
        print(colored(
            f"Subdomain Bruteforcing starting at {datetime.now().strftime('%d/%m/%Y %H:%M:%S')}", 'cyan', attrs=['dark']))
        print("-" * 80)
        print(colored("[*] Target Domain".ljust(20, " "),
                      'light_red'), ":", f"{self.target}")
        print(colored("[*] Wordlist".ljust(20, " "),
                      'light_red'), ":", f"{self.wordlist}")
        if self.headers:
            print(colored("[*] Headers".ljust(20, " "),
                          'light_red'), ":", f"{self.headers}")
        if self.match_size:
            print(colored("[*] Match Res size".ljust(20, " "), 'light_red'),
                  ":", f"{self.match_size}")
        if self.threads:
            print(colored("[*] Threads".ljust(20, " "), 'light_red'),
                  ":", f"{self.threads}")
        if self.match_codes or self.filter_codes:
            if self.match_codes:
                print(colored("[*] Match Codes".ljust(20, " "),
                              'light_red'), ":", f"{self.match_codes}")

            if self.filter_codes:
                print(colored("[*] Filter Codes".ljust(20, " "), 'light_red'),
                      ":", f"{self.filter_codes}")
        else:
            print(colored("[*] Status Codes".ljust(20, " "),
                          'light_red'), ":", f"All Status Codes")

        if self.filter_size:
            print(colored("[*] Filter Response Size".ljust(20, " "), 'light_red'),
                  ":", f"{self.filter_size}")
        print("-" * 80)
        print("-" * 80)

The code is simple, which takes all the values from the variables and prints them.

Defining the `main` method

As you might remember, in the global scope after invoking the print_banner method, we invoked the main method of the object. This method is the entry point for the subdomain brute-forcing process.

    def main(self):
        for _ in range(self.threads):
            thread = Thread(target=self.get_subdomain)
            thread.daemon = True
            thread.start()

        with open(self.wordlist, 'r') as f:
            self.subdomains.extend(f.read().strip().splitlines())

        for subdomain in self.subdomains:
            self.q.put(subdomain)
        self.q.join()

    def get_subdomain(self):
        while True:
            subdomain = self.q.get()
            self.sub_brute(subdomain)
            self.q.task_done()

Inside the main method, firstly we create a loop in the range of the number of threads defined by the user, in the loop we create a thread and assign it the task to execute the class function get_subdomain and in the end, we start the threads.
After that, we open and read the specified wordlist file (self.wordlist) which contains a list of subdomains to scan, strips any leading/trailing whitespace, and splits the content into a list of subdomains.
Then through a loop, we are putting each subdomain in the subdomain variable into a shared queue().

Defining the `get_subdomain` method

    def get_subdomain(self):
        while True:
            subdomain = self.q.get()
            self.sub_brute(subdomain)
            self.q.task_done()

Inside the method, we firstly create an infinite loop, and in the loop, the worker thread fetches a subdomain from the shared queue self.q and stores it in the variable named subdomain
- Then we call the sub_brute() method with the parameter subdomain
- After processing a subdomain, the worker thread signals that it has completed its task for that specific subdomain by calling self.q.task_done()
- This is important for synchronization and is used in conjunction with the self.q.join() statement in the main method. It allows the main thread to know when all worker threads have finished processing all subdomains in the queue, so it can safely exit.

Defining the `sub_brute` method

    def sub_brute(self, subdomain):
        try:
            url = f"https://{subdomain}.{self.target}"
            response = self.make_request(url)
        except:
            pass
        else:
            response_length = len(response.text)
            status_code = response.status_code
            soup = BeautifulSoup(response.text, 'html.parser')
            title = soup.title.string if soup.title else []

            if self.is_match(status_code, response_length):
                self.print_data(status_code, response_length, url, title)

Inside the sub_brute method, we are taking an argument called subdomain, in the try block, we create the URL that will go out on the web and then we call another method make_request which is the function that will be responsible for making the request and retrieving the response.
For any exception that occurs, we are doing nothing, as while making connections a lot will occur.
If no exception occurs, the code reaches the else block where we are parsing response_length, status_code and title from the response.
Then we create a condition in which we call a function called is_match which takes parameters status_code, and response_length, the function is responsible for checking various filter and match codes conditions, if only those conditions get fulfilled then only the execution goes to the function print_data which takes parameters such as status_code, response_length, URL, title and prints the results.

Defining the `make_request` method

    def make_request(self, url):
        session = requests.Session()
        session.headers.update(self.headers)
        response = session.get(url, allow_redirects=self.follow_redirect, timeout=1)
        return response

A make_request method is responsible for fetching the request following various conditions such as whether redirect is allowed or not and after fetching the response it retrieves the request.

In the method,

we first create a session, and in the session, we update our defined headers at last we fetch the request and return the response

Defining the `is_match` method

This method is responsible for deciding whether results from the response should be displayed or not.

    def is_match(self, status_code, response_length):
        status_code = str(status_code)
        response_length = str(response_length)
        

        if self.match_codes:
            if status_code not in self.match_codes:
                return False

        if self.filter_codes and status_code in self.filter_codes:
            return False

        if self.match_size:
            if response_length not in self.match_size:
                return False

        if self.filter_size and response_length in self.filter_size:
            return False

        return True

In the function method, we are firstly converting the arguments to string data type.
After That, we have defined certain conditions that help us decide whether this response should be printed or not.

Defining the `print_data` method[/color


    def print_data(self, status_code, response_length, url, title):
        status_code = int(status_code)
        color = 'grey'
        if 200 <= status_code < 300:
            color = 'green'
        elif 300 <= status_code < 400:
            color = 'yellow'
        elif 400 <= status_code < 500:
            color = 'red'
        elif 500 < status_code < 600:
            color = 'magenta'

        status_code_str = str(status_code).ljust(9, " ")
        response_length_str = str(response_length).ljust(9)
        url_str = url.ljust(30)

        output = f"{colored(status_code_str, color)} {response_length_str} {url_str}"

        if not self.hide_title:
            output += f" [{title}]"

        print(output)

The print_data is the last function responsible for printing the data in a readable format.
So this is the end of the code, if you want to reach out the whole code you can check it out at: 🔗Link to code

🦊 Conclusion

Creating a multithreaded subdomain bruteforcer is a reasonably simple task. You can significantly improve the speed and efficiency of your brute-forcing operation by employing a multithreaded technique. Multithreaded subdomain bruteforcers are thus a valuable tool for penetration testers and security researchers 🔨🛡️.

However, it is critical to employ multithreaded subdomain bruteforcers with caution ⚠️. Bruteforcing can be a time-consuming and resource-intensive procedure that puts pressure on the target website. It is critical to only bruteforce websites with the owner’s permission 🤝 and to respect the target website’s resources 🌐.