
Machine Learning models are everywhere from facial recognition to fraud detection. But what if an attacker could fool a model without even accessing it?
That’s where Transfer Attacks come in one of the most practical and dangerous techniques in AI Security.
In this blog, you’ll learn:
What is an Evasion Attack
Difference between Adversarial vs Evasion Attacks
White-box vs Black-box attacks
And a deep dive into Transfer Attacks with real-world scenarios
What is an Evasion Attack?
An Evasion Attack occurs when an attacker manipulates input data during the inference phase (i.e., when the model is already deployed).
The main goal is to trick the model into making a wrong prediction without changing the model itself
Example:
A spam filter classifies emails
Attacker modifies email slightly (adds special characters, spacing)
Model fails → spam gets through
What are Adversarial Attacks?
Any technique where an attacker tries to fool or manipulate an AI system.
Example:
Image of a panda + tiny noise → model predicts “gibbon”
There are two types of Evasion attacks
- White-box Attack
- Black-box Attack
Understanding this is critical before Transfer Attacks.
White-box Attack
Attacker has full access:
- Model architecture
- Source Code
- Weights
- Gradients
Example:
You directly compute gradients to craft adversarial input
Black-box Attack
Attacker has NO access:
Only input → output interaction
Example:
API-based model (like Google Vision API)
Black-box models are supposed to be secure because:
No internal visibility
No gradients
So how do attackers still succeed?
Answer: Transferability
What is a Transfer Attack?
A Transfer Attack is a type of Black-box attack where:
Adversarial examples created on one model (surrogate) are used to fool another model (target)
Core Idea: Transferability
Different ML models:
- Learn similar patterns
- Share decision boundaries
So:
If one model is fooled, another model often gets fooled too
Step-by-Step Working of Transfer Attack
Step 1: Build a Surrogate Model
Attacker trains their own model:
- Similar dataset
- Similar task
Example:
Target: Face recognition system
Attacker: trains their own face classifier
Step 2: Generate Adversarial Examples
Using techniques like:
- FGSM
- PGD
Attacker perturbs input:
Original Image → + small noise → Adversarial Image
Step 3: Transfer to Target Model
Now attacker sends this input to:
Real system (black-box)
Result:
Model gets fooled
Real-World Scenario 1: Face Recognition Bypass
Situation:
Company uses face recognition for login
Attack:
Attacker trains similar model
Creates adversarial face image
Uploads to real system
Outcome:
System misidentifies attacker as someone else
You can also watch youtube video for better understanding
https://whatsapp.com/channel/0029VbCbhQ16RGJG5tbrN42T/109
Thankyou