Projected Gradient Descent¶
Attack type: white box, gradient-based, evasion, digital
Best for: models with continuous domain and high-dimensional input data such as images and videos are most vulnerable to Projected Gradient Descent (PGD) attacks, as there is more opportunity to find a combination of imperceptible perturbations that are successful.
Attack summary: PGD attacks produce adversarial perturbations (slight variations) in an image (or other approximately continuous domain) that are often difficult to detect with the naked eye. PGD attacks are white-box attacks, which means that they are only possible when the attacker has access to, and knowledge of, the model architecture, parameters and gradients. Gradient-based attacks exploit the knowledge of the model’s gradients to determine how slight changes in the input (pixels in the case of images) influences the output of the model (such as classification probabilities or object detections). These perturbations are carefully optimized to increase the likelihood that the input will be misclassified (or objects go undetected) without making significant changes to the overall image. Perturbations across features can be uniform, small and very hard to detect, which means that the attack runs a high risk of going undetected.
Task: Object detection vs image classification
Modality: HEART currently only supports images, ART supports images and video
Data: Single or three color channel images, of standardized dimensions. Specify pixels in range 0-1 or 0-255, matching input data
Model: Must be fully differentiable in order to compute gradients
To get started with PGD attacks, see the Projected Gradient Descent Notebook, available via the IBM HEART-library GitHub repository.
For increased relevance to your use case, replace the selected hugging face model with your own model, and the test data set with a test dataset of your own.
A model’s robustness can be assessed by comparing performance before and after an attack. For details on how to evaluate model performance and attack effectiveness, see this explanation of evaluation metrics.
Pre-processing mitigation steps (image compression, spatial smoothing, variance minimization)
Defenses like adversarial training (currently supported by ART)
The examples of time and compute requirements below cover a variety of models and datasets to guide users’ expectations. This data can be used for resource planning for model testing and evaluation (T&E).
Execution Date |
Dataset |
Model |
Attack |
Device |
Num samples |
Peak memory |
Duration (seconds) |
Benign Acc |
Advers Acc |
|---|---|---|---|---|---|---|---|---|---|
27/06/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-small |
PGD |
CPU |
1000 |
6296.9 |
3522 |
81.39 |
46.4 |
27/06/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-small |
PGD |
CPU |
10000 |
11642.1 |
32945 |
83.46 |
46.84 |
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-small |
PGD |
GPU |
1000 |
6298 |
1370 |
81.39 |
46.6 |
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-small |
PGD |
GPU |
10000 |
6483 |
1706 |
81.28 |
46.3 |
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-large |
PGD |
CPU |
1000 |
||||
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-large |
PGD |
CPU |
10000 |
||||
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-large |
PGD |
GPU |
1000 |
7805.9 |
5185.9 |
82.02 |
33.57 |
13/11/2024 |
MITLL/LADI-v2-dataset |
MITLL/LADI-v2-classifier-large |
PGD |
GPU |
10000 |
8133.7 |
5419.3 |
81.74 |
33.87 |
Model and input data not compatible –> see ‘Compatibility considerations’ above.
Model is overfit –> won’t produce useful gradient information
Wrong hyperparameters –> iterations must be enough for attack to converge
Landed on false local minimum, no adversarial example present –> modify loss function
Model not differentiable or gradient direction doesn’t minimize loss function –> loss function must be appropriate to model
Last sample of attack path returned, not adversarial –> have optimization algorithm return best attack path sample
For more information on causes of attack failure, see Carlini’s Indicators of Attack Failure and Tramer’s On Adaptive Attacks to Adversarial Example Defenses.
Similar attacks:
PGD is just one type of gradient-based attack. For more information on others, see article.
Adversarial Patch attacks can be applied in similar circumstances as PGD attacks. For more information see the Getting Started with Adversarial Patch notebook, available via the HEART-library GitHub repository.
Further reading:
For more information on which attacks are relevant in which conditions, please see HEART’s Adversarial Evaluation Pathways.