Projected Gradient Descent¶

Attack type: white box, gradient-based, evasion, digital

Best for: models with continuous domain and high-dimensional input data such as images and videos are most vulnerable to Projected Gradient Descent (PGD) attacks, as there is more opportunity to find a combination of imperceptible perturbations that are successful.

Attack summary: PGD attacks produce adversarial perturbations (slight variations) in an image (or other approximately continuous domain) that are often difficult to detect with the naked eye. PGD attacks are white-box attacks, which means that they are only possible when the attacker has access to, and knowledge of, the model architecture, parameters and gradients. Gradient-based attacks exploit the knowledge of the model’s gradients to determine how slight changes in the input (pixels in the case of images) influences the output of the model (such as classification probabilities or object detections). These perturbations are carefully optimized to increase the likelihood that the input will be misclassified (or objects go undetected) without making significant changes to the overall image. Perturbations across features can be uniform, small and very hard to detect, which means that the attack runs a high risk of going undetected.

Compatibility Considerations

Task: Object detection vs image classification
Modality: HEART currently only supports images, ART supports images and video
Data: Single or three color channel images, of standardized dimensions. Specify pixels in range 0-1 or 0-255, matching input data
Model: Must be fully differentiable in order to compute gradients

Getting Started

To get started with PGD attacks, see the Projected Gradient Descent Notebook, available via the IBM HEART-library GitHub repository.

For increased relevance to your use case, replace the selected hugging face model with your own model, and the test data set with a test dataset of your own.

Interpreting the Results

A model’s robustness can be assessed by comparing performance before and after an attack. For details on how to evaluate model performance and attack effectiveness, see this explanation of evaluation metrics.

Remediation Resources

Pre-processing mitigation steps (image compression, spatial smoothing, variance minimization)
Defenses like adversarial training (currently supported by ART)

Scalability

The examples of time and compute requirements below cover a variety of models and datasets to guide users’ expectations. This data can be used for resource planning for model testing and evaluation (T&E).

Execution Date	Dataset	Model	Attack	Device	Num samples	Peak memory	Duration (seconds)	Benign Acc	Advers Acc
27/06/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	CPU	1000	6296.9	3522	81.39	46.4
27/06/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	CPU	10000	11642.1	32945	83.46	46.84
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	GPU	1000	6298	1370	81.39	46.6
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	GPU	10000	6483	1706	81.28	46.3
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-large	PGD	CPU	1000
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-large	PGD	CPU	10000
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-large	PGD	GPU	1000	7805.9	5185.9	82.02	33.57
13/11/2024	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-large	PGD	GPU	10000	8133.7	5419.3	81.74	33.87

What could go wrong?

Model and input data not compatible –> see ‘Compatibility considerations’ above.
Model is overfit –> won’t produce useful gradient information
Wrong hyperparameters –> iterations must be enough for attack to converge
Landed on false local minimum, no adversarial example present –> modify loss function
Model not differentiable or gradient direction doesn’t minimize loss function –> loss function must be appropriate to model
Last sample of attack path returned, not adversarial –> have optimization algorithm return best attack path sample

For more information on causes of attack failure, see Carlini’s Indicators of Attack Failure and Tramer’s On Adaptive Attacks to Adversarial Example Defenses.

More Resources

Similar attacks:
- PGD is just one type of gradient-based attack. For more information on others, see article.
- Adversarial Patch attacks can be applied in similar circumstances as PGD attacks. For more information see the Getting Started with Adversarial Patch notebook, available via the HEART-library GitHub repository.
Further reading:
- Adversarial Robustness Toolbox v1.0.0
- Adversarial Robustness Toolbox repo (v1.18.0+) and related discussions

For more information on which attacks are relevant in which conditions, please see HEART’s Adversarial Evaluation Pathways.