Patch Attack¶

Attack type: white-box (supported by HEART), black-box (currently supported by ART), evasion, digital or physical. For more information on types of patch attack see test these more detailed explanations.

Best for: patch attacks are localized and unbounded, making them easy to transfer to the physical world (while remaining applicable in the digital space).

Attack summary: Patch attacks are carried out by adding an object to an image that degrades the results of a visual model ingesting that image, either producing the wrong classification, or failing to detect a relevant object within the image. Adversarial patches can be created with access to only the model’s output, and are not norm-bound or specific to a single image. Patch attacks are highly versatile and can be implemented both digitally and physically.

Compatibility Considerations

Task: Object detection vs image classification
Modality: HEART currently only supports images, ART supports images and video
Data: Single or three color channel images, of standardized dimensions. Specify pixels in range 0-1 or 0-255, matching input data
Model: Computer vision model

Getting Started

To get started with Patch attacks, see the Patch Attack Notebook, available via the IBM HEART-library GitHub repository.

For increased relevance to your use case, replace the selected hugging face model with your own model, and the test data set with a test dataset of your own.

Interpreting the Results

A model’s robustness can be assessed by comparing performance before and after an attack. For details on how to evaluate model performance and attack effectiveness, see this explanation of evaluation metrics.

Remediation Resources

Pre-processing mitigation steps (image compression, spatial smoothing, variance minimization)
Defenses like adversarial training (currently supported by ART)

Scalability

The examples of time and compute requirements below cover a variety of models and datasets to guide users’ expectations. These data can be used for resource planning for model testing and evaluation (T&E).

Execution Date	Dataset	Model	Attack	Device	Num samples	Peak memory	Duration (seconds)	Benign Acc	Advers Acc
12/6/24	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	CPU	50	1811.5	284	81.67	66.67
12/7/24	MITLL/LADI-v2-dataset	MITLL/LADI-v2-classifier-small	PGD	CPU	100	2069	527.64	78.17	72.17

What could go wrong?

Model and input data not compatible –> see ‘Compatibility considerations’ above
Patch may be too easily detected
Incorrect size, shape, or placement of the patch relative to the original image
[in physical patch use] Changes in lighting or object orientation can decrease effectiveness

For more information on causes of attack failure, see Carlini’s Indicators of Attack Failure and Tramer’s On Adaptive Attacks to Adversarial Example Defenses.

More Resources

Similar attacks:
- A second patch attack notebook, Adversarial Patch for Object Detection, can be found via the IBM HEART-library GitHub repository.
- Other physically realizable attacks include adversarial laser beam.
Further reading:
- Adversarial Robustness Toolbox v1.0.0
- Adversarial Robustness Toolbox repo (v1.18.0+) and related discussions

For more information on which attacks are relevant in which conditions, please see HEART’s Adversarial Evaluation Pathways.