Patch Attack

Attack type: white-box (supported by HEART), black-box (currently supported by ART), evasion, digital or physical. For more information on types of patch attack see test these more detailed explanations.

Best for: patch attacks are localized and unbounded, making them easy to transfer to the physical world (while remaining applicable in the digital space).

Attack summary: Patch attacks are carried out by adding an object to an image that degrades the results of a visual model ingesting that image, either producing the wrong classification, or failing to detect a relevant object within the image. Adversarial patches can be created with access to only the model’s output, and are not norm-bound or specific to a single image. Patch attacks are highly versatile and can be implemented both digitally and physically.

Compatibility Considerations
  • Task: Object detection vs image classification

  • Modality: HEART currently only supports images, ART supports images and video

  • Data: Single or three color channel images, of standardized dimensions. Specify pixels in range 0-1 or 0-255, matching input data

  • Model: Computer vision model

Getting Started

To get started with Patch attacks, see the Patch Attack Notebook, available via the IBM HEART-library GitHub repository.

For increased relevance to your use case, replace the selected hugging face model with your own model, and the test data set with a test dataset of your own.

Interpreting the Results

A model’s robustness can be assessed by comparing performance before and after an attack. For details on how to evaluate model performance and attack effectiveness, see this explanation of evaluation metrics.

Remediation Resources
  1. Pre-processing mitigation steps (image compression, spatial smoothing, variance minimization)

  2. Defenses like adversarial training (currently supported by ART)

Scalability

The examples of time and compute requirements below cover a variety of models and datasets to guide users’ expectations. These data can be used for resource planning for model testing and evaluation (T&E).

Execution Date

Dataset

Model

Attack

Device

Num samples

Peak memory

Duration (seconds)

Benign Acc

Advers Acc

12/6/24

MITLL/LADI-v2-dataset

MITLL/LADI-v2-classifier-small

PGD

CPU

50

1811.5

284

81.67

66.67

12/7/24

MITLL/LADI-v2-dataset

MITLL/LADI-v2-classifier-small

PGD

CPU

100

2069

527.64

78.17

72.17

What could go wrong?
  • Model and input data not compatible –> see ‘Compatibility considerations’ above

  • Patch may be too easily detected

  • Incorrect size, shape, or placement of the patch relative to the original image

  • [in physical patch use] Changes in lighting or object orientation can decrease effectiveness

For more information on causes of attack failure, see Carlini’s Indicators of Attack Failure and Tramer’s On Adaptive Attacks to Adversarial Example Defenses.

More Resources

For more information on which attacks are relevant in which conditions, please see HEART’s Adversarial Evaluation Pathways.