# How to Simulate White Box Attacks ## Introduction This notebook provides a beginner friendly introduction to using adversarial attacks on image classification as part of Test & Evaluation of a small benchmark dataset based on drone imagery. The first attack we will use is Projected Gradient Descent (PGD), a simple attack based on the model gradients warranted the input. We then visualize the attack and afterwards constrain the change applied to the image to a fraction of the input image using a patch attack. Computing the performance under these white-box attacks is a crucial step in T&E. :::: {grid} 5 :::{grid-item} **Intended Audience:** All T&E Users ::: :::{grid-item} **Requirements:** Basic Python and Torchvision / ML Skills ::: :::{grid-item} **Notebook Runtime:** Full run of the notebook: <10 minutes ::: :::{grid-item} **Reading time:** ~10 Minutes ::: :::{grid-item} **Order of Completion:** 1-2.; 3. and 4. can be done in any order or independently. ::: :::: ::::{grid} 2 :::{grid-item} :columns: 8 Before you begin, you will want to make sure that you download the how-to guide's companion Jupyter notebook. This notebook allows you to follow along in your own environment and interact with the code as you learn. The code snippets are also included in the documentation, but the notebook is provided for ease of use and to enable you to try things on your own. ::: :::{grid-item} :child-align: center :columns: 4 ```{note} The [How to Simulate White-Box Attacks for Image Classification Companion Notebook](https://github.com/IBM/heart-library/blob/main/notebooks/how_tos/image_classification/1_How_to_Simulate_White-box_attacks_for_Image_Classification.ipynb) can be downloaded via the HEART public GitHub. ``` ::: :::: ### Contents 1. Imports and set-up 1. Load data and model 1. Projected Gradient Descent Attack 1. Patch attack 1. Targeted White-box Attack 1. Conclusion 1. Next Steps ### Learning Objectives - How to define a custom model and use drone imagery - How to run an attack with JATIC - How to inspect images and understand whether they fool the model - How white box attacks (PGD, Patch attacks) work ## 1. Imports and Set-up We import all necessary libraries for this tutorial. In this order, we first import general libraries such as numpy, then load relevant methods from ART. We then load the corresponding HEART functionality and specific torch functions to support the model. Lastly, we use a command to plot within the notebook. ```python import numpy as np import os import torch from datasets import load_dataset import matplotlib.pyplot as plt #ART imports from art.attacks.evasion import ProjectedGradientDescentPyTorch from art.attacks.evasion import AdversarialPatchPyTorch #HEART imports from heart_library.estimators.classification.pytorch import JaticPyTorchClassifier from heart_library.metrics import AccuracyPerturbationMetric from heart_library.attacks.attack import JaticAttack from heart_library.metrics import HeartAccuracyMetric #torchvision imports import torchvision from torchvision import transforms %matplotlib inline ``` ## 2. Load Drone Data and Model for Classification We load the data, importing only a small part to save compute for this small demonstration. We then define the model and wrap it as JATIC pytorch classifier. The data can be replaced as desired by the user - we first define the six labels, specify the subset used in this notebook (10 images), specify a consistent size and upscale the data to 224 x 224 pixels and then wrap everything as a modified dataframe. ```python classes = { 0:'Building', 1:'Construction Site', 2:'Engineering Vehicle', 3:'Fishing Vessel', 4:'Oil Tanker', 5:'Vehicle Lot' } data = load_dataset("CDAO/xview-subset-classification", split="test[0:10]") idx = 3 plt.title(f"Prediction: {classes[data[idx]['label']]}") plt.imshow(data[idx]['image']) ''' Transform dataset ''' IMAGE_H, IMAGE_W = 224, 224 preprocess = transforms.Compose([ transforms.Resize((IMAGE_H, IMAGE_W)), transforms.ToTensor() ]) dataT = data.map(lambda x: {"image": preprocess(x["image"]), "label": x["label"]}) to_image = lambda x: transforms.ToPILImage()(torch.Tensor(x)) sample_data = torch.utils.data.Subset(dataT, range(10)) ``` ```text Resolving data files: 0%| | 0/31 [00:00int: return len(self.images) def __getitem__(self, ind: int): image = np.asarray(self.images[ind]['image']) return image, 4, {} targeted_sample_data = TargetedImageDataset(sample_data) ``` The only other step is to add the targeted parameter and set it to True. This tells the PGD algorithm to minimize the loss between the models prediction and the provided target label, in this case "Oil Tanker". ```python #define attack pgd_attack = ProjectedGradientDescentPyTorch(estimator=jptc, max_iter=100, eps=0.1, eps_step=0.1, targeted=True) #wrap and run attack jattack = JaticAttack(pgd_attack, norm=2) x_adv, y, metadata = jattack(data=targeted_sample_data) #plot adversarial counterpart of image above pred_batch = jptc(x_adv) plt.imshow(x_adv[1].transpose(1,2,0)) _ = plt.title(f'adversarial classification: {classes[np.argmax(pred_batch,axis=1)[1]]}') plt.show() ``` ```text PGD - Batches: 0%| | 0/1 [00:00