基于生成對抗網(wǎng)絡(luò )的對抗攻擊防御模型
doi: 10.16383/j.aas.2020.c200033
-
1.
暨南大學(xué)智能科學(xué)與工程學(xué)院 珠海 519070
-
2.
暨南大學(xué)信息科學(xué)技術(shù)學(xué)院 廣州 510632
Defense to Adversarial Attack with Generative Adversarial Network
-
1.
School of Intelligent Systems Science and Engineering, Jinan University (Zhuhai Campus), ZhuHai 519070
-
2.
College of Information Science and Technology, Jinan University, GuangZhou 510632
-
摘要: 深度神經(jīng)網(wǎng)絡(luò )在解決復雜問(wèn)題方面取得了驚人的成功, 廣泛應用于生活中各個(gè)領(lǐng)域, 但是最近的研究表明, 深度神經(jīng)網(wǎng)絡(luò )容易受到精心設計的對抗樣本的攻擊, 導致網(wǎng)絡(luò )模型輸出錯誤的預測結果, 這對于深度學(xué)習網(wǎng)絡(luò )的安全性是一種極大的挑戰. 對抗攻擊是深度神經(jīng)網(wǎng)絡(luò )發(fā)展過(guò)程中必須克服的一大障礙, 設計一種高效且能夠防御多種對抗攻擊算法, 且具有強魯棒性的防御模型是有效推動(dòng)對抗攻擊防御的方向之一, 探究能否利用對抗性攻擊來(lái)訓練網(wǎng)絡(luò )分類(lèi)器從而提高其魯棒性具有重要意義. 本文將生成對抗網(wǎng)絡(luò )(Generative adversarial networks, GAN)和現有的攻擊算法結合, 提出一種基于生成對抗網(wǎng)絡(luò )的對抗攻擊防御模型(AC-DefGAN), 利用對抗攻擊算法生成攻擊樣本作為GAN的訓練樣本, 同時(shí)在網(wǎng)絡(luò )中加入條件約束來(lái)穩定模型的訓練過(guò)程, 利用分類(lèi)器對生成器所生成樣本的分類(lèi)來(lái)指導GAN的訓練過(guò)程, 通過(guò)自定義分類(lèi)器需要防御的攻擊算法來(lái)生成對抗樣本以完成判別器的訓練, 從而得到能夠防御多種對抗攻擊的分類(lèi)器. 通過(guò)在MNIST、CIFAR-10和ImageNet數據集上進(jìn)行實(shí)驗, 證明訓練完成后, AC-DefGAN可以直接對原始樣本和對抗樣本進(jìn)行正確分類(lèi), 對各類(lèi)對抗攻擊算法達到很好的防御效果, 且比已有方法防御效果好、魯棒性強.
-
關(guān)鍵詞:
- 對抗攻擊 /
- 生成對抗網(wǎng)絡(luò ) /
- 對抗訓練 /
- 防御模型
Abstract: Deep neural networks have achieved amazing success in solving complex problems and are widely used in various fields of life. However, recent research has shown that deep neural networks are vulnerable to well-designed adversarial samples, which can cause the network model to output incorrect prediction results. This is a great challenge for the security of deep learning networks. Adversarial attacks are a major obstacle that must be overcome during the development of deep neural networks. Designing an efficient and robust defense model that is capable of defending against a variety of adversarial attacks is one of the directions to effectively promote the defense against attacks. It is of great significance to explore whether it is possible to use adversarial attacks to train a network classifier to improve its robustness. In this paper, by combining generative adversarial networks with existing attack algorithms, we propose an adversarial attack defense model AC-DefGAN. We use the adversarial attack model to generate attack samples as training samples, and the classification of samples generated by the generator is used to guide the training process of GAN, and then we add the conditional constraints to stabilize the model training process. Once our model is trained, it can effectively defend the corresponding adversarial attacks. The experimental results on the MNIST, CIFAR-10 and ImageNet datasets show that AC-DefGAN can directly correctly classify the original samples and the adversarial samples after training. It can defend a variety of adversarial attack algorithms, and has better defense effect and robustness than existing defense methods. -
圖 8 CIFAR-10數據集下AC-DefGAN判別器識別對抗樣本
$x_{real}^{adv}$ 的準確率Fig. 8 Accuracy of AC-DefGAN discriminator in identifying adversarial samples
$x_{real}^{adv}$ 表 1 MNIST數據集中各目標網(wǎng)絡(luò )、AC-DefGAN對各類(lèi)對抗樣本的誤分類(lèi)率
Table 1 Misclassification rates of various adversarial examples for target models and AC-DefGAN on MNIST
攻擊方法 VGG11 ResNet-18 Dense-Net40 InceptionV3 Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN FGSM(%) 95.31 0.66 87.83 0.58 79.32 0.56 92.91 0.61 BIM(%) 95.70 0.78 88.51 0.69 82.01 0.64 93.12 0.73 DeepFool(%) 96.24 1.42 89.74 1.13 88.61 1.10 93.80 1.25 C&W(%) 99.37 1.79 97.52 1.71 96.21 1.68 98.93 1.75 PGD(%) 98.13 1.61 95.81 1.52 93.26 1.37 97.15 1.58 下載: 導出CSV表 2 MNIST數據集中AC-DefGAN對各攻擊算法所生成對抗樣本的誤分類(lèi)率
Table 2 Misclassification rates of multiple adversarial examples for AC-DefGAN on MNIST
攻擊方法 VGG11 ResNet-18 Dense-Net40 InceptionV3 BIM、FGSM(%) 0.69 0.64 0.59 0.67 BIM、DeepFool(%) 1.11 0.91 0.87 1.01 FGSM、DeepFool(%) 1.05 0.86 0.81 0.93 BIM、FGSM、DeepFool(%) 1.01 0.84 0.79 0.89 下載: 導出CSV表 3 MNIST數據集中AC-DefGAN與各防御模型對各類(lèi)對抗樣本的誤分類(lèi)率
Table 3 Misclassification rates of various adversarial examplesfor AC-DefGAN and other defense strategies on MNIST
攻擊方法 MagNet Adv. training APE-GANm Defence-GAN-Rec AC-DefGAN FGSM(%) 80.91 18.40 2.80 1.11 0.66 BIM(%) 83.09 19.21 2.91 1.24 0.78 DeepFool(%) 89.93 23.16 2.43 1.53 1.42 C&W(%) 93.18 62.23 1.74 2.29 1.67 粗體表示最優(yōu)值. 下載: 導出CSV表 4 MNIST數據集中目標網(wǎng)絡(luò )、AC-DefGAN對不同擾動(dòng)閾值
$\varepsilon $ 的FGSM所產(chǎn)生對抗樣本的誤分類(lèi)率(%)Table 4 Misclassification rates of adversarial examples generated from FGSM with different
$\varepsilon $ for target model and AC-DefGAN on MNIST (%)FGSM的擾動(dòng)閾值 $\varepsilon $ 目標網(wǎng)絡(luò ) AC-DefGAN $\varepsilon = 0.1$ 96.29 0.68 $\varepsilon = 0.2$ 96.98 0.83 $\varepsilon = 0.3$ 97.35 0.91 $\varepsilon = 0.4$ 98.76 1.69 下載: 導出CSV表 5 CIFAR-10數據集中各目標網(wǎng)絡(luò )、AC-DefGAN對各類(lèi)對抗樣本的誤分類(lèi)率
Table 5 Misclassification rates of various adversarial examples for target models and AC-DefGAN on CIFAR-10
攻擊方法 VGG19 ResNet-18 Dense-Net40 InceptionV3 Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN FGSM(%) 77.81 16.95 74.92 13.07 73.37 13.74 76.74 15.49 BIM(%) 84.73 19.80 75.74 14.27 76.44 13.83 79.52 18.93 DeepFool(%) 88.52 23.47 83.48 22.55 86.16 21.79 88.26 23.15 C&W(%) 98.94 31.13 92.79 30.24 96.68 29.85 97.43 30.97 PGD(%) 87.13 28.37 86.41 26.29 86.28 25.91 87.04 26.74 下載: 導出CSV表 6 CIFAR-10數據集中AC-DefGAN對各攻擊算法所生成對抗樣本的誤分類(lèi)率
Table 6 Misclassification rates of multiple adversarial examples for AC-DefGAN on CIFAR-10
攻擊方法 VGG19 ResNet-18 Dense-Net40 InceptionV3 BIM、FGSM(%) 19.62 13.73 13.18 16.45 BIM、DeepFool(%) 21.71 18.65 17.42 22.14 FGSM、DeepFool(%) 20.95 15.21 16.35 19.78 BIM、FGSM、DeepFool(%) 21.37 17.56 16.93 20.81 下載: 導出CSV表 7 CIFAR-10數據集中AC-DefGAN與各防御模型對各類(lèi)對抗樣本的誤分類(lèi)率
Table 7 Misclassification rates of various adversarial examplesfor AC-DefGAN and other defense strategies on CIFAR-10
攻擊方法 目標網(wǎng)絡(luò )F Adv. training APE-GANm Defence-GAN-Rec AC-DefGAN FGSM(%) 82.83 32.68 26.41 22.50 16.91 BIM(%) 89.75 39.49 24.33 21.72 19.83 DeepFool(%) 93.54 44.71 25.29 28.09 25.56 C&W(%) 98.71 78.23 30.50 32.21 30.24 粗體表示最優(yōu)值. 下載: 導出CSV表 8 CIFAR-10數據集中目標網(wǎng)絡(luò )、AC-DefGAN對不同擾動(dòng)閾值
$\varepsilon $ 的FGSM所產(chǎn)生對抗樣本的誤分類(lèi)率(%)Table 8 Misclassification rates of adversarial examples generated from FGSM with different
$\varepsilon $ for target model and AC-DefGAN on CIFAR-10 (%)FGSM的擾動(dòng)閾值 $\varepsilon $ 目標網(wǎng)絡(luò ) AC-DefGAN $\varepsilon = 0.1$ 77.82 12.92 $\varepsilon = 0.2$ 80.89 17.47 $\varepsilon = 0.3$ 82.33 18.86 $\varepsilon = 0.4$ 84.74 24.13 下載: 導出CSV表 9 ImageNet數據集中各類(lèi)目標網(wǎng)絡(luò )、AC-DefGAN對各類(lèi)對抗樣本的誤分類(lèi)率
Table 9 Misclassification rates of various adversarial examples for target models and AC-DefGAN on ImageNet
攻擊方法 VGG19 ResNet-18 Dense-Net40 InceptionV3 Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN Target model AC-DefGAN FGSM(%) 71.21 39.42 69.14 38.52 68.42 37.92 69.65 38.77 DeepFool(%) 88.45 44.80 85.73 42.96 86.24 43.17 87.67 44.63 C&W(%) 97.39 39.13 96.19 36.75 95.84 36.74 96.43 38.68 下載: 導出CSV表 10 ImageNet數據集中AC-DefGAN與各防御模型對各攻擊算法對抗樣本的誤分類(lèi)率(%)
Table 10 Misclassification rates of various adversarial examples for AC-DefGAN and other defense strategies on ImageNet (%)
攻擊方法 目標網(wǎng)絡(luò ) APE-GANm AC-DefGAN FGSM 72.92 40.14 38.94 C&W 97.84 38.70 36.52 BIM 76.79 41.28 40.78 DeepFool 94.71 45.93 44.31 粗體表示最優(yōu)值. 下載: 導出CSV亚洲第一网址_国产国产人精品视频69_久久久久精品视频_国产精品第九页 -
[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436?444 doi: 10.1038/nature14539 [2] Hinton G E. What kind of a graphical model is the brain? In: Proceedings of the 19th International Joint Conference on Artificial Intelligence. Burlington, USA: Morgan Kaufmann, 2005. 1765−1775 [3] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504 ?507 doi: 10.1126/science.1127647 [4] Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Piscataway, NJ, USA: IEEE, 2011. 315?323 [5] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In: Proceedings of Annual Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2012. 1097−1105 [6] Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer, 2014. 818−833 [7] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv: 1409.1556v6, 2015. [8] Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE: Boston, USA: 2015. 1?9 [9] He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. IEEE: Las Vegas, USA: 2016. 770?778 [10] Chakraborty A, Alam M, Dey V, et al. Adversarial Attacks and Defences: A Survey. arXiv preprint arXiv: 1810.00069, 2018. [11] Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I J, Fergus R. Intriguing properties of neural networks. arXiv preprint arXiv: 1312. 6199, 2014. [12] Akhtar N, Mian A. Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 2018, 6: 14410?14430 doi: 10.1109/ACCESS.2018.2807385 [13] G oodfellow I J, Shlens J, and Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv: 1412.6572, 2015. [14] Kurakin A, Goodfellow I J, Bengio S. Adversarial examples in the physical world. arXiv preprint, arXiv: 1607.02533v4, 2017. [15] Tramer F, Kurakin A, Papernot N, et al. Ensemble adversarial training: Attacks and defenses. arXiv preprint, arXiv: 1705.07204v5, 2020. [16] Moosavi-Dezfooli S M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016. 2574−2582 [17] Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv: 1706.06083v4, 2019. [18] Lyu C, Huang K Z, and Liang H N. A Unified Gradient Regularization Family for Adversarial Examples. In: Proceedings of 2015 IEEE International Conference on Data Mining. IEEE, 2015. 301−309 [19] Shaham U, Yamada Y, Negahban S. Understanding adversarial training: Increasing local stability of supervised models through robust optimization. Neurocomputing, 2018, 307: 195?204 doi: 10.1016/j.neucom.2018.04.027 [20] Nayebi A, Ganguli S. Biologically inspired protection of deep networks from adversarial attacks. arXiv preprint arXiv: 1703.09202, 2017. [21] Dziugaite G K, Ghahramani Z, Roy D M. A study of the effect of jpg compression on adversarial images. arXiv preprint arXiv: 1608.00853, 2016. [22] Guo C, Rana M, Cisse M, et al. Countering adversarial images using input transformations. arXiv preprint arXiv: 1711.00117v3, 2018. [23] Das N, Shanbhogue M, Chen S T, et al. Keeping the bad guys out: Protecting and vaccinating deep learning with jpeg compression. arXiv preprint arXiv: 1705.02900, 2017. [24] Xie C, Wang J, Zhang Z, et al. Adversarial examples for semantic segmentation and object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2017. 1369−1378 [25] Gu S, Rigazio L. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv: 1412.5068v4, 2015. [26] Papernot N, McDaniel P, Wu X, et al. Distillation as a defense to adversarial perturbations against deep neural networks. In: Proceedings of 2016 IEEE Symposium on Security and Privacy (SP). IEEE, 2016. 582−597 [27] Ross A S, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Thirty-second AAAI conference on artificial intelligence. 2018. [28] Cisse M, Adi Y, Neverova N, et al. Houdini: Fooling deep structured prediction models. arXiv preprint arXiv: 1707.05373, 2017. [29] Akhtar N, Liu J, Mian A. Defense against universal adversarial perturbations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3389−3398 [30] Samangouei P, Kabkab M, Chellappa R. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv: 1805.06605, 2018. [31] Meng D, Chen H. Magnet: a two-pronged defense against adversarial examples. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. 2017: 135−147 [32] Xu W, Evans D, Qi Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv: 1704.01155, 2017. [33] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. In: Proceedings of Advances in Neural Information Processing Systems. 2014: 2672−2680 [34] Yu Y, Gong Z, Zhong P, et al. Unsupervised representation learning with deep convolutional neural network for remote sensing images. In: Proceedings of International Conference on Image and Graphics. Springer, Cham, 2017: 97−108 [35] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784, 2014. [36] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier gans. In: Proceedings of International Conference on Machine Learning. 2017: 2642−2651 [37] 林懿倫, 戴星原, 李力, 王曉, 王飛躍. 人工智能研究的新前線(xiàn): 生成式對抗網(wǎng)絡(luò ). 自動(dòng)化學(xué)報, 2018, 44(5): 775?792Lin Yi-Lun, Dai Xing-Yuan, Li Li, Wang Xiao, Wang FeiYue. The new frontier of AI research: generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 775?792 [38] Shen S, Jin G, Gao K, et al. Ape-gan: Adversarial perturbation elimination with gan. arXiv preprint arXiv: 1707.05474, 2017. [39] 王坤峰, 左旺孟, 譚營(yíng), 秦濤, 李力, 王飛躍. 生成式對抗網(wǎng)絡(luò ): 從生成數據到創(chuàng )造智能. 自動(dòng)化學(xué)報, 2018, 44(5): 769?774Wang Kun-Feng, Zuo Wang-Meng, Tan Ying, Qin Tao, Li Li, Wang Fei-Yue. Generative adversarial networks: from generating data to creating intelligence. Acta Automatica Sinica, 2018, 44(5): 769?774 [40] 孔銳, 黃鋼. 基于條件約束的膠囊生成對抗網(wǎng)絡(luò ). 自動(dòng)化學(xué)報, 2020, 46(1): 94?107KONG Rui, HUANG Gang. Conditional Generative Adversarial Capsule Networks. Acta Automatica Sinica, 2020, 46(1): 94?107 [41] Zhang H, Chen H, Song Z, et al. The limitations of adversarial training and the blind-spot attack. arXiv preprint arXiv: 1901.04684, 2019. [42] 唐賢倫, 杜一銘, 劉雨微, 李佳歆, 馬藝瑋. 基于條件深度卷積生成對抗網(wǎng)絡(luò )的圖像識別方法. 自動(dòng)化學(xué)報, 2018, 44(5): 855?864TANG Xian-Lun, DU Yi-Ming, LIU Yu-Wei, LI Jia-Xin, MA Yi-Wei. Image Recognition With Conditional Deep Convolutional Generative Adversarial Networks. Acta Automatica Sinica, 2018, 44(5): 855?864 [43] Kussul E, Baidyk T. Improved method of handwritten digit recognition tested on MNIST database. Image and Vision Computing, 2004, 22(12): 971?981 doi: 10.1016/j.imavis.2004.03.008 [44] Hinton G E, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv: 1207.0580, 2012. [45] Deng J, Dong W, Socher R, et al. Imagenet: A large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2009. 248−255 [46] Carlini N, Wagner D. Towards evaluating the robustness of neural networks. In: Proceedings of 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017. 39−57
計量
- 文章訪(fǎng)問(wèn)數: 2556
- HTML全文瀏覽量: 1537
- 被引次數: 0