Defect Detection of Metal Surface Based on Attention Cascade R-CNN

doi:10.3778/j.issn.1673-9418.2007005

Abstract

Abstract:

Automatic metal surface defect detection is an important part of quality control in industrial production. In complex industrial scenarios, traditional image processing methods cannot detect defect areas effectively, and manual inspection is time-consuming and labor-intensive. How to quickly and effectively detect defects for metal surface has become the key to improve the efficiency of the production. However, the complex lighting conditions on the metal surface are prone to strong reflections and reflections, and defects are varied and have unclear boundaries, which poses a great challenge to defect detection. This paper proposes a novel cascade R-CNN (region-based convolutional neural network) defect detection method based on attention mechanism to classify and locate metal surface defects with high-quality. A lightweight network module is designed to calculate attention along two separate dimensions, spatial and channel. It can be inserted into a convolutional neural network and effectively improve the feature extraction ability. To improve the detection accuracy, two cascade detection heads are trained with increasing IoU thresholds. The output of the previous head is used as the next training set for the next head to refine the detection results in turn. In addition, various factors affecting performance are explored in a large number of experi-ments. Compared with existing methods, the proposed method has high accuracy and good robustness, and can be practically applied in production.

Key words: defect detection, object detection, attention mechanism, deep learning, convolutional neural network (CNN)

摘要：

金属表面缺陷检测是工业生产质量把控的重要一环。在复杂的工业场景中，传统的图像处理方法无法有效地检测缺陷区域，而人工检测既费时又费力。快速有效地检测金属表面缺陷已成为提高生产效率的关键。复杂的光照条件会使金属表面产生强反射和倒影，缺陷种类多样、边界模糊，给缺陷检测问题带来巨大的挑战。提出了一种基于注意力机制的级联网络缺陷检测算法（R-CNN），对金属表面缺陷进行高质量分类和定位。设计了一个轻量级的网络模块，该模块沿着空间和通道计算注意力，将其插入到卷积神经网络中可有效提高特征提取能力；为了提高检测精度，将两个IoU阈值递增的检测头部网络级联，使用前一个头部的输出作为下一个头部的输入，依次细化检测结果。在大量实验中探索影响性能的各种因素，与现有方法进行比较，该方法具有更高的精度和良好的鲁棒性，可实际应用于生产中。

关键词: 缺陷检测, 目标检测, 注意力机制, 深度学习, 卷积神经网络（CNN）

FANG Junting, TAN Xiaoyang. Defect Detection of Metal Surface Based on Attention Cascade R-CNN[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(7): 1245-1254.

方钧婷, 谭晓阳. 注意力级联网络的金属表面缺陷检测算法[J]. 计算机科学与探索, 2021, 15(7): 1245-1254.

References

[1] SINGH M, SINGH S, JAISWAL J, et al. Autonomous rail track inspection using vision based system[C]//Proceedings of the 2006 IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety, Ale-xandria, Oct 16-17, 2006. Piscataway: IEEE, 2006: 56-59.
[2] ZHANG X, DING Y, YAN P. Vision inspection of metal surface defects based on infrared imaging[J]. Acta Optica Sinica, 2011, 31(3): 112-120.
[3] CHOI D C, JEON Y J, KIM S H, et al. Detection of pinholes in steel slabs using Gabor filter combination and morpholo-gical features[J]. Journal of the Iron and Steel Institute of Japan, 2017, 57: 1045-1053.
[4] SHEN H, LI S, GU D, et al. Bearing defect inspection based on machine vision[J]. Measurement, 2012, 45: 719-733.
[5] GHORAI S, MUKHERJEE A, GANGADARAN M, et al. Automatic defect detection on hot-rolled flat steel products[J]. IEEE Transactions on Instrumentation and Measurement, 2013, 62(3): 612-621.
[6] CHA Y, CHOI W, SUH G, et al. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types[J]. Computer-Aided Civil and Infra-structure Engineering, 2018, 33(9): 731-747.
[7] WEN S, CHEN Z, LI C. Vision-based surface inspection sys-tem for bearing rollers using convolutional neural networks[J]. Applied Sciences, 2018, 8(12): 2565.
[8] TAO X, ZHANG D, MA W, et al. Automatic metallic surface defect detection and recognition with convolutional neural networks[J]. Applied Sciences, 2018, 8(9): 1575.
[9] LI Y, HUANG H, XIE Q, et al. Research on a surface defect detection algorithm based on MobileNet-SSD[J]. Applied Sciences, 2018, 8(9): 1678.
[10] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, Columbus, Jun 23-28, 2014. Piscataway: IEEE, 2014: 580-587.
[11] UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154-171.
[12] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//LNCS 8691: Proceedings of the 13th European Confer-ence on Computer Vision, Zurich, Sep 6-12, 2014. Cham: Springer, 2014: 346-361.
[13] GIRSHICK R B. Fast R-CNN[C]//Proceedings of the 14th IEEE International Conference on Computer Vision, Santiago, Dec 7-13, 2015. Piscataway: IEEE, 2015: 1440-1448.
[14] HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]//Proceedings of the 16th IEEE International Conference on Computer Vision, Venice, Oct 22-29, 2017. Piscataway: IEEE, 2017: 2980-2998.
[15] REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal net-works[C]//Proceedings of the 28th Annual Conference on Neural Information Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 91-99.
[16] DAI J F, LI Y, HE K M, et al. R-FCN: object detection via region-based fully convolutional networks[C]//Proceedings of the 29th Annual Conference on Neural Information Pro-cessing Systems, Barcelona, Dec 5-10, 2016. Red Hook: Curran Associates, 2016: 379-387.
[17] LI Y H, CHEN Y T, WANG N Y, et al. Scale-aware trident networks for object detection[C]//Proceedings of the 17th IEEE International Conference on Computer Vision, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6053-6062.
[18] CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the 31st IEEE Conference on Computer Vision and Pattern Recogni-tion, Salt Lake City, Jun 18-22, 2018. Piscataway: IEEE, 2018: 6154-6162.
[19] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, Jun 27-30, 2016. Piscataway: IEEE, 2016: 779-788.
[20] REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 6517-6525.
[21] REDMON J, FARHADI A. YOLOv3: an incremental im-provement[J]. arXiv:1804.02767, 2018.
[22] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C]//LNCS 9905: Proceedings of the 14th European Conference on Computer Vision, Amsterdam, Oct 11-14, 2016. Cham: Springer, 2016: 21-37.
[23] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 99: 2999-3007.
[24] LAW H, DENG J. CornerNet: detecting objects as paired keypoints[C]//LNCS 11218: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 765-781.
[25] DUAN K, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of the 32nd IEEE Con-ference on Computer Vision and Pattern Recognition, Seoul, Oct 27-Nov 2, 2019. Piscataway: IEEE, 2019: 6568-6577.
[26] ITTI L, KOCH C, NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transac-tions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.
[27] RONALD R. The dynamic representation of scenes[J]. Visual Cognition, 2000, 7: 17-42.
[28] CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[C]//Procee-dings of the 29th International Conference on Neural Infor-mation Processing Systems, Montreal, Dec 7-12, 2015. Red Hook: Curran Associates, 2015: 577-585.
[29] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates, 2014: 3104-3112.
[30] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Mon-treal, Dec 8-13, 2014. Red Hook: Curran Associates, 2017: 5998-6008.
[31] GEHRING J, AULI M, GRANGIER D, et al. Convolutional sequence to sequence learning[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Aug 6-11, 2017: 1243-1252.
[32] ITTI L, KOCH C. Computational modelling of visual atten-tion[J]. Nature Reviews Neuroscience, 2001, 2(3): 194-203.
[33] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]//Proceedings of the Annual Conference on Neural Information Processing Systems, Montreal, Dec 8-13, 2014. Red Hook: Curran Associates, 2014: 2204-2212.
[34] WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recogni-tion, Honolulu, Jul 21-26, 2017. Piscataway: IEEE, 2017: 6450-6458.
[35] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023.
[36] HU J, SHEN L, ALBANIE S, et al. Gather-excite: exploiting feature context in convolutional neural networks[C]//Pro-ceedings of the Annual Conference on Neural Information Processing Systems, Montréal, Dec 3-8, 2018: 9423-9433.
[37] ZHAO H S, ZHANG Y, LIU S, et al. PSANet: point-wise spatial attention network for scene parsing[C]//LNCS 11213: Proceedings of the 15th European Conference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 270-286.
[38] WANG X L, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2017: 7794-7803.
[39] HU H, GU J Y, ZHANG Z, et al. Relation networks for object detection[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, Jun 18-23, 2018. Piscataway: IEEE, 2017: 3588-3597.
[40] CAO Y, XU J R, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//LNCS 11211: Proceedings of the 17th IEEE International Conference on Computer Vision, Seoul, Oct 27-28, 2019. Piscataway: IEEE, 2019: 1971-1980.
[41] WOO S, PARK J, LEE J, et al. CBAM: convolutional block attention module[C]//Proceedings of the 15th European Con-ference on Computer Vision, Munich, Sep 8-14, 2018. Cham: Springer, 2018: 3-19.
[42] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[J]. arXiv:1311.2901, 2013.
[43] DING S M, LIU Z F, LI C L. AdaBoost learning for fabric defect detection based on HOG and SVM[C]//Proceedings of the 2011 International Conference on Multimedia Tech-nology, Hangzhou, Jul 26-28, 2011. Piscataway: IEEE, 2011: 2903-2906.
[44] CHONDRONASIOS A, POPOV I, JORDANOV I. Feature selection for surface defect classification of extruded alu-minum profiles[J]. International Journal of Advanced Manu-facturing Technology, 2016, 83: 33-41.