[1] HAN H S, HU X, HAO Y F, et al. Real-time robust video object detection system against physical-world adversarial attacks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(1): 366-379.
[2] 吴瑞东, 刘冰, 付平, 等. 应用于极致边缘计算场景的卷积神经网络加速器架构设计[J]. 电子与信息学报, 2023, 45(6): 1933-1943.
WU R D, LIU B, FU P, et al. Convolutional neural network accelerator architecture design for ultimate edge computing scenario[J]. Journal of Electronics & Information Technology, 2023, 45(6): 1933-1943.
[3] MOONS B, BANKMAN D, VERHELST M. Embedded deep learning: algorithms, architectures and circuits for always- on neural network processing[M]. Berlin, Heidelberg: Springer, 2019: 55-111.
[4] 郭朝鹏, 王馨昕, 仲昭晋, 等. 能耗优化的神经网络轻量化方法研究进展[J]. 计算机学报, 2023, 46(1): 85-102.
GUO C P, WANG X X, ZHONG Z J, et al. Research advance on neural network lightweight for energy optimization[J]. Chinese Journal of Computers, 2023, 46(1): 85-102.
[5] JANG M, KIM J, NAM H, et al. Zero and narrow-width value-aware compression for quantized convolutional neural networks[J]. IEEE Transactions on Computers, 2024, 73(1): 249-262.
[6] FUJIWARA Y, KAWAHARA T. BNN training algorithm with ternary gradients and BNN based on MRAM array[C]//Proceedings of the 2023 IEEE Region 10 Conference. Piscataway: IEEE, 2023: 311-316.
[7] MAO W D, WANG M Q, XIE X R, et al. Hardware accelerator design for sparse DNN inference and training: a tutorial[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2024, 71(3): 1708-1714.
[8] ARUNACHALAM A, KUNDU S, RAHA A, et al. A novel low-power compression scheme for systolic array-based deep learning accelerators[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2023, 42(4): 1085-1098.
[9] CHEN T S, DU Z D, SUN N H, et al. DianNao[J]. ACM SIGARCH Computer Architecture News, 2014, 42(1): 269-284.
[10] DU Z D, FASTHUBER R, CHEN T S, et al. ShiDianNao: shifting vision processing closer to the sensor[C]//Proceedings of the 42nd Annual International Symposium on Computer Architecture. New York: ACM, 2015: 92-104.
[11] 鲁蔚征, 张峰, 贺寅烜, 等. 华为昇腾神经网络加速器性能评测与优化[J]. 计算机学报, 2022, 45(8): 1618-1637.
LU W Z, ZHANG F, HE Y X, et al. Evaluation and optimization for Huawei ascend neural network accelerator[J]. Chinese Journal of Computers, 2022, 45(8): 1618-1637.
[12] JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture. New York: ACM, 2017: 1-12.
[13] ROSS J, THORSON G M. Rotating data for neural network computations: US9747548[P]. 2017-08-29.
[14] CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J]. IEEE Journal of Solid-State Circuits, 2017, 52(1): 127-138.
[15] FAN H X, LIU S L, FERIANC M, et al. A real-time object detection accelerator with compressed SSDLite on FPGA[C]//Proceedings of the 2018 International Conference on Field-Programmable Technology. Piscataway: IEEE, 2018: 14-21.
[16] NGUYEN D T, NGUYEN T N, KIM H, et al. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(8): 1861-1873.
[17] LI R D, WANG Y, LIANG F, et al. Fully quantized network for object detection[C]//Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 2805-2814.
[18] SHARMA H, PARK J, SUDA N, et al. Bit fusion: bit-level dynamically composable architecture for accelerating deep neural network[C]//Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture. Piscataway: IEEE, 2018: 764-775.
[19] HAO Y F, ZHAO Y W, LIU C X, et al. Cambricon-P: a bitflow architecture for arbitrary precision computing[C]//Proceedings of the 2022 55th IEEE/ACM International Symposium on Microarchitecture. Piscataway: IEEE, 2022: 57-72.
[20] ZHOU S, WU Y, NI Z, et al. DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients[EB/OL]. [2024-09-14]. https://arxiv.org/abs/1606.06160.
[21] LI M, ZHANG F, ZHANG C T. Branch convolution quantization for object detection[J]. Machine Intelligence Research, 2024, 21(6): 1192-1200.
[22] CHEN Y W, WANG R H, CHENG Y H, et al. SUN: dynamic hybrid-precision SRAM-based CIM accelerator with high macro utilization using structured pruning mixed-precision networks[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024, 43(7): 2163-2176.
[23] PHAM N S, SUH T. Optimization of microarchitecture and dataflow for sparse tensor CNN acceleration[J]. IEEE Access, 2023, 11: 108818-108832.
[24] KNAG P C, CHEN G K, SUMBUL H E, et al. A 617-TOPS/W all-digital binary neural network accelerator in 10-nm FinFET CMOS[J]. IEEE Journal of Solid-State Circuits, 2021, 56(4): 1082-1092.
[25] ISONO T, YAMAKURA M, SHIMAYA S, et al. A 12.1 TOPS/W mixed-precision quantized deep convolutional neural network accelerator for low power on edge/endpoint device[C]//Proceedings of the 2020 IEEE Asian Solid-State Circuits Conference. Piscataway: IEEE, 2020: 1-4.
[26] CHOI W H, CHIU P F, MA W, et al. An in-flash binary neural network accelerator with SLC NAND flash array[C]//Proceedings of the 2020 IEEE International Symposium on Circuits and Systems. Piscataway: IEEE, 2020: 1-5.
[27] DORRANCE R, DASALUKUNTE D, WANG H C, et al. Energy efficient BNN accelerator using CiM and a time-interleaved hadamard digital GRNG in 22nm CMOS[C]//Proceedings of the 2022 IEEE Asian Solid-State Circuits Conference. Piscataway: IEEE, 2022: 2-4.
[28] BANKMAN D, YANG L T, MOONS B, et al. An always-on 3.8μJ/86% CIFAR-10 mixed-signal binary CNN processor with all memory on chip in 28nm CMOS[C]//Proceedings of the 2018 IEEE International Solid-State Circuits Conference. Piscataway: IEEE, 2018: 222-224. |