计算机科学与探索 ›› 2021, Vol. 15 ›› Issue (11): 2116-2126.DOI: 10.3778/j.issn.1673-9418.2011090

• 类脑计算 • 上一篇    下一篇

以双字线双阈值4T SRAM为基础的存内计算设计

蔺智挺,钮建超,吴秀龙,彭春雨   

  1. 安徽大学 电子信息工程学院,合肥 230601
  • 出版日期:2021-11-01 发布日期:2021-11-09

Computing In-Memory Design Based on Double Word Line and Double Threshold 4T SRAM

LIN Zhiting, NIU Jianchao, WU Xiulong, PENG Chunyu   

  1. School of Electronic Information Engineering, Anhui University, Hefei 230601, China
  • Online:2021-11-01 Published:2021-11-09

摘要:

为了应对冯·诺依曼计算架构的存储墙,存内计算(CIM)架构将逻辑嵌入到存储器中,在读取数据的同时完成运算,使存储单元具备计算能力并且减少了处理器和存储器之间的数据传输。为实现大容量、低成本存储器设计,提出了一种以双字线双阈值4T SRAM为基础的存储系统,不仅可实现数据的存储与读取,而且还可实现BCAM运算和与、或非、异或等逻辑运算。逻辑运算时,经译码电路任选两行存储数据,位线均预放电至低电平,位线电压通过位线端灵敏放大器与参考电压比较后输出运算结果。BCAM运算时,外部输入数据经译码电路译码后实现对存储单元左右传输管的开、断控制,位线端灵敏放大器经或非门输出匹配结果。在65 nm CMOS工艺下对所提电路进行搭建并仿真。4T存储单元相较于6T存储单元的存储面积减少了25%,双字线4T存储结构相较于单字线4T存储结构在超大规模集成电路(VLSI)应用中读功耗可节省47%左右。BCAM运算时数据匹配最大功耗为909.72 FJ,N列的阵列运算速度在字线电压为600 mV时可达16 161.6×N MB/Hz。

关键词: 存内计算(CIM), BCAM, 4T SRAM

Abstract:

In order to cope with the storage wall of the von Neumann computing architecture, the computing in-memory (CIM) architecture embeds logic in the memory, and completes the operation while reading the data, so that the storage unit has computing power and reduces processing data transfer between the device and the memory. In order to realize the design of large-capacity and low-cost memory, this paper proposes a storage system based on 4T SRAM (static random access memory) with double word line and double  threshold, which can not only realize data storage and reading, but also realize BCAM (binary content addressable memory) operations and logic operations such as AND, NOR, and XOR. During logic operation, two rows of storage data are selected through the decoding circuit, the bit lines are all pre-discharged to a low level, and the bit line voltage is compared with the reference voltage through the bit line end sensitive amplifier and the operation result is output. During BCAM operation, the external input data are decoded by the decoding circuit to realize the on and off control of the left and right transmission tubes of the storage unit, and the bit line end sensitive amplifier outputs the matching result through the NOR gate. The proposed circuit is built and simulated under 65 nm CMOS technology. Compared with the 6T memory cell, the storage area of the 4T memory cell is reduced by 25%. Compared with the single word line 4T memory structure, the double word line 4T memory structure can save about 47% of the read power consumption in very large scale integration (VLSI) applications. The maximum power consumption of data matching during BCAM operation is 909.72 FJ, and the array operation speed of N columns can reach 16161.6×N MB/Hz when the word line voltage is 600 mV.

Key words: computing in-memory (CIM), binary content addressable memory (BCAM), 4T SRAM