一种旁路机制下的低功耗片上网络功率门控设计

欧阳一鸣; 陈志远; 徐冬雨; 梁华国

doi:10.11999/JEIT231257

一种旁路机制下的低功耗片上网络功率门控设计

doi: 10.11999/JEIT231257 cstr: 32379.14.JEIT231257

合肥工业大学计算机与信息学院合肥 230601

基金项目: 国家自然科学基金(62374049)

详细信息

作者简介:
欧阳一鸣：男，教授，研究方向为片上网络与片上系统、嵌入式系统的综合与测试、数字系统设计自动化等

陈志远：男，硕士生，研究方向为片上网络的功率门控

徐冬雨：男，博士生，研究方向为片上网络的可重构技术

梁华国：男，教授，研究方向为容错计算与硬件安全、嵌入式系统综合与测试、智能控制系统等

通讯作者:
陈志远　czy20221002@163.com

中图分类号: TP302
计量
- 文章访问数: 592
- HTML全文浏览量: 395
- PDF下载量: 52
- 被引次数: 0
出版历程
- 收稿日期: 2023-11-14
- 修回日期: 2024-04-23
- 网络出版日期: 2024-05-13
- 刊出日期: 2024-08-30

A Low-Power Network-on-Chip Power-Gating Design with Bypass Mechanism

School of Computing and Information, Hefei University of Technology, Hefei 230601, China

Funds: The National Natural Science Foundation of China (62374049)

摘要

摘要: 随着技术尺寸的缩小，静态功耗在片上网络 (NoC)的功耗开销中占据主导地位。功率门控作为一种通用的功耗节约技术，将NoC中空闲模块关闭以降低静态功耗。然而，传统的功率门控技术带来了诸如数据包唤醒延迟，盈亏平衡时间等问题。为了解决上述问题，该文提出代替功率门控路由器进行数据包传输的分区旁路传输机制 (PBTI)，并基于该旁路机制设计了低延迟低功耗的功率门控方案。PBTI使用相互独立的旁路分别处理东西方向传输的数据包，并在旁路内部使用公共的缓冲区以提高缓冲区利用率。PBTI可以在路由器断电时实现数据包的注入、传输和弹出。即使网络中所有的路由器均处于功率门控状态，数据包也可以从源节点传输到目的节点。当流量增大超过PBTI的传输能力时，路由器以列为单位进行统一的唤醒。实验结果表明，与不使用功率门控的NoC相比，所提方案降低了83.4%的静态功耗和17.2%的数据包延迟，同时只额外增加了6.2%的面积开销。相较于常规的功率门控方案该文功率门控设计实现了更低的功耗和延迟，具有显著的优势。
- 片上网络 /
- 功率门控 /
- 旁路 /
- 静态功耗
Abstract: Static power consumption dominates the power overhead of Network-on-Chip (NoC) as the technology size shrinks. Power gating, a generalized power saving technique, turns off idle modules in NoCs to reduce static power consumption. However, the conventional power gating technique brings problems such as packet wake-up delay, break-even time, etc. To solve the above problems, the Partition Bypass Transmission Infrastructure (PBTI), which replaces the power gated router for packet transmission, is proposed in this paper, and a low-latency, low-power power gating scheme has been designed based upon this bypass mechanism. PBTI uses mutually independent bypasses to handle east-west packets separately, and uses common buffers within the bypasses to improve buffer utilization. PBTI can inject, transmit, and eject packets when the router is powered off. Packets can be transmitted from the source node to the destination node even if all routers in the network are power gated. When the traffic increases beyond the transmission capacity of PBTI, the routers perform a uniform wake-up in columns. Experimental results show that compared to the NoC without power gating, the scheme in this paper reduces 83.4% of static power consumption and 17.2% of packet delay, while adding only 6.2% additional area overhead. Compared to the conventional power gating scheme the power gated design in this paper achieves lower power consumption and delay, which is a significant advantage.
- Network-on-Chip /
- Power gating /
- Bypass /
- Static power

HTML全文

图 1 不同缓冲区深度对数据包延迟和饱和点的影响

下载: 全尺寸图片幻灯片

图 2 PBTI旁路设计

下载: 全尺寸图片幻灯片

图 3 PBTI数据包传输网络

下载: 全尺寸图片幻灯片

图 4 数据包传输3种情况

下载: 全尺寸图片幻灯片

图 5 旁路控制机制

下载: 全尺寸图片幻灯片

图 6 NI接口设计

下载: 全尺寸图片幻灯片

图 7 路由器功率门控硬件

下载: 全尺寸图片幻灯片

图 8 不同流量模式下的平均数据包延迟

下载: 全尺寸图片幻灯片

图 9 真实应用下的平均数据包延迟

下载: 全尺寸图片幻灯片

图 10 不同流量模式下的归一化静态功耗

下载: 全尺寸图片幻灯片

图 11 静态功耗和总功耗节省

下载: 全尺寸图片幻灯片

图 12 真实应用下的归一化静态功耗

下载: 全尺寸图片幻灯片

1 缓冲区平衡路由算法

输入: destination address of the packet D, buffer available 　signals from neighboring disconnected routers Available, 　address of the local router R
输出: the packet routing port Direction
Begin
1. IF((Available.E==0\|\|Available.W==0)&&(Available.N==1) 　&&(R.y<D.y)) THEN
2. 　//using YX routing algorithm
3. 　Direction=North;
4. ELSE
IF((Available.E==0\|\|Available.W==0)&&(Available.S==1) 　&&(R.y>D.y)) THEN
5. 　//using YX routing algorithm
6. 　Direction=South;
7. ELSE
8. //using XY routing algorithm
9. IF(R.x<D.x) THEN
10. Direction=East;
11. ELSE IF(R.x>D.x) THEN Direction=Wast;
12. ELSE IF(R.y<D.y) THEN Direction=North;
13. ELSE IF(R.y>D.y) THEN Direction=South;
14. ELSE Direction=Local;
15. END IF
16. END IF
17. END

下载: 导出CSV

表 1 实验基本参数设置表

参数	设置
网络拓扑	8×8 Mesh
缓冲区大小/端口	8 flits
虚通道数量/端口	2
数据包大小	2～6 flits
路由算法	XY,缓冲区平衡路由算法
传输链路宽度	32 bits
路由器频率	1 GHz
流量模式	均匀随机，转置，洗牌
路由器唤醒延迟	8 cycles
盈亏平衡时间	10 cycles
路由器断电等待时间	4 cycles

下载: 导出CSV

参考文献(15)

[1]	MONEMI A, PÉREZ I, LEYVA N, et al. PlugSMART: A pluggable open-source module to implement multihop bypass in networks-on-chip[C]. The 15th IEEE/ACM International Symposium on Networks-on-Chip, Madison, USA, 2021: 41–48.
[2]	SUN Chenglong, OUYANG Yiming, and LU Yingchun. DCBuf: A high-performance wireless network-on-chip architecture with distributed wireless interconnects and centralized buffer sharing[J]. Wireless Networks, 2022, 28(2): 505–520. doi: 10.1007/s11276-021-02882-x.
[3]	OUYANG Yiming, XU Dongyu, CHEN Zhimou, et al. REE: Reconfigurable and energy-efficient router architecture in wireless network-on-chip[J]. Microelectronics Journal, 2022, 129: 105600. doi: 10.1016/j.mejo.2022.105600.
[4]	CHEN Hui, CHEN Peng, ZHOU Jun, et al. ArSMART: An improved SMART NoC design supporting arbitrary-turn transmission[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(5): 1316–1329. doi: 10.1109/TCAD.2021.3091961.
[5]	SUN Chenglong, OUYANG Yiming, and LIANG Huaguo. Architecting a congestion pre-avoidance and load-balanced wireless network-on-chip[J]. Journal of Parallel and Distributed Computing, 2022, 161: 143–154. doi: 10.1016/j.jpdc.2021.12.003.
[6]	DAYA B K, CHEN C H O, SUBRAMANIAN S, et al. SCORPIO: A 36-core research chip demonstrating snoopy coherence on a scalable mesh NoC with in-network ordering[J]. ACM SIGARCH Computer Architecture News, 2014, 42(3): 25–36. doi: 10.1145/2678373.2665680.
[7]	KIM J S, TAYLOR M B, MILLER J, et al. Energy characterization of a tiled architecture processor with on-chip networks[C]. 2003 International Symposium on Low Power Electronics and Design, Seoul, Korea (South), 2003: 424–427. doi: 10.1109/LPE.2003.1231942.
[8]	WOO S C, OHARA M, TORRIE E, et al. The SPLASH-2 programs: Characterization and methodological considerations[J]. ACM SIGARCH Computer Architecture News, 1995, 23(2): 24–36. doi: 10.1145/225830.223990.
[9]	FARROKHBAKHT H, KAMALI H M, and HESSABI S. SMART: A scalable mapping and routing technique for power-gating in NoC routers[C]. 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip, Seoul, Korea (South), 2017: 1–8.
[10]	ZHOU Wu, OUYANG Yiming, LI Jianhua, et al. A transparent virtual channel power gating method for on-chip network routers[J]. Integration, 2023, 88: 286–297. doi: 10.1016/j.vlsi.2022.10.004.
[11]	SAMIH A, WANG Ren, KRISHNA A, et al. Energy-efficient interconnect via Router Parking[C]. 2013 IEEE 19th International Symposium on High Performance Computer Architecture, Shenzhen, China, 2013: 508–519. doi: 10.1109/HPCA.2013.6522345.
[12]	WANG Peng, NIKNAM S, WANG Zhiying, et al. A novel approach to reduce packet latency increase caused by power gating in network-on-chip[C]. 2017 Eleventh IEEE/ACM International Symposium on Networks-on-Chip, Seoul, Korea (South), 2017: 1–8.
[13]	XU Dongyu, OUYANG Yiming, ZHOU Wu, et al. Improving power and performance of on-chip network through virtual channel sharing and power gating[J]. Integration, 2023, 93: 102059. doi: 10.1016/j.vlsi.2023.102059.
[14]	CHEN Lizhong and PINKSTON T M. NoRD: Node-router decoupling for effective power-gating of on-chip routers[C]. 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, Vancouver, Canada, 2012: 270–281. doi: 10.1109/MICRO.2012.33.
[15]	FARROKHBAKHT H, TARAM M, KHALEGHI B, et al. TooT: An efficient and scalable power-gating method for NoC routers[C]. 2016 Tenth IEEE/ACM International Symposium on Networks-on-Chip, Nara, Japan, 2016: 1–8. doi: 10.1109/NOCS.2016.7579326.