CV | Zhenyu Bai

Contact Information

Name	Zhenyu Bai
Professional Title	ARTIC Fellow
Email	zhenyu.bai@nus.edu.sg

Experience

2023 -

Singapore
Research Fellow (ARTIC Fellow from 2026)

National University of Singapore, School of Computing

PI: Tulika Mitra
- Reconfigurable spatial-dataflow architecture and compiler design (collaborations with Tenstorrent & IBM).
- Hardware accelerators for sparse and quantized AI workloads.
- Compilers for Coarse Grained Reconfigurable Array (CGRA).
- Dataflow architecture and software co-design for Spiking Neural Networks.
- Heterogeneous FPGA-GPU system for AI workloads (collaborations with AMD).
2019 - 2023

Toulouse, France
PhD student

IRIT, University of Toulouse
- CPU micro-architecture modeling and program performance analysis for real-time systems.
2019 - 2019

Grenoble, France
Research Intern

Verimag, Grenoble Alpes University
- CPU cache analysis and program analysis for real-time systems.
-

Toulouse, France
Teaching (Computer Architecture & Compilation)

University of Toulouse
- Computer Architecture and VHDL (≈80h)
- Computer Architecture and ARM assembly (≈50h)
- Compilation Theory (≈60h)
- Advanced Compilation (≈10h)
- Master student project supervisor (3 months/year)

Education

2019 - 2023

Toulouse, France
PhD

IRIT lab, University of Toulouse

Computer Science
- Scholarship funded by the French Minister for Higher Education and Research (top Master students).
2017 - 2019

Toulouse, France

Master

University of Toulouse

Embedded Computing Systems
2014 - 2017

Toulouse, France

BS

University of Toulouse

Computer Science

Publications

2026

A Data-Driven Dynamic Execution Orchestration Architecture

31th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
2024

SWAT: Scalable and efficient window attention-based transformers acceleration on FPGAs

Proceedings of the 61st ACM/IEEE Design Automation Conference
2024

Zed: A generalized accelerator for variably sparse matrix computations in ml

Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques
2025

TerEffic: Highly Efficient Ternary LLM Inference on FPGA

arXiv preprint arXiv:2502.16473
2025

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1
2024

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

arXiv preprint arXiv:2406.06543
2024

Reconsidering the energy efficiency of spiking neural networks

arXiv preprint arXiv:2409.08290
2025

Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems

arXiv preprint arXiv:2502.06304
2020

Improving the Performance of WCET Analysis in the Presence of Variable Latencies

The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)
2022

A Framework for Calculating WCET Based on Execution Decision Diagrams

ACM Transactions on Embedded Computing Systems
2023

Computing Execution Times With Execution Decision Diagrams in the Presence of Out-of-Order Resources

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
2019

PLRU cache analysis

Proceedings of the 13th Junior Researcher Workshop on Real-Time Computing (JRWRTC 2019)
2025

TL: Automatic End-to-End Compiler of Tile-Based Languages for Spatial Dataflow Architectures

arXiv preprint arXiv:2512.22168
2024

ASADI: Accelerating sparse attention using diagonal-based in-situ computing

2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)
2023

Modélisation du comportement temporel du pipeline pour le calcul de WCET

PhD thesis, Université Paul Sabatier-Toulouse III
2021

Déterminer le WCET d’applications temps-réel en présence de latences d’exécution variables

Conférence francophone d’informatique en Parallélisme, Architecture et Système (COMPAS 2021)

Languages

Chinese : Native

French : Almost native

English : Fluent

Contact Information

Experience

Research Fellow (ARTIC Fellow from 2026)

National University of Singapore, School of Computing

PI: Tulika Mitra

PhD student

IRIT, University of Toulouse

Research Intern

Verimag, Grenoble Alpes University

Teaching (Computer Architecture & Compilation)

University of Toulouse

Education

PhD

IRIT lab, University of Toulouse

Computer Science

Master

University of Toulouse

Embedded Computing Systems

BS

University of Toulouse

Computer Science

Publications

A Data-Driven Dynamic Execution Orchestration Architecture

31th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

SWAT: Scalable and efficient window attention-based transformers acceleration on FPGAs

Proceedings of the 61st ACM/IEEE Design Automation Conference

Zed: A generalized accelerator for variably sparse matrix computations in ml

Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques

TerEffic: Highly Efficient Ternary LLM Inference on FPGA

arXiv preprint arXiv:2502.16473

Enhancing CGRA Efficiency Through Aligned Compute and Communication Provisioning

Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1

SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

arXiv preprint arXiv:2406.06543

Reconsidering the energy efficiency of spiking neural networks

arXiv preprint arXiv:2409.08290

Data-aware Dynamic Execution of Irregular Workloads on Heterogeneous Systems

arXiv preprint arXiv:2502.06304

Improving the Performance of WCET Analysis in the Presence of Variable Latencies

The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES)

A Framework for Calculating WCET Based on Execution Decision Diagrams

ACM Transactions on Embedded Computing Systems

Computing Execution Times With Execution Decision Diagrams in the Presence of Out-of-Order Resources

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

PLRU cache analysis

Proceedings of the 13th Junior Researcher Workshop on Real-Time Computing (JRWRTC 2019)

TL: Automatic End-to-End Compiler of Tile-Based Languages for Spatial Dataflow Architectures

arXiv preprint arXiv:2512.22168

ASADI: Accelerating sparse attention using diagonal-based in-situ computing

2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Modélisation du comportement temporel du pipeline pour le calcul de WCET

PhD thesis, Université Paul Sabatier-Toulouse III

Déterminer le WCET d’applications temps-réel en présence de latences d’exécution variables

Conférence francophone d’informatique en Parallélisme, Architecture et Système (COMPAS 2021)

Languages