Zhenyu Bai

ARTIC Fellow @ School of Computing, National University of Singapore

photo.jpg

Moore’s Law is over; memory is the enemy.
Don’t repeat the VLIW mistake: compiler–hardware co-design is the key to making dataflow architectures the winner.

My current research focuses on dataflow architecture and compilers. We develop CGRA-style dataflow architectures and polyhedral-based compilation for systems with explicit data movement, distributed memories, and parallel compute units.

On the software side, we build compiler support for tile-based DSLs (e.g., Triton and Helion) targeting commercial dataflow platforms, including Tenstorrent, IBM AIU, AMD NPU/AIE and classical NPU/TPU-like architectures. Our prototype end-to-end flow (Helion/Triton → MLIR → TT-Metal) on Tenstorrent Wormhole achieves performance comparable to vendor libraries on tensor kernels and fused AI operators.

selected publications

  1. ASPLOS26
    A data-driven dynamic execution orchestration architecture
    Zhenyu Bai, Pranav Dangi, Rohan Juneja, and 4 more authors
    In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2026
  2. arXiv25
    TL: Automatic End-to-End Compiler of Tile-Based Languages for Spatial Dataflow Architectures
    Wei Li, Zhenyu Bai, Heru Wang, and 6 more authors
    arXiv preprint arXiv:2512.22168, 2025
  3. DAC24
    Swat: Scalable and efficient window attention-based transformers acceleration on fpgas
    Zhenyu Bai, Pranav Dangi, Huize Li, and 1 more author
    In Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
  4. PACT24
    Zed: A generalized accelerator for variably sparse matrix computations in ml
    Pranav Dangi, Zhenyu Bai, Rohan Juneja, and 2 more authors
    In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024
  5. arXiv24
    Reconsidering the energy efficiency of spiking neural networks
    Zhanglu Yan, Zhenyu Bai, and Weng-Fai Wong
    arXiv preprint arXiv:2409.08290, 2024