Zhenyu Bai

Moore’s Law is over; memory is the enemy.
Don’t repeat the VLIW mistake: compiler–hardware co-design is the key to making dataflow architectures the winner.

My current research focuses on dataflow architecture and compilers. We develop CGRA-style dataflow architectures and polyhedral-based compilation for systems with explicit data movement, distributed memories, and parallel compute units.

On the software side, we build compiler support for tile-based DSLs (e.g., Triton and Helion) targeting commercial dataflow platforms, including Tenstorrent, IBM AIU, AMD NPU/AIE and classical NPU/TPU-like architectures. Our prototype end-to-end flow (Helion/Triton → MLIR → TT-Metal) on Tenstorrent Wormhole achieves performance comparable to vendor libraries on tensor kernels and fused AI operators.

selected publications

ASPLOS26

A data-driven dynamic execution orchestration architecture

Zhenyu Bai, Pranav Dangi, Rohan Juneja, and 4 more authors

In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2026

Bib

@inproceedings{bai2026data,
  title = {A data-driven dynamic execution orchestration architecture},
  author = {Bai, Zhenyu and Dangi, Pranav and Juneja, Rohan and Li, Zhaoying and Yan, Zhanglu and Lan, Huiying and Mitra, Tulika},
  booktitle = {Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1},
  pages = {1--19},
  year = {2026},
}

arXiv25

TL: Automatic End-to-End Compiler of Tile-Based Languages for Spatial Dataflow Architectures

Wei Li, Zhenyu Bai, Heru Wang, and 6 more authors

arXiv preprint arXiv:2512.22168, 2025

Bib

@article{li2025tl,
  title = {TL: Automatic End-to-End Compiler of Tile-Based Languages for Spatial Dataflow Architectures},
  author = {Li, Wei and Bai, Zhenyu and Wang, Heru and Dangi, Pranav and Zhang, Zhiqiang and Tan, Cheng and Lan, Huiying and Wong, Weng-Fai and Mitra, Tulika},
  journal = {arXiv preprint arXiv:2512.22168},
  year = {2025},
}

DAC24

Swat: Scalable and efficient window attention-based transformers acceleration on fpgas

Zhenyu Bai, Pranav Dangi, Huize Li, and 1 more author

In Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Bib

@inproceedings{bai2024swat,
  title = {Swat: Scalable and efficient window attention-based transformers acceleration on fpgas},
  author = {Bai, Zhenyu and Dangi, Pranav and Li, Huize and Mitra, Tulika},
  booktitle = {Proceedings of the 61st ACM/IEEE Design Automation Conference},
  pages = {1--6},
  year = {2024},
}

PACT24

Zed: A generalized accelerator for variably sparse matrix computations in ml

Pranav Dangi, Zhenyu Bai, Rohan Juneja, and 2 more authors

In Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques, 2024

Bib

@inproceedings{dangi2024zed,
  title = {Zed: A generalized accelerator for variably sparse matrix computations in ml},
  author = {Dangi, Pranav and Bai, Zhenyu and Juneja, Rohan and Wijerathne, Dhananjaya and Mitra, Tulika},
  booktitle = {Proceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques},
  pages = {246--257},
  year = {2024},
}

arXiv24

Reconsidering the energy efficiency of spiking neural networks

Zhanglu Yan, Zhenyu Bai, and Weng-Fai Wong

arXiv preprint arXiv:2409.08290, 2024

Bib

@article{yan2024reconsidering,
  title = {Reconsidering the energy efficiency of spiking neural networks},
  author = {Yan, Zhanglu and Bai, Zhenyu and Wong, Weng-Fai},
  journal = {arXiv preprint arXiv:2409.08290},
  year = {2024},
}