Agenda
8:30 am
Opening Remarks
8:35 am
HPC on a Reconfigurable Substrate with Machine Learning Support
[slides]
Lizy Kurian John, UT Austin
Abstract:
Static RAM FPGAs with their reconfigurability yields options to accomplish instruction set metamorphism or dynamic creation of accelerators/coprocessors as needed. In addition, the abundance of matrix multiplications in many HPC problems also gives the possibility to utilize Machine Learning (ML) support on FPGAs to achieve customized dynamic reconfiguration. Many HPC problems can be solved by Processing-in-Memory, and hence BlockRAMs enhanced with computing can be utilized to accelerate HPC applications. In this talk, I will describe some emerging avenues for reconfigurable HPC considering ML Support in FPGAs.
Bio:
Lizy Kurian John is Truchard Foundation Chair in Engineering at the University of Texas at Austin. She received her Ph. D in Computer Engineering from the Pennsylvania State University. Her research interests include workload characterization, performance evaluation, memory systems, reconfigurable architectures, and high performance architectures for emerging workloads. She is recipient of many awards including Joe J. King Professional Engineering Achievement Award (2023), The Pennsylvania State University Outstanding Engineering Alumnus Award (2011), the NSF CAREER award, UT Austin Engineering Foundation Faculty Award, Halliburton, Brown and Root Engineering Foundation Young Faculty Award, University of Texas Alumni Association (Texas Exes) Teaching Award, etc. She has coauthored books on Digital Systems Design using VHDL (Cengage Publishers, 2007, 2017), a book on Digital Systems Design using Verilog (Cengage Publishers, 2014) and has edited 4 books including a book on Computer Performance Evaluation and Benchmarking. She holds 18 US patents and is an IEEE Fellow (Class of 2009), ACM Fellow, AAAS Fellow and Fellow of the National Academy of Inventors (NAI).
9:20 am
Lowering the Barriers to Programming FPGAs and AIEs for HPC
[slides]
Nick Brown, University of Edinburgh
Abstract:
We have seen great advances in the hardware and software ecosystem for FPGAs in recent years, with the release of ever more powerful FPGAs with specialised hardened components such as AI Engines for accelerating compute, and large investment put into tooling, high level synthesis and libraries. However there is still a disconnect between HPC developers, many of whom still write their codes in Fortran, and effectively run codes on these architectures, currently requiring the redevelopment of codes that involves significant time and expertise. In this talk I will describe our work leveraging MLIR to identify and seamlessly offload key computational components of programmer's a code to FPGAs and AIEs based upon the underlying algorithmic pattern. With the objective of requiring no code-level modifications to be made by the programmer, our approach connects frontends such as Flang to AMD's MLIR-AIE dialects and HLS LLVM backend to deliver optimised execution on FPGAs and AIEs.
Bio:
Dr Nick Brown is a Senior Research Fellow at EPCC, the University of Edinburgh. His main interest is in the role that novel hardware can play in future supercomputers, and is specifically motivated by the grand-challenge of how we can ensure scientific programmers are able to effectively exploit such technologies without extensive hardware/architecture expertise. Combining novel algorithmic techniques for new hardware, programming language & library design, and compilers, he has over 80 peer reviewed publications and has worked on a number of large scale parallel codes. He is chair of the RISC-V HPC SIG, leads EPCC's RISC-V testbed, and has organised and lead the RISC-V and UrgentHPC series of workshops at SC and ISC conferences. He also coordinates knowledge exchange for the UK's ExCALIBUR exascale programme. He has moderated numerous panels previously, for instance organising and chairing an invited panel at ISC19.
9:40 am
What Should be Used for Reconfigurable HPC, FPGA or Coarser-Grain Reconfigurable Architecture?
[slides]
Kentaro Sano, RIKEN
Abstract:
At Processor research team in RIKEN Center for Computational Science (R-CCS), we have been researching reconfigurable HPC with FPGAs and custom reconfigurable architectures such as coarse-grained reconfigurable array (CGRA) for general-purpose HPC and/or domain-specific computing. Although FPGAs allow us to have localized data movement and lower power consumption by dataflow computing, they also have overhead in area and frequency as well as long compilation time for place-and-route. In this talk, we introduce our previous research on FPGA-based reconfigurable HPC system and share lessons learned from it, and then show RIKEN CGRA project for HPC and AI with architectural exploration for more efficient reconfigurable computing.
Bio:
Kentaro Sano is the leader of the processor research team and the advanced AI device development unit at RIKEN Center for Computational Science (R-CCS) since 2017, responsible for research and development of future processors and systems for HPC and AI. He is also a visiting professor with an advanced computing system laboratory at Tohoku University. He received his Ph.D. from the graduate school of information sciences, Tohoku University, in 2000. From 2000 until 2018, he was a Research Associate and an Associate Professor at Tohoku University. He was a visiting researcher at the Department of Computing, Imperial College, London, and Maxeler Technology corporation in 2006 and 2007. Nowadays he leads the architecture research group in the feasibility study project for the next-generation supercomputer development in Japan. His research interests include data-driven and spatial-parallel processor architectures such as a coarse-grain reconfigurable array (CGRA), FPGA-based high-performance reconfigurable computing, high-level synthesis compilers and tools for reconfigurable custom computing machines, and system architectures for next-generation supercomputing based on the data-flow computing model.
10:00 am
Break
10:30 am
"ProTEA: Programmable Transformer Encoder Acceleration on FPGA"
[slides]
[paper]
Ehsan Kabir, Jason D. Bakos, David Andrews, and Miaoqing Huang
11:00 am
"DeLiBA-K: Speeding-up Hardware-Accelerated Distributed Storage Access by Tighter Linux Kernel Integration and Use of Modern API"
[slides]
[paper]
Babar Khan
Abstract:
This talk presents an open-source Linux block I/O storage framework called "DeLiBA-K" that leverages FPGA to accelerate block storage operations in data centers. Notably, DeLiBA-K is the first storage research framework to integrate Linux’s new asynchronous I/O interface, "io_uring", into an FPGA-based framework. Since DeLiBA-K uses an open-source distributed storage called Ceph as a use case, it delivers better speed-ups compared to contemporary Ceph I/O hardware accelerators. These speed-ups have been rigorously tested and validated in an industrial environment with real-world workloads.
Bio:
Babar Khan is a PhD candidate in Embedded Systems and Applications Group at Technical University of Darmstadt, Germany. The group is headed by Prof. Andreas Koch. Babar's research is focused on accelerating distributed storage with FPGAs. Currently, he is also working in a stealth mode startup, with a focus on leveraging AI chips for large language models (LLMs).
11:30 am
"Developing a BLAS library for the AMD AI Engine"
[slides]
[paper]
Tristan Laan and Tiziano De Matteis
Abstract:
Spatial (dataflow) computer architectures can mitigate the control and performance overhead of classical von Neumann architectures such as traditional CPUs. Driven by the popularity of Machine Learning (ML) workloads, spatial devices are being marketed as ML inference accelerators. Despite providing a rich software ecosystem for ML practitioners, their adoption in other scientific domains is hindered by the steep learning curve and lack of reusable software, which makes them inaccessible to non-experts. We present our ongoing project AIEBLAS, an open-source, expandable implementation of Basic Linear Algebra Routines (BLAS) for the AMD AI Engine. Numerical routines are designed to be easily reusable, customized, and composed in dataflow programs, leveraging the characteristics of the targeted device without requiring the user to deeply understand the underlying hardware and programming model.
Bio:
Tiziano De Matteis, received his Ph.D. from the University of Pisa. Currently, he is an Assistant Professor on a Tenure Track at the Computer Science Department of Vrije Universiteit Amsterdam. His principal research interests are related to Parallel and Distributed Computing with a particular focus on systems and programming models for spatial and reconfigurable computing for HPC, post-Moore architectures, and sustainability in modern computing systems.
11:45 am
AMD University Program Overview
[slides]
Andrew Schmidt, AMD University Program
12:00 pm
Closing Remarks