



FACULTY OF ENGINEERING

# Integrating network-attached FPGAs into the cloud using partial reconfiguration

Burkhard Ringlein<sup>1,2</sup>, Francois Abel<sup>2</sup>, <u>Alexander Ditter<sup>1</sup></u>, Christoph Hagleitner<sup>2</sup>, Dietmar Fey<sup>1</sup>

#### Fourth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H<sup>2</sup>RC<sup>1</sup>8)

Sunday, November 11, 2018 Dallas, TX

Chair of Computer Architecture, Erlangen, Germany
 IBM Research Zurich, Rüschlikon, Switzerland





# Agenda

- 1. Background and Motivation
- 2. Cloud Deployment
- 3. Management Framework
- 4. Partial Reconfiguration via Network
- 5. Conclusion and Outlook



# cloudFPGA @ ZRL





### **Host- vs Network-Attached FPGAs**







# **Cloud Deployment**

#### Goals

- Easy-to-use cloud service
- Fast and scalable deployment
- Protection of user specific IP
- Enabling heterogeneous architectures

#### Constraints

- One communication interface only
- Protection of and separation from DC network
- Standalone FPGA configuration (without any CPU)
- Integration into cloud middleware (OpenStack)



# **Cloud Deployment Schematic**





# **Protection of Intellectual Property**

#### Transfer of user code onto FPGAs

Keeping algorithms and business secrets protected as users

- may not wish to expose algorithmic details of their codes,
- may not be allowed to upload source code due to corporate policies,
- might want to be able to deploy pre-compiled bitstreams.
- The "off-site compilation" of an FPGA design for deployment would take hours every time!
- → This would *neither* be *fast nor scalable*

➔ Architecture must be able to deploy and run applications based on their bitstreams

➔ Applications must be independent of the final FPGA / know the target at compile time



# Privilege Separation within the FPGA

# **Q:** How to deploy user synthesized designs?

# A: Partial Reconfiguration (PR)

- allows a design to keep its internal states, while parts are updated
- makes sure that only the user logic can be modified
- creates a clear separation between mgnt. / app. logic and network
- enables the potential for multitenancy and "virtualization"
- leads to overall faster reconfiguration





# Management of Disaggregated FPGAs

#### Tasks for standalone network-attached FPGAs

- Acquisition of the necessary number of FPGAs (of requested type)
- Powering FPGAs on/off (before and after usage)
- Deployment of application binary on all devices
- Set-up network for user specific application

#### **Consider different levels for above tasks**

DC levelacquire (& release) FPGAs and IP-addressesstore bitstreams

*Rack level* power FPGAs on/off

program IP-addresses

**FPGA level** configure application bitstream

set up network routing and provide node id



### **Management Framework**





#### cloudFPGA Resource Manager API

| Clusters                       | Show/Hide   List Operations   Expand Operations |
|--------------------------------|-------------------------------------------------|
| Images                         | Show/Hide List Operations Expand Operations     |
| GET /images                    | Get all user images                             |
| POST /images                   | Upload an image                                 |
| DELETE /images/{image_id}      | Delete an image                                 |
| GET /images/{image_id}         | Get an image                                    |
| Instances                      | Show/Hide   List Operations   Expand Operations |
| Resources                      | Show/Hide List Operations Expand Operations     |
| GET /resources                 | Get all cloudFPGA resources                     |
| POST /resources                | Create a cloudFPGA resource                     |
| GET /resources/status/{status} | Get all cloudFPGA resources in state {status}   |
| DELETE /resource_id}           | Remove a resource                               |
| GET /resource_id}              | Get details of one resource                     |
| PUT /resource_id}              | Update a resource                               |
| GET /resource_id}/status/      | Get status of one resource                      |
| PUT /resource_id}/status/      | Update status of a resource                     |



| Clusters                      | Show/Hide List Operations Expand Operations |
|-------------------------------|---------------------------------------------|
| GET /clusters                 | Get all user clusters                       |
| POST /clusters                | Request a cluster                           |
| DELETE /clusters/{cluster_id} | Delete a cluster                            |
| GET /clusters/{cluster_id}    | Get a cluster                               |
|                               |                                             |
| Instances                     | Show/Hide List Operations Expand Operations |
| GET /instances                | Get all instances                           |
| POST /instances               | Create an instance                          |

| POST   |                          | Greate an instance    |
|--------|--------------------------|-----------------------|
| DELETE | /instances/{instance_id} | Remove an instance    |
| GET    | /instances/{instance_id} | Get a single instance |
|        |                          |                       |

[ BASE URL: / , API VERSION: 0.2 ]



| FPGA call  | S                       | Show/Hide List Operations Expand Operations       |  |
|------------|-------------------------|---------------------------------------------------|--|
| POST /def  | fault_gateway/{slot_id} | Program the IP address of the default gateway     |  |
| POST /ip_  | address/{slot_id}       | Program the ip_address of the FPGA                |  |
| PATCH /res | et/{slot_id}/soft       | Triggers the FPGA application reset               |  |
| POST /SU   | onet_mask/{slot_id}     | Program the subnetwork mask                       |  |
| PSoC calls | S                       | Show/Hide List Operations Expand Operations       |  |
| POST /flas | sh/{slot_id}            | Program the flash of a Slot                       |  |
| GET /роу   | wer                     | Get the system status of all FPGAs                |  |
| get /pov   | wer/{slot_id}           | Get the system status of the FPGA                 |  |
| рит /ром   | wer/{slot_id}           | Powers the FPGA on or off                         |  |
| post /pro  | gram/{slot_id}          | Program the FPGA                                  |  |
| PATCH /res | et/{slot_id}            | Triggers the FPGA reset                           |  |
| PATCH /res | tart_app/{slot_id}/     | restart the current application state in the FPGA |  |
| GET /ten   | nperature/{slot_id}     | Get the Die temperature of the FPGA               |  |



### **Partial Reconfiguration via Network**



#### → RESTful management core inside FPGA for distributed management



# FPGA Management Core API (1/2)

| anagement                               |                                                                                              |                                     | Show/Hide Lis  | st Operations Expand Operations      |
|-----------------------------------------|----------------------------------------------------------------------------------------------|-------------------------------------|----------------|--------------------------------------|
| POST /configure                         | 3                                                                                            |                                     |                | Uploads partial bitfile to configure |
| Parameters                              |                                                                                              |                                     |                |                                      |
| Parameter                               | Value                                                                                        | Description                         | Parameter Type | Data Type                            |
| authentication                          | (required)                                                                                   | Authentication                      | formData       | string                               |
| bit_file                                | Choose File No file chosen                                                                   | partial bitfile to be<br>programmed | formData       | file                                 |
| 200                                     | OK, payload configured                                                                       | Response Model                      |                | Headers                              |
| 200<br>400<br>403<br>500<br>Try it out! | OK, payload configured<br>Bad request<br>Unauthorized<br>Internal error during configuration |                                     |                | Headers                              |
| 400<br>403<br>500                       | Bad request Unauthorized                                                                     |                                     |                | Get the current status               |



# FPGA Management Core API (2/2)

#### **FPGA Management Core API**

| Management                               |                                            | Show/Hide   List Operations   Expand Operations      |
|------------------------------------------|--------------------------------------------|------------------------------------------------------|
| Runtime                                  |                                            | Show/Hide   List Operations   Expand Operations      |
| PUT /rank/{id}                           |                                            | Set the rank (node ID) of the FPGA                   |
| POST /routing                            |                                            | Uploads the routing table for messages between nodes |
| Parameters                               |                                            |                                                      |
| Parameter Value                          | Description                                | Parameter Type Data Type                             |
| routing_table Choose File No file chosen | Routing table in ASCII, max<br>128 entries | formData file                                        |
| Response Messages                        | Response Model                             | Headers                                              |
| 200 OK                                   |                                            | netters                                              |
| 400 Bad request                          |                                            |                                                      |
| 422 To many entries Try it out!          |                                            |                                                      |
| PUT /size/{size}                         |                                            | Set the size of the FPGA cluster                     |
| [ BASE URL: / , API VERSION: 0.1 ]       |                                            |                                                      |



# **Proof of Concept – Prototype in the Lab**





# **Proof of Concept and Implementation**



SMC, HWICAP and Decoupler MPE UDP, TCP, IP and ICMP DDR4 Memory controler (both) Role



# **Conclusion and outlook**

#### Proposition of an architecture to

- acquire network-attached FPGAs,
- execute distributed applications,
- protect user specific IP and
- support large scale deployment in DCs.

#### **Future work**

- Fine tune and extend management service
- Extend MPI implementation for FPGAs
- Evaluate FPGAs for "Function-as-a-Service" computation





#### **Contact:** Alexander Ditter alexander.ditter@fau.de

Burkhard Ringlein NGL@zurich.ibm.com