Vortex Workshop and Tutorial at MICRO 2024

Date: Sunday, November 3rd, 2024

Time: 8:00 CST- 12:00 CST

Room: 101

Organizers:

Hyesoon Kim (Georgia Institute of Technology)

Blaise Tine (University of California, Los Angeles)

Jeff Young (Georgia Institute of Technology)

Jaewon Lee (Georgia Institute of Technology)

Seonjin Na (Georgia Institute of Technology)

Liam Cooper (Georgia Institute of Technology)

Chihyo (Mark) Ahn (Georgia Institute of Technology)

Schedule:

08:00-08:20 | Introduction and GPU Background

08:20-09:20 | Vortex Microarchitecture and Software Stack

09:20-09:40 | CuPBoP: Running OpenCL/CUDA on Vortex

09:40-10:00 | Q&A Session

10:00-10:20 | Coffee Break

10:20-11:00 | Tutorial Assignments / Hands-on Exercises

11:00-12:00 | Vortex Workshop Presentations (3 Speakers)

More details for Tuotrial will be updated at the tutorial repository: Vortex Tutorial Github


Workshop Presentation Schedule

Each presentation will consist of a 15-minutes talk followed by a 5-minutes Q&A session.

TimeSpeakerInstitutionTitle
11:00~11:20Davy MillionUniversité Grenoble AlpesPreliminary Integration of Vortex within the OpenPiton Platform
11:20~11:40Matt SinclairUniversity of Wisconsin-madisonPrototyping Mechanisms to Co-design Power Management and Performance in Accelerator-Rich Systems
11:40~12:00Martin TroiberTechnical University of MunichAnalysis of the RISC-V Vector Extension for Vulkan Graphics Kernels on the Vortex GPU

1. Preliminary Integration of Vortex within the OpenPiton Platform

Authors: *Davy Million, *César Fuguet, *Adrian Evans, ‡Jonathan Balkind, ‡Frédéric Petrot

Institution: *Univ. Grenoble Alpes, CEA, List, †University of California, Santa Barbara , ‡Univ. Grenoble Alpes, TIMA

Abstract

This paper details the initial integration of the Vortex GPU core into the OpenPiton many-core research platform. Vortex, an open-source RISC-V-based GPU, was connected to OpenPiton using an AXI-4 to P-Mesh bridge, enabling shared memory access between Vortex and general-purpose processors without data copying. The integration allows for heterogeneous computing but faces performance bottlenecks due to the non-pipelined bridge, particularly for workloads exceeding the L1 cache size. Initial benchmarks using a vector addition kernel showed that while Vortex can handle memory latency in standalone mode, performance drops when integrated with OpenPiton due to cache miss serialization. Future improvements include pipelining the bridge and enhancing memory coherence mechanisms to optimize performance in this open-source heterogeneous CPU/GPU system. </details>

2. Prototyping Mechanisms to Co-design Power Management and Performance in Accelerator-Rich Systems

Authors: Matt Sinclair, Shivaram Venkataraman

Institution: University of Wisconsin–Madison

Abstract

In recent years, to reach performance goals modern computing systems are increasingly turning to using large numbers of compute accelerators, which offer greater power efficiency and thus enable higher performance within a constrained power budget. However, using accelerators increases heterogeneity at multiple levels, including the architecture, resource allocation, competing user needs, and manufacturing variability. Accordingly, current and future systems need to efficiently handle many simultaneous jobs while balancing PM and multiple levels of heterogeneity. In recent work, we have demonstrated the extent of this variability in modern accelerator-rich systems (SC'22) and shown how to embrace variability in cluster-level job schedulers (SC'24). This work significantly improves the efficiency of modern systems for a range of ML workloads. However, scheduling jobs more efficiently at the software and runtime layers is limited in its ability to quickly, dynamically change policies as cluster conditions evolve. A major limiter to further improving efficiency is the lack of standards for exposing power information in modern accelerators. Thus, for future systems we propose to build on the insights generated by our optimizations for current systems, and apply co-design that makes the hardware, software, and runtime layers aware of the variance in the systems. To do this, we will design a standard for accelerators to expose PM information from the hardware to the software and runtime. Using this information, instead of performing PM locally, we plan to develop a global power management scheme to enable optimal PM decisions across accelerators and further reduce performance variability. In this talk, I will discuss our ongoing efforts at harnessing variability in accelerator-rich systems and how we hope to leverage the Vortex open-source GPU hardware to prototype our proposed ideas. </details>

3. Analysis of the RISC-V Vector Extension for Vulkan Graphics Kernels on the Vortex GPU

Authors: *Martin Troiber, †Hyesoon Kim, ‡Blaise Tine, *Martin Schulz

Institution: *Technical University of Munich, †Georgia Institute of Technology , ‡University of California, Los Angeles

Abstract

In this work we analyze the benefits of the RISC-V vector extension for accelerating 3D graphics kernels. Our work uses open-source projects for the graphics driver (SwiftShader), GPU (Vortex) and instruction set architecture (RISC-V). For our purpose we modified the graphics driver to generate kernels with vector instructions. We then extended the GPU simulator to execute the vectorized kernels. To augment our simulated performance measurements we used a CPU (Kendryte K230) with the same vector instructions. On both platforms we analyzed the popular gears example from the Vulkan demo scenes. In GPU simulation, we could reduce the cycle count by up to 54% compared to scalar kernels. On the CPU we increased the frame rate of the gears demo scene by over 85% compared to scalar execution. The measured 13fps@1080p is the highest frame rate achieved using SwiftShader on any RISC-V single board computer.</details>


Call for Papers for Vortex Workshop

The Vortex Workshop aims to bring together Vortex developers, contributors, and users from academia and industry to discuss Vortex-related research. The Vortex Tutorial will be held in conjunction with the workshop, providing an introduction to Vortex and its use in research and teaching. The workshop will offer an opportunity to share ongoing efforts in Vortex development or research using Vortex, thereby fostering the Vortex or open-source GPGPU community.

Topics of interest for the Vortex Workshop include, but are not limited to:

  • Design and implementation of Vortex GPGPU
  • OpenCL/CUDA running on Vortex
  • Compiler optimizations for Vortex
  • Memory hierarchy and management in Vortex GPGPU
  • Benchmarking and performance evaluation
  • Applications and case studies using Vortex
  • Security and reliability in GPGPU architecture
  • Comparison of Vortex with other GPGPU architectures
  • FPGA/ASIC implementation of Vortex

Submission Guidelines:

Authors are invited to submit 2-page papers, which must be formatted in accordance with the ACM two-column style. ACM Word or LaTeX style templates are available here.

Note: Workshop publications do not preclude publishing at future conference venues.

Important Dates:

  • Paper submission deadline: September 5th, 2024, 23:59 EDT
  • Author notification: September 17th, 2024
  • Workshop date: November 3rd, 2024, 8:00 AM CST – 12:00 PM CST

Submission Link:

Organizing Committee:

  • General Chair: Hyesoon Kim (Georgia Tech), Blaise Tine (UCLA)
  • Session Chair: Jaewon Lee (Georgia Tech)
  • Web Chair: Seonjin Na (Georgia Tech)

Contact Information:

For more information about the submission on Vortex Workshop, please send email to vortex_submission@groups.gatech.edu