The search functionality is under construction.
The search functionality is under construction.

Layerweaver+: A QoS-Aware Layer-Wise DNN Scheduler for Multi-Tenant Neural Processing Units

Young H. OH, Yunho JIN, Tae Jun HAM, Jae W. LEE

  • Full Text Views

    0

  • Cite this

Summary :

Many cloud service providers employ specialized hardware accelerators, called neural processing units (NPUs), to accelerate deep neural networks (DNNs). An NPU scheduler is responsible for scheduling incoming user requests and required to satisfy the two, often conflicting, optimization goals: maximizing system throughput and satisfying quality-of-service (QoS) constraints (e.g., deadlines) of individual requests. We propose Layerweaver+, a low-cost layer-wise DNN scheduler for NPUs, which provides both high system throughput and minimal QoS violations. For a serving scenario based on the industry-standard MLPerf inference benchmark, Layerweaver+ significantly improves the system throughput by up to 266.7% over the baseline scheduler serving one DNN at a time.

Publication
IEICE TRANSACTIONS on Information Vol.E105-D No.2 pp.427-431
Publication Date
2022/02/01
Publicized
2021/11/11
Online ISSN
1745-1361
DOI
10.1587/transinf.2021EDL8084
Type of Manuscript
LETTER
Category
Fundamentals of Information Systems

Authors

Young H. OH
  Sungkyunkwan University
Yunho JIN
  Seoul National University
Tae Jun HAM
  Seoul National University
Jae W. LEE
  Seoul National University

Keyword