Co-Located Workshop · REMCI · 2024

Workshop on Reliability Engineering for Multi-Tenant Cloud Inference

October 21, 2024 · Marina Bay Sands Convention Centre · Singapore

Acronym: REMCI
Submissions: 64
Accepted: 18
Co-Located With: CIOTP 2024

About the Workshop

REMCI was a full-day workshop on operating shared LLM and recommendation inference platforms under hard latency, cost, and trust-and-safety constraints. The workshop combined invited industry talks, peer-reviewed experience reports, and a closing roadmap session that fed into the CIOTP 2025 main-track CFP.

Call for Papers — Topics of Interest

· KV-cache management and prefix sharing
· Multi-tenant isolation and noisy-neighbor mitigation
· Token-level SLOs and admission control
· Evaluation harnesses and drift detection in production
· Cost attribution across tenants and models

Organizers

Prof. Dawn Song

Workshop Chair

UC Berkeley

Dr. Mei Hwang

Workshop Co-Chair

National University of Singapore

Ravi Menon

Industry Co-Chair

NVIDIA

Invited Keynote

Dr. Jeff Dean

Chief Scientist, Google DeepMind & Google Research

"Serving Foundation Models at Internet Scale: Lessons from a Decade"

Important Dates

Paper submission deadline: July 1, 2024
Author notification: August 19, 2024
Camera-ready due: September 9, 2024
Workshop date: October 20, 2024

Workshop Programme

09:00 - 09:10Opening remarks (Song, Hwang, Menon)
09:10 - 10:10Keynote: Dr. Jeff Dean (Google DeepMind)
10:10 - 10:40Coffee break
10:40 - 12:10Session 1: Inference Scheduling and Batching (4 papers)
12:10 - 13:30Lunch
13:30 - 15:00Session 2: Multi-Tenant Isolation and Cost Attribution (5 papers)
15:00 - 15:30Coffee break
15:30 - 16:30Session 3: Reliability and Drift Detection (4 papers)
16:30 - 17:30Roadmap discussion: Inputs to the CIOTP 2025 Main-Track CFP
17:30 - 17:45Closing and best-paper announcement

Accepted Papers

01
Continuous Batching with Bounded Tail Latency for Mixed-Model Workloads
L. Zhao, R. Menon
OpenAI · NVIDIA
02
Noisy-Neighbor Mitigation in Shared GPU Inference Clusters
H. Brooks, M. Patel
Anthropic · Amazon Web Services
03
Cost Attribution Across Tenants in a Multi-Model Serving Platform
R. Menon, M. Hwang
NVIDIA · National University of Singapore
04
Drift Detection for Production Recommendation Models
A. Krishnan, F. Okafor
Netflix · Airbnb
05
SLO-Aware Admission Control for Token-Level Streaming
O. Reid, M. Ribeiro
Cloudflare · Fastly
06
Reliability Patterns for Multi-Region LLM Inference
K. Iyer, D. Park
Google Cloud · Cloudflare

Workshop Programme Committee

Prof. Andre Dupont
EPFL
Dr. Maya Patel
Amazon Web Services
Lin Zhao
OpenAI
Hannah Brooks
Anthropic
Ravi Menon
NVIDIA
Aditi Krishnan
Netflix
Olivia Reid
Cloudflare
Dr. Rohan Mehta
IIT Bombay
Prof. Linnea Bergstrom
Chalmers University of Technology
Prof. Maria Chen
University of Toronto

Citation

@proceedings{remci_2024,
  title     = {Proceedings of the Workshop on Reliability Engineering for Multi-Tenant Cloud Inference (REMCI 2024)},
  booktitle = {Co-located with Proceedings of the 6th International Conference on Cloud, IoT & Agentic AI (CIOTP 2024)},
  year      = {2024},
  address   = {Singapore},
  publisher = {IEEE}
}