← CIOTP 2024 proceedings

Co-Located Workshop · REMCI · 2024

Workshop on Reliability Engineering for Multi-Tenant Cloud Inference

October 21, 2024 · Marina Bay Sands Convention Centre · Singapore

Acronym
REMCI
Submissions
64
Accepted
18
Co-Located With
CIOTP 2024

About the Workshop

REMCI was a full-day workshop on operating shared LLM and recommendation inference platforms under hard latency, cost, and trust-and-safety constraints. The workshop combined invited industry talks, peer-reviewed experience reports, and a closing roadmap session that fed into the CIOTP 2025 main-track CFP.

Call for Papers — Topics of Interest

  • · KV-cache management and prefix sharing
  • · Multi-tenant isolation and noisy-neighbor mitigation
  • · Token-level SLOs and admission control
  • · Evaluation harnesses and drift detection in production
  • · Cost attribution across tenants and models

Organizers

Prof. Dawn Song

Workshop Chair

UC Berkeley

Dr. Mei Hwang

Workshop Co-Chair

National University of Singapore

Ravi Menon

Industry Co-Chair

NVIDIA

Invited Keynote

Dr. Jeff Dean

Chief Scientist, Google DeepMind & Google Research

"Serving Foundation Models at Internet Scale: Lessons from a Decade"

Important Dates

Paper submission deadline
July 1, 2024
Author notification
August 19, 2024
Camera-ready due
September 9, 2024
Workshop date
October 20, 2024

Workshop Programme

  1. 09:00 - 09:10Opening remarks (Song, Hwang, Menon)
  2. 09:10 - 10:10Keynote: Dr. Jeff Dean (Google DeepMind)
  3. 10:10 - 10:40Coffee break
  4. 10:40 - 12:10Session 1: Inference Scheduling and Batching (4 papers)
  5. 12:10 - 13:30Lunch
  6. 13:30 - 15:00Session 2: Multi-Tenant Isolation and Cost Attribution (5 papers)
  7. 15:00 - 15:30Coffee break
  8. 15:30 - 16:30Session 3: Reliability and Drift Detection (4 papers)
  9. 16:30 - 17:30Roadmap discussion: Inputs to the CIOTP 2025 Main-Track CFP
  10. 17:30 - 17:45Closing and best-paper announcement

Accepted Papers

  1. 01

    Continuous Batching with Bounded Tail Latency for Mixed-Model Workloads

    L. Zhao, R. Menon

    OpenAI · NVIDIA

  2. 02

    Noisy-Neighbor Mitigation in Shared GPU Inference Clusters

    H. Brooks, M. Patel

    Anthropic · Amazon Web Services

  3. 03

    Cost Attribution Across Tenants in a Multi-Model Serving Platform

    R. Menon, M. Hwang

    NVIDIA · National University of Singapore

  4. 04

    Drift Detection for Production Recommendation Models

    A. Krishnan, F. Okafor

    Netflix · Airbnb

  5. 05

    SLO-Aware Admission Control for Token-Level Streaming

    O. Reid, M. Ribeiro

    Cloudflare · Fastly

  6. 06

    Reliability Patterns for Multi-Region LLM Inference

    K. Iyer, D. Park

    Google Cloud · Cloudflare

Workshop Programme Committee

  • Prof. Andre Dupont

    EPFL

  • Dr. Maya Patel

    Amazon Web Services

  • Lin Zhao

    OpenAI

  • Hannah Brooks

    Anthropic

  • Ravi Menon

    NVIDIA

  • Aditi Krishnan

    Netflix

  • Olivia Reid

    Cloudflare

  • Dr. Rohan Mehta

    IIT Bombay

  • Prof. Linnea Bergstrom

    Chalmers University of Technology

  • Prof. Maria Chen

    University of Toronto

Citation

@proceedings{remci_2024,
  title     = {Proceedings of the Workshop on Reliability Engineering for Multi-Tenant Cloud Inference (REMCI 2024)},
  booktitle = {Co-located with Proceedings of the 6th International Conference on Cloud, IoT & Agentic AI (CIOTP 2024)},
  year      = {2024},
  address   = {Singapore},
  publisher = {IEEE}
}