Prof. Dawn Song
Workshop Chair
UC Berkeley
Co-Located Workshop · REMCI · 2024
October 21, 2024 · Marina Bay Sands Convention Centre · Singapore
REMCI was a full-day workshop on operating shared LLM and recommendation inference platforms under hard latency, cost, and trust-and-safety constraints. The workshop combined invited industry talks, peer-reviewed experience reports, and a closing roadmap session that fed into the CIOTP 2025 main-track CFP.
Prof. Dawn Song
Workshop Chair
UC Berkeley
Dr. Mei Hwang
Workshop Co-Chair
National University of Singapore
Ravi Menon
Industry Co-Chair
NVIDIA
Chief Scientist, Google DeepMind & Google Research
"Serving Foundation Models at Internet Scale: Lessons from a Decade"
Continuous Batching with Bounded Tail Latency for Mixed-Model Workloads
L. Zhao, R. Menon
OpenAI · NVIDIA
Noisy-Neighbor Mitigation in Shared GPU Inference Clusters
H. Brooks, M. Patel
Anthropic · Amazon Web Services
Cost Attribution Across Tenants in a Multi-Model Serving Platform
R. Menon, M. Hwang
NVIDIA · National University of Singapore
Drift Detection for Production Recommendation Models
A. Krishnan, F. Okafor
Netflix · Airbnb
SLO-Aware Admission Control for Token-Level Streaming
O. Reid, M. Ribeiro
Cloudflare · Fastly
Reliability Patterns for Multi-Region LLM Inference
K. Iyer, D. Park
Google Cloud · Cloudflare
Prof. Andre Dupont
EPFL
Dr. Maya Patel
Amazon Web Services
Lin Zhao
OpenAI
Hannah Brooks
Anthropic
Ravi Menon
NVIDIA
Aditi Krishnan
Netflix
Olivia Reid
Cloudflare
Dr. Rohan Mehta
IIT Bombay
Prof. Linnea Bergstrom
Chalmers University of Technology
Prof. Maria Chen
University of Toronto
@proceedings{remci_2024,
title = {Proceedings of the Workshop on Reliability Engineering for Multi-Tenant Cloud Inference (REMCI 2024)},
booktitle = {Co-located with Proceedings of the 6th International Conference on Cloud, IoT & Agentic AI (CIOTP 2024)},
year = {2024},
address = {Singapore},
publisher = {IEEE}
}