Systems meet Intelligence

System for AI

We build scalable AI systems that merge architecture, systems, and compiler advances to accelerate training and inference across cloud, edge, and heterogeneous hardware.

Cloud & Edge AI Acceleration Reliability Performance

About the Initiative

System for AI at KAIST unites researchers in architecture, systems, and compiler optimization to deliver faster, more efficient AI training and inference for large-scale, multimodal, and edge workloads.

Platform Innovation

Designing scalable cloud, edge, and datacenter platforms for LLM serving, video analytics, and multimodal AI.

Accelerated Intelligence

Co-optimizing software stacks, compiler techniques, and accelerators to reduce latency, cost, and energy.

Responsible Reliability

Building secure, reliable systems with memory protection, efficient caching, and resilient training pipelines.

Faculty

Meet the professors leading System for AI research and education.

Research Highlights

Compact snapshots of recent research directions across the faculty.

Dongsu Han

Trinity: tile-level equality saturation for joint tensor optimization (1.35x over manual kernels).

SAND: video DL preprocessing abstraction with up to 12.3x speedup.

SpecEdge: edge-assisted speculative decoding boosting throughput and cost efficiency.

StellaTrain: multi-cloud training acceleration up to 104x over DDP.

Youngjin Kwon

Speculative decoding systems (hierarchical drafting) for low-latency LLM serving.

Test-time search/adaptive inference that trades compute for accuracy.

Energy-aware training/inference frameworks that preserve throughput on multi-GPU systems.

Full-stack acceleration: DRAM-PIM emulation and video-based DL programming abstractions.

Jongse Park

LLM serving systems: hybrid KV cache, PIM/NPU acceleration, and simulator-driven design.

Video & vision systems: computation reuse for continuous learning and video-language queries.

Efficient inference: mixed-precision quantization for diffusion and large models.

Minsoo Rhu

AI agent infrastructure: quantified cost and bottleneck analysis for test-time scaling.

Edge acceleration: Vision Mamba optimizations and quantization on GPUs.

Gaudi NPU serving: end-to-end performance and programmability evaluation.

Data/serving systems: in-storage preprocessing and elastic RecSys serving.

Jaehyuk Huh

Secure heterogeneous memory: multi-granular integrity and metadata management.

High-throughput LLM inference with KV/activation hybrid caching on a single GPU.

PIM and NPU memory optimizations for sparse and training workloads.