System for AI
We build scalable AI systems that merge architecture, systems, and compiler advances to accelerate training and inference across cloud, edge, and heterogeneous hardware.
About the Initiative
System for AI at KAIST unites researchers in architecture, systems, and compiler optimization to deliver faster, more efficient AI training and inference for large-scale, multimodal, and edge workloads.
Platform Innovation
Designing scalable cloud, edge, and datacenter platforms for LLM serving, video analytics, and multimodal AI.
Accelerated Intelligence
Co-optimizing software stacks, compiler techniques, and accelerators to reduce latency, cost, and energy.
Responsible Reliability
Building secure, reliable systems with memory protection, efficient caching, and resilient training pipelines.
Faculty
Meet the professors leading System for AI research and education.
Dongsu Han
LLM serving systems and compiler techniques for faster, cheaper, scalable AI across heterogeneous hardware.
Youngjin Kwon
Efficient AI training/inference systems with speculative decoding, adaptive inference, and energy-aware multi-GPU runtimes.
Jongse Park
AI systems and architecture for generative serving, video intelligence, and memory-efficient execution.
Minsoo Rhu
AI infrastructure and accelerators: agent cost analysis, Gaudi NPU evaluation, Vision Mamba, elastic RecSys serving.
Jaehyuk Huh
Architectures and system software for secure, high-throughput AI with memory/PIM optimizations.
Research Highlights
Compact snapshots of recent research directions across the faculty.
Dongsu Han
Trinity: tile-level equality saturation for joint tensor optimization (1.35x over manual kernels).
SAND: video DL preprocessing abstraction with up to 12.3x speedup.
SpecEdge: edge-assisted speculative decoding boosting throughput and cost efficiency.
StellaTrain: multi-cloud training acceleration up to 104x over DDP.
Youngjin Kwon
Speculative decoding systems (hierarchical drafting) for low-latency LLM serving.
Test-time search/adaptive inference that trades compute for accuracy.
Energy-aware training/inference frameworks that preserve throughput on multi-GPU systems.
Full-stack acceleration: DRAM-PIM emulation and video-based DL programming abstractions.
Jongse Park
LLM serving systems: hybrid KV cache, PIM/NPU acceleration, and simulator-driven design.
Video & vision systems: computation reuse for continuous learning and video-language queries.
Efficient inference: mixed-precision quantization for diffusion and large models.
Minsoo Rhu
AI agent infrastructure: quantified cost and bottleneck analysis for test-time scaling.
Edge acceleration: Vision Mamba optimizations and quantization on GPUs.
Gaudi NPU serving: end-to-end performance and programmability evaluation.
Data/serving systems: in-storage preprocessing and elastic RecSys serving.
Jaehyuk Huh
Secure heterogeneous memory: multi-granular integrity and metadata management.
High-throughput LLM inference with KV/activation hybrid caching on a single GPU.
PIM and NPU memory optimizations for sparse and training workloads.