news

Jun 18, 2025 Our first version of Mirage Persistent Kernel is released:sparkles:! We developed a compiler that automatically transforms LLM inference into a single megakernel — a fused GPU kernel that performs all necessary computation and communication in one launch. This end-to-end GPU fusion approach reduces LLM inference latency by 1.2-6.7x. Have a look at the blog and the codebase.
May 31, 2025 Excited to share that our paper Mirage: A Multi-Level Superoptimizer for Tensor Programs is accepted by OSDI ‘25 :confetti_ball:!