Announcement_1

Our first version of Mirage Persistent Kernel is released:sparkles:! We developed a compiler that automatically transforms LLM inference into a single megakernel — a fused GPU kernel that performs all necessary computation and communication in one launch. This end-to-end GPU fusion approach reduces LLM inference latency by 1.2-6.7x. Have a look at the blog and the codebase.