CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA tensor core units. The project provides a comprehensive ecosystem for expressing and optimizing tiled computations for NVIDIA GPUs, simplifying the development of high-performance CUDA kernels through abstractions for common tiling patterns, memory hierarchy management, and GPU-specific optimizations.
This open-source release is aligned with the CUDA Toolkit 13.1 release. For more information about CUDA Tile, visit https://developer.nvidia.com/cuda/tile.
CUDA Tile is composed of:
CUDA Tile development is driven by the CUDA Tile IR specification, which defines the formal semantics, operations, and type system for tile-based computations on NVIDIA GPUs. For detailed information about the CUDA Tile IR specification, including dialect operations, type system, and transformation passes, please refer to the CUDA Tile Specification.
Last modified 15 January 2026