Writing

May 11, 2025

Flash Attention Explained

A practical walkthrough of how Flash Attention reduces memory traffic and speeds up transformer training.

Deep Learning CUDA GPU Kernels

Apr 27, 2025

GPU Kernels — First Notes

Notes from learning CUDA memory hierarchy, occupancy, and writing my first custom kernels.

CUDA GPU Kernels PyTorch

Projects

All projects →

Scroll to browse