Flash Attention Explained
A practical walkthrough of how Flash Attention reduces memory traffic and speeds up transformer training.
Essays and notes on AI, GPUs, and software I am building.
A practical walkthrough of how Flash Attention reduces memory traffic and speeds up transformer training.
Notes from learning CUDA memory hierarchy, occupancy, and writing my first custom kernels.