Dense reference pages designed to be useful long after publication.
Notes on autograd, tensors, and the PyTorch execution model.
Evergreen reference on transformer blocks, attention, and positional encoding.