<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Harikanth Lingutla — AI/ML Engineer</title><description>I build practical AI systems, GPU projects, and software tools. Writing about PyTorch, CUDA, Triton, and building AI products.</description><link>https://harikanth.site/</link><language>en-us</language><item><title>Flash Attention Explained</title><link>https://harikanth.site/blog/flash-attention-explained/</link><guid isPermaLink="true">https://harikanth.site/blog/flash-attention-explained/</guid><description>A practical walkthrough of how Flash Attention reduces memory traffic and speeds up transformer training.</description><pubDate>Mon, 12 May 2025 00:00:00 GMT</pubDate><category>Deep Learning</category><category>CUDA</category><category>GPU Kernels</category></item><item><title>GPU Kernels — First Notes</title><link>https://harikanth.site/blog/gpu-kernels-notes/</link><guid isPermaLink="true">https://harikanth.site/blog/gpu-kernels-notes/</guid><description>Notes from learning CUDA memory hierarchy, occupancy, and writing my first custom kernels.</description><pubDate>Mon, 28 Apr 2025 00:00:00 GMT</pubDate><category>CUDA</category><category>GPU Kernels</category><category>PyTorch</category></item></channel></rss>