karpathy GitHub repos: In reverse order of adoption (that is, most recently developed first)
Building a Small Language Model(SLM) from Scratch: This Repository provides a Jupyter Notebook for building a small language model from scratch using 'TinyStories' dataset. Covers data preprocessing, BPE tokenization, binary storage, GPU memory management, and training a Transformer in PyTorch. Generate sample stories to test your model. Ideal for learning NLP and PyTorch.
Last modified 15 April 2026