Uncategorized (yet)
General
History
Criticism
- "The Copilot Delusion"
- The Hidden Costs of Coding With Generative AI
- The Hidden Costs of AI Coding Assistants: Insights from a Senior Developer
- "My new hobby: watching AI slowly drive Microsoft employees insane"
- Where's Your Ed At?
-
We need to stop pretending AI is intelligent -- here's how
- Aritificial intelligence is not intelligent at all
- AI: Not That Smart
- Building AI Products In The Probabilistic Era: "Just as physics underwent a conceptual revolution when we moved past Newton's deterministic universe, and into a strange and counterintuitive place made by wave functions, software too is undergoing its own quantum shift. We're leaving a world where code reliably and deterministically takes certain inputs to produce specific outputs, and entering a very different one where machines now produce statistical distributions instead. Building probabilistic software is like nothing we've done before."
- "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con"
- "AI coding tools make developers slower but they think they're faster, study finds"
- Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity: "We conduct a randomized controlled trial (RCT) to understand how early-2025 AI tools affect the productivity of experienced open-source developers working on their own repositories. Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower. We view this result as a snapshot of early-2025 AI capabilities in one relevant setting; as these systems continue to rapidly evolve, we plan on continuing to use this methodology to help estimate AI acceleration from AI R&D automation." (full paper)
- Methodology: To directly measure the real-world impact of AI tools on software development, we recruited 16 experienced developers from large open-source repositories (averaging 22k+ stars and 1M+ lines of code) that they’ve contributed to for multiple years. Developers provide lists of real issues (246 total) that would be valuable to the repository—bug fixes, features, and refactors that would normally be part of their regular work. Then, we randomly assign each issue to either allow or disallow use of AI while working on the issue. When AI is allowed, developers can use any tools they choose (primarily Cursor Pro with Claude 3.5/3.7 Sonnet—frontier models at the time of the study); when disallowed, they work without generative AI assistance. Developers complete these tasks (which average two hours each) while recording their screens, then self-report the total implementation time they needed. We pay developers $150/hr as compensation for their participation in the study.
- Core Result: When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against developer beliefs and expert forecasts. This gap between perception and reality is striking: developers expected AI to speed them up by 24%, and even after experiencing the slowdown, they still believed AI had sped them up by 20%. Below, we show the raw average developer forecasted times, and the observed implementation times—we can clearly see that developers take substantially longer when they are allowed to use AI tools.
- Clarifications: We list claims that we do not provide evidence for:
| We do not provide evidence that: |
Clarification |
| AI systems do not currently speed up many or most software developers |
We do not claim that our developers or repositories represent a majority or plurality of software development work |
| AI systems do not speed up individuals or groups in domains other than software development |
We only study software development |
| AI systems in the near future will not speed up developers in our exact setting |
Progress is difficult to predict, and there has been substantial AI progress over the past five years [3] |
| There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting |
Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup |
-
Factor Analysis: We investigate 20 potential factors that might explain the slowdown, finding evidence that 5 likely contribute:
We rule out many experimental artifacts—developers used frontier models, complied with their treatment assignment, didn’t differentially drop issues (e.g. dropping hard AI-disallowed issues, reducing the average AI-disallowed difficulty), and submitted similar quality PRs with and without AI. The slowdown persists across different outcome measures, estimator methodologies, and many other subsets/analyses of our data. See the paper for further details and analysis.
-
FAQs:
- How were developers actually slowed down given they had the option to not use AI? After the study, developers estimated that they were sped up by 20% on average when using AI--so they were mistaken about AI’s impact on their productivity. Furthermore, it’s possible that developers use AI tools for reasons other than pure productivity--for example, they may find it a more pleasant/enjoyable experience, or they may view it as an investment into learning skills that they expect to be useful with future (more capable) systems.
- What was the motivation for this study? Were we incentivized/motivated to find this result? METR is a non-profit (funded by donations) interested in understanding how close AI systems are to accelerating the AI R&D process, which could pose significant destabilizing risks [1]. This study was designed to give us evidence about a similar domain: experienced, open-source developers working on projects they’re highly familiar with. We initially were broadly expecting to see positive speedup—scientific integrity is a core value of ours, and we were (and are) committed to sharing results regardless of the outcome.
- You only had 16 developers, so these results will not generalize/replicate. We compute confidence intervals accounting for the number of developers by using clustered standard errors (not reported in the released paper, but forthcoming). Because we don’t observe meaningful within-developer structure, and each developer completes issues in both conditions, the 246 total completed issues give us (just enough) sufficient statistical power to reject the null hypothesis of zero speedup/slowdown. See Appendix D for discussion of our empirical strategy. There is still a question of representativeness—i.e. there are likely biases in which developers ended up participating in the study. For example, there may be experienced, open-source developers who decided to not participate because they believe they have significant positive speedup from AI, and they didn’t want to be forced to not use AI on 50% of their tasks. No developer reports thinking in this way, but we can’t rule this (or other sampling biases) out.
- Are the developers beginners at using Cursor/AI tools? Does this explain the result? Developers seem to be qualitatively in-distribution for Cursor Pro users, although we can’t rule out learning effects beyond 50 hours of Cursor usage. Nearly all developers have substantial (dozens to hundreds of hours) prior experience prompting LLMs. See Appendix C.2.7 for more discussion/analysis of developer AI tool use skill.
- Do these results say that AI isn't useful in software engineering? No--it seems plausible or likely that AI tools are useful in many other contexts different from our setting, for example, for less experienced developers, or for developers working in an unfamiliar codebase. See Appendix B for potential misreadings/overgeneralizations we do not endorse on the basis of our results.
- It's not appropriate to use homoskedastic SEs. What gives? Appendix C.3.5 explores alternative estimators, including a naive ratio estimator. All alternative estimators evaluated yield similar results, suggesting that the slowdown result is robust to our empirical strategy. That said, we are actively evaluating further standard error estimation methods, in response to community feedback (thank you to those who have given feedback so far!).
-
AI Anthropomorphism
- Why do Humans Anthropomorphize AI?
- Are You ...?
- We need to stop ...
- The Four Degrees of ...
- Why Human-like is not human
- Anthropomorphism and AI: hype and fallacy (PDF): "As a form of hype, anthropomorphism is shown to exaggerate AI capabilities and performance by attributing human-like traits to systems that do not possess them. As a fallacy, anthropomorphism is shown to distort moral judgments about AI, such as those concerning its moral character and status, as well as judgments of responsibility and trust. By focusing on these two dimensions of anthropomorphism in AI, the essay highlights negative ethical consequences of the phenomenon in this field."
- A Vaccine: "Here is an explanation that I finally settled on, based on a tweet by prominent AI researcher and educator Andrej Karpathy. When I first tried it out on my dad, he kept quiet for a little bit, and then shifted the way he saw and used LLMs. I’ve tested this on a few friends since, of varying levels of technical sophistication, and am pleased to report that it works quite well. This explanation is useful but not accurate. I’ll give you the explanation, explain why it works, and then give a brief sketch on how it is not true. Finally, I’ll argue that even if it is not accurate, this explanation points you towards more productive mental models of LLMs.
- "Imagine that you can visualise words like stars in the sky. The position of the words relative to other words are based on the relationships between each of the words in your language (that is: how closely and how frequently a word appears next to another word in all the sentences ever written). What is important to know is that you can draw arrows from one star to another! These arrows have some surprising properties. One property is that the arrow from the word ‘king’ to the word ‘queen’ is the same as ‘king - boy + girl’. On top of that, let’s imagine that you throw up a starfield for English and a starfield for Spanish. It turns out that if you can draw a one-to-one mapping between the two starfields, the king to queen arrow is the same in both languages! This was a very surprising finding when it first came out!
- " What a Large Language Model is is that it is a very sophisticated auto complete. But it is a bit more than that.
- "When you ask “what are the 10 best places to visit in Bali” the AI will give you a plausible-sounding answer. But the way it gives you that answer is that during the AI’s training, some human somewhere wrote an answer like “the top 10 places to visit in London” and “the top 10 places to visit in Tokyo” and “the top 10 places to visit in New York” based on some cursory research and Google searches. Then the AI took those written examples, and memorised the statistical relationships between the sentences, which you can imagine like the arrows between large clusters of stars. Then when you ask “what are the 10 best places in Bali or Singapore or Lisbon”, it just moves the arrows it learnt over to the part of its starfield that has concepts related to Bali, Singapore and Lisbon, and spits out something similar to what the human trainer wrote.
- "Notice that this is not thinking. The LLM is doing autocomplete, but using this ‘arrow in the starfield’ property to give you very good, plausible-sounding, novel answers.
- "But because it writes so eloquently, and answers you like a human would, you think that it’s actually intelligent and sentient. I suppose you could say that it is ‘intelligent’ (by some definition of ‘intelligent’) but it is not a person. It doesn’t understand concepts the way a human does. What it is relying on is some complex version of this ‘arrow in a starfield’ property.
- "You may get better answers by constraining it to a smaller set of documents. So if you give it a bunch of papers, and ask it for themes or a summary of those papers, it may give you better answers than if you assumed those papers were in its original training corpus.
"And that’s it."
-
OpenAI Realizes It Made a Terrible Mistake: "In a paper published last week, a team of OpenAI researchers attempted to come up with an explanation. They suggest that large language models hallucinate because when they're being created, they're incentivized to guess rather than admit they simply don't know the answer. ... In simple terms, in other words, guessing is rewarded — because it might be right — over an AI admitting it doesn't know the answer, which will be graded as incorrect no matter what. As a result, through "natural statistical pressures," LLMs are far more prone to hallucinate an answer instead of "acknowledging uncertainty.""
- Hallucinations getting worse as AI models get more capable
Tools
- Lovable: Create apps and websites by chatting with AI
- ChatGPT-5
Tutorials
Reading
Definitions
Fuzzy Logic
Java:
Natural Language Processing
Natural Language Programming
An advanced artificial intelligence (AI) system, built on deep learning and transformer architectures, that is pre-trained on massive amounts of text data to understand, process, and generate human-like language. LLMs learn to predict the next word in a sequence, enabling them to perform tasks like text generation, translation, summarization, and responding to complex queries, though they are not perfect oracles and can generate incorrect information or exhibit bias.
- "On the Biology of a Large Language Model": "We investigate the internal mechanisms used by Claude 3.5 Haiku — Anthropic's lightweight production model — in a variety of contexts, using our circuit tracing methodology."
An AI model designed to handle specific tasks, using fewer parameters and less computational power than a large language model (LLM). This efficiency makes SLMs faster to train, more accessible, and suitable for deployment on devices with limited resources or for performing specialized functions, such as data extraction from documents, language translation, or specific conversational agents. In terms of size, SLM parameters range from a few million to a few billion, as opposed to LLMs with hundreds of billions or even trillions of parameters. Parameters are internal variables, such as weights and biases, that a model learns during training. These parameters influence how a machine learning model behaves and performs.
Coding Assistants
- Create a Coding Assistant with StarCoder
- "How to use GPT as a natural language to SQL query engine"
- "Who owns the code?": "This shift raises an important question: who is accountable when something goes wrong – Copilot, the reviewer, or someone else?Rajesh Jethwa, CTO of software engineering consultancy Digiterre, describes this issue as a “minefield”, because there are a number of entities involved in creating the code. First, there are the providers of the models themselves, such as OpenAI or Anthropic. It is currently unclear whether these providers own the code generated by their models. Second, there are the authors of the code used to train the model. There are still questions around whether they have any claim to ownership of the resulting code, given the provenance of the training data. Third, there are employees and the organizations they work for. Typically, when an employee creates code as part of their job, the organization owns that code. However, it remains uncertain whether the organization or the individual employee should bear responsibility for the code that is produced with the help of a coding assistance."
Generative AI
- 10 Github Repositories to Master Reinforcement Learning
- Machine Learning for Software Engineering: A curated list of papers, theses, datasets, and tools related to the application of Machine Learning for Software Engineering.
- Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigations
- A Brief Introduction to Machine Learning for Engineers - Osvaldo Simeone (PDF)
- A Brief Introduction to Neural Networks
- A Comprehensive Guide to Machine Learning - Soroush Nasiriany, Garrett Thomas, William Wang, Alex Yang (PDF)
- A Course in Machine Learning (PDF)
- A First Encounter with Machine Learning - Max Welling (PDF) (:card_file_box: archived)
- A Selective Overview of Deep Learning - Fan, Ma, and Zhong (PDF)
- Algorithms for Reinforcement Learning - Csaba Szepesvári (PDF)
- An Introduction to Statistical Learning - Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani (PDF)
- Approaching Almost Any Machine Learning Problem - Abhishek Thakur (PDF)
- Bayesian Reasoning and Machine Learning
- Deep Learning - Ian Goodfellow, Yoshua Bengio and Aaron Courville
- Deep Learning for Coders with Fastai and PyTorch - Jeremy Howard, Sylvain Gugger (Jupyter Notebooks)
- Deep Learning with PyTorch - Eli Stevens, Luca Antiga, Thomas Viehmann (PDF)
- Dive into Deep Learning
- Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises - James L. McClelland
- Foundations of Machine Learning, Second Edition - Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar
- Free and Open Machine Learning - Maikel Mardjan (HTML)
- Gaussian Processes for Machine Learning
- IBM Machine Learning for Dummies - Judith Hurwitz, Daniel Kirsch
- Information Theory, Inference, and Learning Algorithms
- Interpretable Machine Learning - Christoph Molnar
- Introduction to CNTK Succinctly - James McCaffrey
- Introduction to Machine Learning - Amnon Shashua
- Keras Succinctly - James McCaffrey
- Learn Tensorflow - Jupyter Notebooks
- Learning Deep Architectures for AI (PDF)
- Machine Learning
- Machine Learning for Data Streams - Albert Bifet, Ricard Gavaldà, Geoff Holmes, Bernhard Pfahringer
- Machine Learning from Scratch - Danny Friedman (HTML, PDF, Jupyter Book)
- Machine Learning, Neural and Statistical Classification
- Machine Learning with Python - Tutorials Point (HTML, PDF)
- Mathematics for Machine Learning - Garrett Thomas (PDF)
- Mathematics for Machine Learning - Marc Peter Deisenroth, A Aldo Faisal, and Cheng Soon Ong
- Neural Networks and Deep Learning
- Practitioners guide to MLOps - Khalid Samala, Jarek Kazmierczak, Donna Schut (PDF)
- Probabilistic Models in the Study of Language (Draft, with R code)
- Python Machine Learning Projects - Lisa Tagliaferri, Brian Boucheron, Michelle Morales, Ellie Birkbeck, Alvin Wan (PDF, EPUB, Kindle)
- Reinforcement Learning: An Introduction - Richard S. Sutton, Andrew G. Barto (PDF)
- Speech and Language Processing (3rd Edition Draft) - Daniel Jurafsky, James H. Martin (PDF)
- The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirani, and Jerome Friedman
- The LION Way: Machine Learning plus Intelligent Optimization - Roberto Battiti, Mauro Brunato (PDF)
- The Mechanics of Machine Learning - Terence Parr and Jeremy Howard
- The Python Game Book - Horst Jens (:card_file_box: archived)
- Top 10 Machine Learning Algorithms Every Engineer Should Know - Binny Mathews and Omair Aasim
- Understanding Machine Learning: From Theory to Algorithms - Shai Shalev-Shwartz, Shai Ben-David
- You Don't Need Backpropagation To Train Neural Networks Anymore
Semantic Entity Resolution (Knowledge Graphs) (?)
Detail Pages:
- Agent Development Kit (ADK) An open-source, code-first toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
- Agentic AI A collection of topics and notes on the subject.
- Agent-to-Agent (A2A) protocol An open standard designed to enable seamless communication and collaboration between AI agents.
- AIAC (Artificial Intelligence Infrastructure-as-Code generator) A library and command line tool to generate IaC (Infrastructure as Code) templates, configurations, utilities, queries and more via LLM providers such as OpenAI, Amazon Bedrock and Ollama.
- AIScript A language designed specifically for web development in the AI era, with AI capabilities as first-class language features, and an intuitive route DSL and directive design.
- Arcee Own your own small language models.
- Barfi A Flow-Based Programming framework that offers a graphical programming interface.
- Claude Code The AI agent.
- ComfyUI A node-based Gradio GUI designed for generative AI models to generate AI images, video, and audio locally on your own hardware.
- Copilot An AI agent from GitHub/Microsoft.
- Faiss A library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning.
- Fuzzy logic Readings and links on fuzzy logic.
- Gemini Google's AI agent.
- Goblin.tools A collection of small, simple, single-task tools, mostly designed to help neurodivergent people with tasks they find overwhelming or difficult.
- gptlang A new programming language implemented by GPT-4.
- Husky Research language aimed at next generation AI and software.
- Jlama A modern Java inference engine for LLMs.
- LangChain, LangGraph The open, composable framework that provides a standard interface for every model, tool, and database – so you can build LLM apps that adapt as fast as the ecosystem evolves.. (More description needed!)
- LangStream Combines data streaming with generative AI.
- Large Language Model (LLM) Collection of links, notes, and models.
- LittleHorse A high-performance microservice orchestration engine that allows developers to build scalable, maintainable, and observable applications.
- Llama-3 Practical Llama 3 inference in Java.
- LMQL A programming language for LLMs; Robust and modular LLM prompting using types, templates, constraints and an optimizing runtime.
- Loglan A language which was originally devised to test the Sapir-Whorf hypothesis that the structure of language determines the boundaries of human thought.
- Machine Learning Links and notes on the topic.
- Manifest An Open Source, portable backend that fits into 1 YAML file. Easy for both humans and LLMs to generate and validate.
- Model Control Protocol (MCP) An open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools.
- Ollama Host (?) for multiple open-source LLMs.
- Outerbase An AI-powered database platform.
- Paradigms of Artificial Intelligence Programming Norvig's classic, on the Web.
- PocketFlow A 100-line minimalist LLM framework.
- Priml A new programming language aimed at facilitating systems and AI infrastructure development.
- Prompt Orchestration Markup Language (POML) An open-source framework designed to bring order, modularity, and extensibility to prompt engineering for LLMs.
- Retrival Augmented Generation (RAG) Concepts, notes, links, reading
- Small Language Models (SLM) Collection of links and notes.
- SmartMock An AI-powered API mock server built with Spring Boot, Ollama, and LangChain4j that, instead of serving hardcoded responses, uses an LLM to generate realistic, context-aware mock data directly from your OpenAPI specifications.
- Spec Kit An effort to allow organizations to focus on product scenarios rather than writing undifferentiated code with the help of Spec-Driven Development.
- SpecLang An attempt at lifting the developer experience to a higher level of abstraction, closer to how we conceptually think about our programs: where programming is much more similar to how you'd instruct a human being.
- Spec-Oriented Programming Using natural language and language models to build software.
- Structured and Unstructured Query Language (SUQL) Conversational Search over Structured and Unstructured Data with LLMs
- Suno Generate music from prompts.
- TabbyML Opensource, self-hosted AI coding assistant.
- The Edge of Sentience Notes on (and links to) the book.
Last modified 17 October 2025