Large Language Model (LLM)

Reading

Sebastian Raschka:
- Understanding Reasoning LLMs
- The Big LLM Architecture Comparison
- Understanding Multimodal LLMs
- Understanding the 4 Main Approaches to LLM Evaluation (From Scratch): Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges
- Coding LLMs from Scratch: The Complete Course Source
The Large Language Model Course
"Understanding the Dark Side of Large Language Models: A Comprehensive Guide to Security Threats and Vulnerabilities"
"Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities"
"Researchers from Virginia Tech and Microsoft Introduce Algorithm of Thoughts: An AI Approach That Enhances Exploration of Ideas And Power of Reasoning In Large Language Models (LLMs)"
"How to Build a Large Language Model from Scratch Using Python"
"How to build knowledge graphs using LLMs"
"Language Models are Few-Shot Learners": "We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-ofthe-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous nonsparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT3’s few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora."
"Boost your AI Apps with DSLs"
"Why LLMs get called “stochastic parrots”: That label (from Bender et al., 2021) was meant as a caution: Mimicry over meaning: LLMs are trained to predict the next word, not to “understand.”; Surface fluency: They can generate convincing text that looks like knowledge without grounding in truth.; Bias and error risks: Because they remix training data, they can reinforce biases or produce confident nonsense.
So the “parrot” critique points to the danger of mistaking correlation for comprehension. Today’s LLMs do much more than repeat phrases: Cross-domain synthesis: They can connect physics, philosophy, and finance in ways that no single expert can, because their training corpus spans millions of sources.; Speed and breadth: A human expert might master a domain over decades, but an LLM can retrieve and correlate patterns across thousands of domains instantly.; Interactive reasoning: With Gen-AI, you can push through Socratic dialogues—testing assumptions, iterating on drafts, exploring alternatives. That’s not “parroting”; it’s a kind of statistical exploration of human knowledge.
Where the gap remains: Grounding: LLMs still lack intentionality or teleonomy (a sense of purpose). They don’t know why something matters.; Validation: Their statistical generalizations can mislead without external grounding (e.g., real data, experiments, or trusted sources).; Knowledge vs. Wisdom: They provide access to structured correlations; the human role is to discern meaning, relevance, and ethical direction.
So: the “parrot” label misses the true capability. LLMs are more like global pattern synthesizers—statistical telescopes that let us see across domains of human knowledge. But until they are coupled with grounding mechanisms (like digital genomes, cognizing oracles, or real-world data feedback), their outputs remain knowledge access tools rather than autonomous knowers." --https://www.linkedin.com/posts/raomikkilineni_why-llms-get-called-stochastic-parrots-activity-7366446664368201729-doqb
How to Choose the Best Open Source LLM for Your Project in 2025
A Survey of Reinforcement Learning for Large Reasoning Models: "In this paper, we survey recent advances in Reinforcement Learning (RL) for reasoning with Large Language Models (LLMs). RL has achieved remarkable success in advancing the frontier of LLM capabilities, particularly in addressing complex logical tasks such as mathematics and coding. As a result, RL has emerged as a foundational methodology for transforming LLMs into LRMs. With the rapid progress of the field, further scaling of RL for LRMs now faces foundational challenges not only in computational resources but also in algorithm design, training data, and infrastructure. To this end, it is timely to revisit the development of this domain, reassess its trajectory, and explore strategies to enhance the scalability of RL toward Artificial SuperIntelligence (ASI). In particular, we examine research applying RL to LLMs and LRMs for reasoning abilities, especially since the release of DeepSeek-R1, including foundational components, core problems, training resources, and downstream applications, to identify future opportunities and directions for this rapidly evolving area. We hope this review will promote future research on RL for broader reasoning models. Github: https://github.com/TsinghuaC3I/Awesome-RL-for-LRMs"
How to Enrich LLM Context to Significantly Enhance Capabilities

Critique

Why Large Language Models Cannot Achieve Artificial General Intelligence or Artificial Superintelligence: Large Language Models, in their current incarnation, are statistical systems trained on vast corpora of text data to predict the most likely next token in a sequence. These models learn to compress and reproduce patterns found in their training data, enabling them to generate coherent and contextually appropriate responses. However, their operation remains fundamentally different from the flexible, adaptive intelligence that characterizes AGI.
10 Common Misconceptions about LLMs: LLMs Actually Understand Language Like Humans Do; More Parameters Always Mean Better Performance; LLMs Are Just Autocomplete on Steroids; LLMs Remember Everything They’ve Learned; Fine-Tuning Always Makes Models Better; LLMs Are Deterministic: Same Input, Same Output; Bigger Context Windows Are Always Better; LLMs Can Replace Traditional Machine Learning for All Language Tasks; Prompt Engineering Is Just Trial and Error; LLMs Will Soon Replace All Software Developers
Apple: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
A knockout blow for LLMs? (responding to the Apple paper above)
https://leaddev.com/velocity/writing-code-was-never-the-bottleneck
https://zed.dev/blog/why-llms-cant-build-software
LLMs are not like you and me--and never will be
"Socio-Demographic Modifiers Shape Large Language Models’ Ethical Decisions": "The ethical alignment of large language models (LLMs) in clinical decision making remains unclear, particularly their susceptibility to socio-demographic biases. We therefore tested whether LLMs shift medical ethical decisions in healthcare when presented with socio-demographic cues. Using 100 clinical vignettes, each posing a yes or no choice between two ethical principles, we compared the responses of nine open-source LLMs (Llama 3.3-70B, Llama 3.1-8B, Llama-3.1-Nemotron-70B, Gemma-2-27B, Gemma-2-9B, Phi-3.5-mini, Phi-3-medium, Qwen-2.5-72B, and Qwen-2.5-7B). Each scenario and modifier combination was repeated 10 times per model for a total of approximately 0.5 million experiments. All models changed their responses when introduced with socio-demographic details (p < 0.001). High-income modifiers increased utilitarian choices and decreased beneficence and nonmaleficence preferences, and marginalized-group modifiers raised autonomy considerations. Although some models demonstrated greater consistency than others, none maintained consistency across all scenarios, with the largest shifts observed in utilitarian choices. These results reveal that current LLMs can be steered by socio-demographic cues in ways not clinically justified, posing risks for equitable care in healthcare-informatics applications. This underscores the need for careful auditing and alignment strategies that ensure LLMs behave in ways consistent with widely accepted ethical principles while remaining attentive to the diversity, complexity, and contextual sensitivity required in real-world clinical practice."
"Why Language Models Hallucinate" Article Paper
"When AI Gets It Wrong: Addressing AI Hallucinations and Bias"

Models

qwen2.5-coder: Your go-to coding companion. "It's a specialized LLM from the same family as qwen 2.5, but it's fine-tuned specifically for coding tasks. It is trained on an enormous dataset of code, providing it with a deep understanding of over 40 programming languages. The qwen-coder2.5 comes in different sizes, from a small 0.5B model to a powerful 32B model."
wizard-math: Your logic partner. "This model is a specialized version of the WizardLM family. It is trained to excel in complex mathematical problems, logical reasoning, and solving puzzles. There are three different sizes available for wizard-math – 7B, 13B, and 70B. With my little experience exploring wizard-math with 7B, I found it fantastic. It helps me test my solutions and explore new ways to approach difficult problems. The model's ability to handle these subjects with precision and clarity makes it my go-to partner for all things logic and numbers."
reader-lm: Web to markdown, instantly. "The model is super practical for my needs. Instead of manually creating .md files from the web content, I can feed them to reader-lm to get a perfectly structured markdown file. In my experience, while reader-lm does an amazing job for most of my needs, it sometimes struggles with really large or messy HTML code. But, it works well enough most of the time."
llma-guard3: An LLM for safe prompts. "When working with LLMs, it’s crucial that our interactions are safe and responsible. While we can’t control an LLM’s response, we can ensure our prompts are appropriate. That’s exactly why I self-hosted llama-guard 3. This powerful model acts as a dedicated content moderation tool for all my other local LLMs. The llama-guard 3's job is to classify every interaction against a set of safety categories. It checks our prompts for 13 different categories. When we give a prompt to this LLM, it will respond with a message stating whether the prompt message was safe or unsafe. If it is unsafe, it flags it with a specific reason, such as S1 (Hate Speech) or S2 (Sexual Content), etc."
Gemma 3: My local Gemini experience. "ChatGPT and Gemini are two key benchmarks that made everyone accustomed to AI and LLMs. While self-hosting LLMs, I also did not want to compromise my experience with those platforms. That’s why I self-hosted Gemma 3. This model is built on the same research as Gemini. It provides a premium experience with the flexibility of running locally. It is basically my local ChatGPT / Gemini. Gemma 3 is available in various sizes. It can handle a massive 128k context window, processes both text and images, and understands over 140 languages. This makes it my personal go-to AI for creative tasks. I use it to generate ideas for social media content, draft captions, and research topics for my blog."

Integrations/Implementations

How to run a local LLM via LocalAI, an Open Source project
Guidance: "Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly."

LangChain

Python

https://www.infoq.com/news/2025/08/google-langextract-python/
Python Langchain: LangChain is a framework for developing applications powered by language models. It enables applications that:
- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

Tags: ai

Last modified 16 November 2025

Large Language Model (LLM)

Collection of links, notes, and models.

Reading

Critique

Models

Integrations/Implementations

LangChain

Python