Reading
General
Implementation
- Deploy an AI Analyst in Minutes: Connect Any LLM to Any Data Source with Bag of Words
- "How to Build a Large Language Model from Scratch Using Python"
- "How to build knowledge graphs using LLMs"
- The Large Language Model Course
- Developing LLM-Based Text Adventure Games
- Designing an open-source LLM interface and social platforms for collectively driven LLM evaluation and auditing: "In the emerging landscape of large language models (LLMs), the imperative for robust evaluation and auditing mechanisms is paramount to ensure their ethical deployment and alignment with user needs. This workshop paper proposes a novel framework for the human-centered evaluation and auditing of LLMs, centered around an open-source chat user interface (UI) that facilitates direct interaction with a wide range of models. This approach allows for a collection of rich datasets critical for nuanced evaluation from a diverse spectrum of user interactions. Building on this foundation, we propose a social platform designed to leverage the collective intelligence of its users through crowdsourcing, enabling the evaluation and auditing of LLMs across various domains. This platform supports a dual-layered evaluation pipeline: an automated preliminary assessment based on user feedback and a deeper, community-driven analysis within domain-specific subcommunities. The culmination of this process informs the development of tailored model configurations and curated datasets, ensuring that LLMs serve the specific needs of different user groups. By combining an open-source UI with a socially-driven evaluation platform, our approach fosters a community-centric ecosystem for continuous LLM improvement, emphasizing transparency, inclusivity, and alignment with human values."
Critique
- Why Large Language Models Cannot Achieve Artificial General Intelligence or Artificial Superintelligence: Large Language Models, in their current incarnation, are statistical systems trained on vast corpora of text data to predict the most likely next token in a sequence. These models learn to compress and reproduce patterns found in their training data, enabling them to generate coherent and contextually appropriate responses. However, their operation remains fundamentally different from the flexible, adaptive intelligence that characterizes AGI.
- 10 Common Misconceptions about LLMs: LLMs Actually Understand Language Like Humans Do; More Parameters Always Mean Better Performance; LLMs Are Just Autocomplete on Steroids; LLMs Remember Everything They’ve Learned; Fine-Tuning Always Makes Models Better; LLMs Are Deterministic: Same Input, Same Output; Bigger Context Windows Are Always Better; LLMs Can Replace Traditional Machine Learning for All Language Tasks; Prompt Engineering Is Just Trial and Error; LLMs Will Soon Replace All Software Developers
- Apple: The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
- LLMs: The Illusion of Thinking:
- A knockout blow for LLMs? (responding to the Apple paper above)
- https://leaddev.com/velocity/writing-code-was-never-the-bottleneck
- https://zed.dev/blog/why-llms-cant-build-software
- LLMs are not like you and me--and never will be
- "Socio-Demographic Modifiers Shape Large Language Models’ Ethical Decisions": "The ethical alignment of large language models (LLMs) in clinical decision making remains unclear, particularly their susceptibility to socio-demographic biases. We therefore tested whether LLMs shift medical ethical decisions in healthcare when presented with socio-demographic cues. Using 100 clinical vignettes, each posing a yes or no choice between two ethical principles, we compared the responses of nine open-source LLMs (Llama 3.3-70B, Llama 3.1-8B, Llama-3.1-Nemotron-70B, Gemma-2-27B, Gemma-2-9B, Phi-3.5-mini, Phi-3-medium, Qwen-2.5-72B, and Qwen-2.5-7B). Each scenario and modifier combination was repeated 10 times per model for a total of approximately 0.5 million experiments. All models changed their responses when introduced with socio-demographic details (p < 0.001). High-income modifiers increased utilitarian choices and decreased beneficence and nonmaleficence preferences, and marginalized-group modifiers raised autonomy considerations. Although some models demonstrated greater consistency than others, none maintained consistency across all scenarios, with the largest shifts observed in utilitarian choices. These results reveal that current LLMs can be steered by socio-demographic cues in ways not clinically justified, posing risks for equitable care in healthcare-informatics applications. This underscores the need for careful auditing and alignment strategies that ensure LLMs behave in ways consistent with widely accepted ethical principles while remaining attentive to the diversity, complexity, and contextual sensitivity required in real-world clinical practice."
- "Why Language Models Hallucinate" Article Paper
- "When AI Gets It Wrong: Addressing AI Hallucinations and Bias"
- Large Language Models Will Never Be Intelligent
- Aritificial intelligence is not intelligent at all
- Researchers discover a shortcoming that makes LLMs less reliable
- "The LLMentalist Effect: how chat-based Large Language Models replicate the mechanisms of a psychic’s con"
Security
Models
- qwen2.5-coder: Your go-to coding companion. "It's a specialized LLM from the same family as qwen 2.5, but it's fine-tuned specifically for coding tasks. It is trained on an enormous dataset of code, providing it with a deep understanding of over 40 programming languages. The qwen-coder2.5 comes in different sizes, from a small 0.5B model to a powerful 32B model."
- wizard-math: Your logic partner. "This model is a specialized version of the WizardLM family. It is trained to excel in complex mathematical problems, logical reasoning, and solving puzzles. There are three different sizes available for wizard-math – 7B, 13B, and 70B. With my little experience exploring wizard-math with 7B, I found it fantastic. It helps me test my solutions and explore new ways to approach difficult problems. The model's ability to handle these subjects with precision and clarity makes it my go-to partner for all things logic and numbers."
- reader-lm: Web to markdown, instantly. "The model is super practical for my needs. Instead of manually creating .md files from the web content, I can feed them to reader-lm to get a perfectly structured markdown file. In my experience, while reader-lm does an amazing job for most of my needs, it sometimes struggles with really large or messy HTML code. But, it works well enough most of the time."
- llma-guard3: An LLM for safe prompts. "When working with LLMs, it’s crucial that our interactions are safe and responsible. While we can’t control an LLM’s response, we can ensure our prompts are appropriate. That’s exactly why I self-hosted llama-guard 3. This powerful model acts as a dedicated content moderation tool for all my other local LLMs. The llama-guard 3's job is to classify every interaction against a set of safety categories. It checks our prompts for 13 different categories. When we give a prompt to this LLM, it will respond with a message stating whether the prompt message was safe or unsafe. If it is unsafe, it flags it with a specific reason, such as S1 (Hate Speech) or S2 (Sexual Content), etc."
- Gemma 3: My local Gemini experience. "ChatGPT and Gemini are two key benchmarks that made everyone accustomed to AI and LLMs. While self-hosting LLMs, I also did not want to compromise my experience with those platforms. That’s why I self-hosted Gemma 3. This model is built on the same research as Gemini. It provides a premium experience with the flexibility of running locally. It is basically my local ChatGPT / Gemini. Gemma 3 is available in various sizes. It can handle a massive 128k context window, processes both text and images, and understands over 140 languages. This makes it my personal go-to AI for creative tasks. I use it to generate ideas for social media content, draft captions, and research topics for my blog."
Integrations/Implementations
- How to run a local LLM via LocalAI, an Open Source project
- Guidance: "Guidance is an efficient programming paradigm for steering language models. With Guidance, you can control how output is structured and get high-quality output for your use case—while reducing latency and cost vs. conventional prompting or fine-tuning. It allows users to constrain generation (e.g. with regex and CFGs) as well as to interleave control (conditionals, loops, tool use) and generation seamlessly."
LangChain
Python
- https://www.infoq.com/news/2025/08/google-langextract-python/
- Python Langchain: LangChain is a framework for developing applications powered by language models. It enables applications that:
- Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
- Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)
Tags:
ai
Last modified 23 December 2025