Types and type systems (reading)

A type, also known as a data type, is a classification identifying one of various types of "things" used within a programming language or platform. More prosaically, a type describes the possible values of a "thing" (such as a variable), the semantic meaning of that "thing", and how the values of that "thing" can be stored in memory.

A type system is a collection of rules that assign a property called type to various constructs in a computer program, such as variables, expressions, functions or modules, with the end goal of reducing the number of bugs by verifying that data is represented properly throughout a program.

See "What is a type system, anyway": "Type systems provide a way of writing down properties of our programs that we would like to be true, and then mechanically checking that those properties hold. Type systems come in all shapes and sizes; some are more expressive than others. Types are also a great tool to use when actually writing code. Static type systems provide strong guarantees about program behavior at the expense of some friction in programming: dynamic languages make it easy to throw together a prototype, but can become unwieldy or difficult to maintain once the codebase grows. Gradual typing is an increasingly popular method to get the best of both worlds."

Most types break out into the following categories:

Primitives: the atoms of the programming language; usually things like integers, strings, booleans, and so on. The internal representation of a primitive is usually entirely opaque to the programmer without significant effort (like taking memory addresses and poking around).
- Booleans: true/false values
- Integrals: non-fractional/whole numbers, often using the first bit to indicate sign (positive or negative). Using twos-complement math, this gives a fairly large range of integral numbers, depending on how many bytes are used for the integral:
  - 1-byte/8-bit: -128 to +127/0 to 255
  - 2-byte/16-bit: -32k to +32k/0 to 64k
  - 4-byte/32-bit: -2,147,483,648 to +2,147,483,647/0 to 4gb
  - 8-byte/64-bit: if you have to ask...
  - 16-byte/128-bit: if you have to ask...
  - "word" or "natural" sized, which is the natural "size" of the CPU (typically, in 2021, this is 64 bits)
- Floating-points: fractional numbers, usually represented in IEEE 754 (Wikipedia) format.
- Pointers: One common primitive type is a pointer to a memory location (whether directly manipulable by the programmer or not); Java, for example, talks about Object o = new Object(); as allocating an object o, but in truth o is a reference to the object created, not the object itself. The reference o is allocated on the stack, whereas the object o points to is allocated out of the heap. Pointers/references are almost always word/natural-sized.
Composite: these are the various "structures" that a programmer can build up out of primitives. Some are built within the language (arrays within C/C++/Java/C#, etc), some are built using language constructs (class or struct within C++/Java/C#/etc). I tend to break these into "simple composite types" (composites provided by the language itself) and "complex composite types" (composites defined by the developer), also sometimes known as "user-defined types" (UDTs).
Abstract data types: these are types that do not have specific implementation, and could thus be represented by multiple concrete types. ADTs are usually defined by interfaces in Java/C#, but one does not need an interface keyword to have an ADT.
Nominal typing: types are unique based on their declared name
Structural typing: types are unique based on the structure that makes up their definition, such that two declared types with the same structure are considered the same type, even if they are declared with different names.
Dependent types: "Dependent types allow types to depend on values; i.e. you can have a type for "list with three integers". Dependent typing, as I understand it, opens up complete programmability of the type system, at the cost of type checking becoming undecidable. These type systems allow you describe the behavior of your programs with incredible precision." (from https://lambdaland.org/posts/2023-01-17_what_is_a_type_system_really/ )

Type-checking

The existence of types is useless without a process of verifying that those types make logical sense in the program so that the program can be executed successfully. Type checking is the process of verifying and enforcing the constraints of types, and it can occur either at ahead-of-(execution-)time (i.e. statically) or at runtime/execution time (i.e. dynamically). Type checking is all about ensuring that the program will not encounter errors due to inappropriate or undefined intersection of types; a type error is an erroneous program behavior in which an operation occurs (or trys to occur) on a particular data type that it’s not meant to occur on.

When a program is considered not type-safe, there is no single standard course of action that happens upon reaching a type error. Many programming languages throw type errors which halt the runtime or compilation of the program, while other languages have built-in safety features to handle a type error and continue running (allowing developers to exhibit poor type safety).

"Basic type checking"

What to know before debating type systems

Static: A language is statically-typed if the type of a variable is known at ahead-of-(execution-)time instead of at runtime. Since most ahead-of-time checking is considered compilation, most statically-type-checked languages are compilation-based languages, though certainly nothing stops an interpreter from statically-type-checking a program when it is first loaded. Consequences include:
- Many type errors, because they are often programmer mental mistakes, can be caught early in the development cycle and thus never appear to the user.
- Static typing information can be used to optimize compiled code that executes more quickly because when the compiler knows the exact data types that are in use, it can produce better machine code (i.e. faster and/or using less memory).
- A static type-checker will quickly detect type errors in rarely used code paths. Without static type checking, even code coverage tests with 100% coverage may be unable to find such type errors.
- Static type-checkers make it nearly impossible to manually raise a type error in code because even if that code block hardly gets called, the type-checker would almost always find a situation/codepath that will yield the possibility of that type error occurring at runtime and thus prevent compilation/execution (because a type error was raised).
- Many/most static type checkers evaluate only the type information that can be determined at compile time, but are able to verify that the checked conditions hold for all possible executions of the program, which eliminates the need to repeat type checks every time the program is executed; this can also yield faster startup times.
Dynamic: Dynamic type checking is the process of verifying the type safety of a program at runtime.

Most type-safe languages include some form of runtime/dynamic type checking, even if they also have a static type checker; many useful features or properties are difficult or impossible to verify statically. Languages frequently want to allow a programmer to assert (cast) that a given returned value is, in fact, of a different type; the assumption is that the programmer has awareness of how the code will execute at runtime that the compiler cannot verify. However, humans are fallible creatures, and blindly accepting the cast could yield significant error or corruption. Therefore, a dynamic check is needed to verify that the operation is safe. Consequences:
- In contrast to static type checking, dynamic type checking may cause a program to fail at runtime due to type errors.
- In some programming languages, it is possible to anticipate and recover from these failures – either by error handling or blindly carrying on assuming everything will work out. In others, type checking errors are considered fatal. Because type errors are more difficult to determine in dynamic type checking, it is a common practice to supplement development in these languages with unit testing.
- Dynamic type checking typically results in less optimized code than does static type checking. (Modern runtimes and JIT compilers can offset this, but usually by doing static type analysis at runtime and then generating/compiling code for direct execution--in essence, just pushing the compilation process to the last possible moment before execution.)
- Dynamic type checking forces runtime checks to occur for every execution of the program (instead of just at compile-time).
- It opens up the doors for more powerful language features and makes certain other development practices significantly easier. For example, metaprogramming – especially when using eval functionality – is not impossible in statically-typed languages, but it is much, much easier to work with in dynamically-typed languages.

Note that while these terms usually are applied most directly to programming languages, there's a strong case to be made that they apply to other areas of programming, too, like storage. A relational database, for example, could be said to be a strongly-type-safe (because it insists on only integers in INTEGER columns) and statically-type-checked (since it parses SQL and does type-checking).

"Type tailoring": Type systems evolve too slowly to keep up with the quick evolution of libraries – especially libraries that introduce abstractions. Type tailoring offers a lightweight solution by equipping the core language with an API for modifying the elaboration of surface code into the internal language of the typechecker. Through user-programmable elaboration, tailoring rules appear to improve the precision and expressiveness of the underlying type system. Furthermore, type tailoring cooperates with the host type system by expanding to code that the host then typechecks. In the context of a hygienic metaprogramming system, tailoring rules can even harmoniously compose with one another. Type tailoring has emerged as a theme across several languages and metaprogramming systems, but never with direct support and rarely in the same shape twice. For example, both OCaml and Typed Racket enable forms of tailoring, but in quite different ways. This paper identifies key dimensions of type tailoring systems and tradeoffs along each dimension. It demonstrates the usefulness of tailoring with examples that cover sized vectors, database queries, and optional types. Finally, it outlines a vision for future research at the intersection of types and metaprogramming.

Type-safety

Strong: A strongly-typed language is one in which variables are bound to specific data types, and will result in type errors if types to not match up as expected in the expression. A simple way to think of strong typing is to consider it to be a guarantor of high degrees of type safety.
Weak: A weakly-typed language on the other hand is a language in which variables are not bound to a specific data type; they still have a type, but type safety constraints are lower compared to strongly-typed languages.

Type inference

"Damas-Hindley-Milner inference two ways": Damas-Hindley-Milner (HM) is a type system for Standard ML and the ML-family languages with parametric polymorphism, aka generic functions. It sits at a sweet spot in PL design: the type system is quite expressive, and there are well known type inference algorithms that require absolutely no annotations from the programmer.
"How to implement type inference"

Type theory

"Lambda cube": In mathematical logic and type theory, the λ-cube (also written lambda cube) is a framework introduced by Henk Barendregt[1] to investigate the different dimensions in which the calculus of constructions is a generalization of the simply typed λ-calculus. Each dimension of the cube corresponds to a new kind of dependency between terms and types. Here, "dependency" refers to the capacity of a term or type to bind a term or type. The respective dimensions of the λ-cube correspond to:
- x-axis: types that can bind terms, corresponding to dependent types.
- y-axis: terms that can bind types, corresponding to polymorphism.
- z-axis: types that can bind types, corresponding to (binding) type operators.
The different ways to combine these three dimensions yield the 8 vertices of the cube, each corresponding to a different kind of typed system. The λ-cube can be generalized into the concept of a pure type system.

Beyond

"Moving beyond type systems": Types and effects, where effects are hard to model in types (witness monads!). "What I was quick to realize is that there are two distinct ways of implementing an effect system — static and dynamic. What present day research is trying to figure out are dynamic effects. You create callbacks that handle different state changes in the program accordingly (see that blog post linked earlier if you’re not familiar). A static effect system would be something akin to Rust’s borrow checker but for effects — instead of being able to interact and handle effects through functions, the programming language instead has an effect checker that statically analyzes the program and makes sure that everything holds. This would mean making sure that all effects are annotated properly and that no side effects occur where they shouldn’t. Contrary to the dynamic approach, in a static system you have to prove to the effect checker that your code doesn’t do anything funky. If you can’t prove that some function doesn’t have a side effect, that’s an error. Drawing parallels to the borrow checker once more: if you can’t prove that a variable will be used only once in a closure, the borrow checker will complain. The rest of this post will now focus on implementing a static effect system and how it might be beneficial to future programming languages."

Code/implementations

From PLZoo: Type Inference Library written in TypeScript (used in Heron and Cat).

Reading

Type Theory for All: podcasts about type theory and applications
Type Systems for Memory Safety: "Manual memory management and memory safety used to be incompatible. But it is possible to design programming languages and type systems that provide memory safety at compile time, combining the safety of high-level languages with the performance and low-level control of languages like C." ... "Memory safety is a bundle of things: Null Safety: dereferencing a NULL pointer is bad. This causes a segfault or a NullPointerException or undefined is not a function, depending on your language. This is the easiest one to solve and arguably isn’t about memory at all. Buffer Overflow: indexing past the end of a contiguous chunk of memory. This is solved by storing the length of arrays and checking it. No Use-After-Free: using a chunk of memory after it has been deallocated. This is a source of too many security vulnerabilities to count. Leak Freedom: all memory that is allocated is freed. Data Race Freedom: memory can be used by multiple threads without complex runtime access checks (locks, mutexes etc.) Null safety and buffer overflows are solved by quotidian solutions: option types and range checks. Use-after-free and leak freedom are harder to enforce, and require potentially a lot more compile time machinery."
"How to Write a type checker/type inferrer with good error message" (Source)
Creating the Bolt Compiler: Part 4: An accessible introduction to type theory and implementing a type-checker: This post is split into 2 halves: the first half explains the theory behind type-checkers, and the second half gives you a detailed deep-dive into how it’s implemented in our compiler for our language Bolt. Even if you aren’t interested in writing your own language, the first half is useful if you’ve ever heard about type systems and want to know how they even work!
Type Theory and Functional Programming | Crash Course on Notation in Programming Language Theory, Jeremy G. Siek; LambdaConf 2018 Part 1, Part 2 Slides
Types and Programming Languages: TAPL (rhymes with “apple”), as it’s better known, has a solid introduction to formal semantics in the first few chapters and would be my pick for a starting point on formal semantics. The remainder of the book deals with type systems, which form only one part of programming languages, but it’s the canonical reference if you’re looking to learn about types.
"Building a Typechecker from scratch": Untyped programs are often prone to errors, runtime exceptions, and can make debugging much harder. That’s why many production languages implement a static typechecker — an extra module, which is aimed to increase programs safety and make development simpler.
Basic Type checking (PDF)
Subtype Inference by Example:
- Part 1: Introducing CubiML
- Part 2: Parsing and Biunification
- Part 3: The Typechecker Frontend
- Part 4: The Typechecker Core
- Part 5: Incremental Reachability
- Part 6: Numeric Types and Operators
- Part 7: Spanned Error Messages
- Part 8: Mutability
- Part 9: Match Wildcards, Record Extension, and Row Polymorphism
- Part 10: Let Polymorphism
- Part 11: The Value Restriction and Polymorphic Recursion
- Part 12: Flow Typing and Mixed Comparison Operators
- Part 13: Conditional Flow Constraints, Presence Polymorphism, and Type Level Computation
- Part 14: Type Annotations -- what are they good for?
- Part 15: Monomorphic Type Annotations
What are GADTs (generic abstract data types) and why do they make type inference sad?
On the Relationship Between Static Analysis and Type Theory
Soundness and completeness: with precision
What is soundness (in static analysis)?
What Type Soundness Theorem Do You Really Want to Prove? (YouTube)
- Milner Award Lecture: The Type Soundness Theorem That You Really Want to Prove (and now you can), POPL 2018; Derek Dreyer
Type Theory Behind Glasgow Haskell Compiler Internals (video): LambdaConf 2018; Vitaly Bragilevsky - https://github.com/bravit/tt-ghc-exercises/
Reconstructing TypeScript: I've been building a "document development environment" called Programmable Matter that supports live code embedded in documents, with a simple TypeScript-like programming language. It's been fun figuring out how to implement it—the type system in TypeScript is unusual and very cool! I want to dig into what's cool and unusual about TypeScript by presenting a type checker for a fragment of this language (written in actual TypeScript). I'll start with a tiny fragment and build it up over several posts. In this first post I'll give some background to help make sense of the code.
- Part 0: Intro and Background
- Part 1: bidirectional type checking
- Part 2: functions and function calls
- Part 3: operators and singleton types
- Part 4: union types
- Part 5: intersection types
- Part 6: narrowing
"lambda-me/blog", type inference:
- 2023-09-30 - Implementing kind inference
- 2021-04-13 - Typing polymorphic variants in Giml
- 2021-04-10 - Typing extensible records in Giml
- 2021-04-06 - Giml's type inference engine
"Evolving Languages Faster with Type Tailoring"
"Implementing Type Systems with Macros", which works off of...
"Type Systems as Macros" Cited as Chang, S., Knauth, A. and Greenman, B. 2017. Type systems as macros. Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages - POPL 2017 (Paris, France, 2017), 694–705.

Research

Tags: reading language language development

Last modified 13 August 2025