A type, also known as a data type, is a classification identifying one of various types of "things" used within a programming language or platform. More prosaically, a type describes the possible values of a "thing" (such as a variable), the semantic meaning of that "thing", and how the values of that "thing" can be stored in memory.
A type system is a collection of rules that assign a property called type to various constructs in a computer program, such as variables, expressions, functions or modules, with the end goal of reducing the number of bugs by verifying that data is represented properly throughout a program.
See "What is a type system, anyway": "Type systems provide a way of writing down properties of our programs that we would like to be true, and then mechanically checking that those properties hold. Type systems come in all shapes and sizes; some are more expressive than others. Types are also a great tool to use when actually writing code. Static type systems provide strong guarantees about program behavior at the expense of some friction in programming: dynamic languages make it easy to throw together a prototype, but can become unwieldy or difficult to maintain once the codebase grows. Gradual typing is an increasingly popular method to get the best of both worlds."
Most types break out into the following categories:
Primitives: the atoms of the programming language; usually things like integers, strings, booleans, and so on. The internal representation of a primitive is usually entirely opaque to the programmer without significant effort (like taking memory addresses and poking around).
Floating-points: fractional numbers, usually represented in IEEE 754 (Wikipedia) format.
Object o = new Object();
as allocating an object o
, but in truth o
is a reference to the object created, not the object itself. The reference o
is allocated on the stack, whereas the object o
points to is allocated out of the heap. Pointers/references are almost always word/natural-sized.Composite: these are the various "structures" that a programmer can build up out of primitives. Some are built within the language (arrays within C/C++/Java/C#, etc), some are built using language constructs (class
or struct
within C++/Java/C#/etc). I tend to break these into "simple composite types" (composites provided by the language itself) and "complex composite types" (composites defined by the developer), also sometimes known as "user-defined types" (UDTs).
Abstract data types: these are types that do not have specific implementation, and could thus be represented by multiple concrete types. ADTs are usually defined by interface
s in Java/C#, but one does not need an interface keyword to have an ADT.
Nominal typing: types are unique based on their declared name
Structural typing: types are unique based on the structure that makes up their definition, such that two declared types with the same structure are considered the same type, even if they are declared with different names.
Dependent types: "Dependent types allow types to depend on values; i.e. you can have a type for "list with three integers". Dependent typing, as I understand it, opens up complete programmability of the type system, at the cost of type checking becoming undecidable. These type systems allow you describe the behavior of your programs with incredible precision." (from https://lambdaland.org/posts/2023-01-17_what_is_a_type_system_really/ )
The existence of types is useless without a process of verifying that those types make logical sense in the program so that the program can be executed successfully. Type checking is the process of verifying and enforcing the constraints of types, and it can occur either at ahead-of-(execution-)time (i.e. statically) or at runtime/execution time (i.e. dynamically). Type checking is all about ensuring that the program will not encounter errors due to inappropriate or undefined intersection of types; a type error is an erroneous program behavior in which an operation occurs (or trys to occur) on a particular data type that it’s not meant to occur on.
When a program is considered not type-safe, there is no single standard course of action that happens upon reaching a type error. Many programming languages throw type errors which halt the runtime or compilation of the program, while other languages have built-in safety features to handle a type error and continue running (allowing developers to exhibit poor type safety).
What to know before debating type systems
Static: A language is statically-typed if the type of a variable is known at ahead-of-(execution-)time instead of at runtime. Since most ahead-of-time checking is considered compilation, most statically-type-checked languages are compilation-based languages, though certainly nothing stops an interpreter from statically-type-checking a program when it is first loaded. Consequences include:
Dynamic: Dynamic type checking is the process of verifying the type safety of a program at runtime.
Most type-safe languages include some form of runtime/dynamic type checking, even if they also have a static type checker; many useful features or properties are difficult or impossible to verify statically. Languages frequently want to allow a programmer to assert (cast) that a given returned value is, in fact, of a different type; the assumption is that the programmer has awareness of how the code will execute at runtime that the compiler cannot verify. However, humans are fallible creatures, and blindly accepting the cast could yield significant error or corruption. Therefore, a dynamic check is needed to verify that the operation is safe. Consequences:
In contrast to static type checking, dynamic type checking may cause a program to fail at runtime due to type errors.
In some programming languages, it is possible to anticipate and recover from these failures – either by error handling or blindly carrying on assuming everything will work out. In others, type checking errors are considered fatal. Because type errors are more difficult to determine in dynamic type checking, it is a common practice to supplement development in these languages with unit testing.
Dynamic type checking typically results in less optimized code than does static type checking. (Modern runtimes and JIT compilers can offset this, but usually by doing static type analysis at runtime and then generating/compiling code for direct execution--in essence, just pushing the compilation process to the last possible moment before execution.)
Dynamic type checking forces runtime checks to occur for every execution of the program (instead of just at compile-time).
It opens up the doors for more powerful language features and makes certain other development practices significantly easier. For example, metaprogramming – especially when using eval
functionality – is not impossible in statically-typed languages, but it is much, much easier to work with in dynamically-typed languages.
Note that while these terms usually are applied most directly to programming languages, there's a strong case to be made that they apply to other areas of programming, too, like storage. A relational database, for example, could be said to be a strongly-type-safe (because it insists on only integers in INTEGER columns) and statically-type-checked (since it parses SQL and does type-checking).
Strong: A strongly-typed language is one in which variables are bound to specific data types, and will result in type errors if types to not match up as expected in the expression. A simple way to think of strong typing is to consider it to be a guarantor of high degrees of type safety.
Weak: A weakly-typed language on the other hand is a language in which variables are not bound to a specific data type; they still have a type, but type safety constraints are lower compared to strongly-typed languages.
Note that while these terms usually are applied most directly to programming languages, there's a strong case to be made that they apply to other areas of programming, too, like storage. A relational database, for example, could be said to be a strongly-type-safe (because it insists on only integers in INTEGER columns) and statically-type-checked (since it parses SQL and does type-checking).
The different ways to combine these three dimensions yield the 8 vertices of the cube, each corresponding to a different kind of typed system. The λ-cube can be generalized into the concept of a pure type system.
From PLZoo: Type Inference Library written in TypeScript (used in Heron and Cat).
Type Systems for Memory Safety: "Manual memory management and memory safety used to be incompatible. But it is possible to design programming languages and type systems that provide memory safety at compile time, combining the safety of high-level languages with the performance and low-level control of languages like C." ... "Memory safety is a bundle of things: Null Safety: dereferencing a NULL pointer is bad. This causes a segfault or a NullPointerException or undefined is not a function, depending on your language. This is the easiest one to solve and arguably isn’t about memory at all. Buffer Overflow: indexing past the end of a contiguous chunk of memory. This is solved by storing the length of arrays and checking it. No Use-After-Free: using a chunk of memory after it has been deallocated. This is a source of too many security vulnerabilities to count. Leak Freedom: all memory that is allocated is freed. Data Race Freedom: memory can be used by multiple threads without complex runtime access checks (locks, mutexes etc.) Null safety and buffer overflows are solved by quotidian solutions: option types and range checks. Use-after-free and leak freedom are harder to enforce, and require potentially a lot more compile time machinery."
"How to Write a type checker/type inferrer with good error message" (Source)
Creating the Bolt Compiler: Part 4: An accessible introduction to type theory and implementing a type-checker: This post is split into 2 halves: the first half explains the theory behind type-checkers, and the second half gives you a detailed deep-dive into how it’s implemented in our compiler for our language Bolt. Even if you aren’t interested in writing your own language, the first half is useful if you’ve ever heard about type systems and want to know how they even work!
Type Theory and Functional Programming | Crash Course on Notation in Programming Language Theory, Jeremy G. Siek; LambdaConf 2018 Part 1, Part 2 Slides
Types and Programming Languages: TAPL (rhymes with “apple”), as it’s better known, has a solid introduction to formal semantics in the first few chapters and would be my pick for a starting point on formal semantics. The remainder of the book deals with type systems, which form only one part of programming languages, but it’s the canonical reference if you’re looking to learn about types.
"Building a Typechecker from scratch": Untyped programs are often prone to errors, runtime exceptions, and can make debugging much harder. That’s why many production languages implement a static typechecker — an extra module, which is aimed to increase programs safety and make development simpler.
Basic Type checking (PDF)
Subtype Inference by Example:
What are GADTs (generic abstract data types) and why do they make type inference sad?
What Type Soundness Theorem Do You Really Want to Prove? (YouTube)
Type Theory Behind Glasgow Haskell Compiler Internals (video): LambdaConf 2018; Vitaly Bragilevsky - https://github.com/bravit/tt-ghc-exercises/
Reconstructing TypeScript: I've been building a "document development environment" called Programmable Matter that supports live code embedded in documents, with a simple TypeScript-like programming language. It's been fun figuring out how to implement it—the type system in TypeScript is unusual and very cool! I want to dig into what's cool and unusual about TypeScript by presenting a type checker for a fragment of this language (written in actual TypeScript). I'll start with a tiny fragment and build it up over several posts. In this first post I'll give some background to help make sense of the code.
"lambda-me/blog", type inference:
"Implementing Type Systems with Macros", which works off of...
"Type Systems as Macros" Cited as Chang, S., Knauth, A. and Greenman, B. 2017. Type systems as macros. Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages - POPL 2017 (Paris, France, 2017), 694–705.
Last modified 07 October 2024