Tools
- ANTLR
- APG: a recursive-descent parser using a variation of Augmented BNF, that they call Superset Augmented BNF. ABNF is a particular variant of BNF designed to better support bidirectional communications protocol. APG also support additional operators, like syntactic predicates and custom user defined matching functions. It can generate parsers in C/C++, Java e JavaScript. Support for the last language seems superior and more up to date: it has a few more features and seems more updated. In fact the documentation says it is designed to have the look and feel of JavaScript RegExp.
- Bison/yacc
- BYACC/J
- Canopy: PEG parser compiler targeting Java, JavaScript, Python and Ruby. It takes a file describing a parsing expression grammar and compiles it into a parser module in the target language. The generated parsers have no runtime dependency on Canopy itself.
- Coco/R
- CookCC
- CUP (Untouched in 2016)
- Funcj.parser: a Java parser combinator framework for constructing LL(1) parsers. It’s based primarily on the “Deterministic, error-correcting combinator parsers” paper by S.D. Swierstra & L. Duponcheel, and draws inspiration from various parsers in the Haskell world, as well as the ParsecJ library. It is part of a framework of functional Java libraries. The library wants to provide a simple internal Domain Specific Language to express grammar languages.
- Gold
- Grammatica
- Jacc
- JavaCC
- JParsec: the port of the parsec library of Haskell.
- JFlex
- Lambda PEG parser: This is a Java library providing easy-to-use, powerful and elegant parsing of arbitrary text. It is based on Parsing Expression Grammars (PEGs), which are similar to simple recursive descent parsers, as you would write them by hand, with the addition of backtracking (try something, if it fails, try something else) and support for left recursion. This project is similar to parboiled, with the important difference that the rule methods are executed directly and can return any value. This reduces the conceptual difference to hand coded parsers and thus makes learning to use the parser easier.
- Lemon
- ModelCC
- Mouse: a tool to transcribe PEG into an executable parser written in Java
- myna: Myna is an efficient and easy to use parsing library for JavaScript written using TypeScript 2.0 which targets ECMAScript 5.1.
- Parboiled: a recursive descent PEG parser implementation that operates on PEG rules you specify.
- PetitParser: combines ideas from scannerless parsing, parser combinators, parsing expression grammars and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically. All the information is written in the source code, but the source code is divided in two files. In one file you define the grammar, while in the other one you define the actions corresponding to the various elements. The idea is that it should allow you to dynamically redefine grammars. While it is smartly engineered, it is debatable if it is also smartly designed.
- Rats!: Parser generator part of xtc. (Appears down.)
- Rekex: a new parser generator with a novel approach that flips writing a parser on its head. With traditional parser generators you write a grammar and then the generated parser produces a parse tree. One issue with this approach is that the parse tree is rarely what you want. So, you need to post-process the parse tree to create a data structure that fits your program. This can be a long process in itself. Particularly if you are dealing with a large grammar. You might be forced to optimize the grammar for performance during parsing, but this leads to a convoluted parse tree. Therefore you then have to spend more time in creating a sensible AST for your end users. Rekex changes the process of designing a parser starting from the desired AST.
- Tree-sitter
- syntax: Syntactic analysis toolkit, language-agnostic parser generator.
Articles
Theory and approaches
Pratt parsing
Tree-sitter: a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be: General enough to parse any programming language; Fast enough to parse on every keystroke in a text editor; Robust enough to provide useful results even in the presence of syntax errors; Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application
Tags:
reading
language development
tool
library
Last modified 28 April 2025