Introduction
Lingua.NET is a parser generator that uses code-based grammar definitions. Parser generators typically read a text-based grammar specification and emit source code that is subsequently compiled. Lingua.NET uses reflection to extract the grammar from an assembly and create the corresponding parser.
A grammar consits of three primary elements:
- Terminals or tokens. These define the individual "words" to be recognized by the parser. For a grammar that specifies a programming language such as C#, terminals would include such things as:
- Literals such as strings and numeric constants.
- Symbols such as variables and procedure names.
- Operators such as + and -.
- Punctuation such as { and }.
- Nonterminals. These represent the allowed "phrases" to be recognized by the the parser.
- Rules or productions. These specify how nonterminals are constructed from other nonterminals and terminals. For example:
- expression ::= numeric_constant;
- expression ::= boolean_expression;
- boolean_expression ::= expression boolean_operator expression;
- boolean_operator ::= op_addition;
- boolean_operator ::= op_subtraction;
Specifying a Grammar
Terminals, nonterminals and rules are defined by class and static methods as illustrated below.
Terminals
[Terminal(@"\d+")]
public class Number : Terminal
{
}
A terminal is a class that:
- Inherits directly or indirectly from Terminal.
- Has a public default constructor.
- Is adorned with a TeriminalAttribute containing the regular expression that defines the terminal.
Nonterminals
public class BooleanOperator : Nonterminal
{
}
A nonterminal is a class that:
- Inherits directly or indirectly from Nonterminal.
- Has a public default constructor.
Rules
public class BooleanOperator : Nonterminal
{
public static void Rule(BooleanOperator result, OperatorAddition op)
{
// Code
}
}
A rule is a static method that:
- Contains at least one parameter.
- Specifies a nonterminal as its first parameter.
- Specifies either terminals or nonterminals for all remaining parameters.
Using Lingua.NET
Once a grammar has been defined, use Lingua.NET to generate a terminal reader and parser.
Load Grammar
Construct a Grammar object and read in the grammar defined within the specified assembly.
Assembly assembly = Assembly.GetAssembly(typeof(App));
Grammar grammar = new Grammar();
grammar.Load(assembly);
grammar.LoadRules(assembly);
grammar.Resolve();
Generate Parser and Terminal Reader
The grammar is used to construct a terminal reader and parser.
ITerminalReaderGenerator terminalReaderGenerator = new TerminalReaderGenerator();
TerminalReaderGeneratorResult terminalReaderGeneratorResult = terminalReaderGenerator.GenerateTerminalReader(grammar);
ITerminalReader terminalReader = terminalReaderGeneratorResult.TerminalReader;
IParserGenerator parserGenerator = new ParserGenerator();
ParserGeneratorResult parserGeneratorResult = parserGenerator.GenerateParser(grammar);
IParser parser = parserGeneratorResult.Parser;
Open Terminal Reader and Parse Terminal Stream
The terminal reader and parser are used to process the desired text.
terminalReader.Open(txtExpression.Text);
Start result = parser.Parse(terminalReader);